ContentSproute

Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic thumbnail

Complete silence is always hallucinated as “ترجمة نانسي قنقر” in Arabic



Comment options



VAD, probably.
I’ve only tried the turbo one, but what I can say is that v3 is different from the earlier models.
It looks like it doesn’t have the audio descriptions to fall back on and produces hallucinations instead.

The earlier models will also produce some miscellaneous crap when they encounter silence
(they do this regardless of language), but there are more options for how to deal with that.

For example, these things can be effective for the small model (but not for v3):

  • the suppress_tokens trick
  • setting initial prompt to something like “.”
  • adjusting logprob_threshold to -0.4 (works for this empty audio, probably not good for general use)

You must be logged in to vote


0 replies



Comment options



is there any good arabic model you guys found which is better than large v3 ?
@misutoneko @puthre

You must be logged in to vote


1 reply



Comment options



I found a similar thing happens in German where it says
“Untertitelung des ZDF für funk, 2017.”

For both German and Arabic I found that this pretty much only happens at the very end of videos / when there is sustained silence.

You must be logged in to vote


0 replies



Comment options



Essentially this seems to be an artifact of the fact that Whisper was trained on (amongst other things) YouTube audio + available subtitles. Often subtitlers add their copyright notice onto the end of the subtitles, and the end of the videos are often credits with music, applause, or silence. Thus whisper learned that silence == “copyright notice”.

See some research for the Norwegian example here:

https://medium.com/@lehandreassen/who-is-nicolai-winther-985409568201

You must be logged in to vote


0 replies



Comment options



In English there is always applause

You must be logged in to vote


0 replies



Comment options



this also happens when you don’t speak into the voice mode, the transcript usually results in the same Arabic phrase

You must be logged in to vote


0 replies



Comment options



I’ve also seen this happen a lot in English with Skyeye:

image

It also happens a lot with hallucinations saying stuff like “This is the end of the video, remember to like and subscribe”

image

You must be logged in to vote


0 replies



Comment options



I have built https://arabicworksheet.com for arabic learning from absolute beginners to professional speakers. It created dynamic exercises and worksheets based on your level and topics. Behind the scene I have used Gemini 2.5-pro & GPT-4o for overall agentic workflows.

You must be logged in to vote


1 reply

@nyxiereal



Comment options



Ok? This doesn’t have anything to do with the topic of this discussion



Comment options



In german it’s “Vielen Dank” (Thank you very much)

You must be logged in to vote


0 replies



Comment options



You must be logged in to vote


0 replies

Read More

Scroll to Top