{{title}}
VAD, probably.
I’ve only tried the turbo one, but what I can say is that v3 is different from the earlier models.
It looks like it doesn’t have the audio descriptions to fall back on and produces hallucinations instead.
The earlier models will also produce some miscellaneous crap when they encounter silence
(they do this regardless of language), but there are more options for how to deal with that.
For example, these things can be effective for the small model (but not for v3):
- the suppress_tokens trick
- setting initial prompt to something like “.”
- adjusting logprob_threshold to -0.4 (works for this empty audio, probably not good for general use)
0 replies
{{title}}
is there any good arabic model you guys found which is better than large v3 ?
@misutoneko @puthre
1 reply
{{title}}
I found a similar thing happens in German where it says
“Untertitelung des ZDF für funk, 2017.”
For both German and Arabic I found that this pretty much only happens at the very end of videos / when there is sustained silence.
0 replies
{{title}}
Essentially this seems to be an artifact of the fact that Whisper was trained on (amongst other things) YouTube audio + available subtitles. Often subtitlers add their copyright notice onto the end of the subtitles, and the end of the videos are often credits with music, applause, or silence. Thus whisper learned that silence == “copyright notice”.
See some research for the Norwegian example here:
https://medium.com/@lehandreassen/who-is-nicolai-winther-985409568201
0 replies
{{title}}
In English there is always applause
0 replies
{{title}}
this also happens when you don’t speak into the voice mode, the transcript usually results in the same Arabic phrase
0 replies
{{title}}
I’ve also seen this happen a lot in English with Skyeye:
It also happens a lot with hallucinations saying stuff like “This is the end of the video, remember to like and subscribe”
0 replies
{{title}}
I have built https://arabicworksheet.com for arabic learning from absolute beginners to professional speakers. It created dynamic exercises and worksheets based on your level and topics. Behind the scene I have used Gemini 2.5-pro & GPT-4o for overall agentic workflows.
1 reply
{{title}}
In german it’s “Vielen Dank” (Thank you very much)
0 replies
{{title}}
0 replies