Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only translating last 30s or so of the audio file. #172

Open
SergioEstevao opened this issue Jun 21, 2024 · 4 comments
Open

Only translating last 30s or so of the audio file. #172

SergioEstevao opened this issue Jun 21, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@SergioEstevao
Copy link

When using the whisper-kit cli or apps with a large file, 50 minutes of audio, it looks like the final report (.srt file) is only showing the last 30s of content transcribed.

Is this expected, Am I'm missing a command line argument?

@ZachNagengast
Copy link
Contributor

What kind of audio is it? Also could you provide the command you are using to call the cli? This may be a result of log prob errors considering the full windows to be silent, which would happen if the audio is particularly noisy. Can you try adjusting the log prop threshold and see if the results are better?

@ZachNagengast ZachNagengast added the needs info Further information is requested label Jun 24, 2024
@SergioEstevao
Copy link
Author

So I was trying to transcribe an episode from the Cautionary Tales podcast. The sound is clear for the majority of the episodes.

I was using the CLI with this command:
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny" --audio-path ../transcripts/audio.mp3 --report

You can get the audio file from here:

https://chtbl.com/track/39E17/podtrac.com/pts/redirect.mp3/pdrl.fm/18db03/traffic.omny.fm/d/clips/e73c998e-6e60-432f-8610-ae210140c5b1/c0ae8c6e-22f0-4e9b-ac1c-ae390037ac53/a4efe84f-d748-4730-98f5-b1770137cb8e/audio.mp3

@ZachNagengast ZachNagengast added bug Something isn't working and removed needs info Further information is requested labels Jun 25, 2024
@SergioEstevao
Copy link
Author

@ZachNagengast After doing some more tests I believe the bug is on the converting process when the source file is not 1 channel and 16Kz.

This line here

While we are reading new data for the input buffer in chunck we are always writing to the same position (0) of the outputBuffer so in the end the outputBuffer only has data from the last chunk read from the input file.

@ZachNagengast
Copy link
Contributor

Hi @SergioEstevao I'm having trouble reproducing this, can you share your the hardware and OS you're using where this error occurs?

This is the file I get running your same command with the file
last30bug.srt.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: TODO
Development

No branches or pull requests

2 participants