Skip to content

BenAAndrew/speech-transcriber

Repository files navigation

Speech Transcriber

A web-app/library for transcribing speech

Installation

  1. Install Python 3.9
  2. Install ffmpeg
    • Windows: Download zip & add ffmpeg/bin to environment path
    • Linux: apt-get install ffmpeg
  3. pip install -r requirements.txt
  4. (Optional) Download punctuator model and save as INTERSPEECH-T-BRNN.pcl

Usages

Web app

Run pip install flask before running the web app.

Then run python app.py to open the web app at http://localhost:5000/

CLI

python main.py --path filename --transcriber transcriber

  • Path: Path to the audio/video file to transcribe
  • Transcriber: Transcription model to use, choose from:
    • cmu_sphinx
    • librispeech
    • silero
    • vosk
    • wav2vec2
    • wav2vec2_commonvoice
    • whisper

Transcription models

When selecting transcription models, the following requirements were used:

  1. Must be supported in Python 3.9
  2. Must work locally (without the usage of an API)
  3. Must have a straightforward installation process
    • Should not require building from source
    • Should not require additional OS libraries
    • Should not require manually downloading additional files

Below is a comparison of transcription model performance produced using the Librispeech test clean dataset and analysis script

Name Dependencies Model Size Average processing time Score
Wav2Vec2 CommonVoice speechbrain 1.18GB 3.351s 0.87
Librispeech torch, transformers, torchaudio, librosa 113MB 0.558s 0.85
Wav2Vec2 torch, transformers, torchaudio, librosa 360MB 1.325s 0.8
Whisper whisper 138MB 3.848s 0.77
Vosk vosk 67.7MB 1.206s 0.76
Silero torch, transformers, torchaudio, librosa, omegaconf 111MB 0.261s 0.68
CMU Sphinx SpeechRecognition, pocketsphinx 33.9MB* 1.123s 0.55

*size of pocketsphinx package

Releases

No releases published

Packages

No packages published