Skip to content

Tacotron 2 - PyTorch implementation with faster-than-realtime inference by NVIDIA

License

Notifications You must be signed in to change notification settings

RiccardoGrin/NVIDIA-tacotron2

 
 

Repository files navigation

NOTE

This is a forked version of NVIDIA's tacotron2 repository, which I changed to work on NVIDIA K80 GPUs, instead of the V100 GPUs, used originally by them.

Tacotron 2 (using Griffin Lim, not Wavenet)

Tacotron 2 PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.

This implementation includes distributed and fp16 support and uses the LJSpeech dataset.

Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's Apex Library.

Results from Tensorboard while Training: Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram

Pre-requisites

  1. NVIDIA GPU + CUDA cuDNN

Setup

  1. Download and extract the LJ Speech dataset
  2. Clone repo: git clone https://github.com/RiccardoGrin/NVIDIA-tacotron2.git
  3. CD into repo: cd NVIDIA-tacotron2
  4. Update .wav paths: sed -i -- 's,DUMMY,/home/ubuntu/LJSpeech-1.1/wavs,g' filelists/*.txt
    • Alternatively, set load_mel_from_disk=True in hparams.py and update mel-spectrogram paths
  5. Install pytorch 0.4
  6. Install python requirements: pip install -r requirements.txt
  7. Change 'dist_url' in hparams.py to the repo directory, where test.dpt file does not exist

Training

  1. python train.py --output_directory=outdir --log_directory=logdir
  2. (OPTIONAL) tensorboard --logdir=outdir/logdir

Multi-GPU (distributed) and FP16 Training

  1. python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

This does train much faster and better than the normal training, however this may start by overflowing for a few steps, with messages similar to the following, before it starts training correctly:

'OVERFLOW! Skipping step. Attempted loss scale: 4294967296'

Inference

  1. Start and open a Jupyter Notebook
  2. Open inference.ipynb
  3. Follow instructions on notebook and run

Inference results

Below are the inference results after 15750, and 4750 steps respectively, for the input text: "You stay in Wonderland and I show you how deep the rabbit hole goes." - Morpheus, The Matrix

Predicted Mel Spectrogram, Alignment

Predicted Mel Spectrogram, Alignment

You can download 'inference_test_15750.wav' and 'inference_test_4750.wav', to listen to the respective audio generated. Around step 4750, is when the network started to construct a proper alignment graph and make understandable sounds.

Related repos

nv-wavenet: Faster than real-time wavenet inference

Acknowledgements

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.

We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.

We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.

About

Tacotron 2 - PyTorch implementation with faster-than-realtime inference by NVIDIA

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 87.9%
  • Python 12.1%