Skip to content

Emotional-Text-to-Speech/pytorch-dc-tts

 
 

Repository files navigation

Pytorch Implementation of DC-TTS for Emotional TTS

This fork is modified to work for transfer learning for low-resource emotional TTS, as described here.

Training

  1. Install the dependencies using pip install -r requirements.txt
  2. Preprocess the EmoV-DB dataset using process_emovdb.py
  3. Change the logdir argument in hyperparams.py. Other parameters can be edits optionally. DO NOT edit these hyperparameters.
  4. Add the path to the pre-trained Text2Mel model in the logdir
  5. Comment this line if you are not running the train-text2mel.py file for the first time.
  6. Run the training script like - python train-text2mel.py --dataset=emovdb

Synthesis

  1. Write the sentences that you want to generate here
  2. Add the checkpoint for the fine-tuned Text2Mel model in place of this line
  3. Edit the paths for the output.
  4. Run the synthesis script like - python synthesize.py -- dataset=emovdb


Readme of the original repository

PyTorch implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention based partially on the following projects:

Online Text-To-Speech Demo

The following notebooks are executable on https://colab.research.google.com :

For audio samples and pretrained models, visit the above notebook links.

Training/Synthesizing English Text-To-Speech

The English TTS uses the LJ-Speech dataset.

  1. Download the dataset: python dl_and_preprop_dataset.py --dataset=ljspeech
  2. Train the Text2Mel model: python train-text2mel.py --dataset=ljspeech
  3. Train the SSRN model: python train-ssrn.py --dataset=ljspeech
  4. Synthesize sentences: python synthesize.py --dataset=ljspeech
    • The WAV files are saved in the samples folder.

Training/Synthesizing Mongolian Text-To-Speech

The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.

  1. Download the dataset: python dl_and_preprop_dataset.py --dataset=mbspeech
  2. Train the Text2Mel model: python train-text2mel.py --dataset=mbspeech
  3. Train the SSRN model: python train-ssrn.py --dataset=mbspeech
  4. Synthesize sentences: python synthesize.py --dataset=mbspeech
    • The WAV files are saved in the samples folder.

About

Text to Speech with PyTorch (English and Mongolian)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.2%
  • Python 0.8%