Tacotron (2017)

This is a repository for an unofficial implementation of the Tacotron speech synthesis model using PyTorch. Tacotron is a model that converts given text into a spectrogram and then uses the Griffin-Lim algorithm to synthesize speech. For more detailed information, please refer to here.

The implementation is still in progress.

Architecture

How to run

All implementations were carried out on a container utilizing a PyTorch-based Docker image. There is no need to download a separate dataset, as the torchaudio package is used to automatically download the LJSpeech-1.1 dataset.

Prepare docker image / container

docker pull pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

Install requirements
```
pip install -r requirements.txt
```
Start training
```
python train.py
```

Reference

[1] Wang, Yuxuan, et al. "Tacotron: Towards end-to-end speech synthesis." arXiv preprint arXiv:1703.10135 (2017).

[2] https://github.com/r9y9/tacotron_pytorch

[3] https://github.com/Kyubyong/tacotron

[4] Keith Ito and Linda Johnson, The LJ Speech Dataset, https://keithito.com/LJ-Speech-Dataset/, 2017

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fig		fig
img		img
model		model
.gitignore		.gitignore
Tacotron.ipynb		Tacotron.ipynb
dataset.py		dataset.py
hyperparams.py		hyperparams.py
loss.py		loss.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tacotron (2017)

Architecture

How to run

Reference

About

Languages

Orca0917/Tacotron-pytorch

Folders and files

Latest commit

History

Repository files navigation

Tacotron (2017)

Architecture

How to run

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages