-
Notifications
You must be signed in to change notification settings - Fork 1.8k
PaddleSpeech Architecture
For rapidly iteration of speech model and simplify model development and maintain, overly unified architecture is not appropriate, so we organize code as model first.
For common module, such as audio and text feature transformation and augmentation, we implement as library. We want implement model training/validation process as compose of these fundamental modules.
For each speech task, we have an exp and models directory. Under it, we create an directory named by model name. So we can do specific process for model, e.g. more fine data and/or feature processing, complicate training and/or evaluation loop.
We implement some recipe for conventional speech task, which on common dataset with performance reported, user can apply these template to private data or implement new model follow your own opinion. The philosophy make it easy to use, development and extensionοΌwhich will accelerating industrial and/or academic iteration.
Full directory structure as follows:
paddlespeech
βββ cli
βΒ Β βββ asr
βΒ Β βββ cls
βΒ Β βββ st
βΒ Β βββ text
βΒ Β βββ tts
βββ cls
βΒ Β βββ exps
βΒ Β βΒ Β βββ panns
βΒ Β βΒ Β βββ deploy
βΒ Β βββ models
βΒ Β βββ panns
βββ s2t
βΒ Β βββ decoders
βΒ Β βΒ Β βββ beam_search
βΒ Β βΒ Β βββ ctcdecoder
βΒ Β βΒ Β βΒ Β βββ tests
βΒ Β βΒ Β βββ scorers
βΒ Β βββ exps
βΒ Β βΒ Β βββ deepspeech2
βΒ Β βΒ Β βΒ Β βββ bin
βΒ Β βΒ Β βΒ Β βββ deploy
βΒ Β βΒ Β βββ lm
βΒ Β βΒ Β βΒ Β βββ transformer
βΒ Β βΒ Β βΒ Β βββ bin
βΒ Β βΒ Β βββ u2
βΒ Β βΒ Β βΒ Β βββ bin
βΒ Β βΒ Β βββ u2_kaldi
βΒ Β βΒ Β βΒ Β βββ bin
βΒ Β βΒ Β βββ u2_st
βΒ Β βΒ Β βββ bin
βΒ Β βββ frontend
βΒ Β βΒ Β βββ augmentor
βΒ Β βΒ Β βββ featurizer
βΒ Β βββ io
βΒ Β βββ models
βΒ Β βΒ Β βββ ds2
βΒ Β βΒ Β βββ ds2_online
βΒ Β βΒ Β βββ lm
βΒ Β βΒ Β βββ u2
βΒ Β βΒ Β βββ u2_st
βΒ Β βββ modules
βΒ Β βββ training
βΒ Β βΒ Β βββ extensions
βΒ Β βΒ Β βββ triggers
βΒ Β βΒ Β βββ updaters
βΒ Β βββ transform
βΒ Β βββ utils
βββ t2s
βΒ Β βββ audio
βΒ Β βββ data
βΒ Β βββ datasets
βΒ Β βββ exps
βΒ Β βΒ Β βββ fastspeech2
βΒ Β βΒ Β βββ gan_vocoder
βΒ Β βΒ Β βΒ Β βββ hifigan
βΒ Β βΒ Β βΒ Β βββ multi_band_melgan
βΒ Β βΒ Β βΒ Β βββ parallelwave_gan
βΒ Β βΒ Β βΒ Β βββ style_melgan
βΒ Β βΒ Β βββ new_tacotron2
βΒ Β βΒ Β βββ speedyspeech
βΒ Β βΒ Β βββ tacotron2
βΒ Β βΒ Β βββ transformer_tts
βΒ Β βΒ Β βββ voice_cloning
βΒ Β βΒ Β βΒ Β βββ tacotron2_ge2e
βΒ Β βΒ Β βββ waveflow
βΒ Β βββ frontend
βΒ Β βΒ Β βββ normalizer
βΒ Β βΒ Β βββ zh_normalization
βΒ Β βββ models
βΒ Β βΒ Β βββ fastspeech2
βΒ Β βΒ Β βββ hifigan
βΒ Β βΒ Β βββ melgan
βΒ Β βΒ Β βββ new_tacotron2
βΒ Β βΒ Β βββ parallel_wavegan
βΒ Β βΒ Β βββ speedyspeech
βΒ Β βΒ Β βββ transformer_tts
βΒ Β βββ modules
βΒ Β βΒ Β βββ conformer
βΒ Β βΒ Β βββ predictor
βΒ Β βΒ Β βββ tacotron2
βΒ Β βΒ Β βββ transformer
βΒ Β βββ training
βΒ Β βΒ Β βββ extensions
βΒ Β βΒ Β βββ triggers
βΒ Β βΒ Β βββ updaters
βΒ Β βββ utils
βββ text
βΒ Β βββ exps
βΒ Β βΒ Β βββ ernie_linear
βΒ Β βββ models
βΒ Β βββ ernie_crf
βΒ Β βββ ernie_linear
βββ vector
βββ exps
βΒ Β βββ ge2e
βββ models
97 directories
speech task as follows:
paddlespeech
βββ cls (audio classfication/detection, emotion/gender/age recognition and so on)
βββ s2t (speech to text task, e.g ASR, ST)
βββ t2s (text to speech, e.g TTS, Voice Cloning, Voice Conversion, Music, Sing Voice Synthesis)
βββ text (speech related text task, e.g. punctuation restoration, text corrector, model based text front-end)
βββ vector (speech task which need extracting vector feature, e.g Speaker Verification/Identification, Language Identification, Speaker Dirazation)
speech toolbox as follows:
paddlespeech
βββ cli (CLI toolbox for multi speech task)
βββ server (Server/Client for speech task)
exps
contains bins for model, models
contains implementation and specific process.
Each speech task has these directories.
βββ t2s
β βββ exps
β β βββ fastspeech2
β β βββ gan_vocoder
β β β βββ hifigan
β β β βββ multi_band_melgan
β β β βββ parallelwave_gan
β β β βββ style_melgan
β β βββ new_tacotron2
β β βββ speedyspeech
β β βββ tacotron2
β β βββ transformer_tts
β β βββ voice_cloning
β β β βββ tacotron2_ge2e
β β βββ waveflow
β βββ models
β β βββ fastspeech2
β β βββ hifigan
β β βββ melgan
β β βββ new_tacotron2
β β βββ parallel_wavegan
β β βββ speedyspeech
β β βββ transformer_tts
All model implementation and updater core process are put in models
, like bellow:
paddlespeech/t2s/models/
βββ __init__.py
βββ fastspeech2
β βββ __init__.py
β βββ fastspeech2.py
β βββ fastspeech2_updater.py
βββ hifigan
β βββ __init__.py
β βββ hifigan.py
β βββ hifigan_updater.py
βββ melgan
β βββ __init__.py
β βββ melgan.py
β βββ multi_band_melgan_updater.py
β βββ style_melgan.py
β βββ style_melgan_updater.py
βββ new_tacotron2
β βββ __init__.py
β βββ tacotron2.py
β βββ tacotron2_updater.py
βββ parallel_wavegan
β βββ __init__.py
β βββ parallel_wavegan.py
β βββ parallel_wavegan_updater.py
βββ speedyspeech
β βββ __init__.py
β βββ speedyspeech.py
β βββ speedyspeech_updater.py
βββ tacotron2.py
βββ transformer_tts
β βββ __init__.py
β βββ transformer_tts.py
β βββ transformer_tts_updater.py
βββ waveflow.py
7 directories, 26 files
fastspeech2.py
is the model implementation, fastspeech2_updater.py
is for train and valid core process.
We put train/valid/test and other process in exps
directory, like bellow:
paddlespeech/t2s/exps/
βββ __init__.py
βββ fastspeech2
β βββ __init__.py
β βββ gen_gta_mel.py
β βββ normalize.py
β βββ preprocess.py
β βββ train.py
β βββ voice_cloning.py
paddlespeech/s2t/exps/
βββ __init__.py
βββ deepspeech2
β βββ __init__.py
β βββ bin
β β βββ __init__.py
β β βββ deploy
β β β βββ __init__.py
β β β βββ client.py
β β β βββ record.py
β β β βββ runtime.py
β β β βββ send.py
β β β βββ server.py
β β βββ export.py
β β βββ test.py
β β βββ test_export.py
β β βββ test_wav.py
β β βββ train.py
β βββ model.py
βββ lm
β βββ transformer
β βββ __init__.py
β βββ bin
β β βββ __init__.py
β β βββ cacu_perplexity.py
β βββ lm_cacu_perplexity.py
βββ u2
β βββ __init__.py
β βββ bin
β β βββ __init__.py
β β βββ alignment.py
β β βββ export.py
β β βββ test.py
β β βββ test_wav.py
β β βββ train.py
β βββ model.py
β βββ trainer.py