# FlowVocoder **Repository Path**: wj_gxy/FlowVocoder ## Basic Information - **Project Name**: FlowVocoder - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: final - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-05-12 - **Last Updated**: 2025-05-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow forSpeech Synthesis ## Setup 1. Clone this repo and install requirements ```command git clone https://github.com/tienmanhptit1312/FlowVocoder.git cd FlowVocoder pip install -r requirements.txt ``` 2. Install [Apex](https://github.com/NVIDIA/apex) for mixed-precision training: ## Train your model 1. Download [LJ Speech Data](https://keithito.com/LJ-Speech-Dataset/). Then, uncompress LJ-Speech dataset where you downloaded it. 2. Copy wave files from LJ-Speech directory to FlowVocoder directory. ``` cp -r [LJ-Speech dataset's directory]/wavs [FlowVocoder's directory] ``` 3. Make a list of the file names to use for training/testing. ```command ls wavs/*.wav | tail -n+1310 > train_files.txt ls wavs/*.wav | head -n1310 > test_files.txt ``` `-n1310` indicates that this example reserves the first 1310 audio clips for model testing. The remaining dataset is used for training. 4. Edit the configuration file and train the model. Below are the example commands using `flowvocoder.json` ```command python train.py -c configs/flowvocoder.json --tr ``` Single-node multi-GPU training is automatically enabled with [DataParallel] (instead of [DistributedDataParallel] for simplicity). For mixed precision training, set `"fp16_run": true` on the configuration file. You can load the trained weights from saved checkpoints by providing the path to `checkpoint_path` variable in the config file. `checkpoint_path` accepts either explicit path, or the parent directory if resuming from averaged weights over multiple checkpoints. It takes about a week to train this model with two V100 Nvidia GPUs with batch-size=2. You can download our pretrained model for about 1M training iterations: [link](https://drive.google.com/file/d/1K-NAXjh9DvBEiAXQHay5jC-oivMgX7RQ/view?usp=sharing) for reproducing purpose. ### Examples insert `checkpoint_path: "experiments/flowvocoder/flowvocoder_5000"` in the config file then run ```command python train.py -c configs/flowvocoder.json --tr ``` for loading averaged weights over 10 recent checkpoints, insert `checkpoint_path: "experiments/flowvocoder"` in the config file then run ```command python train.py -a 10 -c configs/flowvocoder.json ``` 5. Synthesize waveform from the trained model. insert `checkpoint_path` in the config file and use `--synthesize` to `train.py`. The model generates waveform by looping over `test_files.txt`. ```command python train.py --synthesize -c configs/flowvocoder.json ``` if `fp16_run: true`, the model uses FP16 (half-precision) arithmetic for faster performance (on GPUs equipped with Tensor Cores). ## Reference NVIDIA Tacotron2: https://github.com/NVIDIA/tacotron2 WaveFlow: https://github.com/L0SG/WaveFlow