# detail_tts

**Repository Path**: ruby11dog/detail_tts

## Basic Information

- **Project Name**: detail_tts
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-06-27
- **Last Updated**: 2024-07-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Detail TTS

The model newly proposed three significant important methods to become the best practice of AR TTS.

- Although RVQ is used, the actual training employs continuous features, I call it fake discretization.
- All in one model. The model contains gpt, diffusion, vqvae, gan and flowvae all in one. One train one inference.
- Both prefixed spk emb and prompt are used to get benefit from both Valle type inference and Tortoise type training.

## Inference

check `api.py`

## Dataset prepare

Change the path contains audios in script and run

```
python prepare/0_vad_asr_save_to_jsonl.py
```

## Train and Fine Tune

```
accelerate launch train.py
```

For fine tuning, change the pretrain model load path.

## Acknowledgements

VQ and VITS from [GSV](https://github.com/RVC-Boss/GPT-SoVITS)

Diffusion and GPT from [tortoise](https://github.com/neonbjb/tortoise-tts)