# contentvec **Repository Path**: ruby11dog/contentvec ## Basic Information - **Project Name**: contentvec - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-16 - **Last Updated**: 2024-01-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers This repository provides the official PyTorch implementation of [ContentVec](https://arxiv.org/abs/2204.09224). This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper. [![ContentVec](./assets/cover.png)](https://youtu.be/aiGp1g-dCY4) ## Cite this paper https://proceedings.mlr.press/v162/qian22b.html ## Pre-trained models The legacy model only contains the representation module, which may be loaded using plain fairseq installation without setting up this code repo. |Model | Classes | | |---|---|---| |ContentVec_legacy | 100 | [download](https://ibm.box.com/s/t76fff0dciyjqt1db03y48323qp99bg9) |ContentVec | 100 | [download](https://ibm.box.com/s/oxly542k5v3bhkfw6g8esatxziarymam) |ContentVec_legacy | 500 | [download](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) |ContentVec | 500 | [download](https://ibm.box.com/s/nv35hsry0v2y595etzysgnn2amsxxb0u) ## Load a model ``` ckpt_path = "/path/to/the/checkpoint_best_legacy.pt" models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path]) model = models[0] ``` For detailed feature extraction steps, please refer to [Hubert](https://github.com/facebookresearch/fairseq/blob/main/examples/hubert/simple_kmeans/dump_hubert_feature.py). ## Train a new model ### Data preparation Download the [zip file](https://ibm.box.com/s/zeyr94mkfs2g896oug31ml0gxv5ny43y) consisting of the following files: - `{train,valid}.tsv` waveform list files in metadata - `{train,valid}.km` frame-aligned pseudo label files in labels - `dict.km.txt` a dummy dictionary in labels - `spk2info.dict` a dictionary mapping from speaker id to speaker embedding in metadata Modify the root directory in the `{train,valid}.tsv` waveform list files ### Setup code repo Follow steps in `setup.sh` to setup the code repo ### Pretrain ContentVec Use `run_pretrain_single.sh` to run on a single node Use `run_pretrain_multi.sh` and the corresponding slurm template to run on multiple GPUs and nodes