# bark.cpp **Repository Path**: ShamerZhao/bark.cpp ## Basic Information - **Project Name**: bark.cpp - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: add_readme_stats - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-09-20 - **Last Updated**: 2024-09-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # bark.cpp ![bark.cpp](./assets/banner.png) [![Actions Status](https://github.com/PABannier/bark.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/PABannier/bark.cpp/actions) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml) Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++. ## Description With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community. - [x] Plain C/C++ implementation without dependencies - [x] AVX, AVX2 and AVX512 for x86 architectures - [x] CPU and GPU compatible backends - [x] Mixed F16 / F32 precision - [x] 4-bit, 5-bit and 8-bit integer quantization - [x] Metal and CUDA backends **Models supported** - [x] [Bark Small](https://huggingface.co/suno/bark-small) - [x] [Bark Large](https://huggingface.co/suno/bark) **Models we want to implement! Please open a PR :)** - [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62)) - [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82)) - [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135)) Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing) ([#95](https://github.com/PABannier/bark.cpp/issues/95)) --- Here is a typical run using `bark.cpp`: ```java ./main -p "This is an audio generated by bark.cpp" __ __ / /_ ____ ______/ /__ _________ ____ / __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \ / /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ / /_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/ /_/ /_/ bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp' bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222 Generating semantic tokens: 17% bark_print_statistics: sample time = 10.98 ms / 138 tokens bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token bark_print_statistics: total time = 633.54 ms Generating coarse tokens: 100% bark_print_statistics: sample time = 3.75 ms / 410 tokens bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token bark_print_statistics: total time = 3274.00 ms Generating fine tokens: 100% bark_print_statistics: sample time = 38.82 ms / 6144 tokens bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token bark_print_statistics: total time = 4772.92 ms write_wav_on_disk: Number of frames written = 65600. main: load time = 324.14 ms main: eval time = 8806.57 ms main: total time = 9131.68 ms ``` Here is a video of Bark running on the iPhone: https://github.com/PABannier/bark.cpp/assets/12958149/bc807c0b-adfa-4c47-a05b-a2d8ba157dd8 ## Usage Here are the steps to use Bark.cpp ### Get the code ```bash git clone --recursive https://github.com/PABannier/bark.cpp.git cd bark.cpp git submodule update --init --recursive ``` ### Build In order to build bark.cpp you must use `CMake`: ```bash mkdir build cd build cmake .. cmake --build . --config Release ``` ### Prepare data & Run ```bash # Install Python dependencies python3 -m pip install -r requirements.txt # Download the Bark checkpoints and vocabulary python3 download_weights.py --out-dir ./models --models bark-small bark # Convert the model to ggml format python3 convert.py --dir-model ./models/bark-small --use-f16 # run the inference ./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4 ``` ### (Optional) Quantize weights Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`. Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models. ```bash ./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0 ``` ### Seminal papers - Bark - [Text Prompted Generative Audio](https://github.com/suno-ai/bark) - Encodec - [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) - GPT-3 - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) ### Statistics Here are some statistics on a MacBook Pro M1. They were obtained by generating the prompt *"this is the last random sentence I will be writing and I am going to stop mid-sent"* **Bark-small** |ms/token | Semantic | Coarse | Fine | |---------------|-------------------|-----------------|---------------| | Python | | | | | F16 | 4.77 | 13.48 | 0.49 | | Q8_0 | 4.47 | 14.80 | 0.90 | | Q5_0 | 3.79 | 13.41 | 0.92 | | Q5_1 | 3.76 | 12.71 | 0.95 | | Q4_0 | 3.73 | 13.47 | 0.88 | | Q4_1 | 3.54 | 11.90 | 0.84 | **Bark** |ms/token | Semantic | Coarse | Fine | |---------------|-------------------|-----------------|---------------| | Python | | | | | F16 | | | | | Q8_0 | 12.29 | 42.99 | 2.61 | | Q5_0 | 12.84 | 44.71 | 2.75 | | Q5_1 | 13.29 | 48.87 | 2.75 | | Q4_0 | 14.53 | 51.33 | 2.68 | | Q4_1 | 12.75 | 50.96 | 2.80 | ### Contributing `bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be - bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section. - feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions. - pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you. ### Coding guidelines - Avoid adding third-party dependencies, extra files, extra headers, etc. - Always consider cross-compatibility with other operating systems and architectures