# bark.cpp

**Repository Path**: ShamerZhao/bark.cpp

## Basic Information

- **Project Name**: bark.cpp
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: add_readme_stats
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-09-20
- **Last Updated**: 2024-09-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# bark.cpp

![bark.cpp](./assets/banner.png)

[![Actions Status](https://github.com/PABannier/bark.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/PABannier/bark.cpp/actions)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)

[Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml)

Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++.

## Description

With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community.

- [x] Plain C/C++ implementation without dependencies
- [x] AVX, AVX2 and AVX512 for x86 architectures
- [x] CPU and GPU compatible backends
- [x] Mixed F16 / F32 precision
- [x] 4-bit, 5-bit and 8-bit integer quantization
- [x] Metal and CUDA backends

**Models supported**

- [x] [Bark Small](https://huggingface.co/suno/bark-small)
- [x] [Bark Large](https://huggingface.co/suno/bark)

**Models we want to implement! Please open a PR :)**

- [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62))
- [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82))
- [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135))

Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing) ([#95](https://github.com/PABannier/bark.cpp/issues/95))

---

Here is a typical run using `bark.cpp`:

```java
./main -p "This is an audio generated by bark.cpp"

   __               __
   / /_  ____ ______/ /__        _________  ____
  / __ \/ __ `/ ___/ //_/       / ___/ __ \/ __ \
 / /_/ / /_/ / /  / ,<    _    / /__/ /_/ / /_/ /
/_.___/\__,_/_/  /_/|_|  (_)   \___/ .___/ .___/
                                  /_/   /_/

bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222

Generating semantic tokens: 17%

bark_print_statistics:   sample time =    10.98 ms / 138 tokens
bark_print_statistics:  predict time =   614.96 ms / 4.46 ms per token
bark_print_statistics:    total time =   633.54 ms

Generating coarse tokens: 100%

bark_print_statistics:   sample time =     3.75 ms / 410 tokens
bark_print_statistics:  predict time =  3263.17 ms / 7.96 ms per token
bark_print_statistics:    total time =  3274.00 ms

Generating fine tokens: 100%

bark_print_statistics:   sample time =    38.82 ms / 6144 tokens
bark_print_statistics:  predict time =  4729.86 ms / 0.77 ms per token
bark_print_statistics:    total time =  4772.92 ms

write_wav_on_disk: Number of frames written = 65600.

main:     load time =   324.14 ms
main:     eval time =  8806.57 ms
main:    total time =  9131.68 ms
```

Here is a video of Bark running on the iPhone:

https://github.com/PABannier/bark.cpp/assets/12958149/bc807c0b-adfa-4c47-a05b-a2d8ba157dd8


## Usage

Here are the steps to use Bark.cpp

### Get the code

```bash
git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive
```

### Build

In order to build bark.cpp you must use `CMake`:

```bash
mkdir build
cd build
cmake ..
cmake --build . --config Release
```

### Prepare data & Run

```bash
# Install Python dependencies
python3 -m pip install -r requirements.txt

# Download the Bark checkpoints and vocabulary
python3 download_weights.py --out-dir ./models --models bark-small bark

# Convert the model to ggml format
python3 convert.py --dir-model ./models/bark-small --use-f16

# run the inference
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4
```

### (Optional) Quantize weights

Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`.

Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.

```bash
./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0
```

### Seminal papers

- Bark
  - [Text Prompted Generative Audio](https://github.com/suno-ai/bark)
- Encodec
  - [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438)
- GPT-3
  - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)

### Statistics

Here are some statistics on a MacBook Pro M1. They were obtained by generating the prompt *"this is the last random sentence I will be writing and I am going to stop mid-sent"*

**Bark-small**

|ms/token       | Semantic          | Coarse          | Fine          |
|---------------|-------------------|-----------------|---------------|
| Python        |                   |                 |               |
| F16           |              4.77 |           13.48 |          0.49 |
| Q8_0          |              4.47 |           14.80 |          0.90 |
| Q5_0          |              3.79 |           13.41 |          0.92 |
| Q5_1          |              3.76 |           12.71 |          0.95 |
| Q4_0          |              3.73 |           13.47 |          0.88 |
| Q4_1          |              3.54 |           11.90 |          0.84 |

**Bark**

|ms/token       | Semantic          | Coarse          | Fine          |
|---------------|-------------------|-----------------|---------------|
| Python        |                   |                 |               |
| F16           |                   |                 |               |
| Q8_0          |             12.29 |           42.99 |          2.61 |
| Q5_0          |             12.84 |           44.71 |          2.75 |
| Q5_1          |             13.29 |           48.87 |          2.75 |
| Q4_0          |             14.53 |           51.33 |          2.68 |
| Q4_1          |             12.75 |           50.96 |          2.80 |


### Contributing

`bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

- bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section.
- feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.

### Coding guidelines

- Avoid adding third-party dependencies, extra files, extra headers, etc.
- Always consider cross-compatibility with other operating systems and architectures