# wenet复现创新工作

**Repository Path**: vickyuuun/wenet

## Basic Information

- **Project Name**: wenet复现创新工作
- **Description**: 南开大学2025年语音信息处理技术大作业-----wenet代码复现+创新
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-16
- **Last Updated**: 2025-06-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Performance Record

## Conformer Result

* Feature info: using fbank feature, dither, cmvn, online speed perturb
* Training info: lr 0.002, batch size 18, 4 gpu, acc_grad 4, 240 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 20
* Git hash: 919f07c4887ac500168ba84b39b535fd8e58918a

| decoding mode             | CER   |
|---------------------------|-------|
| attention decoder         | 5.18  |
| ctc greedy search         | 4.94  |
| ctc prefix beam search    | 4.94  |
| attention rescoring       | 4.61  |
| LM + attention rescoring  | 4.36  |

## U2++ Conformer Result

* Feature info: using fbank feature, dither=1.0, cmvn, oneline speed perturb
* Training info: lr 0.001, batch size 16, 8 gpu, acc_grad 1, 360 epochs
* Decoding info: ctc_weight 0.3, reverse_weight 0.5  average_num 30, lm_scale 0.7, decoder_scale 0.1, r_decoder_scale 0.7
* Git hash: 5a1342312668e7a5abb83aed1e53256819cebf95

| decoding mode/chunk size  | full  | 16    |
|---------------------------|-------|-------|
| ctc greedy search         | 5.19  | 5.81  |
| ctc prefix beam search    | 5.17  | 5.81  |
| attention rescoring       | 4.63  | 5.05  |
| LM + attention rescoring  | 4.40  | 4.75  |
| HLG(k2 LM)                | 4.81  | 5.27  |
| HLG(k2 LM)  + attention rescoring | 4.32  | 4.70  |
| HLG(k2 LM)  + attention rescoring + LFMMI | 4.11  | 4.47  |

## U2++ lite Conformer Result (uio shard)

* Feature info: using fbank feature, dither=1.0, cmvn, oneline speed perturb
* Training info: lr 0.001, batch size 16, 8 gpu, acc_grad 1, load a well trained model and continue training 80 epochs with u2++ lite config
* Decoding info: ctc_weight 0.3, reverse_weight 0.5  average_num 30
* Git hash: 73185808fa1463b0163a922dc722513b7baabe9e

| decoding mode/chunk size  | full  | 16    |
|---------------------------|-------|-------|
| ctc greedy search         | 5.21  | 5.91  |
| ctc prefix beam search    | 5.20  | 5.91  |
| attention rescoring       | 4.67  | 5.10  |

## Unified Conformer Result

* Feature info: using fbank feature, dither=0, cmvn, oneline speed perturb
* Training info: lr 0.001, batch size 16, 8 gpu, acc_grad 1, 180 epochs, dither 0.0
* Decoding info: ctc_weight 0.5, average_num 20
* Git hash: 919f07c4887ac500168ba84b39b535fd8e58918a

| decoding mode/chunk size  | full  | 16    | 8     | 4     |
|---------------------------|-------|-------|-------|-------|
| attention decoder         | 5.40  | 5.60  | 5.74  | 5.86  |
| ctc greedy search         | 5.56  | 6.29  | 6.68  | 7.10  |
| ctc prefix beam search    | 5.57  | 6.30  | 6.67  | 7.10  |
| attention rescoring       | 5.05  | 5.45  | 5.69  | 5.91  |
| LM + attention rescoring  | 4.73  | 5.08  | 5.22  | 5.38  |

## U2++ Transformer Result

* Feature info: using fbank feature, dither, cmvn, online speed perturb.
* Training info: lr 0.001, batch size 26, 8 gpu, acc_grad 1, 360 epochs, dither 0.1
* Decoding info: ctc_weight 0.2, reverse_weight 0.5, average_num 30
* Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0

| decoding mode/chunk size  | full  | 16    |
|---------------------------|-------|-------|
| ctc greedy search         | 6.05  | 6.92  |
| ctc prefix beam search    | 6.05  | 6.90  |
| attention rescoring       | 5.11  | 5.63  |
| LM + attention rescoring  | 4.82  | 5.24  |

## Transformer Result

* Feature info: using fbank feature, dither, with cmvn, online speed perturb.
* Training info: lr 0.002, batch size 26, 4 gpu, acc_grad 4, 240 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 20
* Git hash: 919f07c4887ac500168ba84b39b535fd8e58918a

| decoding mode             | CER   |
|---------------------------|-------|
| attention decoder         | 5.69  |
| ctc greedy search         | 5.92  |
| ctc prefix beam search    | 5.91  |
| attention rescoring       | 5.30  |
| LM + attention rescoring  | 5.04  |

## Unified Transformer Result

* Feature info: using fbank feature, dither=0, with cmvn, online speed perturb.
* Training info: lr 0.002, batch size 16, 4 gpu, acc_grad 1, 240 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 20
* Git hash: 919f07c4887ac500168ba84b39b535fd8e58918a

| decoding mode/chunk size  | full  | 16    | 8     | 4     |
|---------------------------|-------|-------|-------|-------|
| attention decoder         | 6.04  | 6.35  | 6.45  | 6.70  |
| ctc greedy search         | 6.28  | 6.99  | 7.39  | 7.89  |
| ctc prefix beam search    | 6.28  | 6.98  | 7.40  | 7.89  |
| attention rescoring       | 5.52  | 6.05  | 6.28  | 6.62  |
| LM + attention rescoring  | 5.11  | 5.59  | 5.86  | 6.17  |

## AMP Training Transformer Result

* Feature info: using fbank feature, dither, cmvn, online speed perturb
* Training info: lr 0.002, batch size, 4 gpus, acc_grad 4, 240 epochs, dither 0.1, warm up steps 25000
* Decoding info: ctc_weight 0.5, average_num 20
* Git hash: 1bb4e5a269c535340fae5b0739482fa47733d2c1

| decoding mode          | CER  |
|------------------------|------|
| attention decoder      | 5.73 |
| ctc greedy search      | 5.92 |
| ctc prefix beam search | 5.92 |
| attention rescoring    | 5.31 |


## Muilti-machines Training Conformer Result

* Feature info: using fbank feature, dither, cmvn, online speed perturb
* Training info: lr 0.004, batch size 16, 2 machines, 8\*2=16 gpus, acc_grad 4, 240 epochs, dither 0.1, warm up steps 10000
* Decoding info: ctc_weight 0.5, average_num 20
* Git hash: f6b1409023440da1998d31abbcc3826dd40aaf35

| decoding mode          | CER  |
|------------------------|------|
| attention decoder      | 4.90 |
| ctc greedy search      | 5.07 |
| ctc prefix beam search | 5.06 |
| attention rescoring    | 4.65 |


## Conformer with/without Position Encoding Result

* Feature info: using fbank feature, dither, cmvn, online speed perturb
* Training info: lr 0.002, batch size 16, 8 gpu, acc_grad 4, 240 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 20

| decoding mode          | with PE | without PE |
|------------------------|---------|------------|
| attention decoder      | 5.18    | 5.73       |
| ctc greedy search      | 4.94    | 4.97       |
| ctc prefix beam search | 4.94    | 4.97       |
| attention rescoring    | 4.61    | 4.69       |


## Efficient Conformer v1 Result

* Feature info:
    * using fbank feature, cmvn, speed perturb, dither
* Training info:
    * train_u2++_efficonformer_v1.yaml
    * 8 gpu, batch size 16, acc_grad 1, 200 epochs
    * lr 0.001, warmup_steps 25000
* Model info:
    * Model Params: 48,488,347
    * Downsample rate: 1/4 (conv2d) * 1/2 (efficonformer block)
    * encoder_dim 256, output_size 256, head 8, linear_units 2048
    * num_blocks 12, cnn_module_kernel 15, group_size 3
* Decoding info:
    * ctc_weight 0.5, reverse_weight 0.3, average_num 20
* Model Download: [wenet_efficient_conformer_aishell_v1](https://huggingface.co/58AILab/wenet_efficient_conformer_aishell_v1)

| decoding mode          | full | 18   | 16   |
|------------------------|------|------|------|
| attention decoder      | 4.99 | 5.13 | 5.16 |
| ctc prefix beam search | 4.98 | 5.23 | 5.23 |
| attention rescoring    | 4.64 | 4.86 | 4.85 |


## Efficient Conformer v2 Result

* Feature info:
    * using fbank feature, cmvn, speed perturb, dither
* Training info:
    * train_u2++_efficonformer_v2.yaml
    * 8 gpu, batch size 16, acc_grad 1, 200 epochs
    * lr 0.001, warmup_steps 25000
* Model info:
    * Model Params: 49,354,651
    * Downsample rate: 1/2 (conv2d2) * 1/4 (efficonformer block)
    * encoder_dim 256, output_size 256, head 8, linear_units 2048
    * num_blocks 12, cnn_module_kernel 15, group_size 3
* Decoding info:
    * ctc_weight 0.5, reverse_weight 0.3, average_num 20
* Model Download: [wenet_efficient_conformer_aishell_v2](https://huggingface.co/58AILab/wenet_efficient_conformer_aishell_v2)

| decoding mode          | full | 18   | 16   |
|------------------------|------|------|------|
| attention decoder      | 4.87 | 5.03 | 5.07 |
| ctc prefix beam search | 4.97 | 5.18 | 5.20 |
| attention rescoring    | 4.56 | 4.75 | 4.77 |


## U2++ Branchformer Result

* Feature info: using fbank feature, dither=1.0, cmvn, oneline speed perturb
* * Model info:
    * Model Params: 48,384,667
    * Num Encoder Layer: 24
    * CNN Kernel Size: 63
    * Merge Method: concat
* Training info: lr 0.001, weight_decay: 0.000001, batch size 16, 3 gpu, acc_grad 1, 360 epochs
* Decoding info: ctc_weight 0.3, reverse_weight 0.5  average_num 30, lm_scale 0.7, decoder_scale 0.1, r_decoder_scale 0.7
* Git hash: 5a1342312668e7a5abb83aed1e53256819cebf95

| decoding mode             | CER   |
|---------------------------|-------|
| ctc greedy search         | 5.28  |
| ctc prefix beam search    | 5.28  |
| attention decoder         | 5.12  |
| attention rescoring       | 4.81  |
| LM + attention rescoring  | 4.46  |

## E-Branchformer Result

* Feature info: using fbank feature, dither=1.0, cmvn, online speed perturb
* * Model info:
    * Model Params: 47,570,132
    * Num Encoder Layer: 17
    * CNN Kernel Size: 31
* Training info: lr 0.001, weight_decay: 0.000001, batch size 16, 4 gpu, acc_grad 1, 240 epochs
* Decoding info: ctc_weight 0.3, average_num 30
* Git hash: 89962d1dcae18dd3a281782a40e74dd2721ae8fe

| decoding mode          | CER  |
| ---------------------- | ---- |
| attention decoder      | 4.73 |
| ctc greedy search      | 4.77 |
| ctc prefix beam search | 4.77 |
| attention rescoring    | 4.39 |
| LM + attention rescoring  | 4.22  |