# voice_benchmark **Repository Path**: taj5/voice_benchmark ## Basic Information - **Project Name**: voice_benchmark - **Description**: 语音基准测试,包含指标测试脚本(句错率,词错率),统计语音时长脚本,语音格式转换脚本等 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-11-04 - **Last Updated**: 2024-11-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # voice_benchmark #### 介绍 离线语音转写,*输入一段语音文件,输出转写的文本*基准测试,包含指标测试脚本(句错率,词错率),统计语音时长脚本,语音格式转换脚本等. 详细参考[飞书文档](https://x9k4wliip8.feishu.cn/docx/LkLwd5QuqoETAdxPXZocVCHinb4?from=from_copylink) #### 目录结构 ```bash voice_benchmark ├── data # 数据 ├── models # 模型 ├── docs # 文档 ├── README.md # 说明文档 └── src ├── benchmark.py # 评测 └── utils # 工具代码 ├── audio_cvt.py # 语音格式转换 └── audio_stat.py # 语音统计信息 ``` #### 数据集 data目录下包含了不同用途的语音数据集,包括: - test.json # 测试集 - train.json # 训练集 - val.json # 验证集 样本格式: field | description | options | example ---|---|---|--- audio_path | 语音文件路径 | string | audio/000002.wav text | 语音文本标签 | string | 打开3306摄像头 tags | 标签,语料内容类型 | list | ["number", "chinese"] duration | 语音时长,秒 | float | 5.23 language | 语言 | string | sichuanhua from | 语料来源 | string | human or ai created_at | 样本创建时间 | string | 2024-11-04 ```bash [ { "audio_path": "audio/000002.wav", "text": "打开3306A摄像头", "tags": ["chinese", "number", "alphabet"], "duration": 2.16, "language": "sichuanhua", "from": "human", "created_at": "2024-11-04" }, ] ``` #### 使用说明 - 客户端/服务端 - 启动语音服务端(funasr/iflytek/kaldi/) - 启动语音测试客户端 - 本地调用(需要本地有强大GPU支持) - python src/benchmark.py #### 测试结果 #### 评测指标 1. [字错率(Character Error Rate, CER)](https://zhuanlan.zhihu.com/p/114414797)常用于中文评价指标。 2. 词错率(Word Error Rate, WER)常用于英文语音识别评价指标。 3. 句错率(Sentence Error Rate, SER) #### 文本正则化/逆正则化 - pip install funtextprocessing -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html - pip install WeTextProcessing #### 模型说明 1. 短音频版本-Paraformer语音识别-中文-通用-16k-离线-large-pytorch - iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch - 2. 长音频版本-vad-punc-asr - iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch 3. 短音频热词版本-contextual - iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404 4. 短音频热词版本-seaco - iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch ```python name_maps_ms = { "paraformer": "iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", "paraformer-zh": "iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", "paraformer-en": "iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020", "paraformer-en-spk": "iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020", "paraformer-zh-streaming": "iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online", "fsmn-vad": "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", "ct-punc": "iic/punc_ct-transformer_cn-en-common-vocab471067-large", "ct-punc-c": "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", "fa-zh": "iic/speech_timestamp_prediction-v1-16k-offline", "cam++": "iic/speech_campplus_sv_zh-cn_16k-common", "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual", "Whisper-large-v3": "iic/Whisper-large-v3", "Qwen-Audio": "Qwen/Qwen-Audio", "emotion2vec_plus_large": "iic/emotion2vec_plus_large", "emotion2vec_plus_base": "iic/emotion2vec_plus_base", "emotion2vec_plus_seed": "iic/emotion2vec_plus_seed", "Whisper-large-v3-turbo": "iic/Whisper-large-v3-turbo", } ```