modelscope/FunASR
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
🐛 Wheel packaging (fixes #2943)
- FunASR now publishes a `py3-none-any` wheel alongside the source distribution. Installation is faster since pip no longer needs to build from source.
🐛 Bug fixes
- SenseVoice + speaker diarization: Fixed crash when using `spk_model="cam++"` with SenseVoice (auto-falls back to VAD-segment mode since SenseVoice doesn't produce word-level timestamps)
- torchaudio >= 2.11 compatibility: Added `soundfile` as intermediate fallback for users with newer torchaudio versions that removed legacy backends
📦 Install / Upgrade
- ```bash
- pip install --upgrade funasr
- ```
- Full changelog: https://github.com/modelscope/FunASR/compare/v1.3.3...v1.3.9
📦 Highlights
- This release makes FunASR a drop-in speech backend for AI agents.
✨ New: `funasr-server` CLI
- ```bash
- pip install funasr fastapi uvicorn python-multipart
- funasr-server --device cuda
- ```
- One command starts an OpenAI-compatible `/v1/audio/transcriptions` endpoint.
✨ New: MCP Server
- AI assistants (Claude, Cursor, Windsurf) can now transcribe audio directly.
✨ New: OpenAI-Compatible API
- Works with any agent framework: LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI.
- ```python
- from openai import OpenAI
- client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
- result = client.audio.transcriptions.create(model="sensevoice", file=open("a.wav","rb"))
- ```
🐛 Bug Fixes
- Fixed `hub="hf"` parameter propagation to sub-models (v1.3.2)
- Fixed Qwen3-ASR ImportError masking
📦 Upgrade
- ```bash
- pip install --upgrade funasr
- ```
📦 Links
- [Agent Integration Guide](https://modelscope.github.io/FunASR/agent.html)
- [OpenAI API Docs](https://github.com/modelscope/FunASR/tree/main/examples/openai_api)
- [MCP Server Docs](https://github.com/modelscope/FunASR/tree/main/examples/mcp_server)
- [Benchmark](https://modelscope.github.io/FunASR/benchmark.html)
🐛 Bug Fix
- Fixed hub parameter propagation — When using `hub="hf"`, the parameter is now correctly forwarded to VAD/PUNC/SPK sub-models. Previously, users on HuggingFace would get 404 errors for sub-models. (#2859)
📦 Improvements
- Updated PyPI metadata with better description, keywords, and project URLs
- Added comprehensive benchmark page: https://modelscope.github.io/FunASR/benchmark.html
📦 Benchmark Results (PyTorch, GPU)
- | Model | Type | Speed |
- |-------|------|-------|
- | SenseVoice-Small | NAR | 170x realtime |
- | Paraformer-Large | NAR | 120x realtime |
- | Whisper-large-v3-turbo | AR | 46x realtime |
- | Fun-ASR-Nano | LLM | 17x realtime |
- | Whisper-large-v3 | AR | 13.4x realtime |
📦 Install / Upgrade
- ```bash
- pip install --upgrade funasr
- ```
📦 Quick Start
- ```python
- from funasr import AutoModel
- model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
- result = model.generate(input="audio.wav")
- ```
📦 2023.3.17, funasr-0.3.0, modelscope-1.4.1
- New Features:
- Added support for GPU runtime solution, [nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu), which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
- Added support for CPU runtime [quantization solution](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export), which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted [benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python) tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
- Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc).
- Added streaming inference pipeline to the [16k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), [8k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary), with support for audio input streams (>= 10ms) , [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/236).
- Improved the [punctuation prediction model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), resulting in increased accuracy (F-score increased from 55.6 to 56.5).
- Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. [Paraformer streaming](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) model is used to output text in real time, while [Paraformer-large offline model](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) is used to correct recognition results, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc).
- New Models:
- + 8 more
📦 最新更新:
- 2023年3月17日:[funasr-0.3.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.4.1
- 功能完善:
- 新增GPU runtime方案,[nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu),可以将modelscope中Paraformer模型便捷导出,并部署成triton服务,实测,单GPU-V100,RTF为0.0032,吞吐率为300,[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu#performance-benchmark)。
- 新增CPU [runtime量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),支持从modelscope导出量化版本onnx与libtorch,实测,CPU-8369B,量化后,RTF提升50%(0.00438->0.00226),吞吐率翻倍(228->442),[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python)。
- [新增加C++版本grpc服务部署方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc),配合C++版本[onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/onnxruntime),以及[量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),相比python-runtime性能翻倍。
- [16k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary),[8k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary),modelscope pipeline,新增加流式推理方式,,最小支持10ms语音输入流,[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/236)。
- 优化[标点预测模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),主观体验标点准确性提升(fscore绝对提升 55.6->56.5)。
- 基于grpc服务,新增实时字幕[demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc),采用2pass识别模型,[Paraformer流式模型](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) 用来上屏,[Paraformer-large离线模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)用来纠正识别结果。
- + 9 more
✨ New Contributors
- @dingbig made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/147
- @yuekaizhang made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/161
- @zhuzizyf made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/180
- @znsoftm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/185
- @songtaoshi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/227
- Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.2.0...v0.3.0
📦 2023.2.17, funasr-0.2.0, modelscope-1.3.0
- We support a new feature, export paraformer models into [onnx and torchscripts](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export) from modelscope. The local finetuned models are also supported.
- We support a new feature, [onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python), you could deploy the runtime without modelscope or funasr, for the [paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, [details](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/onnxruntime/paraformer/rapid_paraformer#speed).
- We support a new feature, [grpc](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc), you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
- We release a new model [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary), which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
- We optimize the timestamp alignment of [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, [details](https://arxiv.org/abs/2301.12343).
- We release a new model, [8k VAD model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [modelscope](https://github.com/alibaba-damo-academy/FunASR/discussions/134).
- We release a new model, [MFCCA](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary), a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
- We release several new UniASR model: [Southern Fujian Dialect model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825/summary), [French model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary), [German model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary), [Vietnamese model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary), [Persian model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary).
- + 4 more
📦 最新更新:
- 2023年2月(2月17号发布):[funasr-0.2.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.3.0
- 功能完善:
- 新增加模型导出功能,Modelscope中所有Paraformer模型与本地finetune模型,支持一键导出[onnx格式模型与torchscripts格式模型](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),用于模型部署。
- 新增加Paraformer模型[onnxruntime部署功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python),无须安装Modelscope与FunASR,即可部署,cpu实测,onnxruntime推理速度提升近3倍(rtf: 0.110->0.038)。
- 新增加[grpc服务功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc),支持对Modelscope推理pipeline进行服务部署,也支持对onnxruntime进行服务部署。
- 优化[Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)时间戳,对badcase时间戳预测准确率有较大幅度提升,平均首尾时间戳偏移74.7ms,[详见论文](https://arxiv.org/abs/2301.12343)。
- 新增加任意VAD模型、ASR模型与标点模型自由组合功能,可以自由组合Modelscope中任意模型以及本地finetune后的模型进行推理,[用法示例](https://github.com/alibaba-damo-academy/FunASR/discussions/134)。
- 优化[标点通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),增加标点召回和精度,修复缺少标点等问题。
- + 7 more
✨ New Contributors
- @zjc6666 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/35
- @lyblsgo made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/37
- @lingyunfly made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/42
- @fangd123 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/44
- @dyyzhmm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/48
- @R1ckShi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/50
- @chenmengzheAAA made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/57
- @ZhihaoDU made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/95
- + 4 more
📦 2023.1.16, funasr-0.1.6
- We release a new version model [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which integrate the [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) model, [ASR](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),
- [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) model and timestamp together. The model could take in several hours long inputs.
- We release a new type model, [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
- We release a new type model, [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
- We release a new model, [Data2vec](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary), an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
- We release a new model, [Paraformer-Tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary), a lightweight Paraformer model which supports Mandarin command words recognition.
- We release a new type model, [SV](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary), which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
- We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
- + 1 more
📦 最新更新
- 2023年1月(1月16号发布):[funasr-0.1.6](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.2.0
- 上线新模型:
- [Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),集成VAD、ASR、标点与时间戳功能,可直接对时长为数小时音频进行识别,并输出带标点文字与时间戳。
- [中文无监督预训练Data2vec模型](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary),采用Data2vec结构,基于AISHELL-2数据的中文无监督预训练模型,支持ASR或者下游任务微调模型。
- [16k语音端点检测VAD模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary),可用于检测长语音片段中有效语音的起止时间点。
- [中文标点预测通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),可用于语音识别模型输出文本的标点预测。
- [8K UniASR流式模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary),[8K UniASR模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/summary),一种流式与离线一体化语音识别模型,进行流式语音识别的同时,能够以较低延时输出离线识别结果来纠正预测文本。
- Paraformer-large基于[AISHELL-1微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell1-vocab8404-pytorch/summary)、[AISHELL-2微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch/summary),将Paraformer-large模型分别基于AISHELL-1与AISHELL-2数据微调。
- + 5 more
✨ New Contributors
- @nichongjia-2007 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/27
- Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.4...v0.1.6
The is the first release version. 1. Paraformer model could be decoding with batch >1. 2. UniASR model and recipes are new added. 3. Transformer and Conformer are also contained. 4. The inference and finetuning of models in modelscope are more convenience.