modelscope/FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

7 Releases

Latest: yesterday

v1.3.9: Wheel packaging + SenseVoice speaker diarization fixv1.3.9Latest

LauraGPT·yesterday·May 29, 2026

GitHub

🐛 Wheel packaging (fixes #2943)

FunASR now publishes a `py3-none-any` wheel alongside the source distribution. Installation is faster since pip no longer needs to build from source.

🐛 Bug fixes

SenseVoice + speaker diarization: Fixed crash when using `spk_model="cam++"` with SenseVoice (auto-falls back to VAD-segment mode since SenseVoice doesn't produce word-level timestamps)
torchaudio >= 2.11 compatibility: Added `soundfile` as intermediate fallback for users with newer torchaudio versions that removed legacy backends

📦 Install / Upgrade

```bash
pip install --upgrade funasr
```
Full changelog: https://github.com/modelscope/FunASR/compare/v1.3.3...v1.3.9

v1.3.3: Agent Integration — OpenAI API + MCP Server + funasr-server CLIv1.3.3

LauraGPT·1w ago·May 23, 2026

GitHub

📦 Highlights

This release makes FunASR a drop-in speech backend for AI agents.

✨ New: `funasr-server` CLI

```bash
pip install funasr fastapi uvicorn python-multipart
funasr-server --device cuda
```
One command starts an OpenAI-compatible `/v1/audio/transcriptions` endpoint.

✨ New: MCP Server

AI assistants (Claude, Cursor, Windsurf) can now transcribe audio directly.

✨ New: OpenAI-Compatible API

Works with any agent framework: LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI.
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
result = client.audio.transcriptions.create(model="sensevoice", file=open("a.wav","rb"))
```

🐛 Bug Fixes

Fixed `hub="hf"` parameter propagation to sub-models (v1.3.2)
Fixed Qwen3-ASR ImportError masking

📦 Upgrade

```bash
pip install --upgrade funasr
```

📦 Links

[Agent Integration Guide](https://modelscope.github.io/FunASR/agent.html)
[OpenAI API Docs](https://github.com/modelscope/FunASR/tree/main/examples/openai_api)
[MCP Server Docs](https://github.com/modelscope/FunASR/tree/main/examples/mcp_server)
[Benchmark](https://modelscope.github.io/FunASR/benchmark.html)

v1.3.2: HuggingFace Hub Fix + Performance Benchmarkv1.3.2

LauraGPT·1w ago·May 23, 2026

GitHub

🐛 Bug Fix

Fixed hub parameter propagation — When using `hub="hf"`, the parameter is now correctly forwarded to VAD/PUNC/SPK sub-models. Previously, users on HuggingFace would get 404 errors for sub-models. (#2859)

📦 Improvements

Updated PyPI metadata with better description, keywords, and project URLs
Added comprehensive benchmark page: https://modelscope.github.io/FunASR/benchmark.html

📦 Benchmark Results (PyTorch, GPU)

| Model | Type | Speed |
|-------|------|-------|
| SenseVoice-Small | NAR | 170x realtime |
| Paraformer-Large | NAR | 120x realtime |
| Whisper-large-v3-turbo | AR | 46x realtime |
| Fun-ASR-Nano | LLM | 17x realtime |
| Whisper-large-v3 | AR | 13.4x realtime |

📦 Install / Upgrade

```bash
pip install --upgrade funasr
```

📦 Quick Start

```python
from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")
```

0.3.0v0.3.0

LauraGPT·3y ago·March 16, 2023

GitHub

📦 2023.3.17, funasr-0.3.0, modelscope-1.4.1

New Features:
Added support for GPU runtime solution, [nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu), which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
Added support for CPU runtime [quantization solution](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export), which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted [benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python) tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc).
Added streaming inference pipeline to the [16k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), [8k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary), with support for audio input streams (>= 10ms) , [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/236).
Improved the [punctuation prediction model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), resulting in increased accuracy (F-score increased from 55.6 to 56.5).
Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. [Paraformer streaming](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) model is used to output text in real time, while [Paraformer-large offline model](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) is used to correct recognition results, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc).
New Models:
+ 8 more

📦 最新更新：

2023年3月17日：[funasr-0.3.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.4.1
功能完善：
新增GPU runtime方案，[nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu)，可以将modelscope中Paraformer模型便捷导出，并部署成triton服务，实测，单GPU-V100，RTF为0.0032，吞吐率为300，[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu#performance-benchmark)。
新增CPU [runtime量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export)，支持从modelscope导出量化版本onnx与libtorch，实测，CPU-8369B，量化后，RTF提升50%（0.00438->0.00226），吞吐率翻倍（228->442），[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python)。
[新增加C++版本grpc服务部署方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc)，配合C++版本[onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/onnxruntime)，以及[量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export)，相比python-runtime性能翻倍。
[16k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary)，[8k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary)，modelscope pipeline，新增加流式推理方式，，最小支持10ms语音输入流，[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/236)。
优化[标点预测模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary)，主观体验标点准确性提升(fscore绝对提升 55.6->56.5)。
基于grpc服务，新增实时字幕[demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc)，采用2pass识别模型，[Paraformer流式模型](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) 用来上屏，[Paraformer-large离线模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)用来纠正识别结果。
+ 9 more

✨ New Contributors

@dingbig made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/147
@yuekaizhang made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/161
@zhuzizyf made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/180
@znsoftm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/185
@songtaoshi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/227
Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.2.0...v0.3.0

v0.2.0

LauraGPT·3y ago·February 20, 2023

GitHub

📦 2023.2.17, funasr-0.2.0, modelscope-1.3.0

We support a new feature, export paraformer models into [onnx and torchscripts](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export) from modelscope. The local finetuned models are also supported.
We support a new feature, [onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python), you could deploy the runtime without modelscope or funasr, for the [paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, [details](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/onnxruntime/paraformer/rapid_paraformer#speed).
We support a new feature, [grpc](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc), you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
We release a new model [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary), which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
We optimize the timestamp alignment of [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, [details](https://arxiv.org/abs/2301.12343).
We release a new model, [8k VAD model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [modelscope](https://github.com/alibaba-damo-academy/FunASR/discussions/134).
We release a new model, [MFCCA](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary), a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
We release several new UniASR model: [Southern Fujian Dialect model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825/summary), [French model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary), [German model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary), [Vietnamese model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary), [Persian model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary).
+ 4 more

📦 最新更新：

2023年2月（2月17号发布）：[funasr-0.2.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.3.0
功能完善：
新增加模型导出功能，Modelscope中所有Paraformer模型与本地finetune模型，支持一键导出[onnx格式模型与torchscripts格式模型](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export)，用于模型部署。
新增加Paraformer模型[onnxruntime部署功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python)，无须安装Modelscope与FunASR，即可部署，cpu实测，onnxruntime推理速度提升近3倍(rtf: 0.110->0.038)。
新增加[grpc服务功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc)，支持对Modelscope推理pipeline进行服务部署，也支持对onnxruntime进行服务部署。
优化[Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)时间戳，对badcase时间戳预测准确率有较大幅度提升，平均首尾时间戳偏移74.7ms，[详见论文](https://arxiv.org/abs/2301.12343)。
新增加任意VAD模型、ASR模型与标点模型自由组合功能，可以自由组合Modelscope中任意模型以及本地finetune后的模型进行推理，[用法示例](https://github.com/alibaba-damo-academy/FunASR/discussions/134)。
优化[标点通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary)，增加标点召回和精度，修复缺少标点等问题。
+ 7 more

✨ New Contributors

@zjc6666 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/35
@lyblsgo made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/37
@lingyunfly made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/42
@fangd123 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/44
@dyyzhmm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/48
@R1ckShi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/50
@chenmengzheAAA made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/57
@ZhihaoDU made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/95
+ 4 more

v0.1.6

LauraGPT·3y ago·January 16, 2023

GitHub

📦 2023.1.16, funasr-0.1.6

We release a new version model [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which integrate the [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) model, [ASR](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),
[Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) model and timestamp together. The model could take in several hours long inputs.
We release a new type model, [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
We release a new type model, [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
We release a new model, [Data2vec](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary), an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
We release a new model, [Paraformer-Tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary), a lightweight Paraformer model which supports Mandarin command words recognition.
We release a new type model, [SV](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary), which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
+ 1 more

📦 最新更新

2023年1月（1月16号发布）：[funasr-0.1.6](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.2.0
上线新模型：
[Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)，集成VAD、ASR、标点与时间戳功能，可直接对时长为数小时音频进行识别，并输出带标点文字与时间戳。
[中文无监督预训练Data2vec模型](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary)，采用Data2vec结构，基于AISHELL-2数据的中文无监督预训练模型，支持ASR或者下游任务微调模型。
[16k语音端点检测VAD模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary)，可用于检测长语音片段中有效语音的起止时间点。
[中文标点预测通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary)，可用于语音识别模型输出文本的标点预测。
[8K UniASR流式模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary)，[8K UniASR模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/summary)，一种流式与离线一体化语音识别模型，进行流式语音识别的同时，能够以较低延时输出离线识别结果来纠正预测文本。
Paraformer-large基于[AISHELL-1微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell1-vocab8404-pytorch/summary)、[AISHELL-2微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch/summary)，将Paraformer-large模型分别基于AISHELL-1与AISHELL-2数据微调。
+ 5 more

✨ New Contributors

@nichongjia-2007 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/27
Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.4...v0.1.6

v0.1.4

LauraGPT·3y ago·December 10, 2022

GitHub

The is the first release version. 1. Paraformer model could be decoding with batch >1. 2. UniASR model and recipes are new added. 3. Transformer and Conformer are also contained. 4. The inference and finetuning of models in modelscope are more convenience.

← Back to FunASR wiki