Gitpedia
modelscope

modelscope/FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

7 Releases
Latest: yesterday
v1.3.9: Wheel packaging + SenseVoice speaker diarization fixv1.3.9Latest
LauraGPTLauraGPT·yesterday·May 29, 2026
GitHub

🐛 Wheel packaging (fixes #2943)

  • FunASR now publishes a `py3-none-any` wheel alongside the source distribution. Installation is faster since pip no longer needs to build from source.

🐛 Bug fixes

  • SenseVoice + speaker diarization: Fixed crash when using `spk_model="cam++"` with SenseVoice (auto-falls back to VAD-segment mode since SenseVoice doesn't produce word-level timestamps)
  • torchaudio >= 2.11 compatibility: Added `soundfile` as intermediate fallback for users with newer torchaudio versions that removed legacy backends

📦 Install / Upgrade

  • ```bash
  • pip install --upgrade funasr
  • ```
  • Full changelog: https://github.com/modelscope/FunASR/compare/v1.3.3...v1.3.9
v1.3.3: Agent Integration — OpenAI API + MCP Server + funasr-server CLIv1.3.3
LauraGPTLauraGPT·1w ago·May 23, 2026
GitHub

📦 Highlights

  • This release makes FunASR a drop-in speech backend for AI agents.

New: `funasr-server` CLI

  • ```bash
  • pip install funasr fastapi uvicorn python-multipart
  • funasr-server --device cuda
  • ```
  • One command starts an OpenAI-compatible `/v1/audio/transcriptions` endpoint.

New: MCP Server

  • AI assistants (Claude, Cursor, Windsurf) can now transcribe audio directly.

New: OpenAI-Compatible API

  • Works with any agent framework: LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI.
  • ```python
  • from openai import OpenAI
  • client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
  • result = client.audio.transcriptions.create(model="sensevoice", file=open("a.wav","rb"))
  • ```

🐛 Bug Fixes

  • Fixed `hub="hf"` parameter propagation to sub-models (v1.3.2)
  • Fixed Qwen3-ASR ImportError masking

📦 Upgrade

  • ```bash
  • pip install --upgrade funasr
  • ```

📦 Links

  • [Agent Integration Guide](https://modelscope.github.io/FunASR/agent.html)
  • [OpenAI API Docs](https://github.com/modelscope/FunASR/tree/main/examples/openai_api)
  • [MCP Server Docs](https://github.com/modelscope/FunASR/tree/main/examples/mcp_server)
  • [Benchmark](https://modelscope.github.io/FunASR/benchmark.html)
v1.3.2: HuggingFace Hub Fix + Performance Benchmarkv1.3.2
LauraGPTLauraGPT·1w ago·May 23, 2026
GitHub

🐛 Bug Fix

  • Fixed hub parameter propagation — When using `hub="hf"`, the parameter is now correctly forwarded to VAD/PUNC/SPK sub-models. Previously, users on HuggingFace would get 404 errors for sub-models. (#2859)

📦 Improvements

  • Updated PyPI metadata with better description, keywords, and project URLs
  • Added comprehensive benchmark page: https://modelscope.github.io/FunASR/benchmark.html

📦 Benchmark Results (PyTorch, GPU)

  • | Model | Type | Speed |
  • |-------|------|-------|
  • | SenseVoice-Small | NAR | 170x realtime |
  • | Paraformer-Large | NAR | 120x realtime |
  • | Whisper-large-v3-turbo | AR | 46x realtime |
  • | Fun-ASR-Nano | LLM | 17x realtime |
  • | Whisper-large-v3 | AR | 13.4x realtime |

📦 Install / Upgrade

  • ```bash
  • pip install --upgrade funasr
  • ```

📦 Quick Start

  • ```python
  • from funasr import AutoModel
  • model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
  • result = model.generate(input="audio.wav")
  • ```
0.3.0v0.3.0
LauraGPTLauraGPT·3y ago·March 16, 2023
GitHub

📦 2023.3.17, funasr-0.3.0, modelscope-1.4.1

  • New Features:
  • Added support for GPU runtime solution, [nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu), which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
  • Added support for CPU runtime [quantization solution](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export), which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted [benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python) tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
  • Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc).
  • Added streaming inference pipeline to the [16k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), [8k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary), with support for audio input streams (>= 10ms) , [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/236).
  • Improved the [punctuation prediction model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), resulting in increased accuracy (F-score increased from 55.6 to 56.5).
  • Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. [Paraformer streaming](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) model is used to output text in real time, while [Paraformer-large offline model](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) is used to correct recognition results, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc).
  • New Models:
  • + 8 more

📦 最新更新:

  • 2023年3月17日:[funasr-0.3.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.4.1
  • 功能完善:
  • 新增GPU runtime方案,[nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu),可以将modelscope中Paraformer模型便捷导出,并部署成triton服务,实测,单GPU-V100,RTF为0.0032,吞吐率为300,[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu#performance-benchmark)。
  • 新增CPU [runtime量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),支持从modelscope导出量化版本onnx与libtorch,实测,CPU-8369B,量化后,RTF提升50%(0.00438->0.00226),吞吐率翻倍(228->442),[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python)。
  • [新增加C++版本grpc服务部署方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc),配合C++版本[onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/onnxruntime),以及[量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),相比python-runtime性能翻倍。
  • [16k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary),[8k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary),modelscope pipeline,新增加流式推理方式,,最小支持10ms语音输入流,[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/236)。
  • 优化[标点预测模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),主观体验标点准确性提升(fscore绝对提升 55.6->56.5)。
  • 基于grpc服务,新增实时字幕[demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc),采用2pass识别模型,[Paraformer流式模型](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) 用来上屏,[Paraformer-large离线模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)用来纠正识别结果。
  • + 9 more

New Contributors

  • @dingbig made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/147
  • @yuekaizhang made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/161
  • @zhuzizyf made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/180
  • @znsoftm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/185
  • @songtaoshi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/227
  • Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.2.0...v0.3.0
v0.2.0
LauraGPTLauraGPT·3y ago·February 20, 2023
GitHub

📦 2023.2.17, funasr-0.2.0, modelscope-1.3.0

  • We support a new feature, export paraformer models into [onnx and torchscripts](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export) from modelscope. The local finetuned models are also supported.
  • We support a new feature, [onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python), you could deploy the runtime without modelscope or funasr, for the [paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, [details](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/onnxruntime/paraformer/rapid_paraformer#speed).
  • We support a new feature, [grpc](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc), you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
  • We release a new model [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary), which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
  • We optimize the timestamp alignment of [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, [details](https://arxiv.org/abs/2301.12343).
  • We release a new model, [8k VAD model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [modelscope](https://github.com/alibaba-damo-academy/FunASR/discussions/134).
  • We release a new model, [MFCCA](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary), a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
  • We release several new UniASR model: [Southern Fujian Dialect model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825/summary), [French model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary), [German model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary), [Vietnamese model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary), [Persian model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary).
  • + 4 more

📦 最新更新:

  • 2023年2月(2月17号发布):[funasr-0.2.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.3.0
  • 功能完善:
  • 新增加模型导出功能,Modelscope中所有Paraformer模型与本地finetune模型,支持一键导出[onnx格式模型与torchscripts格式模型](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),用于模型部署。
  • 新增加Paraformer模型[onnxruntime部署功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python),无须安装Modelscope与FunASR,即可部署,cpu实测,onnxruntime推理速度提升近3倍(rtf: 0.110->0.038)。
  • 新增加[grpc服务功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc),支持对Modelscope推理pipeline进行服务部署,也支持对onnxruntime进行服务部署。
  • 优化[Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)时间戳,对badcase时间戳预测准确率有较大幅度提升,平均首尾时间戳偏移74.7ms,[详见论文](https://arxiv.org/abs/2301.12343)。
  • 新增加任意VAD模型、ASR模型与标点模型自由组合功能,可以自由组合Modelscope中任意模型以及本地finetune后的模型进行推理,[用法示例](https://github.com/alibaba-damo-academy/FunASR/discussions/134)。
  • 优化[标点通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),增加标点召回和精度,修复缺少标点等问题。
  • + 7 more

New Contributors

  • @zjc6666 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/35
  • @lyblsgo made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/37
  • @lingyunfly made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/42
  • @fangd123 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/44
  • @dyyzhmm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/48
  • @R1ckShi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/50
  • @chenmengzheAAA made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/57
  • @ZhihaoDU made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/95
  • + 4 more
v0.1.6
LauraGPTLauraGPT·3y ago·January 16, 2023
GitHub

📦 2023.1.16, funasr-0.1.6

  • We release a new version model [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which integrate the [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) model, [ASR](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),
  • [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) model and timestamp together. The model could take in several hours long inputs.
  • We release a new type model, [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
  • We release a new type model, [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
  • We release a new model, [Data2vec](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary), an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
  • We release a new model, [Paraformer-Tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary), a lightweight Paraformer model which supports Mandarin command words recognition.
  • We release a new type model, [SV](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary), which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
  • We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
  • + 1 more

📦 最新更新

  • 2023年1月(1月16号发布):[funasr-0.1.6](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.2.0
  • 上线新模型:
  • [Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),集成VAD、ASR、标点与时间戳功能,可直接对时长为数小时音频进行识别,并输出带标点文字与时间戳。
  • [中文无监督预训练Data2vec模型](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary),采用Data2vec结构,基于AISHELL-2数据的中文无监督预训练模型,支持ASR或者下游任务微调模型。
  • [16k语音端点检测VAD模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary),可用于检测长语音片段中有效语音的起止时间点。
  • [中文标点预测通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),可用于语音识别模型输出文本的标点预测。
  • [8K UniASR流式模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary),[8K UniASR模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/summary),一种流式与离线一体化语音识别模型,进行流式语音识别的同时,能够以较低延时输出离线识别结果来纠正预测文本。
  • Paraformer-large基于[AISHELL-1微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell1-vocab8404-pytorch/summary)、[AISHELL-2微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch/summary),将Paraformer-large模型分别基于AISHELL-1与AISHELL-2数据微调。
  • + 5 more

New Contributors

  • @nichongjia-2007 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/27
  • Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.4...v0.1.6
v0.1.4
LauraGPTLauraGPT·3y ago·December 10, 2022
GitHub

The is the first release version. 1. Paraformer model could be decoding with batch >1. 2. UniASR model and recipes are new added. 3. Transformer and Conformer are also contained. 4. The inference and finetuning of models in modelscope are more convenience.