LogoAIAny
Icon for item

FunASR

FunASR is an open-source end-to-end speech recognition toolkit (ASR) led by Alibaba DAMO Academy. It supports ASR, voice activity detection (VAD), punctuation restoration, speaker verification/diarization, multi-talker ASR, emotion recognition and more. FunASR provides many industrial-grade pretrained models, inference scripts, and deployment runtimes for research and production use.

Introduction

Overview

FunASR is an end-to-end speech recognition toolkit designed to bridge academic research and industrial deployment. Originating from Alibaba DAMO Academy and contributed to by an active community, FunASR bundles training & fine-tuning capabilities, a rich model zoo of pretrained models (published on ModelScope and Hugging Face), runtime components for both batch and real-time inference, and utilities for related speech tasks (VAD, punctuation, speaker tasks, emotion recognition, etc.).

Key features
  • Multi-task support: non-streaming and streaming ASR, VAD, punctuation restoration, timestamp prediction, speaker verification/diarization, keyword spotting, emotion recognition, and multi-talker pipelines.
  • Extensive model zoo: contains industrial and academic pretrained models such as Paraformer variants, Conformer, Whisper integrations, SenseVoice family, and Fun-ASR-Nano large-scale models covering many languages and accents.
  • Deployment-ready runtimes: offline file transcription services, real-time transcription services, GPU/CPU runtimes, and ONNX export for optimized inference.
  • Production oriented: supports hotword customization, WFST/ngram decoding, low-latency transducers (BAT), and optimizations for memory and throughput.
  • Ecosystem integrations: direct support for ModelScope and Hugging Face model hubs, examples and demos for common tasks, and packaging to PyPI (funasr) for easy install.
Model zoo & notable models

FunASR publishes many pretrained models aimed at production use. Representative entries include Paraformer (Chinese/English variants), SenseVoiceSmall (multilingual speech understanding), Fun-ASR-Nano (large-scale trained model supporting dozens of languages and dialects), Whisper integrations, Qwen-Audio/Qwen-Audio-Chat adapters, and specialized models for punctuation, timestamp prediction and keyword spotting. Models are available on ModelScope and Hugging Face, enabling easy downloading and inference.

Usage & developer experience

FunASR provides both CLI tools (funasr, funasr-export) and Python APIs (AutoModel and generate/export flows). Common workflows are: quick inference with a pretrained model, streaming ASR with chunked inputs and low-latency settings, VAD segmentation, and model export to ONNX for optimized runtime. The repo includes many ready-to-run demos and example configurations for different languages and deployment scenarios.

Example typical usage:

  • Quick non-streaming inference: instantiate AutoModel with a pretrained model id (e.g. "paraformer-zh" or "FunAudioLLM/Fun-ASR-Nano-2512"), optionally enable vad/punctuation models, then call generate() on local audio files.
  • Streaming ASR: use chunk_size and lookback settings to trade off latency and accuracy, call generate() incrementally with is_final flags.
  • Export: export models to ONNX and run with funasr-onnx runtime for lower-latency CPU/GPU inference.
Deployment & runtime

FunASR ships runtime modules and deployment docs for file transcription services (Mandarin/English, CPU/GPU variants) and real-time transcription services. The runtime has been incrementally optimized for memory leak fixes, ARM64 docker images, dynamic batching, and GPU acceleration. Tools for hotword support, sentence-level timestamps, and automated threading configurations are provided to ease production deployment.

Community, license & citation

The project is MIT-licensed and includes contributions from Alibaba DAMO Academy and multiple academic/industrial partners. Pretrained models may carry their own model license terms. The repo includes citation entries for the FunASR Interspeech paper and related works (Paraformer, BAT, SeACo-Paraformer). Community support is via GitHub issues and communication groups linked from the repo.

Who should use it

Researchers and engineers building ASR systems, speech analytics, meeting transcription, voice assistants, and other speech-enabled products can use FunASR to prototype, fine-tune on industrial data, and deploy production services with pretrained models and runtime components.

Information

  • Websitegithub.com
  • AuthorsAlibaba DAMO Academy, Northwestern Polytechnical University (NWPU), China Telecom, RapidAI, AIHealthX, XVERSE, Community contributors
  • Published date2022/11/24