LogoAIAny
Icon for item

Amphion

Amphion is an open-source toolkit for audio, music, and speech generation. It provides reproducible implementations of state-of-the-art models (TTS, VC, SVC, TTA, etc.), visualization tools, neural codecs, evaluation metrics, and dataset pipelines (e.g., Emilia). Amphion targets researchers and engineers who want a practical, educational platform for building and evaluating audio generation systems.

Introduction

Amphion — Open-Source Audio, Music, and Speech Generation Toolkit

Overview

Amphion (/æmˈfaɪən/) is an open-source toolkit focused on audio, music, and speech generation research and development. Its goal is to provide reproducible implementations, educational visualizations, dataset preprocessing pipelines, pretrained models, and evaluation metrics to accelerate experiments and onboarding of junior researchers and engineers in audio generation.

Supported tasks
  • Text-to-Speech (TTS) — implementations of FastSpeech2, VITS, VALL-E, NaturalSpeech2, MaskGCT, Vevo-TTS, and others.
  • Voice Conversion (VC) — zero-shot and controllable methods such as Vevo and FACodec.
  • Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC).
  • Accent Conversion (AC) and various speech editing tasks.
  • Text-to-Audio (TTA) / Text-to-Music (TTM) via latent-diffusion style pipelines.
  • Neural audio codecs and tokenizers for efficient discrete-token generation.
Key features
  • Comprehensive model implementations: diffusion-, transformer-, VAE-, flow- and GAN-based architectures for generation and vocoding.
  • Visualization tools: interactive visualization (e.g., SingVisio) to illustrate internal mechanisms of models for educational purposes.
  • Evaluation suite: objective metrics for F0, energy, intelligibility (WER/CER with Whisper), perceptual scores (FAD, PESQ, STOI), speaker-similarity measures, etc.
  • Dataset support & preprocessing: unified preprocess for common datasets (LJSpeech, LibriTTS, VCTK, AudioCaps, etc.) and the large in-the-wild Emilia dataset with Emilia-Pipe for cleaning/annotation.
  • Pretrained models and demos: Hugging Face hosting, ModelScope integrations, and demo pages for several released systems.
  • Extensible & reproducible: designed to help reproducible research and to be a learning platform for newcomers.
Notable releases (selected)
  • Amphion v0.1 (2023-12-18) and v0.2 (technical report released 2025-01-30).
  • Emilia dataset (101k+ hours) and later Emilia-Large combining additional hours (announced 2025-02-26).
  • Model releases such as Vevo (zero-shot voice imitation), MaskGCT (non-autoregressive TTS), Metis (foundation model for unified speech generation), and DualCodec (neural audio codec for discrete tokens).
Installation & usage
  • Install from GitHub or use the provided Docker image. Typical workflow: clone the repo, create a conda environment (python 3.9.15 recommended), run env.sh to install dependencies; or pull the official Docker image and mount datasets.
  • Recipes and examples are organized under egs/ (TTS, SVC, TTA, vocoder, evaluation, visualization), with README guides for each task.
Ecosystem & interoperability
  • Amphion integrates with Hugging Face (models & datasets), ModelScope, and provides example notebooks/demos. It uses common pretrained backbones (Whisper, WavLM, ContentVec, WeNet) and supports widely-used vocoders (HiFi-GAN, BigVGAN, WaveNet, DiffWave, etc.).
License & citation
  • Licensed under the MIT License — free for research and commercial use.
  • Citation information provided for Amphion v0.1 and v0.2 in the repository README.
Who is it for
  • Researchers building or reproducing state-of-the-art audio generation models.
  • Engineers prototyping TTS, voice conversion, singing synthesis, text-to-audio, and codec-based generation systems.
  • Educators and students who want interactive visualizations to understand model internals.

(See the project homepage and repository README for detailed examples, API usage, and task-specific instructions.)

Information

  • Websitegithub.com
  • AuthorsOpenMMLab, Amphion contributors
  • Published date2023/11/15

Categories