LogoAIAny
Icon for item

Chatterbox TTS

Chatterbox is an open-source family of state-of-the-art text-to-speech models from Resemble AI. It includes Chatterbox-Turbo (a 350M-parameter efficient model with paralinguistic tags and single-step mel decoding), Chatterbox, and a multilingual model supporting 23+ languages. Designed for low-latency voice agents, narration, and creative workflows; includes built-in PerTh watermarking and demo/Hub integrations.

Introduction

Chatterbox TTS — detailed introduction

Chatterbox is an open-source suite of text-to-speech models published by Resemble AI. The project provides three main model families tailored to different use cases:

  • Chatterbox-Turbo: a 350M-parameter, highly efficient model optimized for low compute and VRAM usage. Turbo introduces native paralinguistic tags (e.g., [laugh], [chuckle], [cough]) for adding realistic non-speech events, and a distilled speech-token-to-mel decoder that reduces generation to a single step while maintaining high-fidelity audio output. It is particularly suited for zero-shot voice agents and production scenarios where latency and resource use matter.

  • Chatterbox: the original English-focused model offering flexible control (CFG & exaggeration tuning) for expressive outputs, useful for creative TTS tasks and general zero-shot voice cloning.

  • Chatterbox-Multilingual: a larger variant (≈500M) supporting 23+ languages with zero-shot cloning and multilingual synthesis for localization and global applications.

Key features

  • Paralinguistic tags: native support for non-verbal tokens to boost realism.
  • Low-latency Turbo inference: distilled decoder reduces synthesis steps to a single step for faster generation and lower VRAM requirements.
  • Zero-shot voice cloning: models accept a short reference audio clip to mimic a target voice.
  • Multi-language support: the multilingual model supports 23+ languages including Chinese, Spanish, French, Arabic, Hindi, Japanese, Korean, and others.
  • Built-in PerTh watermarking: every generated audio includes Resemble AI's Perth implicit watermark that survives common edits and compression and can be programmatically extracted for provenance and responsible AI workflows.
  • Demos & integrations: demo pages and Hugging Face Spaces are provided for quick listening and evaluation.

Installation & usage (summary)

  • Install via pip: pip install chatterbox-tts or install from source for development.
  • Typical usage involves loading a model (e.g., ChatterboxTurboTTS.from_pretrained(device="cuda")) and calling generate(text, audio_prompt_path=...) when voice cloning is required. The library uses torch/torchaudio for audio handling and provides example scripts for TTS and voice conversion.

Supported languages

  • Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, Chinese.

Responsible AI

  • The project includes Perth watermark embedding and extraction tools to help detect generated audio. The README explicitly discourages misuse and provides detection code samples.

Repository & publishing

  • This GitHub repository serves as the canonical open-source distribution for the Chatterbox models and associated code, demo pages, and examples. The project metadata indicates it was created on 2025-04-23.

Use cases

  • Real-time voice agents and assistants (low-latency requirements)
  • Audiobook and narration production
  • Multilingual localization and zero-shot voice cloning experiments
  • Research and fine-tuning for higher-accuracy or bespoke voice models (with commercial Resemble AI services available for production scaling)

Links and demos

  • Official demo page: provided on the repository homepage
  • Hugging Face Spaces: demo spaces linked from the README

(For implementation details, examples and API parameters, refer to the repository README and example scripts.)

Information

  • Websitegithub.com
  • AuthorsResemble AI
  • Published date2025/04/23

Categories