Overview
NeuTTS is an open-source project from Neuphonic that provides compact, high-quality text-to-speech (TTS) models optimized for on-device inference. The project focuses on producing natural, realistic voices while keeping model size and compute costs low so that synthesis can run in real time on mid-range phones, laptops and embedded devices (e.g., Raspberry Pi).
Key components and design
- Lightweight speech language models built from small LLM backbones (examples: NeuTTS-Air, NeuTTS-Nano).
- NeuCodec: a 50 Hz neural audio codec designed for low-bitrate, high-quality audio encoding using a single codebook.
- GGML / GGUF format distribution for efficient local inference (compatible with llama-cpp-python / llama.cpp style runtimes).
- Instant speaker cloning: the model can clone a speaker from a short reference (as little as ~3 seconds of clean speech plus reference text).
- Watermarked outputs (Perth perceptual-threshold watermarker) to help with provenance and responsible use.
Features
- Best-in-class realism relative to model size: aims for a balance of speed, size, and naturalness for real-world embedded use.
- On-device-first: models and artifacts are provided in formats and quantisations suitable for CPU-only mobile devices as well as GPUs.
- Streaming support: the project includes examples for streaming generation so audio can play as it is produced.
- Multiple model families: e.g., NeuTTS-Air (larger) and NeuTTS-Nano (very small), each with their own trade-offs in params/quality/latency.
- Benchmarked throughput numbers across devices (mobile CPU, desktop CPU, and high-end GPU) to guide deployment choices.
Model & technical details
- Supported languages: English (per repo info).
- Context window: ~2048 tokens (sufficient for processing ~30 seconds of audio including prompts).
- Typical active parameter counts (approx): NeuTTS-Air ~360M active params, NeuTTS-Nano ~120M active params (plus embedding params reported separately).
- Codec & format: NeuCodec for audio coding; models are distributed in GGML/GGUF formats and also available on Hugging Face model repos.
- Watermarking: outputs embed a perceptual watermarker for responsibility compliance.
- Licenses: NeuTTS-Air under Apache 2.0; NeuTTS-Nano under NeuTTS Open License 1.0 (per repo metadata).
Benchmarks (summary)
The project provides throughput benchmarks (tokens/s) for several device classes and quantisations. These compare prefill and decode performance on CPU-only phones and desktops as well as on high-end GPUs. The codec component is not included in the reported SLM throughput numbers and must be considered when estimating full pipeline latency.
Quick start
- Clone the repo and install dependencies (Python >= 3.11 recommended for full PyTorch workflows).
git clone https://github.com/neuphonic/neutts.git
cd neutts
pip install -r requirements.txt- (Optional) Install llama-cpp-python to run GGUF models with local inference acceleration.
pip install llama-cpp-python- Run an example to synthesise audio (repo includes sample scripts and streaming example):
python -m examples.basic_example --input_text "Hello world" --ref_audio samples/jo.wav --ref_text samples/jo.txtOr use the one-block API provided in the repo to instantiate NeuTTS, encode a reference and call infer (example included in the README).
Deployment tips
- Use GGUF quantised backbones and pre-encode references to reduce latency on-device.
- For lowest-latency audio decode, use the ONNX codec decoder provided by the project where applicable.
- Follow the repo examples for streaming usage to play audio chunks as they are generated.
Responsibility & usage
The project explicitly includes output watermarking for traceability and includes a short disclaimer urging responsible use. The repo warns about third-party websites claiming affiliation (e.g., similarly named domains) and points users to the Neuphonic official domain for authoritative information.
Contributing and development
The repository includes developer tooling, tests and pre-commit hooks. There are example finetuning scripts and a training guide (TRAINING.md) for users who want to adapt models to new voices or datasets. Tests can be run with pytest as outlined in the README.
Where to find models
NeuTTS model checkpoints and quantised GGUF artifacts are published to Hugging Face under the Neuphonic account (NeuTTS-Air, NeuTTS-Nano and quantised variants), and the repository README lists these collections and spaces.
