Why this matters
Most consumer TTS/dubbing services ship as cloud APIs that require keys and per‑use billing. This project flips that model: it packages a multi-engine TTS + ASR desktop pipeline so teams and creators can run voice cloning, long-form audiobook production, and cinematic dubbing entirely on their hardware with no external audio uploads.
What Sets It Apart
- Local-first, multi-engine stack: supports an internal OmniVoice diffusion TTS plus adapters for ~10 other engines and 8 ASR backends, letting you route work to GPU/CPU per-engine. That makes it practical to run large‑vocabulary dubbing and zero‑shot cloning without sending data to a cloud provider.
- High language and workflow coverage: zero-shot voice cloning and TTS across ~600+ languages, integrated speaker diarization (pyannote+WhisperX), Demucs vocal isolation, sentence-chunked unlimited-length generation, and one-click batch dubbing (transcribe→translate→synthesize→mux → MP4).
- Production ergonomics for desktop: cross-platform Tauri app with a dictation widget (global hotkey), projects/voice profiles, A/B voice auditions, exportable persona bundles, diagnostics/self-check suite, and optional remote backend/MCP server mode for remote clients.
- AI provenance and safety tools: embeds an invisible AudioSeal audio watermark in generated files and offers a detection endpoint plus configurable visible branding for exported videos.
Who It's For and Trade-offs
Great fit if you need privacy or offline control (no API keys/cloud), want multi‑language zero‑shot cloning, or process large batches of videos locally. It’s also well suited to audio-first workflows like audiobook production or studio dubbing where per‑asset cloud costs become significant.
Look elsewhere if you require a turnkey cloud API or an out‑of‑the‑box hosted voice library with enterprise SLA: running large models locally still demands disk space (models + cache), modest RAM/VRAM (recommended 16 GB+ RAM, 8+ GB VRAM for best GPU concurrency), and occasional manual model installs. The project is AGPL‑3.0 (commercial licensing available), and some engines are platform‑specific (e.g., MLX optimizations for Apple Silicon).
