Converts text into expressive conversational speech across 100+ languages with zero-shot voice cloning and inline control tokens for emotion, style, prosody, pauses, and sound effects. Released under a research/non-commercial license; commercial use requires separate licensing.