Sanskrit chant requires precise, metrically consistent prosody that ordinary speech datasets do not capture. This corpus delivers a single-speaker, tradition-faithful recording set with per-clip prosodic annotations so TTS systems can learn chant-specific timing, pausing, and vowel/ consonant shaping rather than plain speech patterns.
What Sets It Apart
- Metrically-aware recordings: each clip preserves pāda-level breath groups, daṇḍa pauses, and yati caesura rules—so a model can learn chant-accurate pause placement and phrasing rather than generic sentence boundaries.
- Two complementary cuts: style_a (764 clips, ~2.70 h) and style_b (703 clips, ~2.64 h) contain largely different verses and metadata variants (style_b includes explicit meter and syllable counts), enabling experiments on data-splitting and prosody conditioning.
- High-quality, consistent capture: single reciter, fixed microphone setup, lossless WAV derived to 24 kHz, low noise floor and controlled peaks—minimizes speaker/recording variability for cleaner model training.
- Rich metadata per clip: Devanagari text, SLP1 transliteration, Kannada-routed text, duration, session/take, and (style_b) meter and n_syll, useful for meter-conditioned synthesis and evaluation.
Who It's For and Trade-offs
Great fit if you want to train or fine-tune chant- or meter-aware TTS, study prosody/meter in Sanskrit verse, or create accessible renditions of classical ślokas. The single-speaker, tradition-faithful design reduces inter-speaker variance and highlights prosody learning.
Look elsewhere if you need large-scale multi-speaker conversational speech, Vedic svara recordings (this corpus excludes Vedic svaras), or massively diverse acoustic conditions—this dataset prioritizes ritual/traditional chant fidelity over breadth. Also, total duration (~5.3 h) is modest for large neural TTS pretraining but appropriate for fine-tuning or focused prosody research.
Where It Fits
Use this as fine-tuning data for meter-conditioned TTS models, a benchmark for chant prosody research, or as high-quality training examples when building accessible audio renditions of Sanskrit ślokas. License: CC-BY-4.0; respect the author's voice and attribution guidance when releasing synthesized outputs.
