RealtimeSTT — Realtime Speech-to-Text Library
RealtimeSTT is an open-source library designed for low-latency, realtime speech-to-text (ASR) applications. It focuses on robustness and efficiency and combines multiple components to deliver fast and practical transcription for voice assistants, live captioning, and interactive voice UIs.
Key characteristics
- Low-latency realtime transcription suitable for interactive apps.
- Integrated Voice Activity Detection (VAD) to start/stop recordings automatically.
- Wake-word support (Porcupine or OpenWakeWord) for activation by keyword.
- Uses Faster_Whisper for fast (GPU-accelerated) transcription and supports CPU-only operation with smaller models.
- Community-driven project status — the author has stepped back from active maintenance but merges well-written PRs.
Tech stack
- Voice Activity Detection: WebRTCVAD for initial detection; SileroVAD for more accurate verification.
- Speech-to-Text: Faster_Whisper (GPU accelerated) / Whisper-compatible models.
- Wake Word Detection: Picovoice Porcupine or OpenWakeWord.
- Written in Python; installs via pip and depends on PyTorch (CPU or CUDA builds) and other audio libraries.
Typical use cases
- Wake-word activated voice assistants
- Real-time transcription and captioning
- Voice-driven automation and tooling
Quick install
pip install RealtimeSTT
The package ships with a CPU-only PyTorch build by default; for better realtime performance, install a CUDA-capable PyTorch matching your CUDA toolkit (the README documents commands for CUDA 11.8 / 12.x examples).
Quick example
from RealtimeSTT import AudioToTextRecorder
def process_text(text):
print(text)
if __name__ == '__main__':
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)Configuration highlights
RealtimeSTT exposes many initialization parameters to tune behavior for realtime vs final transcription, VAD sensitivity, wake-word backend and sensitivity, GPU device selection, transcription model choice (tiny/base/small/medium/large-*), callbacks for recording/transcription events, and realtime stabilization callbacks.
Notable repository facts
- GitHub stars (snapshot from collected context): 9246
- Created at: 2023-08-29 (repository created date)
- Latest release mentioned in README: v0.3.104
Caveats & community status
The repository README clearly states the project is "Community-Driven" and the original author is no longer actively adding features or providing support due to time constraints. The project remains open to contributions and the author will review/merge well-written pull requests.
License & contribution
- License: MIT
- Contributions: PRs welcome. The README points to tests, demos, and example scripts to help contributors and users evaluate features.
Where to look next
Visit the GitHub repository for release history, installation details for CUDA/PyTorch, deeper configuration options, example scripts, and instructions for wake-word model training when using OpenWakeWord.
