LogoAIAny
Icon for item

RealtimeSTT

RealtimeSTT is an open-source, low-latency speech-to-text library for real-time applications. It combines voice-activity detection, wake-word activation and fast transcription (GPU-ready) to power voice assistants and other low-latency ASR use cases. The project is community-driven and accepts contributions.

Introduction

RealtimeSTT — Realtime Speech-to-Text Library

RealtimeSTT is an open-source library designed for low-latency, realtime speech-to-text (ASR) applications. It focuses on robustness and efficiency and combines multiple components to deliver fast and practical transcription for voice assistants, live captioning, and interactive voice UIs.

Key characteristics
  • Low-latency realtime transcription suitable for interactive apps.
  • Integrated Voice Activity Detection (VAD) to start/stop recordings automatically.
  • Wake-word support (Porcupine or OpenWakeWord) for activation by keyword.
  • Uses Faster_Whisper for fast (GPU-accelerated) transcription and supports CPU-only operation with smaller models.
  • Community-driven project status — the author has stepped back from active maintenance but merges well-written PRs.
Tech stack
  • Voice Activity Detection: WebRTCVAD for initial detection; SileroVAD for more accurate verification.
  • Speech-to-Text: Faster_Whisper (GPU accelerated) / Whisper-compatible models.
  • Wake Word Detection: Picovoice Porcupine or OpenWakeWord.
  • Written in Python; installs via pip and depends on PyTorch (CPU or CUDA builds) and other audio libraries.
Typical use cases
  • Wake-word activated voice assistants
  • Real-time transcription and captioning
  • Voice-driven automation and tooling
Quick install

pip install RealtimeSTT

The package ships with a CPU-only PyTorch build by default; for better realtime performance, install a CUDA-capable PyTorch matching your CUDA toolkit (the README documents commands for CUDA 11.8 / 12.x examples).

Quick example
from RealtimeSTT import AudioToTextRecorder
 
def process_text(text):
    print(text)
 
if __name__ == '__main__':
    recorder = AudioToTextRecorder()
    while True:
        recorder.text(process_text)
Configuration highlights

RealtimeSTT exposes many initialization parameters to tune behavior for realtime vs final transcription, VAD sensitivity, wake-word backend and sensitivity, GPU device selection, transcription model choice (tiny/base/small/medium/large-*), callbacks for recording/transcription events, and realtime stabilization callbacks.

Notable repository facts
  • GitHub stars (snapshot from collected context): 9246
  • Created at: 2023-08-29 (repository created date)
  • Latest release mentioned in README: v0.3.104
Caveats & community status

The repository README clearly states the project is "Community-Driven" and the original author is no longer actively adding features or providing support due to time constraints. The project remains open to contributions and the author will review/merge well-written pull requests.

License & contribution
  • License: MIT
  • Contributions: PRs welcome. The README points to tests, demos, and example scripts to help contributors and users evaluate features.
Where to look next

Visit the GitHub repository for release history, installation details for CUDA/PyTorch, deeper configuration options, example scripts, and instructions for wake-word model training when using OpenWakeWord.

Information

  • Websitegithub.com
  • AuthorsKolja Beigel
  • Published date2023/08/29

Categories