Overview
Pathway is an open-source (BSL-licensed) live data framework that lets developers write pipelines in Python while executing them on a scalable Rust engine. It targets both batch and stream processing use cases and emphasizes real-time analytics, incremental computation, and easy integration with machine learning and LLM tooling.
Key features
- Easy-to-use Python API for composing ETL and streaming pipelines while running the computation in a Rust backend for performance and concurrency.
- Unified batch + streaming semantics, enabling the same code to operate in development, CI, local runs, or distributed production environments.
- Stateful transformations (joins, windows, sorting) and built-in reducers with support for custom Python UDFs.
- Persistence to save computation state for restarts and fault recovery.
- Rich connector ecosystem (Kafka, PostgreSQL, Google Drive, Airbyte integration for many sources, etc.).
- Dedicated LLM/RAG tooling: wrappers for common LLM services, embedders, splitters, in-memory real-time vector index, and integrations with LlamaIndex and LangChain.
- Deployment options: local execution, Docker images, Kubernetes-friendly, and guides for cloud deployment.
Architecture and implementation
Although pipelines are authored in Python, Pathway executes them on a high-performance Rust engine built on concepts like differential dataflow and incremental computation. This architecture enables multithreading, multiprocessing, and distributed runs while keeping the developer ergonomics of Python. Pathway keeps pipeline state mostly in memory, with optional persistence for durability and exactly-once semantics in enterprise variants.
Typical use cases
- Real-time ETL and streaming analytics (metrics, aggregations, alerting).
- Live ML/LLM pipelines and RAG (retrieval-augmented generation) systems that require low-latency ingestion and on-the-fly embedding/indexing.
- Event-driven processing: temporal joins, windowed aggregations, and streaming joins that handle out-of-order or late data.
- Prototyping locally and running the same code in production via containers or Kubernetes.
Getting started
Install via pip (requires Python 3.10+):
pip install -U pathwayA minimal pipeline is created with the Python API and executed using pw.run() or pathway spawn python main.py for multi-threaded runs. The project provides runnable examples and templates for common patterns, including LLM/RAG templates.
Deployment & integration
Pathway provides an official Docker image (pathwaycom/pathway) and documentation for deploying on Kubernetes and cloud providers. It integrates with third-party tooling like LangChain, LlamaIndex, and many storage/streaming systems via connectors.
License & community
The repository is distributed under BSL 1.1 (converts to Apache 2.0 after 4 years for the code). The project maintains an active community via Discord, GitHub issues, and documentation at pathway.com/developers. The GitHub project was created on 2022-11-27 and has a notable community adoption and examples gallery.
