AIAny - pg_durable

Most AI and data pipelines stitch together schedulers, queues, workers, and status tables — and then spend more time handling partial failures and restarts than doing useful work. Putting durable execution inside the database flips that trade-off: workflows execute as checkpointed SQL graphs, state lives in the same backup/ACL model as your data, and the runtime resumes from the last checkpoint after crashes.

What Sets It Apart

SQL-first durable execution: workflow graphs are authored in SQL using composable operators (e.g., |=>, ~>) so pipeline logic lives next to the rows it touches instead of in external orchestrators. This reduces duplication of state and simplifies auditing.
In-process background worker + Postgres state: a background worker hosts the duroxide runtime while duroxide-pg persists instance state in Postgres schemas. That design avoids external queues or Redis for checkpointing and lets you query progress directly from the database.
Practical for AI pipelines: explicitly calls out embedding pipelines (chunk → embed → upsert to pgvector), ingest/transform flows, fan-out aggregations, and scheduling tasks — common patterns in model preprocessing, dataset builds, and inference enrichment.
Packaging & compatibility: publishes Debian packages for PostgreSQL 17 and 18 and provides dev workflows (codespace, pgrx) — making it practical to test or run where installing extensions is acceptable.

Who It's For and Trade-offs

Great fit if you already keep canonical state in Postgres and want background workflows that: must survive restarts, are auditable in SQL, or can be expressed as a sequence/graph of SQL steps. It removes many external moving parts (cron, workers, queues) and is especially attractive for embedding/ETL pipelines and database-driven automation.

Look elsewhere if you cannot install extensions or require heavy in-memory arbitrary application logic that doesn’t map to SQL (those steps need to be wrapped as SQL-callable functions or HTTP endpoints). The background worker role must be a superuser, so multi-tenant or highly restricted managed environments may require extra operational controls.

Where It Fits

Think of pg_durable as an alternative to glue-layer orchestrators (cron+jobs tables, lightweight workers) and a complement to broader orchestrators (Airflow, Temporal) when you prefer state locality. Use it to reduce latency and complexity for database-centric pipelines; keep a general orchestrator when workflows must tightly coordinate heterogeneous external systems.

pg_durable

Introduction

What Sets It Apart

Who It's For and Trade-offs

Where It Fits

Information

Categories

Tags

More Items

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

ODS (Osmantic Deployment System)

Openship