Long-horizon agents quickly exhaust token budgets and lose traceability when history is either naively accumulated or irreversibly summarized. This project tackles that by combining symbolic short-term memory (compact Mermaid canvases) with a progressive L0→L3 long-term layering so agents keep high-density abstractions while preserving a deterministic drill-down path to raw evidence. In benchmarks with OpenClaw, the plugin reports up to 61.38% token reduction and a ~51.5% relative improvement in pass rate.
What Sets It Apart
- Layered memory architecture (L0 Conversation → L1 Atom → L2 Scenario → L3 Persona). So what: recall becomes hierarchical and contextual rather than a blind vector search, which improves relevance for personalization and long-session tasks.
- Symbolic short-term memory via Mermaid canvases and node_id tracing. So what: large verbose tool logs are offloaded to files while a compact, machine- and human-readable graph stays in context — you can verify any node by drilling down to its raw ref.
- Local-first, production-ready engineering: SQLite + sqlite-vec backend by default, Hermes Gateway adapter and OpenClaw plugin. So what: runs without cloud APIs, easy to operate in privacy-sensitive or air-gapped environments.
- Hybrid retrieval (BM25 + embeddings + RRF) and explicit white-box artifacts (markdown personas, jsonl scenarios). So what: operators can inspect and debug memory artifacts directly rather than interpreting opaque vector scores.
Who It's For and Trade-offs
Great fit if you run long, stateful LLM-driven workflows (assistant platforms, task orchestration, developer agents) and need auditable, local memory that reduces token costs while keeping evidence. It’s also useful when you want an out-of-the-box plugin for OpenClaw or a Hermes sidecar. Look elsewhere if you require cross-device/cloud-synced enterprise memory today (portable memory / cross-framework migration is on the roadmap) or need a managed, high-availability remote vector DB by default — this project prioritizes local, inspectable storage and traceability over managed cloud offerings.
Where It Fits
Use it as a drop-in memory layer for agents that accumulate long interaction traces or heavy tool outputs. It’s most valuable when token budget, traceability, and local privacy are priorities, and when you want a debuggable memory pipeline rather than an opaque vector-only store.
