Long-horizon interaction histories break standard "retrieve-then-reason" pipelines: static retrieval pulls a fixed set of documents that cannot adapt as intermediate evidence emerges. The core insight here is that memory for LLM agents should be reconstructed during reasoning, not treated as a one-shot retrieval target. By exposing lightweight associative tags and letting the model explore and prune candidate retrieval paths, the system adapts memory access to the evolving inference context and avoids combinatorial explosion.
Key Findings
- Active reconstruction mechanism: the agent traverses a Cue–Tag–Content graph, using associative tags as semantic bridges to choose paths before loading full content, then iteratively prunes branches based on intermediate evidence. This reduces unnecessary content expansion.
- Empirical gains: experiments on the LoCoMo and LongMemEval benchmarks report improvements up to ~23% over strong baselines, alongside noticeable reductions in token usage and runtime during inference.
- Practical effect: tags enable coarse, cheap selection of promising memory paths; integrating LLM reasoning into retrieval lets the system dynamically focus retrieval on high-information paths instead of retrieving a large static support set.
Who it's for and tradeoffs
Great fit if you build or evaluate LLM-based agents that must reason over long, multi-session histories (personal assistants, long-dialog QA, persistent agents) and you need a retrieval strategy that adapts during multi-step reasoning. Look elsewhere if your tasks are short-context or you need minimal engineering overhead: constructing and maintaining a Cue–Tag–Content graph and running iterative LLM-guided reconstruction adds pipeline complexity and extra model calls compared with single-shot retrieval.
Method snapshot
The pipeline rewrites dialogue turns into normalized cues, builds a graph linking cues→tags→contents, retrieves candidate content by coarse embedding similarity then re-ranks with an LLM, and runs a tool-calling loop that performs tag-conditioned keyword/topic/personal/temporal lookups to refine answers. The design explicitly trades upfront graph construction and multiple lightweight LLM calls for more targeted, lower-cost content access during complex reasoning tasks.
