AIAny - MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

Most memory benchmarks check storage, retrieval, or update fidelity — not how retrieved memories sway downstream decisions. MemSyco-Bench focuses on that gap: it measures when memory helps versus when it causes models to over-align with users (sycophancy), producing systematically wrong or biased outputs.

Key Findings

MemSyco-Bench organizes evaluation into five complementary tasks (Objective Fact Judgment, Contextual Scope Control, Memory–Evidence Conflict, Valid Memory Selection, Personalized Memory Use) and supplies 1,550 final samples with standardized scoring. It compares NoMemory, full-dialogue (RawDialogue), and multiple memory‑system settings, and includes open-ended LLM judging and unified baseline adapters so researchers can isolate failure modes like stale, conflicting, or overgeneralized memories.

Who it's for and trade-offs

Great fit if you build or evaluate memory-augmented agents and need targeted tests for preference-driven failures or personalization harms. The benchmark surfaces when memory retrieval helps and when it should be ignored, but it is specialized: it emphasizes preference-related and decision-making effects of memory rather than exhaustively measuring all memory competencies (e.g., capacity, low-level retrieval latency). Expect to complement this with other benchmarks for throughput, long-range factual retention, or systems-level deployment metrics.

Where it fits

Use MemSyco-Bench to compare memory extraction strategies, retrieval formats, and mitigation techniques (e.g., richer context extraction or summarization) when your agent must avoid blindly echoing user beliefs. The repository includes evaluation scripts, baseline adapters, and leaderboards to facilitate reproducible comparisons.

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

Introduction

Key Findings

Who it's for and trade-offs

Where it fits

Information

Categories

Tags

More Items

ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving

Kairos: A Native World Model Stack for Physical AI

Guava: An Effective and Universal Harness for Embodied Manipulation