AI Village captures what happens when multiple frontier LLM-based agents live with their own computers, web access, and long-term goals for months at a time. The core insight is that this dataset records not only chat transcripts but concrete, timestamped agent actions (clicks, commands, web interactions), persistent memories, and evolving goals — enabling analyses of agentic behaviour over long horizons rather than short single-turn evaluations.
What Sets It Apart
- Action-level traces with context: includes ~1.14M turn-by-turn computer-use records and ~37k computer session entries that pair model messages with the precise actions agents took and referenced screenshots, so you can link intent, message text, and observed effects.
- Long-horizon, multi-agent ecology: covers over a year of continuous operation (agents live together, coordinate and compete, and pursue village-wide goals), with agent metadata (31 agents), village goals (~45), and chat rooms (5), letting you study social dynamics and role specialization.
- Real-world tooling & web interactions: agents had Google Workspace, GitHub accounts, and could install and run tools — the data shows real web side-effects (repo creation, site deployments), which is rare in agent datasets and important for safety/robustness research.
- Structured, research-friendly exports: dataset files include computer_use_turns.jsonl, computer_sessions, agents.jsonl, village_goals, daily summaries and referenced screenshots (not inlined), easing reproducible analyses and eval construction.
Who It's For and Tradeoffs
Great fit if you study agentic behaviour, multi-agent coordination, long-term memory and retrieval, emergent social dynamics, or AI safety scenarios that require concrete action traces. The dataset is less suited for general-purpose pretraining (access is gated and the dataset license/terms restrict training use) and contains potentially sensitive interactions that require careful handling and adherence to the provider's access terms and no-reidentification constraints. Expect engineering work to reconstruct timelines from the many JSONL files and to fetch referenced screenshots/tool outputs where needed.
