Why this matters
Recorded, real-world agent sessions reveal how coding-focused LLM agents actually behave when issuing tools, editing code, and interacting with developers—information that synthetic benchmarks or API logs alone often miss. Trace Commons provides these traces as raw agent-native session files (plus a Parquet table), with a donation-and-scrub workflow designed to balance openness with privacy risk reduction.
What Sets It Apart
- Raw, harness-native traces: sessions are kept as the agent produced them (sessions/claude_code, sessions/codex, sessions/pi, sessions/cursor, sessions/opencode), enabling researchers to inspect full message exchanges, parsed tool calls, and command outputs rather than a flattened summary.
- Contributor-driven donation with local scrubbing: a donate-trace skill runs a deterministic scrubber (scrub.py) on contributors' machines, then asks for human review before a PR is opened. The ingestion server re-runs broader scanners (TruffleHog) and maintainers review each donation—this creates an auditable pipeline rather than an ad-hoc upload.
- Practical trade-offs surfaced: the compilation is licensed CC-BY-4.0 and curated for openness, but contributors certify provenance; maintainers do quality/secret checks, not legal provenance checks. Traces therefore reflect real usage patterns but are a voluntary, non-representative sample.
Who It's For & Tradeoffs
Great fit if you need realistic agent interaction data for evaluation, prompting research, tool-use analysis, or training models that handle multi-step tool invocations. The dataset preserves execution telemetry and agent-native structure, which helps reproducibility and fine-grained analysis.
Look elsewhere if you require guaranteed anonymity, complete provenance guarantees, or a statistically representative corpus of agent usage. Anonymization is best-effort (deterministic regex scrub, TruffleHog scanning, human review); novel secrets, names, or internal references can slip through, and individual traces may carry their own original licenses.
Where It Fits
Use this dataset when you want empirical traces of coding-agent workflows (multi-tool calls, terminal outputs, patch edits) rather than synthetic prompts or aggregated logs. It complements benchmark suites and simulator-generated traces by offering human-contributed, operational sessions that reveal real-world prompting patterns, failure modes, and tool orchestration behavior.
Practical notes
- Data format: raw session files per harness plus a Parquet table under data/ for convenient loading.
- Collection process: donation via a skill, local deterministic scrub, contributor review, PR-based ingestion with TruffleHog backstop and maintainer review.
- License & responsibility: the compilation is CC-BY-4.0; users must verify licenses and privacy suitability of individual traces before reuse.
