Why this matters
Large, diverse trajectory data is a key bottleneck for building capable software-engineering agents. Open-SWE-Traces fills that gap by delivering over 207k agentic trajectories — full conversation histories, tool calls, and final patches — specifically formatted for supervised fine-tuning and distillation of LLM-based coding agents.
What Sets It Apart
- Scale + structure: 207,489 trajectories with structured fields (instance_id, repo, license, language, trajectory, model_patch, resolved), ready for SFT or offline distillation without heavy pre-processing. This dataset aggregates traces from two agent scaffolds (OpenHands, SWE-agent) and two trajectory generators (MiniMax-M2.5 for explicit “thinking” traces, Qwen3.5-122B for behavioral traces).
- SWE-focused and multilingual: Tasks are drawn from permissively licensed PRs (MIT/Apache/BSD) and cover nine programming languages (Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, C++), making it suitable for building agents that must navigate real repos and tests.
- Reproducibility & commercial readiness: Distributed in Parquet with dataset configs for OpenHands and SWE-agent splits, licensed under CC BY 4.0 (with source repos’ original licenses noted), so teams can use it for both research and product development.
Who it’s for, and trade‑offs
Great fit if you are training or distilling coding assistants or autonomous SWE agents that require long-horizon tool usage and repo-aware behavior. The dataset is practical for supervised fine-tuning, behavioral cloning, offline RL pretraining, and benchmarking against SWE-bench–style tasks.
Look elsewhere if you need human-authored, hand-curated expert traces only: Open-SWE-Traces is predominantly model-synthesized and inherits biases and failure modes from the generating LLMs and agent scaffolds. Also, while licenses are permissive at the repo level, downstream users must still respect original repo licenses for redistributed code.
Where it fits
Use this dataset as a high-volume training corpus to bootstrap agent policies or to distill dual-mode (thinking vs. non-thinking) behaviors into smaller models. It complements executable environment collections (e.g., OpenSWE) rather than replacing them: pair traces with runnable environments when you need end-to-end evaluation.
