Why this matters
Long, uncensored model traces reveal process-level behavior that short Q&A datasets hide: planning, backtracking, verification and multi-stage decomposition emerge only across very long outputs. This collection condenses that signal into a compact package — 462 examples but ~104.7M characters of reasoning — making it efficient to study long-context dynamics, failure modes that arise only after many intermediate steps, and the raw reasoning manifold of an unaligned base model.
What Sets It Apart
- Full-model distillate, not a light or safety-tuned snapshot — the dataset preserves extended thinking traces attributed by the author to the complete Mythos V2 weights, so you can inspect raw planning and verification patterns that aligned variants often suppress. This matters if you need to observe authentic failure modes or build evaluators that target base-model cognition.
- Extreme reasoning density in a small example count — long traces range from ~12.8K up to ~552K characters (94 examples exceed 300K), totaling ≈104.7M reasoning characters. That density lets you run targeted long-context experiments without parsing millions of short examples.
- Domain breadth with deep traces — entries cover cybersecurity, biomedicine, software architecture, AI/LLM reasoning and formal math, enabling cross-domain analysis of how long-form reasoning strategies generalize or break.
- SFT / process-supervision ready formatting — records are provided as query + thinking (and sometimes response) JSONL rows, suitable for supervised fine-tuning or trace-aware evaluation pipelines where platform and license conditions permit.
Who it's for — and trade-offs
Great fit if you are building or evaluating long-context models, researching chain-of-thought/process supervision, or analyzing domain-specific reasoning (e.g., vulnerability analysis or biomedical argumentation) and need authentic, uncensored traces to surface nuanced behaviors. Look elsewhere if you require permissive, well-documented licensing (this dataset's license is listed as unknown) or strictly sanitized safety-aligned outputs — the collection intentionally preserves raw model cognition and may contain harmful, sensitive, or technically actionable content. Verify legal and ethical constraints before using for SFT or public deployment.
