Why this matters
Many large-model coding workflows pay a high price in tokens and latency when every step enters long-form internal reasoning. This release intentionally flips that product assumption: instead of maximizing visible chain-of-thought, it tunes an LLM to act reliably and repeatedly with minimal deliberation so agent loops (read→edit→test→fix) run faster and cheaper.
What Sets It Apart
- Thinking-off agent target: trained to make fast next-step decisions (inspect, edit, test, summarize) rather than produce long visible reasoning each turn — reduces token waste and turnaround time.
- MoE + MTP inference stack: built from a 35B total / ~3B active-parameter hybrid sparse MoE foundation and preserves MTP heads for speculative multi-token decoding to improve local throughput.
- Agent-first evaluation: completed a Q5_K_M quantized SWE-bench submitted-patch run with 62.4% on 300 cases; benchmarking emphasizes sustained execution, multi-turn orchestration, and real patch submission quality rather than only isolated reasoning scores.
Who It's For and Trade-offs
Great fit if you run local or self-hosted coding agents, need repeated test-fix cycles, or want lower-latency multi-file edits in a structured harness. It pairs well with Codex/OpenCode-style agent loops and tooling that enforces strict schemas for tool calls and outputs.
Look elsewhere if your primary need is maximal reasoning transparency, long-form chain-of-thought for research, or the highest engineering-competence scores: evaluations show other thinking-on models can outperform on metacognition and some recall-heavy engineering metrics.
Where It Fits
Positioned as a practical execution model rather than a pure reasoning champion. In held-out comparisons it scores higher on operational categories (legit-request compliance, multi-turn orchestration, integrity under pressure) while models optimized for thinking-on retain advantages in long-context recall, metacognition, and context-poison resistance.
Training & Evaluation Notes
Built on Qwopus3.6-35B-A3B-v1 (derived from Qwen3.6-35B-A3B). Fine-tuning used Unsloth for memory-efficient adaptation; hardware and live-agent demos were collaborated with an external engineer to validate multi-file demos (RTS game sample) and long-horizon patch workflows. Recommended quantized evaluation build: Q5_K_M.
Practical takeaway: adopt this model when execution throughput, token efficiency, and stable multi-turn agent behavior matter more than exposing long internal reasoning traces.
