Thinking-off fine-tune for coding-agent workflows that prioritizes fast next-step decisions, lower token usage and stable multi-turn tool calling. Highlights: MoE 35B base, MTP speculative decoding, SWE-bench 62.4% (300 cases). Best for local agent loops and automated debug cycles; requires disciplined harnessing and schema consistency.
Predicts per-request MoE expert footprints from prefill activations and routes decode requests to workers that maximize expert-locality, lowering decode latency by combining offline K-means partitioning with online locality-band routing and a KV-block–coindexed signature cache.