Tag
Explore by tags
2026
Hugging Face, Alibaba Cloud (Qwen) +1
Jackrong
Thinking-off fine-tune for coding-agent workflows that prioritizes fast next-step decisions, lower token usage and stable multi-turn tool calling. Highlights: MoE 35B base, MTP speculative decoding, SWE-bench 62.4% (300 cases). Best for local agent loops and automated debug cycles; requires disciplined harnessing and schema consistency.
2026
KAISTDaejeonKorea, Microsoft ResearchBeijingChina +2
Sangjin Choi, Sukmin Cho +4
Predicts per-request MoE expert footprints from prefill activations and routes decode requests to workers that maximize expert-locality, lowering decode latency by combining offline K-means partitioning with online locality-band routing and a KV-block–coindexed signature cache.
