AIAny - Qwopus-3.6-35B-A3B-Coder-MTP-GGUF

Why this matters

Many large-model coding workflows pay a high price in tokens and latency when every step enters long-form internal reasoning. This release intentionally flips that product assumption: instead of maximizing visible chain-of-thought, it tunes an LLM to act reliably and repeatedly with minimal deliberation so agent loops (read→edit→test→fix) run faster and cheaper.

What Sets It Apart

Thinking-off agent target: trained to make fast next-step decisions (inspect, edit, test, summarize) rather than produce long visible reasoning each turn — reduces token waste and turnaround time.
MoE + MTP inference stack: built from a 35B total / ~3B active-parameter hybrid sparse MoE foundation and preserves MTP heads for speculative multi-token decoding to improve local throughput.
Agent-first evaluation: completed a Q5_K_M quantized SWE-bench submitted-patch run with 62.4% on 300 cases; benchmarking emphasizes sustained execution, multi-turn orchestration, and real patch submission quality rather than only isolated reasoning scores.

Who It's For and Trade-offs

Great fit if you run local or self-hosted coding agents, need repeated test-fix cycles, or want lower-latency multi-file edits in a structured harness. It pairs well with Codex/OpenCode-style agent loops and tooling that enforces strict schemas for tool calls and outputs.

Look elsewhere if your primary need is maximal reasoning transparency, long-form chain-of-thought for research, or the highest engineering-competence scores: evaluations show other thinking-on models can outperform on metacognition and some recall-heavy engineering metrics.

Where It Fits

Positioned as a practical execution model rather than a pure reasoning champion. In held-out comparisons it scores higher on operational categories (legit-request compliance, multi-turn orchestration, integrity under pressure) while models optimized for thinking-on retain advantages in long-context recall, metacognition, and context-poison resistance.

Training & Evaluation Notes

Built on Qwopus3.6-35B-A3B-v1 (derived from Qwen3.6-35B-A3B). Fine-tuning used Unsloth for memory-efficient adaptation; hardware and live-agent demos were collaborated with an external engineer to validate multi-file demos (RTS game sample) and long-horizon patch workflows. Recommended quantized evaluation build: Q5_K_M.

Practical takeaway: adopt this model when execution throughput, token efficiency, and stable multi-turn agent behavior matter more than exposing long internal reasoning traces.

Qwopus-3.6-35B-A3B-Coder-MTP-GGUF

Introduction

What Sets It Apart

Who It's For and Trade-offs

Where It Fits

Training & Evaluation Notes

Information

Categories

Tags

More Items

AFTER

BugTraceAI-CORE-Ultra-27B-Q6

GLM-5.2 (Unsloth GGUF)