Why this matters Most coding models trade off verifiable correctness for scale or convenience. This GGUF release prioritizes execution-verified reasoning: training traces were preserved only when the reasoning led to passing, runnable Python solutions, so the model is explicitly optimized to "think through" algorithmic problems and emit code that runs.
Key Capabilities
- Execution-verified chain-of-thought: distilled primarily from Composer 2.5 traces, with a small set of Fable 5 re-solves to cover cases Composer missed — training kept examples only when generated code passed tests, which raises the likelihood that generated solutions are runnable without manual fixes.
- Multiple GGUF quantizations: Q2_K (~4.5 GB), Q4_K_M (~6.9 GB, recommended), Q6_K and Q8_0 for higher fidelity — enables running on constrained GPUs or unified-memory Apple Silicon with clear VRAM/context tradeoffs.
- Large context & native Gemma behavior: supports Gemma’s large context window (up to ~131K tokens depending on quant/KV cache) and uses a dedicated "thinking" channel consistent with its training, so enabling the thinking channel yields outputs resembling the training regime.
- Local-first deployment: packaged for llama.cpp and one-click apps (LM Studio, Jan, Ollama), enabling offline, private inference without cloud APIs.
Who it's for and tradeoffs
Great fit if you need a local coding assistant that favors runnable Python solutions over generic text fluency — especially for algorithmic/function-level problems where execution can be used as a fidelity signal. It’s also practical for developers on constrained hardware who want multiple quant options. Look elsewhere if you require production-grade safety alignment, broad non-coding knowledge accuracy, or an extensively benchmarked SOTA reasoning model; this release is a personal project with a focus on coding reasoning, reduced refusal behavior, and explicit warning that it is not safety-aligned. The Fable 5 supplement is limited in size (author notes access was pulled), so future versions plan to rely more on Composer 2.5 or other teachers for generalization. Also remember the base-model terms: derivatives must comply with Gemma Terms of Use.
Decision note If your goal is to iterate on runnable Python snippets locally and you can accept manual safety/validation layers, this GGUF build is a practical, low-overhead option. If you need a production service with safety guarantees or non-coding knowledge trustworthiness, treat this as a specialized tool to be combined with validation, guardrails, or alternative models.
