AIAny - Gemma4-12B v2 — Coding + Agentic Edition (GGUF)

Why this matters

Running a capable agentic coding assistant locally usually requires heavy hardware or cloud access. This release packs a Gemma 4 12B fine-tune into compact GGUF quants so you can run a private coding + tool-using agent on modest hardware (≈4.5 GB VRAM/unified memory) while keeping an agentic read→reason→act→verify workflow intact.

What Sets It Apart

Agentic-first fine-tune: training emphasizes multi-step terminal/tool trajectories (read → reason → act → verify), which markedly improves real-world debugging/terminal loops compared to the base Gemma 4 assistant. The author reports a tau2-bench telecom jump from ~15% (base) to ~55% (v2) under identical local Q8_0 conditions — roughly a 3.5× relative improvement on that agentic benchmark.
Practical local deployment: ships ready GGUF quants in several sizes (Q3_K_M 5.7 GB, Q4_K_M ~6.87 GB recommended, Q6_K ~9.11 GB, Q8_0 ~11.8 GB) so users can pick a trade-off between VRAM and fidelity. Recommended runtime is llama.cpp with the gemma4_unified loader; a specific llama.cpp build (b9553) is recommended for MTP/speculative-draft support due to loader sensitivity in newer builds.
Grounded tool behavior: the fine-tune preserves a “read-before-act” habit — it tends to grep/read/ls first and avoid fabricating file paths or values in terminal tasks, matching the base model on a fabrication probe.
Open license and provenance: published under Apache-2.0 and built on google/gemma-4-12B-it; the release includes a full-precision safetensors master for builders and quantized GGUFs for users.

Who it's for and trade-offs

Great fit if: you need a local coding assistant that can use tools and operate in multi-step terminal workflows, or you want an on-device agentic model that runs on small GPUs or unified-memory laptops.

Look elsewhere if: you need a broad generalist for knowledge-heavy benchmarks (v2 deliberately trades a bit of general MMLU-style breadth for agentic/coding capability), require strong safety guardrails out of the box (v2 is task-focused and not safety-aligned), or depend on GUI-first integrations rather than terminal/tool pipelines.

Practical notes

Recommended quant: Q4_K_M (sweet spot). Smallest reliable quant: Q3_K_M; Q2_K was withheld. Full-quality: Q8_0.
Runtime tips: use llama.cpp with --jinja to pass tools via the OpenAI-style tools field; for MTP/speculative decoding, llama.cpp b9553 (commit cited by the author) is noted as verified. If you see repeating-output artifacts, adjust sampler settings (rep_pen and temperature) as recommended by the author.
Limitations: English-centric, reduced refusals due to task-focused fine-tuning (add external guardrails for production), and some remaining failure modes include over-trying or retry loops on hard agentic tasks.

Bottom line: an opinionated, locally runnable Gemma 4 12B fine-tune that substantially ups agentic/terminal performance at the cost of a small hit to generalist benchmarks — a practical choice for developers who want a private, tool-using coding agent on constrained hardware.

Gemma4-12B v2 — Coding + Agentic Edition (GGUF)

Introduction

What Sets It Apart

Who it's for and trade-offs

Practical notes

Information

Categories

Tags

More Items

AFTER

BugTraceAI-CORE-Ultra-27B-Q6

Qwopus-3.6-35B-A3B-Coder-MTP-GGUF