AIAny - Gemma-4-12B-OBLITERATED

The core insight behind this release is unsettling: refusal behavior can be surgically removed from a transformer via targeted weight modifications without measurable loss in standard benchmarks. That gap exposes a fragility in post-training alignment — useful as a controlled baseline for studying how safety constraints are geometrically encoded and how robust RLHF/DPO-style training is to weight-level attacks.

Key Capabilities

Zero refusals with benchmark parity: The model reports 0/842 refusals while matching stock MMLU-Pro performance, providing an uncensored baseline for capability-vs-safety experiments — so what: lets researchers separate 'capability' regressions from safety removal effects.
Two-pass abliteration pipeline (SOM + ASPA): A first pass removes refusal directions; a second pass selectively blends weights back to recover capabilities; so what: offers a reproducible method to probe layerwise loci of safety features.
Multi-format local deployment: Authors provide BF16 and multiple GGUF quantizations (Q8_0, Q6_K, Q5_K_M, Q4_K_M) and examples for transformers/llama.cpp; so what: enables red-team and interpretability experiments on commodity hardware.

Who it's for & Trade‑offs

Great fit if you are an alignment researcher, red‑teamer, or safety evaluator who needs an uncontrolled baseline to (a) measure how alignment is stored in weights, (b) stress-test defenses, or (c) build measurement protocols for refusal geometry. Look elsewhere if you need a production-ready, safety‑guarded model — this release intentionally removes guardrails and can generate harmful content. Users are responsible for legal and ethical compliance.

Where it fits

Compared with stock google/gemma-4-12B-it, this artifact is explicitly an experimental, surgically‑modified variant intended as a research probe rather than a consumer model. It belongs alongside other open red‑team baselines (HarmBench, JailbreakBench) and abliteration toolkits used in mechanistic interpretability and alignment robustness work.

Methodology (brief)

The pipeline uses two complementary steps: (1) SOM Refusal Geometry Removal on intermediate layers to ablate primary refusal directions; (2) ASPA (Abliteration Source‑Tethering) step-gradient blending across upper layers to restore capability parity without reintroducing refusal. The release documents layer groups, gamma schedules, and empirical sweep results used to choose the final step gradient.

Gemma-4-12B-OBLITERATED

Introduction

Key Capabilities

Who it's for & Trade‑offs

Where it fits

Methodology (brief)

Information

Categories

Tags

More Items

Qwen3-TTS-12Hz-1.7B-CustomVoice

GLM-5.2-Vision (NVFP4)

LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-Hermes-V5-GGUF