A surgically modified Gemma 4 (12B) that removes refusal behavior while preserving benchmark parity; released as an uncensored research artifact with GGUF quantizations for local inference and red‑team/alignment evaluation.
Turns clinical text into structured, de-identified clinical signals—entity extraction and PII de-identification—that run entirely on local hardware. Provides 1,000+ specialized medical NER models, multilingual support, Apple MLX acceleration, and Apache‑2.0 licensing.
The core insight behind this release is unsettling: refusal behavior can be surgically removed from a transformer via targeted weight modifications without measurable loss in standard benchmarks. That gap exposes a fragility in post-training alignment — useful as a controlled baseline for studying how safety constraints are geometrically encoded and how robust RLHF/DPO-style training is to weight-level attacks.
Great fit if you are an alignment researcher, red‑teamer, or safety evaluator who needs an uncontrolled baseline to (a) measure how alignment is stored in weights, (b) stress-test defenses, or (c) build measurement protocols for refusal geometry. Look elsewhere if you need a production-ready, safety‑guarded model — this release intentionally removes guardrails and can generate harmful content. Users are responsible for legal and ethical compliance.
Compared with stock google/gemma-4-12B-it, this artifact is explicitly an experimental, surgically‑modified variant intended as a research probe rather than a consumer model. It belongs alongside other open red‑team baselines (HarmBench, JailbreakBench) and abliteration toolkits used in mechanistic interpretability and alignment robustness work.
The pipeline uses two complementary steps: (1) SOM Refusal Geometry Removal on intermediate layers to ablate primary refusal directions; (2) ASPA (Abliteration Source‑Tethering) step-gradient blending across upper layers to restore capability parity without reintroducing refusal. The release documents layer groups, gamma schedules, and empirical sweep results used to choose the final step gradient.