Why this model matters This release packages a locally runnable, quantized 40B Qwen3.6-derived model that the author claims improves reasoning, stability and long-conversation behavior by (1) expanding a 27B base to 40B, (2) multi-stage fine-tuning on Claude 4.6 Opus and Deckard/PDK datasets, and (3) custom "NEO-CODE-Di-IMatrix" GGUF quants. If you need a large, locally deployable multimodal LLM with very large context windows and tailored quants for extended sessions, it’s an explicit candidate — with strong caveats on safety and reliability.
Key Capabilities
- Multimodal (image-text-to-text) operation: packaged as a Hugging Face model with pipeline_tag image-text-to-text and tested image-processing support (requires a companion mmproj for vision). So what: lets you run VLM-style prompts locally without cloud inference.
- Large context and long-conversation tuning: claims native 256K context and tuning for preserved "thinking" traces. So what: better handling of long documents, multi-turn reasoning and agent-like workflows when supported by your inference stack.
- Custom GGUF quants (NEO-CODE-Di-IMatrix-MAX): several imatrix-based quant options (IQ4_XS, Q6/Q8 variants) benchmarked vs bf16. So what: reduced memory and faster local inference while aiming to retain high fidelity for coding and long sessions.
- Uncensored/Heretic tuning and aggressive persona: intentionally removes safety alignment to produce uncensored outputs and a strong character voice. So what: high creative or unconstrained outputs, but significant risks for safety/compliance in many contexts.
Who it's for & tradeoffs
Great fit if you run local inference and need a high-context, multimodal LLM for creative writing, roleplay, coding assistance or offline experimentation and you can accept large downloads and manual deployment. It’s also attractive for users who test custom quants (GGUF) and want vLLM/sglang/transformers/vllm-compatible artifacts. Look elsewhere if you need a safety-aligned production model, strict content moderation, or a vendor-supported model: this release explicitly states safety alignment is removed and uses provocative marketing (“uncensored”, “no nanny”). Expect heavy resource needs, potential instability with lower quants, and maintenance responsibility for updates and safety controls.
Additional notes Author: DavidAU. Published on Hugging Face (Apache-2.0 listed in model card). Created 2026-05-01; last modified 2026-06-11. Community metrics (downloads/likes) indicate notable interest but not official institutional backing. Practical deployment requires attention to quant selection, accompanying mmproj for vision, and the legal/ethical implications of uncensored outputs.
