AIAny - GLM-5.2 (Unsloth GGUF)

Long-context LLMs are difficult to run locally at scale; this GGUF distribution packages GLM-5.2 so you can run the model with Unsloth dynamic quantization and common local inference tooling.

Key Capabilities

1M-token context: Enables stable long-horizon tasks (editing, long-form reasoning, multi-file codebases) without fragmenting context.
Quantized GGUF builds: Dynamic 1-bit and 2-bit variants (and higher-bit options) let you trade model footprint versus fidelity; 1-bit ≈ 223 GB total memory, 2-bit ≈ 239 GB on disk in common distributions.
Runtime & integration: Designed for llama.cpp, Unsloth Studio, vLLM and transformers ecosystems; includes presets for reasoning effort (non-thinking, high, max) and speculative decoding improvements.
Architecture notes: Builds on GLM-5.2 innovations (IndexShare sparse indexing and MTP improvements) to reduce per-token FLOPs at very long contexts and increase speculative decoding acceptance.

Who it's for and trade-offs

Great fit if you want to run a large long-context LLM locally or on-premise (researchers, teams testing agentic chains, developers evaluating long-form code generation) and can provide large unified memory or a mix of VRAM+RAM. Look elsewhere if you need a tiny footprint (edge devices) or cannot meet the hundreds of GBs of total memory required for useful quantized variants. The package prioritizes reproducible, local inference and measurable trade-offs between quant levels (file size vs. accuracy).

GLM-5.2 (Unsloth GGUF)

Introduction

Key Capabilities

Who it's for and trade-offs

Information

Categories

Tags

More Items

BugTraceAI-CORE-Ultra-27B-Q6

Qwopus-3.6-35B-A3B-Coder-MTP-GGUF

Huihui-GLM-5.2-abliterated-GGUF