LogoAIAny
Icon for item

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GLM-4.5 is a foundation model series from Zhipu AI (GLM Team) designed for agentic tasks, combining reasoning, coding, and tool-using capabilities. It includes multiple model sizes (e.g., 355B and 106B variants), supports a thinking-mode for complex reasoning and tool calls, and releases open-source weights and FP8 versions for research and deployment.

Introduction

Overview

GLM-4.5 is a family of large foundation models developed by the GLM Team (Zhipu AI) aimed at powering intelligent agents and complex reasoning/coding workflows. The series unifies reasoning, coding, and tool usage in a hybrid architecture that supports a dedicated "thinking mode" for multi-step reasoning and tool-integrated inference. GLM-4.5 comes in multiple sizes (notably a 355B parameter version and a more compact 106B "Air" variant) and the project provides both full-precision and FP8 releases to facilitate efficient inference and research.

Key Features
  • Agentic capabilities: Designed for agent frameworks and tool-using workflows; supports calling tools with structured tool-call parsers and reasoning parsers.
  • Thinking modes: Offers Interleaved Thinking (reasoning before actions), Preserved Thinking (retain reasoning across turns for agentic consistency), and turn-level control to trade off latency vs. depth of reasoning.
  • Coding & "Vibe Coding": Strong focus on coding tasks and UI/page generation quality (called "vibe coding"), with measured gains on coding benchmarks and improved front-end/page generation.
  • Multiple precisions & deployment options: Provides BF16 and FP8 checkpoints; includes guidance for running with vLLM, SGLang, and transformers integrations, plus hardware recommendations (H100/H200 configurations) for different precisions and context-length targets.
  • Open-source release: Base models, hybrid reasoning models, and FP8 versions are released under the MIT license with download mirrors on Hugging Face and ModelScope.
Use cases
  • Building intelligent agents that require tool use, multi-step planning, and preserved multi-turn reasoning.
  • Coding assistants and automated code generation (including multilingual coding scenarios and terminal-based task automation).
  • Research and production deployment of large LLMs with options for FP8 efficient inference.
Artifacts & Resources
  • GitHub repo: the implementation, tooling, inference scripts, and guidance live in the repository (this project).
  • Official blog/technical report: technical blog and an arXiv technical report provide evaluation details and benchmarks.
  • Model downloads: checkpoints available on Hugging Face and ModelScope; integration examples for vLLM and SGLang are included.
Technical & Operational Notes

GLM-4.5 emphasizes realistic system-level deployment: the README documents recommended GPU counts and configurations to realize full context windows (up to 128K for the series), speculative decoding settings for competitive latency, and guidelines for LoRA/SFT/RL fine-tuning experiments. The project also provides parser hooks (tool-call parser, reasoning parser) for smooth integration with agent frameworks.

Information

  • Websitegithub.com
  • AuthorsGLM Team (Zhipu AI), zai-org (GitHub)
  • Published date2025/07/20