Skip to content

GLM-5.1

by Zhipu AI · glm family

754B

parameters

text-generation code-generation reasoning multilingual tool-use math

GLM-5.1 is Zhipu AI's next-generation flagship model for agentic engineering, succeeding GLM-5 with significantly stronger coding and long-horizon task capabilities. A 754B parameter Mixture-of-Experts with 40B active parameters per token, it achieves state-of-the-art performance on SWE-Bench Pro (58.4), outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Designed for sustained autonomous work, GLM-5.1 can operate on a single task for up to 8 hours — planning, executing, testing, and iterating across hundreds of rounds and thousands of tool calls. Its MoE architecture keeps VRAM requirements manageable despite the massive parameter count, making it accessible on high-end consumer hardware at Q4_K_M quantization.

Quick Start with Ollama

ollama run latest
Resources Ollama Hugging Face Official Page
Creator Zhipu AI
Parameters 754B
Architecture transformer-moe
Context 198K tokens
Released Apr 7, 2026
License MIT
Ollama glm-5.1

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q2_K rec 285 GB 305 GB latest
Q4_K_M 430 GB 450 GB q4_K_M
Q8_0 800 GB 820 GB q8_0

Compatible Hardware

Q2_K requires 305 GB VRAM

Compatible Hardware

HardwareVRAMTypeFitEst. Speed
Mac Studio M4 Ultra 512GB512 GBmacRuns~3 tok/s
106 hardware device(s) cannot run this model at Q2_K.

Benchmark Scores

58.4
swe-bench-pro