GLM-5.1

Name: GLM-5.1
Author: Zhipu AI

by Zhipu AI · glm family

754B

parameters

text-generation code-generation reasoning multilingual tool-use math

GLM-5.1 is Zhipu AI's next-generation flagship model for agentic engineering, succeeding GLM-5 with significantly stronger coding and long-horizon task capabilities. A 754B parameter Mixture-of-Experts with 40B active parameters per token, it achieves state-of-the-art performance on SWE-Bench Pro (58.4), outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Designed for sustained autonomous work, GLM-5.1 can operate on a single task for up to 8 hours — planning, executing, testing, and iterating across hundreds of rounds and thousands of tool calls. Its MoE architecture keeps VRAM requirements manageable despite the massive parameter count, making it accessible on high-end consumer hardware at Q4_K_M quantization.

Quick Start with Ollama


ollama run latest

Resources Ollama Hugging Face Official Page

Creator	Zhipu AI
Parameters	754B
Architecture	transformer-moe
Context	198K tokens
Released	Apr 7, 2026
License	MIT
Ollama	glm-5.1

Quantization Options

Format	File Size	VRAM Required	Ollama Tag
Q2_K rec	285 GB	305 GB	`latest`
Q4_K_M	430 GB	450 GB	`q4_K_M`
Q8_0	800 GB	820 GB	`q8_0`

Compatible Hardware

Q2_K requires 305 GB VRAM

Compatible Hardware

Hardware	VRAM	Type	Fit	Est. Speed
Mac Studio M4 Ultra 512GB	512 GB	mac	Runs	~3 tok/s

106 hardware device(s) cannot run this model at Q2_K.

Benchmark Scores

58.4

swe-bench-pro