Skip to content

GLM-5

by Zhipu AI · glm family

744B

parameters

text-generation code-generation reasoning multilingual tool-use math

GLM-5 is Zhipu AI's flagship reasoning model — a 744B parameter Mixture-of-Experts with 40B active parameters per token. It achieves state-of-the-art performance on reasoning and agentic benchmarks, competing with the best closed-source models. At 281 GB even at aggressive 2-bit quantization, GLM-5 requires enterprise-grade hardware — multiple high-VRAM GPUs or a Mac Studio/Pro with 300 GB+ unified memory. Not practical for consumer hardware, but available through Ollama for those with the resources.

Quick Start with Ollama

ollama run latest
Resources Ollama Hugging Face Official Page
Creator Zhipu AI
Parameters 744B
Architecture transformer-decoder
Context 195K tokens
Released Mar 15, 2026
License MIT
Ollama glm-5

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q2_K rec 281 GB 300 GB latest

Compatible Hardware

Q2_K requires 300 GB VRAM

Compatible Hardware

HardwareVRAMTypeFitEst. Speed
Mac Studio M4 Ultra 512GB512 GBmacRuns~3 tok/s
106 hardware device(s) cannot run this model at Q2_K.