GLM-5

Name: GLM-5
Author: Zhipu AI

MIT

Zhipu AI · 744B · transformer-decoder

🤗 HuggingFace Ollama Official

2026-03-15 200K context 744B params

Use Cases

chat code reasoning multilingual tools math

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q2_Krec	2	300.0 GB	Moderate	—

About this model

GLM-5 is Zhipu AI's flagship reasoning model — a 744B parameter Mixture-of-Experts with 40B active parameters per token. It achieves state-of-the-art performance on reasoning and agentic benchmarks, competing with the best closed-source models. At 281 GB even at aggressive 2-bit quantization, GLM-5 requires enterprise-grade hardware — multiple high-VRAM GPUs or a Mac Studio/Pro with 300 GB+ unified memory. Not practical for consumer hardware, but available through Ollama for those with the resources.

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run glm-5:latest

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 744B
Architecture: transformer-decoder
Context: 200K tokens
Min VRAM: 300.0 GB
Recommended: 300.0 GB
Family: GLM
Released: 2026-03-15
License: MIT