Gemma 4 26B

Name: Gemma 4 26B
Author: Google

Apache 2.0

Google · 26B · transformer-decoder

🤗 HuggingFace Ollama Official

2026-04-02 262K context 26B params

Use Cases

chat code reasoning multilingual vision tools math

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	20.0 GB	Good	—
Q8_0	8	30.0 GB	Excellent	—

About this model

Gemma 4 26B is a Mixture-of-Experts model with 26B total parameters but only 3.8B active per token, giving it exceptional efficiency. It ranks #6 on Arena AI among open models and scores 88.3% on AIME 2026 — remarkable for its active parameter count. The MoE architecture means it fits in ~20 GB VRAM at Q4 while delivering reasoning quality that rivals much larger dense models. Supports 256K context with native vision and tool use.

Benchmarks

88.3

aime2026

77.1

livecodebench

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run gemma4:26b-a4b-it-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 26B
Architecture: transformer-decoder
Context: 262K tokens
Min VRAM: 20.0 GB
Recommended: 20.0 GB
Family: Gemma 4
Released: 2026-04-02
License: Apache 2.0