QwQ 32B

Name: QwQ 32B
Author: Alibaba

Apache 2.0

Alibaba · 32B · transformer-decoder

🤗 HuggingFace Ollama Official

2025-03-06 131K context 32B params

Use Cases

chat code reasoning multilingual math

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	21.5 GB	Good	—
Q8_0	8	37.0 GB	Good	—
F16	16	68.0 GB	Excellent	—

About this model

QwQ 32B is Alibaba's reasoning-focused model from the Qwen family, designed to excel at complex mathematical and logical reasoning tasks through chain-of-thought processing. At 32B parameters, it delivers reasoning performance that punches well above its weight class. The model is particularly strong at math competitions, code reasoning, and scientific problem solving. Its relatively compact size makes it accessible on high-end consumer GPUs at Q4 quantization, offering an excellent balance of reasoning capability and hardware requirements.

Benchmarks

82.5

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run qwq:q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 32B
Architecture: transformer-decoder
Context: 131K tokens
Min VRAM: 21.5 GB
Recommended: 21.5 GB
Family: Qwen 3
Released: 2025-03-06
License: Apache 2.0