Qwen 3 8B

Name: Qwen 3 8B
Author: Alibaba

Apache 2.0

Alibaba · 8B · transformer-decoder

🤗 HuggingFace Ollama Official

2025-04-29 131K context 8B params

Use Cases

chat code reasoning multilingual math tools summary

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	7.5 GB	Good	—
Q8_0	8	11.5 GB	Good	—
F16	16	20.0 GB	Excellent	—

About this model

Qwen 3 8B is the workhorse of the Qwen 3 dense lineup, offering an excellent balance of capability and resource efficiency. Features hybrid thinking mode for adaptive reasoning depth and supports tool calling for agentic workflows. At Q4 it fits on 8 GB GPUs with some headroom, and runs comfortably on 12-16 GB hardware. Strong at coding, math, and multilingual tasks — a direct upgrade over Llama 3.1 8B in most benchmarks.

Benchmarks

73.5

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run qwen3:8b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 8B
Architecture: transformer-decoder
Context: 131K tokens
Min VRAM: 7.5 GB
Recommended: 7.5 GB
Family: Qwen 3
Released: 2025-04-29
License: Apache 2.0