Qwen 3 30B-A3B (MoE)

Name: Qwen 3 30B-A3B (MoE)
Author: Alibaba

Apache 2.0

Alibaba · 30B · mixture-of-experts

🤗 HuggingFace Ollama Official

2025-04-29 131K context 30B params

Use Cases

chat code reasoning multilingual math tools summary

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	22.0 GB	Good	—
Q8_0	8	37.0 GB	Excellent	—
F16	16	67.0 GB	Excellent	—

About this model

Qwen 3 30B-A3B is a mixture-of-experts model with 30B total parameters but only 3B active per token, delivering surprisingly strong performance with fast inference speed. It achieves results comparable to much larger dense models while generating tokens as quickly as a 3B model. Despite needing ~22 GB VRAM at Q4 (all expert weights must be loaded), inference is extremely fast since only 3B params activate per token. A unique efficiency pick for users with 24 GB+ VRAM who want both quality and speed.

Benchmarks

72.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run qwen3:30b-a3b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 30B
Architecture: mixture-of-experts
Context: 131K tokens
Min VRAM: 22.0 GB
Recommended: 22.0 GB
Family: Qwen 3
Released: 2025-04-29
License: Apache 2.0