Qwen 3 30B-A3B (MoE)

by Alibaba · qwen-3 family

30B

parameters

text-generation code-generation reasoning multilingual math tool-use summarization

Qwen 3 30B-A3B is a mixture-of-experts model with 30B total parameters but only 3B active per token, delivering surprisingly strong performance with fast inference speed. It achieves results comparable to much larger dense models while generating tokens as quickly as a 3B model. Despite needing ~22 GB VRAM at Q4 (all expert weights must be loaded), inference is extremely fast since only 3B params activate per token. A unique efficiency pick for users with 24 GB+ VRAM who want both quality and speed.

Quick Start with Ollama

ollama run 30b-a3b-q4_K_M
Resources Ollama Hugging Face Official Page
Creator Alibaba
Parameters 30B
Architecture mixture-of-experts
Context 128K tokens
Released Apr 29, 2025
License Apache 2.0
Ollama qwen3:30b-a3b

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q4_K_M rec 19 GB 22 GB
30b-a3b-q4_K_M
Q8_0 32.5 GB 37 GB
30b-a3b-q8_0
F16 62 GB 67 GB
30b-a3b-fp16

Compatible Hardware

for Q4_K_M (22 GB VRAM)

Compatible Hardware

Hardware VRAM Type Fit Est. Speed
Mac Studio M4 Ultra 512GB 512 GB mac Runs ~37 tok/s
Mac Pro M2 Ultra 192GB 192 GB mac Runs ~36 tok/s
Mac Studio M4 Ultra 192GB 192 GB mac Runs ~37 tok/s
Mac Studio M4 Max 128GB 128 GB mac Runs ~25 tok/s
MacBook Pro M4 Max 128GB 128 GB mac Runs ~25 tok/s
MacBook Pro M3 Max 96GB 96 GB mac Runs ~18 tok/s
Mac mini M4 Pro 64GB 64 GB mac Runs ~12 tok/s
Mac Studio M4 Max 64GB 64 GB mac Runs ~25 tok/s
MacBook Pro M4 Max 64GB 64 GB mac Runs ~25 tok/s
Mac mini M4 Pro 48GB 48 GB mac Runs ~12 tok/s
MacBook Pro M3 Max 48GB 48 GB mac Runs ~18 tok/s
MacBook Pro M4 Max 48GB 48 GB mac Runs ~25 tok/s
MacBook Pro M4 Pro 48GB 48 GB mac Runs ~12 tok/s
Mac Studio M4 Max 36GB 36 GB mac Runs ~25 tok/s
MacBook Pro M3 Pro 36GB 36 GB mac Runs ~7 tok/s
NVIDIA GeForce RTX 5090 32 GB gpu Runs ~81 tok/s
iMac M4 32GB 32 GB mac Runs ~5 tok/s
Mac mini M4 32GB 32 GB mac Runs ~5 tok/s
MacBook Air M4 32GB 32 GB mac Runs ~5 tok/s
AMD Radeon RX 7900 XTX 24 GB gpu Runs (tight) ~44 tok/s
NVIDIA GeForce RTX 3090 24 GB gpu Runs (tight) ~43 tok/s
NVIDIA GeForce RTX 4090 24 GB gpu Runs (tight) ~46 tok/s
iMac M3 24GB 24 GB mac Runs (tight) ~5 tok/s
Mac mini M2 24GB 24 GB mac Runs (tight) ~5 tok/s
Mac mini M4 Pro 24GB 24 GB mac Runs (tight) ~12 tok/s
MacBook Air M2 24GB 24 GB mac Runs (tight) ~5 tok/s
MacBook Air M4 24GB 24 GB mac Runs (tight) ~5 tok/s
MacBook Pro M4 Pro 24GB 24 GB mac Runs (tight) ~12 tok/s
AMD Radeon RX 7900 XT 20 GB gpu CPU Offload ~36 tok/s
MacBook Pro M3 Pro 18GB 18 GB mac CPU Offload ~7 tok/s
AMD Radeon RX 6800 XT 16 GB gpu CPU Offload ~23 tok/s
AMD Radeon RX 7800 XT 16 GB gpu CPU Offload ~28 tok/s
Intel Arc A770 16 GB gpu CPU Offload ~25 tok/s
NVIDIA GeForce RTX 4060 Ti 16GB 16 GB gpu CPU Offload ~13 tok/s
NVIDIA GeForce RTX 4070 Ti Super 16 GB gpu CPU Offload ~31 tok/s
NVIDIA GeForce RTX 4080 Super 16 GB gpu CPU Offload ~33 tok/s
NVIDIA GeForce RTX 4080 16 GB gpu CPU Offload ~33 tok/s
NVIDIA GeForce RTX 5070 Ti 16 GB gpu CPU Offload ~41 tok/s
NVIDIA GeForce RTX 5080 16 GB gpu CPU Offload ~44 tok/s
iMac M1 16GB 16 GB mac CPU Offload ~3 tok/s
iMac M4 16GB 16 GB mac CPU Offload ~5 tok/s
Mac mini M1 16GB 16 GB mac CPU Offload ~3 tok/s
Mac mini M4 16GB 16 GB mac CPU Offload ~5 tok/s
MacBook Air M2 16GB 16 GB mac CPU Offload ~5 tok/s
MacBook Air M3 16GB 16 GB mac CPU Offload ~5 tok/s
MacBook Air M4 16GB 16 GB mac CPU Offload ~5 tok/s
MacBook Pro M1 16GB 16 GB mac CPU Offload ~3 tok/s
MacBook Pro M2 Pro 16GB 16 GB mac CPU Offload ~9 tok/s
17 hardware device(s) cannot run this model configuration.

Benchmark Scores

72.0
mmlu