Skip to content

Best AI Models for 32 GB VRAM

Premium tier — 70B models with quantization. Here are all 80 models you can run locally with 32 GB of memory.

Hardware with 32 GB Memory

Runs Comfortably (72)

These models fit with room to spare for context window and OS overhead.

Model Params Quantization VRAM Quality
Dolphin Mixtral 8x7B 47B Q4_K_M 26 GB 4
Command R 35B 35B Q5_K_M 26 GB 4
Qwen 3.5 35B A3B 35B Q8_0 20 GB 5
Nous Hermes 2 34B 34B Q4_K_M 19 GB 4
Yi 1.5 34B 34B Q4_K_M 21 GB 4
WizardCoder 33B 33B Q4_K_M 22 GB 4
Aya Expanse 32B 32B Q4_K_M 22 GB 4
Cogito 32B 32B Q4_K_M 21.5 GB 4
DeepSeek R1 32B 32B Q5_K_M 23.9 GB 4
Qwen 2.5 32B 32B Q5_K_M 23.9 GB 4
Qwen 2.5 Coder 32B 32B Q4_K_M 23 GB 4
Qwen 3 32B 32B Q4_K_M 23 GB 4
QwQ 32B 32B Q4_K_M 21.5 GB 4
Gemma 4 31B 31B Q4_K_M 22 GB 4
Qwen 3 30B-A3B (MoE) 30B Q4_K_M 22 GB 4
Gemma 3 27B 27B Q4_K_M 20 GB 4
Qwen 3.5 27B 27B Q4_K_M 19 GB 4
Codestral 22B 22B Q8_0 24 GB 5
InternLM 2.5 20B 20B Q8_0 22 GB 5
StarCoder2 15B 15B Q8_0 17 GB 5
DeepSeek R1 14B 14B Q8_0 16 GB 5
Phi-4 14B 14B Q8_0 16 GB 5
Phi-4 Reasoning 14B 14B Q8_0 18 GB 4
Qwen 2.5 14B 14B Q8_0 16 GB 5
Qwen 2.5 Coder 14B 14B Q8_0 19 GB 5
Qwen 3 14B 14B Q8_0 19 GB 5
Llama 3.2 Vision 11B 11B F16 26 GB 5
Falcon 3 10B 10B F16 24 GB 5
Gemma 2 9B 9B F16 20 GB 5
Qwen 3.5 9B 9B Q8_0 11.5 GB 5
Yi 1.5 9B 9B F16 20 GB 5
Yi Coder 9B 9B F16 21 GB 5
Aya Expanse 8B 8B Q8_0 10.5 GB 5
Cogito 8B 8B F16 19.5 GB 5
DeepSeek R1 8B 8B F16 20 GB 5
Dolphin 3 8B 8B F16 18 GB 5
Granite 3.3 8B 8B F16 18 GB 5
Llama 3.1 8B 8B F16 18 GB 5
Nemotron 3 Nano 8B 8B F16 19.5 GB 5
Nous Hermes 2 8B 8B F16 18 GB 5
Qwen 3 8B 8B F16 20 GB 5
Codestral Mamba 7B 7B F16 17 GB 5
DeepSeek R1 7B 7B F16 16 GB 5
Falcon 3 7B 7B F16 17.5 GB 5
InternLM 2.5 7B 7B F16 16 GB 5
Mistral 7B 7B F16 16 GB 5
OpenChat 3.5 7B 7B F16 17 GB 5
Qwen 2.5 7B 7B F16 16 GB 5
Qwen 2.5 Coder 7B 7B F16 16 GB 5
Qwen 2.5 VL 7B 7B F16 18.5 GB 5
StarCoder2 7B 7B F16 16 GB 5
WizardLM 2 7B 7B F16 17 GB 5
Yi 1.5 6B 6B F16 14 GB 5
Gemma 3 4B 4B F16 11.5 GB 5
Gemma 3n E4B 4B F16 11 GB 5
Gemma 4 E4B 4B Q8_0 10 GB 5
Qwen 3 4B 4B F16 11 GB 5
Qwen 3.5 4B 4B Q8_0 6.5 GB 4
Phi-3 Mini 3.8B 3.8B F16 9.6 GB 5
Phi-4 Mini 3.8B 3.8B F16 10.5 GB 5
Llama 3.2 3B 3B F16 8 GB 5
StarCoder2 3B 3B F16 8 GB 5
Gemma 2 2B 2B F16 6 GB 5
Gemma 3n E2B 2B F16 6.5 GB 5
Gemma 4 E2B 2B Q8_0 6 GB 5
Qwen 3.5 2B 2B Q8_0 4.5 GB 4
SmolLM2 1.7B 1.7B F16 4.4 GB 5
DeepSeek R1 1.5B 1.5B F16 5 GB 5
Gemma 3 1B 1B F16 3.5 GB 5
Llama 3.2 1B 1B F16 4 GB 5
Qwen 3.5 0.8B 0.8B Q8_0 2 GB 4
Qwen 3 0.6B 0.6B F16 3.3 GB 5

Tight Fit (8)

These models run but with limited context window. Close other apps to free memory.

Model Params Quantization VRAM Quality
Mixtral 8x7B 47B Q4_K_M 29.7 GB 4
Gemma 2 27B 27B Q8_0 29 GB 5
Gemma 4 26B 26B Q8_0 30 GB 5
Devstral 24B 24B Q8_0 29 GB 5
Magistral Small 24B 24B Q8_0 28 GB 4
Mistral Small 3.1 24B 24B Q8_0 30 GB 5
Gemma 3 12B 12B F16 28 GB 5
Mistral Nemo 12B 12B F16 28 GB 5

Want to check a specific combination?

Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.

Open Compatibility Checker
8 GB Entry-level local AI 12 GB The minimum for a good experience 16 GB The sweet spot 24 GB Serious local AI