Skip to content

Best AI Models for 16 GB VRAM

The sweet spot — 14B at Q8, smaller models at full precision. Here are all 58 models you can run locally with 16 GB of memory.

Hardware with 16 GB Memory

Runs Comfortably (55)

These models fit with room to spare for context window and OS overhead.

Model Params Quantization VRAM Quality
Qwen 3.5 35B A3B 35B Q4_K_M 12 GB 4
InternLM 2.5 20B 20B Q4_K_M 12 GB 4
StarCoder2 15B 15B Q5_K_M 12 GB 4
DeepSeek R1 14B 14B Q5_K_M 11.3 GB 4
Phi-4 14B 14B Q5_K_M 11.3 GB 4
Phi-4 Reasoning 14B 14B Q4_K_M 11 GB 4
Qwen 2.5 14B 14B Q5_K_M 11.3 GB 4
Qwen 2.5 Coder 14B 14B Q4_K_M 12 GB 4
Qwen 3 14B 14B Q4_K_M 12 GB 4
Gemma 3 12B 12B Q4_K_M 10.5 GB 4
Mistral Nemo 12B 12B Q4_K_M 9.5 GB 4
Falcon 3 10B 10B Q8_0 13 GB 4
Gemma 2 9B 9B Q8_0 11 GB 4
Qwen 3.5 9B 9B Q8_0 11.5 GB 5
Yi 1.5 9B 9B Q8_0 11 GB 5
Yi Coder 9B 9B Q8_0 12 GB 5
Aya Expanse 8B 8B Q8_0 10.5 GB 5
Cogito 8B 8B Q8_0 11 GB 4
DeepSeek R1 8B 8B Q8_0 11.5 GB 4
Dolphin 3 8B 8B Q8_0 10 GB 5
Granite 3.3 8B 8B Q8_0 10 GB 4
Llama 3.1 8B 8B Q8_0 10 GB 4
Nemotron 3 Nano 8B 8B Q8_0 11 GB 4
Nous Hermes 2 8B 8B Q8_0 10 GB 5
Qwen 3 8B 8B Q8_0 11.5 GB 4
Codestral Mamba 7B 7B Q8_0 9.9 GB 5
DeepSeek R1 7B 7B Q8_0 9 GB 4
Falcon 3 7B 7B Q8_0 10 GB 4
InternLM 2.5 7B 7B Q8_0 9 GB 5
Mistral 7B 7B Q8_0 9 GB 4
OpenChat 3.5 7B 7B Q8_0 9.9 GB 5
Qwen 2.5 7B 7B Q8_0 9 GB 4
Qwen 2.5 Coder 7B 7B Q8_0 9 GB 4
Qwen 2.5 VL 7B 7B Q8_0 10.5 GB 4
StarCoder2 7B 7B Q8_0 9 GB 5
WizardLM 2 7B 7B Q8_0 9.9 GB 5
Gemma 3 4B 4B F16 11.5 GB 5
Gemma 3n E4B 4B F16 11 GB 5
Gemma 4 E4B 4B Q8_0 10 GB 5
Qwen 3 4B 4B F16 11 GB 5
Qwen 3.5 4B 4B Q8_0 6.5 GB 4
Phi-3 Mini 3.8B 3.8B F16 9.6 GB 5
Phi-4 Mini 3.8B 3.8B F16 10.5 GB 5
Llama 3.2 3B 3B F16 8 GB 5
StarCoder2 3B 3B F16 8 GB 5
Gemma 2 2B 2B F16 6 GB 5
Gemma 3n E2B 2B F16 6.5 GB 5
Gemma 4 E2B 2B Q8_0 6 GB 5
Qwen 3.5 2B 2B Q8_0 4.5 GB 4
SmolLM2 1.7B 1.7B F16 4.4 GB 5
DeepSeek R1 1.5B 1.5B F16 5 GB 5
Gemma 3 1B 1B F16 3.5 GB 5
Llama 3.2 1B 1B F16 4 GB 5
Qwen 3.5 0.8B 0.8B Q8_0 2 GB 4
Qwen 3 0.6B 0.6B F16 3.3 GB 5

Tight Fit (3)

These models run but with limited context window. Close other apps to free memory.

Model Params Quantization VRAM Quality
Codestral 22B 22B Q4_K_M 14.7 GB 4
Llama 3.2 Vision 11B 11B Q8_0 14 GB 4
Yi 1.5 6B 6B F16 14 GB 5

Want to check a specific combination?

Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.

Open Compatibility Checker
8 GB Entry-level local AI 12 GB The minimum for a good experience 24 GB Serious local AI 32 GB Premium tier