Skip to content

Best AI Models for 24 GB VRAM

Serious local AI — 30B+ models, larger context windows. Here are all 76 models you can run locally with 24 GB of memory.

Hardware with 24 GB Memory

Runs Comfortably (64)

These models fit with room to spare for context window and OS overhead.

Model Params Quantization VRAM Quality
Qwen 3.5 35B A3B 35B Q8_0 20 GB 5
Nous Hermes 2 34B 34B Q4_K_M 19 GB 4
Gemma 2 27B 27B Q5_K_M 20.4 GB 4
Gemma 3 27B 27B Q4_K_M 20 GB 4
Qwen 3.5 27B 27B Q4_K_M 19 GB 4
Gemma 4 26B 26B Q4_K_M 20 GB 4
Devstral 24B 24B Q4_K_M 17 GB 4
Magistral Small 24B 24B Q4_K_M 17 GB 4
Mistral Small 3.1 24B 24B Q4_K_M 18 GB 4
Codestral 22B 22B Q5_K_M 16.9 GB 4
StarCoder2 15B 15B Q8_0 17 GB 5
DeepSeek R1 14B 14B Q8_0 16 GB 5
Phi-4 14B 14B Q8_0 16 GB 5
Phi-4 Reasoning 14B 14B Q8_0 18 GB 4
Qwen 2.5 14B 14B Q8_0 16 GB 5
Qwen 2.5 Coder 14B 14B Q8_0 19 GB 5
Qwen 3 14B 14B Q8_0 19 GB 5
Gemma 3 12B 12B Q8_0 16 GB 5
Mistral Nemo 12B 12B Q8_0 16 GB 4
Llama 3.2 Vision 11B 11B Q8_0 14 GB 4
Falcon 3 10B 10B Q8_0 13 GB 4
Gemma 2 9B 9B F16 20 GB 5
Qwen 3.5 9B 9B Q8_0 11.5 GB 5
Yi 1.5 9B 9B F16 20 GB 5
Aya Expanse 8B 8B Q8_0 10.5 GB 5
Cogito 8B 8B F16 19.5 GB 5
DeepSeek R1 8B 8B F16 20 GB 5
Dolphin 3 8B 8B F16 18 GB 5
Granite 3.3 8B 8B F16 18 GB 5
Llama 3.1 8B 8B F16 18 GB 5
Nemotron 3 Nano 8B 8B F16 19.5 GB 5
Nous Hermes 2 8B 8B F16 18 GB 5
Qwen 3 8B 8B F16 20 GB 5
Codestral Mamba 7B 7B F16 17 GB 5
DeepSeek R1 7B 7B F16 16 GB 5
Falcon 3 7B 7B F16 17.5 GB 5
InternLM 2.5 7B 7B F16 16 GB 5
Mistral 7B 7B F16 16 GB 5
OpenChat 3.5 7B 7B F16 17 GB 5
Qwen 2.5 7B 7B F16 16 GB 5
Qwen 2.5 Coder 7B 7B F16 16 GB 5
Qwen 2.5 VL 7B 7B F16 18.5 GB 5
StarCoder2 7B 7B F16 16 GB 5
WizardLM 2 7B 7B F16 17 GB 5
Yi 1.5 6B 6B F16 14 GB 5
Gemma 3 4B 4B F16 11.5 GB 5
Gemma 3n E4B 4B F16 11 GB 5
Gemma 4 E4B 4B Q8_0 10 GB 5
Qwen 3 4B 4B F16 11 GB 5
Qwen 3.5 4B 4B Q8_0 6.5 GB 4
Phi-3 Mini 3.8B 3.8B F16 9.6 GB 5
Phi-4 Mini 3.8B 3.8B F16 10.5 GB 5
Llama 3.2 3B 3B F16 8 GB 5
StarCoder2 3B 3B F16 8 GB 5
Gemma 2 2B 2B F16 6 GB 5
Gemma 3n E2B 2B F16 6.5 GB 5
Gemma 4 E2B 2B Q8_0 6 GB 5
Qwen 3.5 2B 2B Q8_0 4.5 GB 4
SmolLM2 1.7B 1.7B F16 4.4 GB 5
DeepSeek R1 1.5B 1.5B F16 5 GB 5
Gemma 3 1B 1B F16 3.5 GB 5
Llama 3.2 1B 1B F16 4 GB 5
Qwen 3.5 0.8B 0.8B Q8_0 2 GB 4
Qwen 3 0.6B 0.6B F16 3.3 GB 5

Tight Fit (12)

These models run but with limited context window. Close other apps to free memory.

Model Params Quantization VRAM Quality
Command R 35B 35B Q4_K_M 22.5 GB 4
Yi 1.5 34B 34B Q4_K_M 21 GB 4
WizardCoder 33B 33B Q4_K_M 22 GB 4
Aya Expanse 32B 32B Q4_K_M 22 GB 4
Cogito 32B 32B Q4_K_M 21.5 GB 4
DeepSeek R1 32B 32B Q4_K_M 20.7 GB 4
Qwen 2.5 32B 32B Q4_K_M 20.7 GB 4
QwQ 32B 32B Q4_K_M 21.5 GB 4
Gemma 4 31B 31B Q4_K_M 22 GB 4
Qwen 3 30B-A3B (MoE) 30B Q4_K_M 22 GB 4
InternLM 2.5 20B 20B Q8_0 22 GB 5
Yi Coder 9B 9B F16 21 GB 5

Want to check a specific combination?

Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.

Open Compatibility Checker
8 GB Entry-level local AI 12 GB The minimum for a good experience 16 GB The sweet spot 32 GB Premium tier