Skip to content

Best AI Models for 48 GB VRAM

Professional — large 70B models at higher quality quants. Here are all 86 models you can run locally with 48 GB of memory.

Hardware with 48 GB Memory

Runs Comfortably (79)

These models fit with room to spare for context window and OS overhead.

Model Params Quantization VRAM Quality
Dolphin Mixtral 8x7B 47B Q4_K_M 26 GB 4
Mixtral 8x7B 47B Q5_K_M 34.4 GB 4
Command R 35B 35B Q8_0 37 GB 5
Qwen 3.5 35B A3B 35B Q8_0 20 GB 5
Nous Hermes 2 34B 34B Q8_0 36 GB 5
Yi 1.5 34B 34B Q8_0 36 GB 5
WizardCoder 33B 33B Q8_0 38.5 GB 5
Aya Expanse 32B 32B Q8_0 38 GB 5
Cogito 32B 32B Q8_0 37 GB 4
DeepSeek R1 32B 32B Q8_0 34 GB 5
Qwen 2.5 32B 32B Q8_0 34 GB 5
Qwen 2.5 Coder 32B 32B Q8_0 39 GB 5
Qwen 3 32B 32B Q8_0 39 GB 5
QwQ 32B 32B Q8_0 37 GB 4
Gemma 4 31B 31B Q8_0 38 GB 5
Qwen 3 30B-A3B (MoE) 30B Q8_0 37 GB 5
Gemma 2 27B 27B Q8_0 29 GB 5
Gemma 3 27B 27B Q8_0 34 GB 5
Qwen 3.5 27B 27B Q8_0 33 GB 5
Gemma 4 26B 26B Q8_0 30 GB 5
Devstral 24B 24B Q8_0 29 GB 5
Magistral Small 24B 24B Q8_0 28 GB 4
Mistral Small 3.1 24B 24B Q8_0 30 GB 5
Codestral 22B 22B Q8_0 24 GB 5
StarCoder2 15B 15B Q8_0 17 GB 5
DeepSeek R1 14B 14B Q8_0 16 GB 5
Phi-4 14B 14B Q8_0 16 GB 5
Phi-4 Reasoning 14B 14B F16 32 GB 5
Qwen 2.5 14B 14B Q8_0 16 GB 5
Qwen 2.5 Coder 14B 14B F16 33 GB 5
Qwen 3 14B 14B F16 33 GB 5
Gemma 3 12B 12B F16 28 GB 5
Mistral Nemo 12B 12B F16 28 GB 5
Llama 3.2 Vision 11B 11B F16 26 GB 5
Falcon 3 10B 10B F16 24 GB 5
Gemma 2 9B 9B F16 20 GB 5
Qwen 3.5 9B 9B Q8_0 11.5 GB 5
Yi 1.5 9B 9B F16 20 GB 5
Yi Coder 9B 9B F16 21 GB 5
Aya Expanse 8B 8B Q8_0 10.5 GB 5
Cogito 8B 8B F16 19.5 GB 5
DeepSeek R1 8B 8B F16 20 GB 5
Dolphin 3 8B 8B F16 18 GB 5
Granite 3.3 8B 8B F16 18 GB 5
Llama 3.1 8B 8B F16 18 GB 5
Nemotron 3 Nano 8B 8B F16 19.5 GB 5
Nous Hermes 2 8B 8B F16 18 GB 5
Qwen 3 8B 8B F16 20 GB 5
Codestral Mamba 7B 7B F16 17 GB 5
DeepSeek R1 7B 7B F16 16 GB 5
Falcon 3 7B 7B F16 17.5 GB 5
InternLM 2.5 7B 7B F16 16 GB 5
Mistral 7B 7B F16 16 GB 5
OpenChat 3.5 7B 7B F16 17 GB 5
Qwen 2.5 7B 7B F16 16 GB 5
Qwen 2.5 Coder 7B 7B F16 16 GB 5
Qwen 2.5 VL 7B 7B F16 18.5 GB 5
StarCoder2 7B 7B F16 16 GB 5
WizardLM 2 7B 7B F16 17 GB 5
Yi 1.5 6B 6B F16 14 GB 5
Gemma 3 4B 4B F16 11.5 GB 5
Gemma 3n E4B 4B F16 11 GB 5
Gemma 4 E4B 4B Q8_0 10 GB 5
Qwen 3 4B 4B F16 11 GB 5
Qwen 3.5 4B 4B Q8_0 6.5 GB 4
Phi-3 Mini 3.8B 3.8B F16 9.6 GB 5
Phi-4 Mini 3.8B 3.8B F16 10.5 GB 5
Llama 3.2 3B 3B F16 8 GB 5
StarCoder2 3B 3B F16 8 GB 5
Gemma 2 2B 2B F16 6 GB 5
Gemma 3n E2B 2B F16 6.5 GB 5
Gemma 4 E2B 2B Q8_0 6 GB 5
Qwen 3.5 2B 2B Q8_0 4.5 GB 4
SmolLM2 1.7B 1.7B F16 4.4 GB 5
DeepSeek R1 1.5B 1.5B F16 5 GB 5
Gemma 3 1B 1B F16 3.5 GB 5
Llama 3.2 1B 1B F16 4 GB 5
Qwen 3.5 0.8B 0.8B Q8_0 2 GB 4
Qwen 3 0.6B 0.6B F16 3.3 GB 5

Tight Fit (7)

These models run but with limited context window. Close other apps to free memory.

Model Params Quantization VRAM Quality
Qwen 2.5 72B 72B Q4_K_M 44.7 GB 4
Qwen 2.5 VL 72B 72B Q4_K_M 41 GB 4
Cogito 70B 70B Q4_K_M 43 GB 4
DeepSeek R1 70B 70B Q4_K_M 43.5 GB 4
Llama 3.1 70B 70B Q4_K_M 43.5 GB 4
Llama 3.3 70B 70B Q4_K_M 43.5 GB 4
InternLM 2.5 20B 20B F16 42 GB 5

Want to check a specific combination?

Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.

Open Compatibility Checker
8 GB Entry-level local AI 12 GB The minimum for a good experience 16 GB The sweet spot 24 GB Serious local AI