Skip to content

Best AI Models for 128 GB VRAM

Maximum consumer — 200B+ models, research workloads. Here are all 94 models you can run locally with 128 GB of memory.

Hardware with 128 GB Memory

Runs Comfortably (92)

These models fit with room to spare for context window and OS overhead.

Model Params Quantization VRAM Quality
Mixtral 8x22B 141B Q4_K_M 86 GB 4
Devstral 2 123B 123B Q4_K_M 67 GB 4
Mistral Large 2 123B 123B Q4_K_M 67 GB 4
Qwen 3.5 122B 122B Q4_K_M 85 GB 4
Llama 4 Scout (109B/17B active) 109B Q4_K_M 72 GB 4
Llama 3.2 Vision 90B 90B Q8_0 96 GB 4
Qwen 2.5 72B 72B Q8_0 74 GB 5
Qwen 2.5 VL 72B 72B Q8_0 78 GB 4
Cogito 70B 70B Q8_0 76 GB 4
DeepSeek R1 70B 70B Q8_0 72 GB 5
Llama 3.1 70B 70B Q8_0 72 GB 5
Llama 3.3 70B 70B Q8_0 72 GB 5
Dolphin Mixtral 8x7B 47B F16 94 GB 5
Mixtral 8x7B 47B Q8_0 49 GB 5
Command R 35B 35B Q8_0 37 GB 5
Qwen 3.5 35B A3B 35B Q8_0 20 GB 5
Nous Hermes 2 34B 34B F16 70 GB 5
Yi 1.5 34B 34B F16 70 GB 5
WizardCoder 33B 33B F16 69.5 GB 5
Aya Expanse 32B 32B Q8_0 38 GB 5
Cogito 32B 32B F16 68 GB 5
DeepSeek R1 32B 32B Q8_0 34 GB 5
Qwen 2.5 32B 32B Q8_0 34 GB 5
Qwen 2.5 Coder 32B 32B F16 70 GB 5
Qwen 3 32B 32B F16 70 GB 5
QwQ 32B 32B F16 68 GB 5
Gemma 4 31B 31B F16 66 GB 5
Qwen 3 30B-A3B (MoE) 30B F16 67 GB 5
Gemma 2 27B 27B Q8_0 29 GB 5
Gemma 3 27B 27B F16 60 GB 5
Qwen 3.5 27B 27B Q8_0 33 GB 5
Gemma 4 26B 26B Q8_0 30 GB 5
Devstral 24B 24B F16 53 GB 5
Magistral Small 24B 24B F16 52 GB 5
Mistral Small 3.1 24B 24B F16 54 GB 5
Codestral 22B 22B Q8_0 24 GB 5
InternLM 2.5 20B 20B F16 42 GB 5
StarCoder2 15B 15B Q8_0 17 GB 5
DeepSeek R1 14B 14B Q8_0 16 GB 5
Phi-4 14B 14B Q8_0 16 GB 5
Phi-4 Reasoning 14B 14B F16 32 GB 5
Qwen 2.5 14B 14B Q8_0 16 GB 5
Qwen 2.5 Coder 14B 14B F16 33 GB 5
Qwen 3 14B 14B F16 33 GB 5
Gemma 3 12B 12B F16 28 GB 5
Mistral Nemo 12B 12B F16 28 GB 5
Llama 3.2 Vision 11B 11B F16 26 GB 5
Falcon 3 10B 10B F16 24 GB 5
Gemma 2 9B 9B F16 20 GB 5
Qwen 3.5 9B 9B Q8_0 11.5 GB 5
Yi 1.5 9B 9B F16 20 GB 5
Yi Coder 9B 9B F16 21 GB 5
Aya Expanse 8B 8B Q8_0 10.5 GB 5
Cogito 8B 8B F16 19.5 GB 5
DeepSeek R1 8B 8B F16 20 GB 5
Dolphin 3 8B 8B F16 18 GB 5
Granite 3.3 8B 8B F16 18 GB 5
Llama 3.1 8B 8B F16 18 GB 5
Nemotron 3 Nano 8B 8B F16 19.5 GB 5
Nous Hermes 2 8B 8B F16 18 GB 5
Qwen 3 8B 8B F16 20 GB 5
Codestral Mamba 7B 7B F16 17 GB 5
DeepSeek R1 7B 7B F16 16 GB 5
Falcon 3 7B 7B F16 17.5 GB 5
InternLM 2.5 7B 7B F16 16 GB 5
Mistral 7B 7B F16 16 GB 5
OpenChat 3.5 7B 7B F16 17 GB 5
Qwen 2.5 7B 7B F16 16 GB 5
Qwen 2.5 Coder 7B 7B F16 16 GB 5
Qwen 2.5 VL 7B 7B F16 18.5 GB 5
StarCoder2 7B 7B F16 16 GB 5
WizardLM 2 7B 7B F16 17 GB 5
Yi 1.5 6B 6B F16 14 GB 5
Gemma 3 4B 4B F16 11.5 GB 5
Gemma 3n E4B 4B F16 11 GB 5
Gemma 4 E4B 4B Q8_0 10 GB 5
Qwen 3 4B 4B F16 11 GB 5
Qwen 3.5 4B 4B Q8_0 6.5 GB 4
Phi-3 Mini 3.8B 3.8B F16 9.6 GB 5
Phi-4 Mini 3.8B 3.8B F16 10.5 GB 5
Llama 3.2 3B 3B F16 8 GB 5
StarCoder2 3B 3B F16 8 GB 5
Gemma 2 2B 2B F16 6 GB 5
Gemma 3n E2B 2B F16 6.5 GB 5
Gemma 4 E2B 2B Q8_0 6 GB 5
Qwen 3.5 2B 2B Q8_0 4.5 GB 4
SmolLM2 1.7B 1.7B F16 4.4 GB 5
DeepSeek R1 1.5B 1.5B F16 5 GB 5
Gemma 3 1B 1B F16 3.5 GB 5
Llama 3.2 1B 1B F16 4 GB 5
Qwen 3.5 0.8B 0.8B Q8_0 2 GB 4
Qwen 3 0.6B 0.6B F16 3.3 GB 5

Tight Fit (2)

These models run but with limited context window. Close other apps to free memory.

Model Params Quantization VRAM Quality
Command A 111B 111B Q8_0 117 GB 5
Command R+ 104B 104B Q8_0 110 GB 5

Want to check a specific combination?

Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.

Open Compatibility Checker
8 GB Entry-level local AI 12 GB The minimum for a good experience 16 GB The sweet spot 24 GB Serious local AI