Best AI Models for 12 GB VRAM
The minimum for a good experience — 8B at Q8 or 14B models at Q4. Here are all 53 models you can run locally with 12 GB of memory.
Hardware with 12 GB Memory
Runs Comfortably (38)
These models fit with room to spare for context window and OS overhead.
| Model | Params | Quantization | VRAM | Quality |
|---|---|---|---|---|
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | 4 |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | 4 |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | 4 |
| Qwen 3.5 9B | 9B | Q4_K_M | 7.5 GB | 4 |
| Yi Coder 9B | 9B | Q4_K_M | 8 GB | 4 |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | 4 |
| Dolphin 3 8B | 8B | Q8_0 | 10 GB | 5 |
| Granite 3.3 8B | 8B | Q8_0 | 10 GB | 4 |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | 4 |
| Nous Hermes 2 8B | 8B | Q8_0 | 10 GB | 5 |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | 4 |
| Codestral Mamba 7B | 7B | Q8_0 | 9.9 GB | 5 |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | 4 |
| Falcon 3 7B | 7B | Q8_0 | 10 GB | 4 |
| InternLM 2.5 7B | 7B | Q8_0 | 9 GB | 5 |
| Mistral 7B | 7B | Q8_0 | 9 GB | 4 |
| OpenChat 3.5 7B | 7B | Q8_0 | 9.9 GB | 5 |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | 4 |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | 4 |
| StarCoder2 7B | 7B | Q8_0 | 9 GB | 5 |
| WizardLM 2 7B | 7B | Q8_0 | 9.9 GB | 5 |
| Yi 1.5 6B | 6B | Q8_0 | 8 GB | 5 |
| Gemma 3 4B | 4B | Q8_0 | 7.5 GB | 4 |
| Gemma 4 E4B | 4B | Q8_0 | 10 GB | 5 |
| Qwen 3.5 4B | 4B | Q8_0 | 6.5 GB | 4 |
| Phi-3 Mini 3.8B | 3.8B | F16 | 9.6 GB | 5 |
| Llama 3.2 3B | 3B | F16 | 8 GB | 5 |
| StarCoder2 3B | 3B | F16 | 8 GB | 5 |
| Gemma 2 2B | 2B | F16 | 6 GB | 5 |
| Gemma 3n E2B | 2B | F16 | 6.5 GB | 5 |
| Gemma 4 E2B | 2B | Q8_0 | 6 GB | 5 |
| Qwen 3.5 2B | 2B | Q8_0 | 4.5 GB | 4 |
| SmolLM2 1.7B | 1.7B | F16 | 4.4 GB | 5 |
| DeepSeek R1 1.5B | 1.5B | F16 | 5 GB | 5 |
| Gemma 3 1B | 1B | F16 | 3.5 GB | 5 |
| Llama 3.2 1B | 1B | F16 | 4 GB | 5 |
| Qwen 3.5 0.8B | 0.8B | Q8_0 | 2 GB | 4 |
| Qwen 3 0.6B | 0.6B | F16 | 3.3 GB | 5 |
Tight Fit (15)
These models run but with limited context window. Close other apps to free memory.
| Model | Params | Quantization | VRAM | Quality |
|---|---|---|---|---|
| StarCoder2 15B | 15B | Q4_K_M | 10.5 GB | 3 |
| DeepSeek R1 14B | 14B | Q5_K_M | 11.3 GB | 4 |
| Phi-4 14B | 14B | Q5_K_M | 11.3 GB | 4 |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | 4 |
| Qwen 2.5 14B | 14B | Q5_K_M | 11.3 GB | 4 |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | 4 |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | 4 |
| Yi 1.5 9B | 9B | Q8_0 | 11 GB | 5 |
| Aya Expanse 8B | 8B | Q8_0 | 10.5 GB | 5 |
| Cogito 8B | 8B | Q8_0 | 11 GB | 4 |
| Nemotron 3 Nano 8B | 8B | Q8_0 | 11 GB | 4 |
| Qwen 2.5 VL 7B | 7B | Q8_0 | 10.5 GB | 4 |
| Gemma 3n E4B | 4B | F16 | 11 GB | 5 |
| Qwen 3 4B | 4B | F16 | 11 GB | 5 |
| Phi-4 Mini 3.8B | 3.8B | F16 | 10.5 GB | 5 |
Want to check a specific combination?
Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.
Open Compatibility Checker