Best AI Models for 16 GB VRAM
The sweet spot — 14B at Q8, smaller models at full precision. Here are all 58 models you can run locally with 16 GB of memory.
Hardware with 16 GB Memory
AMD Radeon RX 6800 XT AMD Radeon RX 6900 XT AMD Radeon RX 7800 XT AMD Radeon RX 9060 XT 16GB AMD Radeon RX 9070 XT AMD Radeon RX 9070 Intel Arc A770 NVIDIA GeForce RTX 4060 Ti 16GB NVIDIA GeForce RTX 4070 Ti Super NVIDIA GeForce RTX 4080 Super NVIDIA GeForce RTX 4080 NVIDIA GeForce RTX 5060 Ti 16GB NVIDIA GeForce RTX 5070 Ti NVIDIA GeForce RTX 5080 NVIDIA RTX A4000 iMac M1 16GB iMac M4 16GB Mac mini M1 16GB Mac mini M4 16GB MacBook Air M2 16GB MacBook Air M4 16GB MacBook Air M3 16GB MacBook Air M5 16GB MacBook Pro M2 Pro 16GB MacBook Pro M1 16GB MacBook Pro M5 16GB
Runs Comfortably (55)
These models fit with room to spare for context window and OS overhead.
| Model | Params | Quantization | VRAM | Quality |
|---|---|---|---|---|
| Qwen 3.5 35B A3B | 35B | Q4_K_M | 12 GB | 4 |
| InternLM 2.5 20B | 20B | Q4_K_M | 12 GB | 4 |
| StarCoder2 15B | 15B | Q5_K_M | 12 GB | 4 |
| DeepSeek R1 14B | 14B | Q5_K_M | 11.3 GB | 4 |
| Phi-4 14B | 14B | Q5_K_M | 11.3 GB | 4 |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | 4 |
| Qwen 2.5 14B | 14B | Q5_K_M | 11.3 GB | 4 |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | 4 |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | 4 |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | 4 |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | 4 |
| Falcon 3 10B | 10B | Q8_0 | 13 GB | 4 |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | 4 |
| Qwen 3.5 9B | 9B | Q8_0 | 11.5 GB | 5 |
| Yi 1.5 9B | 9B | Q8_0 | 11 GB | 5 |
| Yi Coder 9B | 9B | Q8_0 | 12 GB | 5 |
| Aya Expanse 8B | 8B | Q8_0 | 10.5 GB | 5 |
| Cogito 8B | 8B | Q8_0 | 11 GB | 4 |
| DeepSeek R1 8B | 8B | Q8_0 | 11.5 GB | 4 |
| Dolphin 3 8B | 8B | Q8_0 | 10 GB | 5 |
| Granite 3.3 8B | 8B | Q8_0 | 10 GB | 4 |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | 4 |
| Nemotron 3 Nano 8B | 8B | Q8_0 | 11 GB | 4 |
| Nous Hermes 2 8B | 8B | Q8_0 | 10 GB | 5 |
| Qwen 3 8B | 8B | Q8_0 | 11.5 GB | 4 |
| Codestral Mamba 7B | 7B | Q8_0 | 9.9 GB | 5 |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | 4 |
| Falcon 3 7B | 7B | Q8_0 | 10 GB | 4 |
| InternLM 2.5 7B | 7B | Q8_0 | 9 GB | 5 |
| Mistral 7B | 7B | Q8_0 | 9 GB | 4 |
| OpenChat 3.5 7B | 7B | Q8_0 | 9.9 GB | 5 |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | 4 |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | 4 |
| Qwen 2.5 VL 7B | 7B | Q8_0 | 10.5 GB | 4 |
| StarCoder2 7B | 7B | Q8_0 | 9 GB | 5 |
| WizardLM 2 7B | 7B | Q8_0 | 9.9 GB | 5 |
| Gemma 3 4B | 4B | F16 | 11.5 GB | 5 |
| Gemma 3n E4B | 4B | F16 | 11 GB | 5 |
| Gemma 4 E4B | 4B | Q8_0 | 10 GB | 5 |
| Qwen 3 4B | 4B | F16 | 11 GB | 5 |
| Qwen 3.5 4B | 4B | Q8_0 | 6.5 GB | 4 |
| Phi-3 Mini 3.8B | 3.8B | F16 | 9.6 GB | 5 |
| Phi-4 Mini 3.8B | 3.8B | F16 | 10.5 GB | 5 |
| Llama 3.2 3B | 3B | F16 | 8 GB | 5 |
| StarCoder2 3B | 3B | F16 | 8 GB | 5 |
| Gemma 2 2B | 2B | F16 | 6 GB | 5 |
| Gemma 3n E2B | 2B | F16 | 6.5 GB | 5 |
| Gemma 4 E2B | 2B | Q8_0 | 6 GB | 5 |
| Qwen 3.5 2B | 2B | Q8_0 | 4.5 GB | 4 |
| SmolLM2 1.7B | 1.7B | F16 | 4.4 GB | 5 |
| DeepSeek R1 1.5B | 1.5B | F16 | 5 GB | 5 |
| Gemma 3 1B | 1B | F16 | 3.5 GB | 5 |
| Llama 3.2 1B | 1B | F16 | 4 GB | 5 |
| Qwen 3.5 0.8B | 0.8B | Q8_0 | 2 GB | 4 |
| Qwen 3 0.6B | 0.6B | F16 | 3.3 GB | 5 |
Tight Fit (3)
These models run but with limited context window. Close other apps to free memory.
| Model | Params | Quantization | VRAM | Quality |
|---|---|---|---|---|
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | 4 |
| Llama 3.2 Vision 11B | 11B | Q8_0 | 14 GB | 4 |
| Yi 1.5 6B | 6B | F16 | 14 GB | 5 |
Want to check a specific combination?
Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.
Open Compatibility Checker