Best AI Models for 8 GB VRAM
Entry-level local AI — 7B-8B models at Q4 quantization. Here are all 43 models you can run locally with 8 GB of memory.
Hardware with 8 GB Memory
AMD Radeon RX 6600 XT AMD Radeon RX 7600 AMD Radeon RX 9060 XT 8GB Intel Arc A750 NVIDIA GeForce GTX 1070 NVIDIA GeForce RTX 2060 Super NVIDIA GeForce RTX 2070 Super NVIDIA GeForce RTX 2080 Super NVIDIA GeForce RTX 3050 NVIDIA GeForce RTX 3060 Ti NVIDIA GeForce RTX 3070 NVIDIA GeForce RTX 4060 Ti 8GB NVIDIA GeForce RTX 4060 NVIDIA GeForce RTX 5050 NVIDIA GeForce RTX 5060 Ti 8GB NVIDIA GeForce RTX 5060 MacBook Air M1 8GB MacBook Air M2 8GB
Runs Comfortably (32)
These models fit with room to spare for context window and OS overhead.
| Model | Params | Quantization | VRAM | Quality |
|---|---|---|---|---|
| Yi 1.5 9B | 9B | Q4_K_M | 6.5 GB | 4 |
| Aya Expanse 8B | 8B | Q4_K_M | 6.5 GB | 4 |
| Dolphin 3 8B | 8B | Q4_K_M | 6 GB | 4 |
| Granite 3.3 8B | 8B | Q4_K_M | 6 GB | 3 |
| Llama 3.1 8B | 8B | Q4_K_M | 6.3 GB | 3 |
| Nous Hermes 2 8B | 8B | Q4_K_M | 6 GB | 4 |
| DeepSeek R1 7B | 7B | Q4_K_M | 5.7 GB | 3 |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | 4 |
| InternLM 2.5 7B | 7B | Q4_K_M | 5.5 GB | 4 |
| Mistral 7B | 7B | Q4_K_M | 5.7 GB | 3 |
| Qwen 2.5 7B | 7B | Q4_K_M | 5.7 GB | 3 |
| Qwen 2.5 Coder 7B | 7B | Q4_K_M | 5.7 GB | 3 |
| StarCoder2 7B | 7B | Q4_K_M | 5.5 GB | 4 |
| Yi 1.5 6B | 6B | Q4_K_M | 5 GB | 4 |
| Gemma 3n E4B | 4B | Q8_0 | 6.5 GB | 4 |
| Gemma 4 E4B | 4B | Q4_K_M | 6 GB | 4 |
| Qwen 3 4B | 4B | Q8_0 | 6.5 GB | 4 |
| Qwen 3.5 4B | 4B | Q8_0 | 6.5 GB | 4 |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | 4 |
| Phi-4 Mini 3.8B | 3.8B | Q8_0 | 6.5 GB | 4 |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | 4 |
| StarCoder2 3B | 3B | Q8_0 | 5 GB | 5 |
| Gemma 2 2B | 2B | F16 | 6 GB | 5 |
| Gemma 3n E2B | 2B | F16 | 6.5 GB | 5 |
| Gemma 4 E2B | 2B | Q8_0 | 6 GB | 5 |
| Qwen 3.5 2B | 2B | Q8_0 | 4.5 GB | 4 |
| SmolLM2 1.7B | 1.7B | F16 | 4.4 GB | 5 |
| DeepSeek R1 1.5B | 1.5B | F16 | 5 GB | 5 |
| Gemma 3 1B | 1B | F16 | 3.5 GB | 5 |
| Llama 3.2 1B | 1B | F16 | 4 GB | 5 |
| Qwen 3.5 0.8B | 0.8B | Q8_0 | 2 GB | 4 |
| Qwen 3 0.6B | 0.6B | F16 | 3.3 GB | 5 |
Tight Fit (11)
These models run but with limited context window. Close other apps to free memory.
| Model | Params | Quantization | VRAM | Quality |
|---|---|---|---|---|
| Gemma 2 9B | 9B | Q4_K_M | 6.9 GB | 3 |
| Qwen 3.5 9B | 9B | Q4_K_M | 7.5 GB | 4 |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | 4 |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | 4 |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | 4 |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | 4 |
| Codestral Mamba 7B | 7B | Q4_K_M | 6.9 GB | 4 |
| OpenChat 3.5 7B | 7B | Q4_K_M | 6.9 GB | 4 |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | 4 |
| WizardLM 2 7B | 7B | Q4_K_M | 6.9 GB | 4 |
| Gemma 3 4B | 4B | Q8_0 | 7.5 GB | 4 |
Want to check a specific combination?
Use the compatibility checker to see exactly how a model runs on your specific hardware, with performance estimates.
Open Compatibility Checker