NVIDIA GeForce RTX 3080 12GB
NVIDIA · 12GB GDDR6X · Can run 40 models
Buy Amazon
| Manufacturer | NVIDIA |
| VRAM | 12 GB |
| Memory Type | GDDR6X |
| Architecture | Ampere |
| CUDA Cores | 8,960 |
| Tensor Cores | 280 |
| Bandwidth | 912 GB/s |
| TDP | 350W |
| MSRP | $799 |
| Released | Jan 11, 2022 |
AI Notes
The RTX 3080 12GB is a decent option for running local AI models. With 12GB of GDDR6X VRAM, it can handle 7B models at full precision and 13B models with quantization. Its older Ampere architecture is slower per core than Ada Lovelace but still delivers solid inference performance.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~365 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~456 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~304 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~304 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~228 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~276 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~182 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~157 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~203 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~182 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~203 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~203 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs | ~101 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~134 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs | ~101 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs | ~101 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs | ~101 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~130 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~122 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~122 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs | ~91 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~122 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~122 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~107 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~107 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs | ~96 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~92 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~92 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~92 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs (tight) | ~83 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | Runs (tight) | ~87 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | Runs (tight) | ~83 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~23 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~23 tok/s |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | CPU Offload | ~16 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | CPU Offload | ~19 tok/s |
| Devstral 24B | 24B | Q4_K_M | 17 GB | CPU Offload | ~16 tok/s |
| Magistral Small 24B | 24B | Q4_K_M | 17 GB | CPU Offload | ~16 tok/s |
| Mistral Small 3.1 24B | 24B | Q4_K_M | 18 GB | CPU Offload | ~15 tok/s |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | CPU Offload | ~16 tok/s |
29
model(s) are too large for this hardware.