NVIDIA GeForce RTX 4070
NVIDIA · 12GB GDDR6X · Can run 40 models
Buy Amazon
| Manufacturer | NVIDIA |
| VRAM | 12 GB |
| Memory Type | GDDR6X |
| Architecture | Ada Lovelace |
| CUDA Cores | 5,888 |
| Tensor Cores | 184 |
| Bandwidth | 504 GB/s |
| TDP | 200W |
| MSRP | $599 |
| Released | Apr 13, 2023 |
AI Notes
The RTX 4070 is a popular entry point for local AI work. With 12GB of GDDR6X VRAM, it handles 7B models well and can run 13B models with Q4 quantization. Its low 200W TDP makes it efficient for sustained AI inference workloads in smaller builds.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~202 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~252 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~168 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~168 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~126 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~153 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~101 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~87 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~112 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~101 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~112 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~112 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs | ~56 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~74 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs | ~56 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs | ~56 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs | ~56 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~72 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~67 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~67 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs | ~50 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~67 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~67 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~59 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~59 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs | ~53 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~51 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~51 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~51 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs (tight) | ~46 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | Runs (tight) | ~48 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | Runs (tight) | ~46 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~13 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~13 tok/s |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | CPU Offload | ~9 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | CPU Offload | ~10 tok/s |
| Devstral 24B | 24B | Q4_K_M | 17 GB | CPU Offload | ~9 tok/s |
| Magistral Small 24B | 24B | Q4_K_M | 17 GB | CPU Offload | ~9 tok/s |
| Mistral Small 3.1 24B | 24B | Q4_K_M | 18 GB | CPU Offload | ~8 tok/s |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | CPU Offload | ~8 tok/s |
29
model(s) are too large for this hardware.