NVIDIA GeForce RTX 3060 Ti
NVIDIA · 8GB GDDR6X · Can run 57 models
Buy Amazon
| Manufacturer | NVIDIA |
| VRAM | 8 GB |
| Memory Type | GDDR6X |
| Architecture | Ampere |
| CUDA Cores | 4,864 |
| Tensor Cores | 152 |
| Bandwidth | 448 GB/s |
| TDP | 200W |
| MSRP | $399 |
| Released | Dec 1, 2020 |
AI Notes
The RTX 3060 Ti offers 8GB VRAM with significantly higher bandwidth than the RTX 3060 12GB. It is limited to 7B models with quantization due to the 8GB VRAM cap, but the 448 GB/s bandwidth means faster token generation than many 8GB cards. A solid used-market option for budget local AI.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~179 tok/s |
| Qwen 3.5 0.8B | 800M | Q4_K_M | 1.5 GB | Runs | ~299 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~224 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~149 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~149 tok/s |
| SmolLM2 1.7B | 1.7B | Q8_0 | 2.7 GB | Runs | ~166 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~112 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~136 tok/s |
| Gemma 4 E2B | 2B | Q4_K_M | 4 GB | Runs | ~112 tok/s |
| Qwen 3.5 2B | 2B | Q4_K_M | 3 GB | Runs | ~149 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~90 tok/s |
| StarCoder2 3B | 3B | Q4_K_M | 3.5 GB | Runs | ~128 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~77 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~90 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Gemma 4 E4B | 4B | Q4_K_M | 6 GB | Runs | ~75 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Qwen 3.5 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Yi 1.5 6B | 6B | Q4_K_M | 5 GB | Runs | ~90 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~66 tok/s |
| InternLM 2.5 7B | 7B | Q4_K_M | 5.5 GB | Runs | ~81 tok/s |
| StarCoder2 7B | 7B | Q4_K_M | 5.5 GB | Runs | ~81 tok/s |
| Aya Expanse 8B | 8B | Q4_K_M | 6.5 GB | Runs | ~69 tok/s |
| Dolphin 3 8B | 8B | Q4_K_M | 6 GB | Runs | ~75 tok/s |
| Nous Hermes 2 8B | 8B | Q4_K_M | 6 GB | Runs | ~75 tok/s |
| Yi 1.5 9B | 9B | Q4_K_M | 6.5 GB | Runs | ~69 tok/s |
| Codestral Mamba 7B | 7B | Q4_K_M | 6.9 GB | Runs (tight) | ~65 tok/s |
| OpenChat 3.5 7B | 7B | Q4_K_M | 6.9 GB | Runs (tight) | ~65 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs (tight) | ~64 tok/s |
| WizardLM 2 7B | 7B | Q4_K_M | 6.9 GB | Runs (tight) | ~65 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| Qwen 3.5 9B | 9B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Granite 3.3 8B | 8B | Q8_0 | 10 GB | CPU Offload | ~14 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | CPU Offload | ~14 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | CPU Offload | ~12 tok/s |
| Yi Coder 9B | 9B | Q4_K_M | 8 GB | CPU Offload | ~17 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | CPU Offload | ~16 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | CPU Offload | ~16 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | CPU Offload | ~13 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | CPU Offload | ~14 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~14 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~14 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | CPU Offload | ~12 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~14 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
| StarCoder2 15B | 15B | Q4_K_M | 10.5 GB | CPU Offload | ~13 tok/s |
| InternLM 2.5 20B | 20B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
| Qwen 3.5 35B A3B | 35B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
48
model(s) are too large for this hardware.