NVIDIA GeForce RTX 4060 Ti 16GB
NVIDIA · 16GB GDDR6 · Can run 49 models
Buy Amazon
| Manufacturer | NVIDIA |
| VRAM | 16 GB |
| Memory Type | GDDR6 |
| Architecture | Ada Lovelace |
| CUDA Cores | 4,352 |
| Tensor Cores | 136 |
| Bandwidth | 288 GB/s |
| TDP | 165W |
| MSRP | $499 |
| Released | Jul 18, 2023 |
AI Notes
The RTX 4060 Ti 16GB variant is a compelling option for budget-conscious AI enthusiasts. Its 16GB of VRAM allows it to load 13B models and attempt 30B models with quantization, despite its lower core count. The very low TDP of 165W makes it ideal for always-on inference servers.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~115 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~144 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~96 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~96 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~72 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~87 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~58 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~50 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~64 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~58 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~64 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~64 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs | ~32 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~42 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs | ~32 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs | ~32 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs | ~32 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~41 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~38 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~38 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs | ~29 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~38 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~38 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs | ~26 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~34 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~34 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | Runs | ~27 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs | ~30 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~29 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~29 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | Runs | ~26 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~29 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | Runs | ~24 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | Runs | ~24 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | Runs (tight) | ~20 tok/s |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | CPU Offload | ~5 tok/s |
| Devstral 24B | 24B | Q4_K_M | 17 GB | CPU Offload | ~5 tok/s |
| Magistral Small 24B | 24B | Q4_K_M | 17 GB | CPU Offload | ~5 tok/s |
| Mistral Small 3.1 24B | 24B | Q4_K_M | 18 GB | CPU Offload | ~5 tok/s |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | CPU Offload | ~5 tok/s |
| Gemma 3 27B | 27B | Q4_K_M | 20 GB | CPU Offload | ~4 tok/s |
| Qwen 3 30B-A3B (MoE) | 30B | Q4_K_M | 22 GB | CPU Offload | ~4 tok/s |
| Cogito 32B | 32B | Q4_K_M | 21.5 GB | CPU Offload | ~4 tok/s |
| DeepSeek R1 32B | 32B | Q4_K_M | 20.7 GB | CPU Offload | ~4 tok/s |
| Qwen 2.5 32B | 32B | Q4_K_M | 20.7 GB | CPU Offload | ~4 tok/s |
| Qwen 2.5 Coder 32B | 32B | Q4_K_M | 23 GB | CPU Offload | ~4 tok/s |
| Qwen 3 32B | 32B | Q4_K_M | 23 GB | CPU Offload | ~4 tok/s |
| QwQ 32B | 32B | Q4_K_M | 21.5 GB | CPU Offload | ~4 tok/s |
| Command R 35B | 35B | Q4_K_M | 22.5 GB | CPU Offload | ~4 tok/s |
20
model(s) are too large for this hardware.