Skip to content
guides May 2, 2026

DeepSeek R1 VRAM Requirements — Every Size from 1.5B to 671B

Complete VRAM requirements guide for all DeepSeek R1 model sizes. Find out exactly how much memory you need to run R1 locally with Ollama at different quantization levels.

DeepSeek R1 is one of the most popular reasoning models to run locally, and for good reason — it’s MIT-licensed, genuinely strong at math and code, and the distilled variants cover everything from a Raspberry Pi to a multi-GPU server. But the VRAM requirements vary wildly across the seven available sizes. Here’s exactly what you need for each one.

VRAM Requirements at a Glance

Every DeepSeek R1 variant is available through Ollama at multiple quantization levels. Lower quantization means smaller files and less VRAM at the cost of some quality. Q4_K_M is the practical sweet spot for most people; Q8_0 is noticeably better if you have the headroom.

ModelQ4_K_MQ8_0F16
R1 1.5B2.0 GB3.0 GB5.0 GB
R1 7B5.7 GB9.0 GB16.0 GB
R1 8B7.5 GB11.5 GB20.0 GB
R1 14B9.9 GB16.0 GB
R1 32B20.7 GB34.0 GB
R1 70B43.5 GB72.0 GB
R1 671B362 GB685 GB

The 14B, 32B, and 70B variants offer Q5_K_M as a middle ground (11.3 GB, 23.9 GB, and 50.5 GB respectively) if you want better quality than Q4 but can’t afford Q8.

Breaking It Down by Size

R1 1.5B — Runs on Anything

At 2 GB VRAM for Q4_K_M, this fits on virtually any GPU made in the last decade, and every Mac with Apple Silicon. Good for experimenting with chain-of-thought reasoning, but don’t expect it to solve hard problems. Think of it as a demo of the R1 approach rather than a production tool.

Runs on: Any GPU with 4+ GB VRAM, any Apple Silicon Mac. Even a MacBook Air M1 8GB handles it easily.

R1 7B and 8B — The Entry Point for Real Reasoning

The 7B (Qwen 2.5-based) and 8B (Llama 3.1-based) are comparable in quality but differ slightly in architecture. At Q4_K_M, the 7B needs 5.7 GB and the 8B needs 7.5 GB. Both show meaningful chain-of-thought reasoning on math and logic tasks — a clear step up from the 1.5B.

Runs on: RTX 3060 12GB, RTX 4060, RX 7600, any Mac with 8+ GB unified memory. The 7B even fits at Q8_0 on a Mac Mini M4 16GB.

R1 14B — The Sweet Spot

This is the size most people should start with. The 14B at Q4_K_M needs 9.9 GB of VRAM — that’s tight on an 8 GB GPU but comfortable on anything with 12+ GB. It handles complex multi-step reasoning, code generation, and math problems significantly better than the 7B/8B variants, and it’s the most-searched DeepSeek R1 quantization for a reason.

Runs on: RTX 3060 12GB, RTX 4070, RX 7800 XT, any Mac with 16+ GB. For Q8_0 (16 GB), you’ll want a RTX 4070 Ti or Mac Mini M4 Pro 24GB.

ollama run deepseek-r1:14b

R1 32B — Serious Reasoning Power

The 32B is where DeepSeek R1 starts competing with much larger models on reasoning benchmarks. At 20.7 GB for Q4_K_M, it requires high-end consumer hardware — but that hardware is increasingly common.

Runs on: RTX 3090 (24 GB), RTX 4090 (24 GB — tight fit), RTX 5090 (32 GB), or a Mac with 24+ GB unified memory like the Mac Mini M4 Pro 24GB. For Q8_0 at 34 GB, you’ll need a Mac Mini M4 Pro 48GB or a workstation GPU like the RTX A6000.

R1 70B — Near-Frontier Quality

The 70B (Llama 3.3-based) is the largest distilled model and captures the most reasoning capability from the full 671B. At 43.5 GB for Q4_K_M, this is firmly in workstation or Mac Studio territory.

Runs on: Mac Mini M4 Pro 48GB (tight), Mac Studio M4 Max 64GB, Mac Studio M4 Max 128GB, RTX A6000 (48 GB), or RTX Pro 6000 Blackwell. Dual consumer GPU setups can also work if Ollama is configured for multi-GPU.

R1 671B — The Full Model

The 671B is the original teacher model — a Mixture-of-Experts architecture with 671 billion parameters. At 362 GB for Q4_K_M, this is strictly datacenter or extreme enthusiast territory (think Mac Studio M4 Ultra 512GB or a multi-GPU server). Most people should skip this and run one of the distilled variants.

Which Size Should You Start With?

If you want to try R1 today with whatever hardware you have: Start with the 14B at Q4_K_M. It needs ~10 GB VRAM, runs on a wide range of GPUs and Macs, and gives you a genuine taste of what R1-style reasoning looks like. It’s the best ratio of quality to hardware cost in the lineup.

If you have a high-end GPU (24 GB): Go for the 32B. The jump in reasoning quality over 14B is substantial, and it fits on an RTX 3090 or 4090 at Q4.

If you have a Mac with 48+ GB: The 70B is within reach and delivers near-frontier reasoning. Apple Silicon handles these large models well thanks to unified memory, even if token generation is slower than on a dedicated GPU.

Not sure what fits your hardware? Use our compatibility checker — select your GPU or Mac and see exactly which R1 variants run at each quantization level.