Skip to content

DeepSeek R1 70B

MIT

DeepSeek · 70B · transformer-decoder

2025-01-20 131K context 70B params

Use Cases

chat code reasoning math writing summary

Quantization Options

QuantBitsVRAMQualityStatus
Q4_K_Mrec443.5 GBGood
Q5_K_M550.5 GBGood
Q8_0872.0 GBExcellent

About this model

DeepSeek R1 70B is the largest distilled reasoning model in the DeepSeek R1 series, based on the Llama 3.3 70B architecture. It captures the most reasoning capability from the full DeepSeek R1 671B model through distillation, delivering exceptional performance on complex reasoning tasks. This model approaches the reasoning quality of the full R1 model on many benchmarks while requiring far less compute. It excels at advanced mathematics, competitive programming, scientific reasoning, and complex analytical tasks. Multi-GPU setups are recommended for comfortable inference.

Benchmarks

85.5
mmlu