DeepSeek R1 70B
MITDeepSeek · 70B · transformer-decoder
2025-01-20 131K context
70B params
Use Cases
chat code reasoning math writing summary
Quantization Options
About this model
DeepSeek R1 70B is the largest distilled reasoning model in the DeepSeek R1 series, based on the Llama 3.3 70B architecture. It captures the most reasoning capability from the full DeepSeek R1 671B model through distillation, delivering exceptional performance on complex reasoning tasks.
This model approaches the reasoning quality of the full R1 model on many benchmarks while requiring far less compute. It excels at advanced mathematics, competitive programming, scientific reasoning, and complex analytical tasks. Multi-GPU setups are recommended for comfortable inference.
Benchmarks
85.5
mmlu