Skip to content

DeepSeek V3

DeepSeek License

DeepSeek · 671B · mixture-of-experts

2024-12-26 131K context 671B params

Use Cases

chat code reasoning multilingual math tools writing summary

Quantization Options

QuantBitsVRAMQualityStatus
Q4_K_Mrec4362.0 GBGood
Q8_08685.0 GBExcellent

About this model

DeepSeek V3 is a 671B parameter mixture-of-experts model that rivals top proprietary models across coding, math, and general reasoning benchmarks. It uses an efficient MoE architecture that activates only a fraction of its parameters per token, but all 671B weights must be loaded into VRAM. The model demonstrates particularly strong performance on coding and mathematical tasks, making it a compelling open-weight alternative to GPT-4 class models for users with sufficient hardware resources.

Benchmarks

88.5
mmlu