DeepSeek V3
by DeepSeek · deepseek-v3 family
671B
parameters
text-generation code-generation reasoning multilingual math tool-use creative-writing summarization
DeepSeek V3 is a 671B parameter mixture-of-experts model that rivals top proprietary models across coding, math, and general reasoning benchmarks. It uses an efficient MoE architecture that activates only a fraction of its parameters per token, but all 671B weights must be loaded into VRAM. The model demonstrates particularly strong performance on coding and mathematical tasks, making it a compelling open-weight alternative to GPT-4 class models for users with sufficient hardware resources.
Quick Start with Ollama
ollama run q4_K_M | Creator | DeepSeek |
| Parameters | 671B |
| Architecture | mixture-of-experts |
| Context | 128K tokens |
| Released | Dec 26, 2024 |
| License | DeepSeek License |
| Ollama | deepseek-v3 |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M rec | 350 GB | 362 GB | | q4_K_M |
| Q8_0 | 670 GB | 685 GB | | q8_0 |
Compatible Hardware
Q4_K_M requires 362 GB VRAM
Benchmark Scores
88.5
mmlu