Skip to content

DeepSeek V3

by DeepSeek · deepseek-v3 family

671B

parameters

text-generation code-generation reasoning multilingual math tool-use creative-writing summarization

DeepSeek V3 is a 671B parameter mixture-of-experts model that rivals top proprietary models across coding, math, and general reasoning benchmarks. It uses an efficient MoE architecture that activates only a fraction of its parameters per token, but all 671B weights must be loaded into VRAM. The model demonstrates particularly strong performance on coding and mathematical tasks, making it a compelling open-weight alternative to GPT-4 class models for users with sufficient hardware resources.

Quick Start with Ollama

ollama run q4_K_M
Resources Ollama Hugging Face Official Page
Creator DeepSeek
Parameters 671B
Architecture mixture-of-experts
Context 128K tokens
Released Dec 26, 2024
License DeepSeek License
Ollama deepseek-v3

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q4_K_M rec 350 GB 362 GB q4_K_M
Q8_0 670 GB 685 GB q8_0

Compatible Hardware

Q4_K_M requires 362 GB VRAM

Compatible Hardware

HardwareVRAMTypeFitEst. Speed
Mac Studio M4 Ultra 512GB512 GBmacRuns~2 tok/s
106 hardware device(s) cannot run this model at Q4_K_M.

Benchmark Scores

88.5
mmlu