DeepSeek V3

Name: DeepSeek V3
Author: DeepSeek

671B

parameters

text-generation code-generation reasoning multilingual math tool-use creative-writing summarization

DeepSeek V3 is a 671B parameter mixture-of-experts model that rivals top proprietary models across coding, math, and general reasoning benchmarks. It uses an efficient MoE architecture that activates only a fraction of its parameters per token, but all 671B weights must be loaded into VRAM. The model demonstrates particularly strong performance on coding and mathematical tasks, making it a compelling open-weight alternative to GPT-4 class models for users with sufficient hardware resources.

Quick Start with Ollama


ollama run q4_K_M

Resources Ollama Hugging Face Official Page

Creator	DeepSeek
Parameters	671B
Architecture	mixture-of-experts
Context	128K tokens
Released	Dec 26, 2024
License	DeepSeek License
Ollama	deepseek-v3

Quantization Options

Format	File Size	VRAM Required	Quality	Ollama Tag
Q4_K_M rec	350 GB	362 GB		`q4_K_M`
Q8_0	670 GB	685 GB		`q8_0`

Compatible Hardware

Q4_K_M requires 362 GB VRAM

Compatible Hardware

Hardware	VRAM	Type	Fit	Est. Speed
Mac Studio M4 Ultra 512GB	512 GB	mac	Runs	~2 tok/s

106 hardware device(s) cannot run this model at Q4_K_M.

Benchmark Scores

88.5

mmlu