Kimi K2.5

Name: Kimi K2.5
Author: Moonshot AI

by Moonshot AI · kimi family

1040B

parameters

text-generation code-generation reasoning multilingual vision tool-use math

Kimi K2.5 is Moonshot AI's flagship open-weight model — a 1.04 trillion parameter Mixture-of-Experts with 32B active parameters per token. It employs 384 experts with 8 activated per forward pass, using Multi-head Latent Attention (MLA) to cut memory bandwidth by 40-50%. Trained on 15 trillion mixed visual and text tokens, it delivers state-of-the-art coding (76.8% SWE-Bench Verified) and agentic capabilities with Agent Swarm technology coordinating up to 100 sub-agents. At 374 GB even at aggressive 2-bit quantization, Kimi K2.5 demands enterprise-grade hardware — multiple high-VRAM GPUs or a Mac with 400 GB+ unified memory. The native INT4 weights from Quantization-Aware Training make 4-bit quantization practically lossless compared to FP16. Available on Ollama with a cloud-backed tag for those without the local resources.

Quick Start with Ollama


ollama run latest

Resources Ollama Hugging Face Official Page Research Paper

Creator	Moonshot AI
Parameters	1T
Architecture	mixture-of-experts
Context	250K tokens
Released	Jan 15, 2026
License	Modified MIT
Ollama	kimi-k2.5

Quantization Options

Format	File Size	VRAM Required	Ollama Tag
Q2_K rec	374 GB	390 GB	`latest`
Q4_K_M	580 GB	600 GB	`q4_K_M`
Q8_0	1040 GB	1060 GB	`q8_0`

Compatible Hardware

Q2_K requires 390 GB VRAM

Compatible Hardware

Hardware	VRAM	Type	Fit	Est. Speed
Mac Studio M4 Ultra 512GB	512 GB	mac	Runs	~2 tok/s

106 hardware device(s) cannot run this model at Q2_K.

Benchmark Scores

87.1

mmlu