Kimi K2.5
by Moonshot AI · kimi family
parameters
Kimi K2.5 is Moonshot AI's flagship open-weight model — a 1.04 trillion parameter Mixture-of-Experts with 32B active parameters per token. It employs 384 experts with 8 activated per forward pass, using Multi-head Latent Attention (MLA) to cut memory bandwidth by 40-50%. Trained on 15 trillion mixed visual and text tokens, it delivers state-of-the-art coding (76.8% SWE-Bench Verified) and agentic capabilities with Agent Swarm technology coordinating up to 100 sub-agents. At 374 GB even at aggressive 2-bit quantization, Kimi K2.5 demands enterprise-grade hardware — multiple high-VRAM GPUs or a Mac with 400 GB+ unified memory. The native INT4 weights from Quantization-Aware Training make 4-bit quantization practically lossless compared to FP16. Available on Ollama with a cloud-backed tag for those without the local resources.
Quick Start with Ollama
ollama run latest | Creator | Moonshot AI |
| Parameters | 1T |
| Architecture | mixture-of-experts |
| Context | 250K tokens |
| Released | Jan 15, 2026 |
| License | Modified MIT |
| Ollama | kimi-k2.5 |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q2_K rec | 374 GB | 390 GB | | latest |
| Q4_K_M | 580 GB | 600 GB | | q4_K_M |
| Q8_0 | 1040 GB | 1060 GB | | q8_0 |
Compatible Hardware
Q2_K requires 390 GB VRAM