Qwen 2.5 32B
by Alibaba · qwen-2.5 family
32B
parameters
text-generation code-generation reasoning multilingual tool-use math creative-writing summarization
Qwen 2.5 32B is a powerful model from Alibaba that delivers excellent performance across reasoning, coding, and multilingual tasks. With 32 billion parameters, it occupies a sweet spot between efficiency and capability that makes it highly popular for local deployment. The model supports 128K context, tool use, and structured output generation. At Q4 quantization, it fits on a single high-end consumer GPU, offering near-70B-class performance at a fraction of the resource cost.
Quick Start with Ollama
ollama run 32b-instruct-q4_K_M | Creator | Alibaba |
| Parameters | 32B |
| Architecture | transformer-decoder |
| Context Length | 128K tokens |
| License | Apache 2.0 |
| Released | Sep 19, 2024 |
| Ollama | qwen2.5:32b |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M recommended | 16 GB | 20.7 GB |
★
★
★
★
★
| 32b-instruct-q4_K_M |
| Q5_K_M | 18.7 GB | 23.9 GB |
★
★
★
★
★
| 32b-instruct-q5_K_M |
| Q8_0 | 28.8 GB | 34 GB |
★
★
★
★
★
| 32b-instruct-q8_0 |
Compatible Hardware for Q4_K_M
Showing compatibility for the recommended quantization (Q4_K_M, 20.7 GB VRAM).
Compatible Hardware
9 hardware
device(s) cannot run this model configuration.
Benchmark Scores
83.3
mmlu