Qwen 2.5 VL 7B
by Alibaba · qwen-2.5 family
7B
parameters
text-generation vision reasoning multilingual
Qwen 2.5 VL 7B is Alibaba's vision-language model that brings multimodal understanding to an efficient 7 billion parameter size. It can process images and video alongside text, supporting tasks like visual question answering, document understanding, and image description. The model features strong multilingual capabilities and competitive performance on vision benchmarks, making it an excellent choice for local multimodal applications that need to run on consumer GPUs with moderate VRAM.
Quick Start with Ollama
ollama run 7b-q4_K_M | Creator | Alibaba |
| Parameters | 7B |
| Architecture | transformer-decoder |
| Context | 32K tokens |
| Released | Jan 26, 2025 |
| License | Apache 2.0 |
| Ollama | qwen2.5vl:7b |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M rec | 4.7 GB | 7 GB | | 7b-q4_K_M |
| Q8_0 | 8 GB | 10.5 GB | | 7b-q8_0 |
| F16 | 15 GB | 18.5 GB | | 7b-fp16 |
Compatible Hardware
Q4_K_M requires 7 GB VRAM
Benchmark Scores
70.0
mmlu