Qwen 2.5 VL 7B

Name: Qwen 2.5 VL 7B
Author: Alibaba

Apache 2.0

Alibaba · 7B · transformer-decoder

🤗 HuggingFace Ollama Official

2025-01-26 33K context 7B params

Use Cases

chat vision reasoning multilingual

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	7.0 GB	Good	—
Q8_0	8	10.5 GB	Good	—
F16	16	18.5 GB	Excellent	—

About this model

Qwen 2.5 VL 7B is Alibaba's vision-language model that brings multimodal understanding to an efficient 7 billion parameter size. It can process images and video alongside text, supporting tasks like visual question answering, document understanding, and image description. The model features strong multilingual capabilities and competitive performance on vision benchmarks, making it an excellent choice for local multimodal applications that need to run on consumer GPUs with moderate VRAM.

Benchmarks

70.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run qwen2.5vl:7b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 7B
Architecture: transformer-decoder
Context: 33K tokens
Min VRAM: 7.0 GB
Recommended: 7.0 GB
Family: Qwen 2.5
Released: 2025-01-26
License: Apache 2.0