Llama 3.2 Vision 11B

Name: Llama 3.2 Vision 11B
Author: Meta

Llama 3.2 Community License

Meta · 11B · transformer-decoder

🤗 HuggingFace Ollama Official

2024-09-25 131K context 11B params

Use Cases

chat vision reasoning multilingual summary

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	8.5 GB	Good	—
Q8_0	8	14.0 GB	Good	—
F16	16	26.0 GB	Excellent	—

About this model

Llama 3.2 Vision 11B is Meta's multimodal model capable of understanding both text and images. It can perform visual reasoning, image captioning, document understanding, and visual question answering alongside standard text generation tasks. Built on the Llama 3.2 architecture with a 128K context window, this model brings vision capabilities to a relatively compact size, making it accessible for local deployment on consumer hardware with sufficient VRAM.

Benchmarks

73.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run llama3.2-vision:11b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 11B
Architecture: transformer-decoder
Context: 131K tokens
Min VRAM: 8.5 GB
Recommended: 8.5 GB
Family: Llama 3
Released: 2024-09-25
License: Llama 3.2 Community License