Llama 3.2 Vision 90B

Name: Llama 3.2 Vision 90B
Author: Meta

Llama 3.2 Community License

Meta · 90B · transformer-decoder

🤗 HuggingFace Ollama Official

2024-09-25 131K context 90B params

Use Cases

chat vision reasoning multilingual summary writing

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	50.0 GB	Good	—
Q8_0	8	96.0 GB	Good	—

About this model

Llama 3.2 Vision 90B is Meta's largest multimodal model, combining powerful text generation with advanced image understanding capabilities. It delivers state-of-the-art performance on visual reasoning, document analysis, chart understanding, and image captioning tasks. With 90 billion parameters and a 128K context window, this model represents the top tier of Meta's vision-language offerings, providing significantly stronger visual comprehension and reasoning compared to the smaller 11B variant.

Benchmarks

86.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run llama3.2-vision:90b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 90B
Architecture: transformer-decoder
Context: 131K tokens
Min VRAM: 50.0 GB
Recommended: 50.0 GB
Family: Llama 3
Released: 2024-09-25
License: Llama 3.2 Community License