Skip to content

Llama 3.2 Vision 90B

Llama 3.2 Community License

Meta · 90B · transformer-decoder

2024-09-25 131K context 90B params

Use Cases

chat vision reasoning multilingual summary writing

Quantization Options

QuantBitsVRAMQualityStatus
Q4_K_Mrec450.0 GBGood
Q8_0896.0 GBGood

About this model

Llama 3.2 Vision 90B is Meta's largest multimodal model, combining powerful text generation with advanced image understanding capabilities. It delivers state-of-the-art performance on visual reasoning, document analysis, chart understanding, and image captioning tasks. With 90 billion parameters and a 128K context window, this model represents the top tier of Meta's vision-language offerings, providing significantly stronger visual comprehension and reasoning compared to the smaller 11B variant.

Benchmarks

86.0
mmlu