Llama 3.2 Vision 90B
Llama 3.2 Community LicenseMeta · 90B · transformer-decoder
2024-09-25 131K context
90B params
Use Cases
chat vision reasoning multilingual summary writing
Quantization Options
About this model
Llama 3.2 Vision 90B is Meta's largest multimodal model, combining powerful text generation with advanced image understanding capabilities. It delivers state-of-the-art performance on visual reasoning, document analysis, chart understanding, and image captioning tasks.
With 90 billion parameters and a 128K context window, this model represents the top tier of Meta's vision-language offerings, providing significantly stronger visual comprehension and reasoning compared to the smaller 11B variant.
Benchmarks
86.0
mmlu