Llama 4 Scout (109B/17B active)

by Meta · llama-4 family

109B

parameters

text-generation code-generation reasoning multilingual vision math tool-use creative-writing summarization

Llama 4 Scout is Meta's mixture-of-experts model with 109B total parameters but only 17B active per token across 16 experts. It's natively multimodal (text + images) and supports an unprecedented 10M token context window. At Q4 it needs about 72 GB — too large for a single consumer GPU but fits on Macs with 96-128 GB unified memory, or multi-GPU setups. Despite the large memory footprint, inference speed benefits from only 17B active params. The most capable open-weight model from Meta.

Quick Start with Ollama

ollama run scout-q4_K_M
Resources Ollama Hugging Face Official Page
Creator Meta
Parameters 109B
Architecture mixture-of-experts
Context 512K tokens
Released Apr 5, 2025
License Llama 4 Community License
Ollama llama4:scout

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q4_K_M rec 67 GB 72 GB
scout-q4_K_M
Q8_0 117 GB 125 GB
scout-q8_0

Compatible Hardware

for Q4_K_M (72 GB VRAM)

Compatible Hardware

Hardware VRAM Type Fit Est. Speed
Mac Studio M4 Ultra 512GB 512 GB mac Runs ~11 tok/s
Mac Pro M2 Ultra 192GB 192 GB mac Runs ~11 tok/s
Mac Studio M4 Ultra 192GB 192 GB mac Runs ~11 tok/s
Mac Studio M4 Max 128GB 128 GB mac Runs ~8 tok/s
MacBook Pro M4 Max 128GB 128 GB mac Runs ~8 tok/s
MacBook Pro M3 Max 96GB 96 GB mac Runs ~6 tok/s
Mac mini M4 Pro 64GB 64 GB mac CPU Offload ~4 tok/s
Mac Studio M4 Max 64GB 64 GB mac CPU Offload ~8 tok/s
MacBook Pro M4 Max 64GB 64 GB mac CPU Offload ~8 tok/s
Mac mini M4 Pro 48GB 48 GB mac CPU Offload ~4 tok/s
MacBook Pro M3 Max 48GB 48 GB mac CPU Offload ~6 tok/s
MacBook Pro M4 Max 48GB 48 GB mac CPU Offload ~8 tok/s
MacBook Pro M4 Pro 48GB 48 GB mac CPU Offload ~4 tok/s
52 hardware device(s) cannot run this model configuration.

Benchmark Scores

80.0
mmlu