Llama 3.2 3B

by Meta · llama-3 family

3B

parameters

text-generation code-generation multilingual summarization

Llama 3.2 3B is a lightweight model from Meta designed for edge deployment and on-device inference. Despite its small size, it delivers surprisingly capable performance for text generation, summarization, and basic coding tasks. This model is ideal for users with limited hardware who still want a capable assistant. It runs comfortably on most modern laptops and even some mobile devices, making it one of the most accessible models in the Llama family.

Quick Start with Ollama

ollama run 3b-instruct-q8_0
Creator Meta
Parameters 3B
Architecture transformer-decoder
Context Length 128K tokens
License Llama 3.2 Community License
Released Sep 25, 2024
Ollama llama3.2

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q4_K_M 1.8 GB 3.3 GB
3b-instruct-q4_K_M
Q8_0 recommended 2.7 GB 5 GB
3b-instruct-q8_0
F16 5.7 GB 8 GB
3b-instruct-fp16

Compatible Hardware for Q8_0

Showing compatibility for the recommended quantization (Q8_0, 5 GB VRAM).

Benchmark Scores

63.4
mmlu