Llama 3.1 8B

by Meta · llama-3 family

8B

parameters

text-generation code-generation multilingual tool-use summarization

Llama 3.1 8B is Meta's versatile mid-range model offering strong performance across text generation, coding, and multilingual tasks. It supports the full 128K context window and includes native tool-use capabilities. This model strikes an excellent balance between capability and resource requirements, running well on consumer GPUs with 8-16GB VRAM. It is one of the most popular open-source models for local inference and serves as a strong baseline for many use cases.

Quick Start with Ollama

ollama run 8b-instruct-q8_0
Creator Meta
Parameters 8B
Architecture transformer-decoder
Context Length 128K tokens
License Llama 3.1 Community License
Released Jul 23, 2024
Ollama llama3.1

Quantization Options

Format File Size VRAM Required Quality Ollama Tag
Q4_K_M 4.1 GB 6.3 GB
8b-instruct-q4_K_M
Q8_0 recommended 7.2 GB 10 GB
8b-instruct-q8_0
F16 15.2 GB 18 GB
8b-instruct-fp16

Compatible Hardware for Q8_0

Showing compatibility for the recommended quantization (Q8_0, 10 GB VRAM).

Benchmark Scores

73.0
mmlu