Llama 3.1 8B
by Meta · llama-3 family
8B
parameters
text-generation code-generation multilingual tool-use summarization
Llama 3.1 8B is Meta's versatile mid-range model offering strong performance across text generation, coding, and multilingual tasks. It supports the full 128K context window and includes native tool-use capabilities. This model strikes an excellent balance between capability and resource requirements, running well on consumer GPUs with 8-16GB VRAM. It is one of the most popular open-source models for local inference and serves as a strong baseline for many use cases.
Quick Start with Ollama
ollama run 8b-instruct-q8_0 | Creator | Meta |
| Parameters | 8B |
| Architecture | transformer-decoder |
| Context Length | 128K tokens |
| License | Llama 3.1 Community License |
| Released | Jul 23, 2024 |
| Ollama | llama3.1 |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M | 4.1 GB | 6.3 GB |
★
★
★
★
★
| 8b-instruct-q4_K_M |
| Q8_0 recommended | 7.2 GB | 10 GB |
★
★
★
★
★
| 8b-instruct-q8_0 |
| F16 | 15.2 GB | 18 GB |
★
★
★
★
★
| 8b-instruct-fp16 |
Compatible Hardware for Q8_0
Showing compatibility for the recommended quantization (Q8_0, 10 GB VRAM).
Compatible Hardware
Benchmark Scores
73.0
mmlu