Mistral Nemo 12B

Name: Mistral Nemo 12B
Author: Mistral AI

Apache 2.0

Mistral AI · 12B · transformer-decoder

🤗 HuggingFace Ollama Official

2024-07-18 131K context 12B params

Use Cases

chat code reasoning multilingual tools summary

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	9.5 GB	Good	—
Q8_0	8	16.0 GB	Good	—
F16	16	28.0 GB	Excellent	—

About this model

Mistral Nemo 12B was built jointly by Mistral AI and NVIDIA. It features a 128K context window and uses a Tekken tokenizer that's more efficient across languages than prior Mistral models. With 3.4M+ Ollama pulls, it's one of the most popular models at its size. At Q4 it fits on 12 GB GPUs comfortably, making it a strong contender alongside Gemma 3 12B. Excellent at function calling, multilingual tasks, and general instruction following.

Benchmarks

68.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run mistral-nemo:12b-instruct-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 12B
Architecture: transformer-decoder
Context: 131K tokens
Min VRAM: 9.5 GB
Recommended: 9.5 GB
Family: Mistral
Released: 2024-07-18
License: Apache 2.0