Nemotron Ultra 253B
by NVIDIA · nemotron family
parameters
Nemotron Ultra 253B is NVIDIA's most capable open-weight reasoning model, derived from Llama 3.1 405B and compressed to 253B parameters using Neural Architecture Search (NAS). It delivers state-of-the-art performance on math, coding, and complex reasoning benchmarks while fitting on a single 8xH100 node at FP8 precision. The model features a dual-mode system supporting both standard chat and explicit chain-of-thought reasoning, toggled via system prompt. It supports a 128K context window and excels at tool calling, RAG, and agentic workflows. With multilingual support for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, it is one of the most versatile open-weight models available.
Quick Start with Ollama
ollama run 253b-q4_K_M | Creator | NVIDIA |
| Parameters | 253B |
| Architecture | transformer-decoder |
| Context | 128K tokens |
| Released | Apr 7, 2025 |
| License | NVIDIA Open Model License |
| Ollama | nemotron-ultra:253b |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M rec | 151 GB | 155 GB | | 253b-q4_K_M |
| Q8_0 | 269 GB | 275 GB | | 253b-q8_0 |
| F16 | 506 GB | 508 GB | | 253b-fp16 |
Compatible Hardware
Q4_K_M requires 155 GB VRAM