5 New Models You Should Try on Ollama Right Now (May 2026)
GLM-5.1, Kimi K2.5, Nemotron Ultra, Granite 3.3, and SmolLM2 are the latest additions to Ollama. Here's what each one does, how much VRAM they need, and which hardware can run them.
The Ollama library hit 200+ models in April 2026, and the pace of new releases shows no signs of slowing down. Five models dropped in the last few months that are worth your attention, whether you have 4 GB of VRAM or 96 GB.
GLM-5.1 — The New Open-Source Frontier
Zhipu AI’s GLM-5.1 is a 744B-parameter Mixture-of-Experts model with 40B active parameters per token. It was released in April 2026 and immediately became one of the most discussed models in the local AI community.
Why it matters: GLM-5.1 competes with frontier closed-source models on coding and agentic tasks, supports a 200K+ context window, and is fully open-source. The MoE architecture means it’s far more efficient than the parameter count suggests.
Like its predecessor GLM-5, all expert weights must be loaded into memory despite only 40B activating per token. That means the real VRAM requirements are much higher than the active parameter count suggests:
VRAM requirements (estimated):
| Quantization | VRAM Required |
|---|---|
| Q2_K | ~305 GB |
| Q4_K_M | ~450 GB |
| Q8_0 | ~820 GB |
This is enterprise hardware territory — multi-GPU setups or a Mac Studio M4 Ultra with 192 GB with aggressive quantization and offloading. Ollama also offers a cloud-backed tag for those without the local resources.
ollama run glm-5.1
Kimi K2.5 — The Agent Specialist
Moonshot AI’s Kimi K2.5 is a 1 trillion parameter MoE model built specifically for long-horizon agentic tasks. It’s the largest open-weight model currently available on Ollama and has been generating massive buzz on r/LocalLLaMA.
The catch: at 1T total parameters, even at aggressive 2-bit quantization the model file is 374 GB. Unlike smaller MoE models, all 384 experts must be loaded into memory. This is firmly enterprise or multi-GPU territory, though Ollama offers a cloud-backed tag for those without the local resources.
ollama run kimi-k2.5
Nemotron Ultra 253B — NVIDIA’s Reasoning Powerhouse
NVIDIA’s Nemotron Ultra is a 253B-parameter dense model focused on reasoning, math, and code generation. It builds on the Llama architecture with NVIDIA’s training infrastructure and datasets, and delivers strong benchmark scores that rival much larger models.
VRAM requirements (estimated):
| Quantization | VRAM Required |
|---|---|
| Q4_K_M | ~128 GB |
| Q8_0 | ~253 GB |
This is firmly in Mac Studio Ultra or multi-GPU territory. A Mac Studio M4 Ultra with 192 GB can run it at Q4 with room for context.
ollama run nemotron-ultra
Granite 3.3 8B — Enterprise-Grade, Consumer-Friendly
IBM’s Granite 3.3 is a refreshing counterpoint to the “bigger is better” trend. At 8B parameters, it runs on virtually any modern GPU and focuses on enterprise use cases: structured data extraction, tool use, RAG pipelines, and multilingual support.
VRAM requirements (estimated):
| Quantization | VRAM Required |
|---|---|
| Q4_K_M | ~6 GB |
| Q8_0 | ~10 GB |
| F16 | ~18 GB |
An RTX 3060 12 GB runs Granite 3.3 at Q8 with room to spare. Even an 8 GB card handles Q4 fine.
ollama run granite3.3
SmolLM2 1.7B — Tiny but Capable
Hugging Face’s SmolLM2 is the opposite end of the spectrum. At 1.7B parameters, it needs under 2 GB of VRAM at Q4 and runs on literally anything, including a Raspberry Pi. It’s designed for edge deployment, mobile apps, and situations where you need a local model but don’t have a GPU.
VRAM requirements (estimated):
| Quantization | VRAM Required |
|---|---|
| Q4_K_M | ~1.9 GB |
| Q8_0 | ~2.7 GB |
| F16 | ~4.4 GB |
Every single GPU and Mac in our database can run SmolLM2. If you’ve been wanting to try local AI but thought your hardware wasn’t good enough, this is your entry point.
ollama run smollm2
Which One Should You Try?
It depends on your hardware and what you need:
- Best on 8 GB VRAM: Granite 3.3 8B (Q4) or SmolLM2 1.7B (Q8)
- Best on 24 GB VRAM: Granite 3.3 8B at Q8 for maximum quality at this tier
- Best on 128+ GB Mac: Nemotron Ultra for reasoning tasks
- Enterprise / multi-GPU: GLM-5.1 or Kimi K2.5 for frontier-level performance
Use our compatibility checker to see exactly how each model runs on your specific hardware, or browse the full model list to compare all 100+ models side by side.