5 New Models You Should Try on Ollama Right Now (May 2026)

The Ollama library hit 200+ models in April 2026, and the pace of new releases shows no signs of slowing down. Five models dropped in the last few months that are worth your attention, whether you have 4 GB of VRAM or 96 GB.

GLM-5.1 — The New Open-Source Frontier

Zhipu AI’s GLM-5.1 is a 744B-parameter Mixture-of-Experts model with 40B active parameters per token. It was released in April 2026 and immediately became one of the most discussed models in the local AI community.

Why it matters: GLM-5.1 competes with frontier closed-source models on coding and agentic tasks, supports a 200K+ context window, and is fully open-source. The MoE architecture means it’s far more efficient than the parameter count suggests.

Like its predecessor GLM-5, all expert weights must be loaded into memory despite only 40B activating per token. That means the real VRAM requirements are much higher than the active parameter count suggests:

VRAM requirements (estimated):

Quantization	VRAM Required
Q2_K	~305 GB
Q4_K_M	~450 GB
Q8_0	~820 GB

This is enterprise hardware territory — multi-GPU setups or a Mac Studio M4 Ultra with 192 GB with aggressive quantization and offloading. Ollama also offers a cloud-backed tag for those without the local resources.

ollama run glm-5.1

View GLM-5.1 details

Kimi K2.5 — The Agent Specialist

Moonshot AI’s Kimi K2.5 is a 1 trillion parameter MoE model built specifically for long-horizon agentic tasks. It’s the largest open-weight model currently available on Ollama and has been generating massive buzz on r/LocalLLaMA.

The catch: at 1T total parameters, even at aggressive 2-bit quantization the model file is 374 GB. Unlike smaller MoE models, all 384 experts must be loaded into memory. This is firmly enterprise or multi-GPU territory, though Ollama offers a cloud-backed tag for those without the local resources.

ollama run kimi-k2.5

View Kimi K2.5 details

Nemotron Ultra 253B — NVIDIA’s Reasoning Powerhouse

NVIDIA’s Nemotron Ultra is a 253B-parameter dense model focused on reasoning, math, and code generation. It builds on the Llama architecture with NVIDIA’s training infrastructure and datasets, and delivers strong benchmark scores that rival much larger models.

VRAM requirements (estimated):

Quantization	VRAM Required
Q4_K_M	~128 GB
Q8_0	~253 GB

This is firmly in Mac Studio Ultra or multi-GPU territory. A Mac Studio M4 Ultra with 192 GB can run it at Q4 with room for context.

ollama run nemotron-ultra

View Nemotron Ultra details

Granite 3.3 8B — Enterprise-Grade, Consumer-Friendly

IBM’s Granite 3.3 is a refreshing counterpoint to the “bigger is better” trend. At 8B parameters, it runs on virtually any modern GPU and focuses on enterprise use cases: structured data extraction, tool use, RAG pipelines, and multilingual support.

VRAM requirements (estimated):

Quantization	VRAM Required
Q4_K_M	~6 GB
Q8_0	~10 GB
F16	~18 GB

An RTX 3060 12 GB runs Granite 3.3 at Q8 with room to spare. Even an 8 GB card handles Q4 fine.

ollama run granite3.3

View Granite 3.3 details

SmolLM2 1.7B — Tiny but Capable

Hugging Face’s SmolLM2 is the opposite end of the spectrum. At 1.7B parameters, it needs under 2 GB of VRAM at Q4 and runs on literally anything, including a Raspberry Pi. It’s designed for edge deployment, mobile apps, and situations where you need a local model but don’t have a GPU.

VRAM requirements (estimated):

Quantization	VRAM Required
Q4_K_M	~1.9 GB
Q8_0	~2.7 GB
F16	~4.4 GB

Every single GPU and Mac in our database can run SmolLM2. If you’ve been wanting to try local AI but thought your hardware wasn’t good enough, this is your entry point.

ollama run smollm2

View SmolLM2 details

Which One Should You Try?

It depends on your hardware and what you need:

Best on 8 GB VRAM: Granite 3.3 8B (Q4) or SmolLM2 1.7B (Q8)
Best on 24 GB VRAM: Granite 3.3 8B at Q8 for maximum quality at this tier
Best on 128+ GB Mac: Nemotron Ultra for reasoning tasks
Enterprise / multi-GPU: GLM-5.1 or Kimi K2.5 for frontier-level performance

Use our compatibility checker to see exactly how each model runs on your specific hardware, or browse the full model list to compare all 100+ models side by side.