Blog

Guides, comparisons, and tips for running AI models locally on your own hardware.

comparisons May 2, 2026

Best AI Models to Run on Mac Mini, MacBook Pro & Mac Studio in 2026

Which AI models can your Mac actually run? A practical guide to local LLMs on Apple Silicon — from Mac Mini M4 Pro to Mac Studio M4 Ultra, with specific model recommendations for each memory config.

mac apple-silicon mac-mini mac-studio

comparisons May 2, 2026

Gemma 3 vs Llama 3 — Which Runs Better on Your Hardware?

Head-to-head comparison of Google's Gemma 3 and Meta's Llama 3 for local inference. Compare VRAM requirements, performance, and quality at each size tier to pick the right model for your GPU.

gemma-3 llama-3 comparison vram

guides May 2, 2026

DeepSeek R1 VRAM Requirements — Every Size from 1.5B to 671B

Complete VRAM requirements guide for all DeepSeek R1 model sizes. Find out exactly how much memory you need to run R1 locally with Ollama at different quantization levels.

deepseek deepseek-r1 vram ollama

guides May 2, 2026

Qwen 2.5 Complete Guide — Coder vs Base, VRAM Requirements & Best Sizes

Everything you need to know about running Qwen 2.5 models locally. Compare Qwen 2.5 base vs Coder variants, VRAM requirements for each size, and which quantization to use with Ollama.

qwen qwen-2.5 qwen-2.5-coder vram

news May 2, 2026

5 New Models You Should Try on Ollama Right Now (May 2026)

GLM-5.1, Kimi K2.5, Nemotron Ultra, Granite 3.3, and SmolLM2 are the latest additions to Ollama. Here's what each one does, how much VRAM they need, and which hardware can run them.

ollama new-models glm-5 kimi

guides May 2, 2026

What AI Models Can You Run with 8, 12, 16, 24, or 32 GB VRAM?

A practical breakdown of which AI models fit at each VRAM tier. From budget 8 GB cards to the RTX 5090, find the best Ollama models for your exact amount of VRAM.

vram gpu ollama quantization

news Apr 12, 2026

Gemma 4 Is Here: Google's Most Capable Open Model Family

Google's Gemma 4 ships four model sizes under Apache 2.0 — from 2B edge models to a 31B dense powerhouse that rivals 400B+ rivals. Here's what it means for local AI.

gemma-4 google open-weight local-ai

comparisons Feb 15, 2026

Best GPU for Running AI Models Locally in 2026

An honest look at the best GPUs for local LLM inference in 2026 — covering the new RTX 50-series, the used market bargains, and why a 3-year-old card might still be your best bet.

gpu nvidia amd rtx-5090

guides Feb 12, 2026

Build Your Own Local ChatGPT with a Mac Mini, Ollama, and Open WebUI

How to set up a self-hosted AI chat server using a Mac Mini for inference and a Raspberry Pi for the web interface — accessible from anywhere, running for about $1/month.

mac-mini ollama open-webui homelab

guides Feb 10, 2026

How to Run Llama Locally: From Llama 3 to Llama 4

Step-by-step guide to running Meta's Llama models on your own hardware with Ollama — covering Llama 3.x, Llama 4 Scout, and how to pick the right model for your GPU.

llama llama-4 ollama setup

explainers Feb 5, 2026

What is Quantization? How It Lets You Run AI on a Regular GPU

A plain-English explanation of AI model quantization — what Q4_K_M, Q8_0, and the newer formats actually mean, and how to pick the right one for your hardware.

quantization vram optimization beginner