Skip to content

Getting Started with Local AI

Run AI models privately on your own machine — no API keys, no subscriptions, no data leaving your computer. This guide gets you from zero to chatting with a local model in about 10 minutes.

What you'll set up: Ollama (runs models) + Open WebUI (chat interface) — giving you a private, ChatGPT-like experience running entirely on your hardware.

1. Install Ollama

Ollama is the engine that downloads and runs AI models on your machine. Pick your platform:

macOS

Install via Homebrew (recommended) or download from ollama.com :

brew install ollama

Linux

Use the official install script:

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com/download and run the setup wizard.

2. Pull & Run Your First Model

Once installed, pull a model and start chatting. We'll use Llama 3.2 3B — it's small (~2GB download), fast, and runs on virtually any modern hardware:

ollama run llama3.2:3b

Ollama will download the model on first run, then drop you into a chat session. Type a message and press Enter. Type /bye to exit.

$ ollama run llama3.2:3b
pulling manifest...
pulling model... 100%

>>> What is the capital of France?
The capital of France is Paris.

>>> _

3. Add a Chat UI with Open WebUI

The terminal works, but a proper chat interface is much nicer. Open WebUI gives you a ChatGPT-like experience that connects to your local Ollama instance.

Option A: Docker (Recommended)

If you have Docker installed, this is the fastest way:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser. Create an account (stored locally) and start chatting.

Option B: pip install

If you prefer Python:

pip install open-webui && open-webui serve

Opens on http://localhost:8080 by default.

Tip: Open WebUI auto-detects Ollama running on localhost. If Ollama is on another machine, go to Settings → Connections and set the Ollama URL (e.g. http://192.168.1.100:11434).

4. Pick the Right Model for Your Hardware

The model you can run depends on your GPU's VRAM (or unified memory on Macs). Here are popular starting points at each VRAM tier:

Your VRAM Recommended Model Command Best For
4-6 GB Llama 3.2 3B ollama run llama3.2:3b Quick tasks, older GPUs
8 GB Llama 3.1 8B ollama run llama3.1:8b Great all-rounder
8 GB Qwen 2.5 Coder 7B ollama run qwen2.5-coder:7b Code generation
12-16 GB DeepSeek R1 14B ollama run deepseek-r1:14b Reasoning, math
24 GB Qwen 2.5 32B ollama run qwen2.5:32b High quality, multilingual
48+ GB Llama 3.3 70B ollama run llama3.3:70b Best open-weight quality

Not sure what VRAM you have? Use our compatibility checker to find exactly which models your hardware can run.

5. Useful Ollama Commands

ollama list

List all downloaded models

ollama pull llama3.1:8b

Download a model without starting a chat

ollama rm llama3.1:8b

Delete a downloaded model to free space

ollama show llama3.1:8b

View model details — parameters, quantization, license

ollama serve

Start the Ollama API server (runs on port 11434 — needed for Open WebUI)

What's Next?