Getting Started with Local AI
Run AI models privately on your own machine — no API keys, no subscriptions, no data leaving your computer. This guide gets you from zero to chatting with a local model in about 10 minutes.
What you'll set up: Ollama (runs models) + Open WebUI (chat interface) — giving you a private, ChatGPT-like experience running entirely on your hardware.
1. Install Ollama
Ollama is the engine that downloads and runs AI models on your machine. Pick your platform:
Linux
Use the official install script:
curl -fsSL https://ollama.com/install.sh | sh
Windows
Download the installer from ollama.com/download and run the setup wizard.
2. Pull & Run Your First Model
Once installed, pull a model and start chatting. We'll use Llama 3.2 3B — it's small (~2GB download), fast, and runs on virtually any modern hardware:
ollama run llama3.2:3b
Ollama will download the model on first run, then drop you into a chat session.
Type a message and press Enter. Type /bye to exit.
$ ollama run llama3.2:3b
pulling manifest...
pulling model... 100%
>>> What is the capital of France?
The capital of France is Paris.
>>> _
3. Add a Chat UI with Open WebUI
The terminal works, but a proper chat interface is much nicer. Open WebUI gives you a ChatGPT-like experience that connects to your local Ollama instance.
Option A: Docker (Recommended)
If you have Docker installed, this is the fastest way:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000
in your browser. Create an account (stored locally) and start chatting.
Option B: pip install
If you prefer Python:
pip install open-webui && open-webui serve
Opens on http://localhost:8080
by default.
Tip: Open WebUI auto-detects Ollama running on localhost.
If Ollama is on another machine, go to Settings → Connections and set the Ollama URL
(e.g. http://192.168.1.100:11434).
4. Pick the Right Model for Your Hardware
The model you can run depends on your GPU's VRAM (or unified memory on Macs). Here are popular starting points at each VRAM tier:
| Your VRAM | Recommended Model | Command | Best For |
|---|---|---|---|
| 4-6 GB | Llama 3.2 3B | ollama run llama3.2:3b | Quick tasks, older GPUs |
| 8 GB | Llama 3.1 8B | ollama run llama3.1:8b | Great all-rounder |
| 8 GB | Qwen 2.5 Coder 7B | ollama run qwen2.5-coder:7b | Code generation |
| 12-16 GB | DeepSeek R1 14B | ollama run deepseek-r1:14b | Reasoning, math |
| 24 GB | Qwen 2.5 32B | ollama run qwen2.5:32b | High quality, multilingual |
| 48+ GB | Llama 3.3 70B | ollama run llama3.3:70b | Best open-weight quality |
Not sure what VRAM you have? Use our compatibility checker to find exactly which models your hardware can run.
5. Useful Ollama Commands
ollama list List all downloaded models
ollama pull llama3.1:8b Download a model without starting a chat
ollama rm llama3.1:8b Delete a downloaded model to free space
ollama show llama3.1:8b View model details — parameters, quantization, license
ollama serve Start the Ollama API server (runs on port 11434 — needed for Open WebUI)