Build Your Own Local ChatGPT with a Mac Mini, Ollama, and Open WebUI

Cloud AI subscriptions add up. ChatGPT Plus is $20/month. Claude Pro is $20/month. If you’ve got a Mac Mini or similar hardware sitting around (or you’re willing to buy one), you can build your own private ChatGPT that runs 24/7 for about a dollar a month in electricity. No API keys. No usage limits. Your data stays on your network.

Here’s the full setup.

The Architecture

The idea is simple: split the work between two devices.

Mac Mini (M4) — Runs Ollama for AI inference. Apple Silicon’s unified memory and Metal GPU acceleration make this surprisingly capable for running LLMs.
Raspberry Pi 4 — Runs Open WebUI, a ChatGPT-style web interface. The Pi draws about 5W and serves the frontend.
Tailscale — Connects everything over an encrypted mesh VPN so you can access your AI from anywhere, without exposing anything to the public internet.

You don’t technically need the Pi — you can run everything on the Mac Mini. But splitting it out keeps things clean, and the Pi can also run other services (Pi-hole for ad blocking, Home Assistant, whatever).

What You’ll Need

Mac Mini M4 (16 GB unified memory minimum, 24 GB+ recommended)
Raspberry Pi 4 (4 GB+ RAM) with Raspberry Pi OS
Both connected to your local network
A Tailscale account (free tier works)

Total one-time cost: around $700–900 depending on Mac Mini config. Running cost: roughly $1–2/month in electricity.

Part 1: Set Up Ollama on the Mac Mini

Install Ollama:

brew install ollama

Pull a model. For a 16 GB Mac Mini, Qwen3 14B at Q4 is a solid daily driver — hits 30–60 tokens/sec on M4:

ollama pull qwen3:14b

Other good options for 16 GB:

llama4:scout — Llama 4’s MoE model, 17B active params
gemma3:12b — Google’s open model, strong all-rounder
llama3.1:8b — Reliable, well-tested, lighter on memory

If you have a 24 GB Mac Mini, you can run larger models or use higher quantizations. Check our Mac compatibility pages for specifics.

Enable Remote Access

By default, Ollama only listens on localhost. To let other devices (like the Pi) connect, you need to set the host to 0.0.0.0. On macOS, setting environment variables in your shell profile alone won’t work for the background Ollama service. Use launchctl:

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

Then restart Ollama (quit and reopen, or brew services restart ollama).

Verify it’s accessible from another device:

curl http://<mac-mini-ip>:11434/api/version

Lock Down Privacy

If you want to guarantee nothing leaves your machine:

launchctl setenv OLLAMA_NO_CLOUD "1"

This disables any cloud model features and ensures all inference stays local.

Part 2: Set Up Open WebUI on the Raspberry Pi

Open WebUI gives you a proper web interface — conversation history, multiple model switching, file uploads, the works. It looks and feels like ChatGPT.

Install Docker on the Pi

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

Log out and back in for the group change to take effect.

Run Open WebUI

This is the key command. Point it at your Mac Mini’s Ollama instance:

docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://<mac-mini-ip>:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

Replace <mac-mini-ip> with your Mac Mini’s local IP address.

Open http://<pi-ip>:3000 in a browser. Create an admin account on first launch. You should see your Ollama models available in the model dropdown.

A Note on Pi Compatibility

Open WebUI’s Docker images work on the Pi’s ARM64 architecture, but newer versions can be heavy. If you hit issues with container startup or builds failing, check the Open WebUI GitHub releases for known ARM64 compatibility notes. The Docker approach (pulling a pre-built image rather than building from source) is the most reliable path on Pi hardware.

Part 3: Connect Everything with Tailscale

Tailscale creates an encrypted WireGuard mesh network between your devices. This means you can access your AI from your phone, laptop, or anywhere — without opening ports or dealing with dynamic DNS.

Install on Both Devices

Mac Mini:

brew install tailscale

Raspberry Pi:

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Follow the auth URL to link each device to your Tailscale account.

Use Tailscale IPs

Once connected, each device gets a stable Tailscale IP (usually 100.x.x.x). Update your Open WebUI config to use the Mac Mini’s Tailscale IP:

docker rm -f open-webui

docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://<mac-mini-tailscale-ip>:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

Now you can access http://<pi-tailscale-ip>:3000 from any device on your Tailscale network — your phone on cellular, your laptop at a coffee shop, whatever.

Part 4: Daily Use

Once everything is running, the workflow is simple:

Open http://<pi-tailscale-ip>:3000 on any device
Pick a model from the dropdown
Chat

The Mac Mini handles inference. The Pi serves the UI. Tailscale encrypts everything in between. The Mac Mini draws about 10–15W at idle and spikes to 30–40W during inference. The Pi draws ~5W constantly. Total power cost is roughly $1–2/month depending on your electricity rate.

Recommended Models by Use Case

Use Case	Model	VRAM	Speed on M4
Daily chat	Qwen3 14B	~9 GB	30–60 tok/s
Coding help	Qwen3-Coder 14B	~9 GB	30–60 tok/s
Reasoning/math	DeepSeek R1 14B	~10 GB	25–50 tok/s
Quick tasks	Llama 3.1 8B	~5.5 GB	40–80 tok/s
Multimodal (text + images)	Llama 4 Scout	~10 GB	20–40 tok/s

Why Bother?

Fair question. Cloud AI is convenient and often better. But there are real reasons to run your own:

Privacy. Your conversations never leave your network. No training on your data.
No recurring cost. After the hardware, it’s $1/month in electricity versus $20+/month for subscriptions.
No rate limits. Use it as much as you want.
Offline access. Works without internet (minus the Tailscale remote access, obviously).
Learning. Running this stuff yourself teaches you how these models actually work.

The gap between local open-weight models and cloud APIs has narrowed considerably. A 14B model running on a Mac Mini handles most everyday tasks — writing, brainstorming, summarizing, light coding — without issue. It’s not GPT-5, but for most things, it doesn’t need to be.

Next Steps

Check what models your specific Mac can run
Learn about quantization to optimize what fits in your memory
Browse the full model catalog to find what to try next