Skip to main content

Ollama on Mac: Run Local Models and Use Them on iPhone

How to install Ollama on macOS, run local models like Gemma 4 or Qwen on a Mac mini or MacBook, connect to them from Cumbersome on Mac and iPhone, expose them on your LAN, reach them remotely over HTTPS with Tailscale, and keep logs off disk for more privacy.

Published Last updated

Qwen 3.5 small models punch way above their weight. The chart is from when I first wrote this guide. The setup works the same for Gemma 4 and whatever else Ollama ships next.

I have been spending more time with local models lately. I think a lot more people should try them. The model lineup moves fast, but the plumbing does not: install Ollama, pull a model, point Cumbersome at localhost:11434/v1, and you are off.

Four reasons to run AI locally:

  • Cheap. You pay for the Mac and the electricity. There's no per-token bill, no monthly subscription, no metered API.
  • Private. The model runs on hardware you control. Prompts never leave your machine unless you deliberately send them somewhere.
  • Increasingly capable. Local models used to feel like a science fair project. That changed fast.
  • Low barrier. If you already have Apple silicon sitting around, the setup takes about ten minutes.

Ollama's model library (opens in new tab) is the catalog. Right now I reach for Gemma 4 (opens in new tab) on a Mac mini (gemma4:e4b or gemma4:12b depending on RAM) and still keep Qwen around for comparison. Gemma 4's edge variants (e2b, e4b) are built for laptops and smaller machines, with 128K context on the small sizes and multimodal support on most variants. Qwen 3.5 remains a solid option if you want a different flavor or already have pulls cached. So do Llama, Mistral, DeepSeek, and dozens of others. Pick what fits your RAM and task, not what was trending when someone wrote a blog post.

If you want a rough idea of what your hardware can support before you download a 20 GB artifact, ModelFit (opens in new tab) is a useful sanity check.

I originally bought a Mac mini for OpenClaw (opens in new tab) experiments. Then I got frustrated by WhatsApp constantly disconnecting from it. That detour pushed me toward running models locally instead of wiring up another cloud dependency, and I am glad it did. Small and mid-size open models deliver an absurd amount of useful AI for a fraction of the compute cost of the largest cloud options.

This guide covers the full path: install Ollama on a Mac, pull a local model, connect to it from Cumbersome on that same Mac, then make it available to an iPhone or another Mac on your local network or over Tailscale. I will also cover the simplest privacy hardening so your "local" setup does not quietly spray logs all over disk.

My Setup

A base M4 Mac mini with 16 GB of RAM. Nothing exotic. Enough for gemma4:e4b, qwen3.5:4b, and similar edge-sized models.

My Mac mini M4 handles those models capably. Not magically. Capably. That distinction matters.

If you are expecting flagship cloud performance on every hard reasoning task, recalibrate. That's not what this is. But for private writing help, summarization, brainstorming, coding assistance, and general-purpose "think with me" AI on your own hardware, this setup works well. I use it daily.

Step 1: Install Ollama on Your Mac

Ollama has a Quickstart (opens in new tab), but here is the Mac version in plain English:

  1. Go to ollama.com/download (opens in new tab) and grab the macOS app.
  2. Open the download and move it into Applications if macOS asks.
  3. Launch Ollama once. A background service starts automatically.
  4. Leave it running. Ollama now serves a local API on http://localhost:11434.

Two endpoints matter:

Cumbersome uses the second one.

Step 2: Pull a Local Model

Open Terminal. Pick a model from Ollama's library (opens in new tab). Here are sensible starting points on Apple silicon:

ollama pull gemma4:e4b
ollama pull gemma4:12b

Or, if you already have Qwen pulls cached:

ollama pull qwen3.5:4b
ollama pull qwen3.5:9b

Rough sizing guide (check the library page for current disk footprints):

  • gemma4:e2b and gemma4:e4b are edge-oriented. e4b is about 9.6 GB and my default on a 16 GB Mac mini when I want Gemma 4.
  • gemma4:12b is about 7.6 GB with a 256K context window. Stronger, still practical on 16 GB if you are not running ten other hungry apps.
  • qwen3.5:4b and qwen3.5:9b remain good lightweight options if you prefer that family.
  • Larger tags (gemma4:26b, gemma4:31b, qwen3.5:27b, and up) need significantly more RAM and are a different hardware conversation.

You can pull more than one and switch in Cumbersome's model dropdown. That is the whole point of Ollama: one server, many models.

Step 3: Connect Cumbersome to Ollama

Download Cumbersome on the Mac where Ollama is running. Then add Ollama as an OpenAI-compatible provider.

Point Cumbersome at http://localhost:11434/v1. Leave the API key blank.

The settings:

  • Provider Name: Local or Ollama (your choice)
  • Base URL: http://localhost:11434/v1
  • API key: leave it blank

The latest version of Cumbersome makes the API key optional, so you can leave it empty for a local Ollama provider. This matches how Ollama works: its OpenAI compatibility docs (opens in new tab) note the key is ignored anyway. If you are on an older build that still requires a value, type any string at all, such as ollama, and Ollama will ignore it.

Quick sanity check (Terminal). On the same Mac where Ollama is running, run curl -sS http://localhost:11434/v1/models. On that machine 127.0.0.1 and localhost are the same for this check; use whichever host form you use here in Cumbersome's base URL too.

If you don't get JSON like this (including a data array of models), the problem is not Cumbersome or your API key string. Check Ollama installation, that the daemon is running, port 11434, firewall rules, and anything else blocking loopback on that Mac, then try again.

Save the provider and start a conversation. If everything is working, your pulled Ollama models show up in the model dropdown.

Screenshot shows Qwen 3.5, but any model you pulled (Gemma 4, Llama, Mistral, etc.) appears here. Set the Title Model to the same model you plan to chat with.

One practical tip: set the Title Model to match your main chat model. In local setups, bouncing between one model for conversation and a different one for auto-generated titles adds latency you do not need. Keep them the same.

Step 4: Turn Thinking Off (When the Model Supports It)

Some local models ship with a "thinking" or reasoning mode. Qwen 3.5 and Gemma 4 both do. My strongest recommendation for everyday use: leave it off unless you are deliberately testing reasoning on a hard prompt.

With thinking enabled, you can ask for a single fact and get three pages of internal deliberation first. The screenshots below use qwen3.5:9b, but the behavior is the same class of annoyance on other thinking-capable models.

To find the setting, tap the + button at the bottom of the chat composer. That opens the advanced features panel:

The AI Reasoning toggle lives behind the + button. Yes, people complain that the plus icon buries these controls. We are following the same pattern ChatGPT and Claude use: they both hide their little knobs behind a plus-style menu too.

Thinking should be off by default. Here is a concrete example of why.

I asked qwen3.5:9b a dead-simple prompt: "tell me an interesting fact about flowers."

With thinking off:

Clean, direct answer. Fast. No drama.

With thinking on:

An enormous amount of internal throat-clearing for a one-sentence flower fact.

For everyday local use, thinking off wins:

  • faster responses
  • less rambling
  • no multi-page internal monologues
  • better fit for writing, summarization, and utility work

You can always flip it back on for a single hard prompt. I wouldn't leave it on by default.

Step 5: Expose Ollama to Your Local Network

Running Ollama on the same Mac is useful. Running it from your phone on the couch is better.

Open Ollama's settings and flip the network toggle:

"Expose Ollama to the network" turns your Mac from localhost-only into a local AI server for every device on your LAN.

With this on, other devices on the same Wi-Fi can reach Ollama.

Step 6: Find Your Mac's Local IP Address

You need the Mac's LAN IP to point other devices at it.

Two quick ways:

  1. System SettingsWi-Fi → click your current network → look for the IP address.
  2. In Terminal:
ipconfig getifaddr en0

You are looking for something like 192.168.0.108. Build the base URL from that:

http://192.168.0.108:11434/v1

Same Ollama server, addressed over the local network instead of localhost.

Step 7: Use It from Your iPhone or Another Mac

On your iPhone (or a second Mac), add another OpenAI-compatible provider in Cumbersome using the LAN address instead of localhost:

Same setup, different device. Swap localhost for the Mac's LAN IP and you are running local models from your phone.

That's it. The Mac does the heavy lifting. The phone or second Mac is just a client, with a much nicer interface than poking at a terminal or a browser tab.

Three things need to be true:

  • the Mac running Ollama stays on and awake
  • both devices are on the same network
  • Cumbersome points to the LAN URL, not localhost

Privacy Tips: Keep the Good Part Local

The whole point of running local is that your prompts stay on hardware you control. Do not undermine that by leaving a trail of logs behind.

1. Only expose the network when you need it

If you are only using Ollama from the same Mac, leave network exposure off. One machine, one process, no surface area.

When you turn it on for same-network phone or laptop access, understand what that means: the Mac is now serving AI to every device on your LAN. That's still vastly more private than a cloud provider, but it's no longer "this process only talks to itself."

2. Drop routine logs

The simplest privacy move is to stop writing request logs to disk. If you start Ollama manually from a shell, redirect stdout to nowhere and only keep errors:

ollama serve >/dev/null 2>>"$HOME/Library/Logs/ollama-error.log"

If a helper or wrapper insists on writing to a specific log path, you can also symlink it to /dev/null:

ln -sf /dev/null /path/to/whatever.log

Blunt, but effective. If your prompts are privacy-sensitive, do not casually keep request traces you do not need.

3. Watch for cloud features

Ollama now includes cloud models (opens in new tab) and cloud-based web search. The web-search piece is probably useful for a lot of tasks. I haven't played with it yet.

But if the reason you set all of this up is privacy and keeping everything on your own hardware, leave the cloud features off. The moment you enable cloud models or cloud search, you are back in a hybrid setup. Local means local.

Practical Recommendations

The short version:

  • Start with an edge-sized model that fits your RAM (gemma4:e4b, qwen3.5:4b, or similar).
  • Step up (gemma4:12b, qwen3.5:9b) when you want more capability and have headroom.
  • Set the Title Model to match your main chat model.
  • Leave thinking off by default on models that support it.
  • Use localhost on the Ollama machine, LAN IP on everything else.
  • Keep logs off disk unless you genuinely need them.

The Tradeoffs

Local AI is not free. You are paying in three currencies: hardware, electricity, and patience when a small model decides to be weird.

But compared with paying cloud token bills in perpetuity, I think this trade is getting more attractive every few months. The small models keep getting better. The hardware keeps getting cheaper.

A well-chosen local model will not match ChatGPT or Claude on every hard task. For a surprising amount of everyday use, it gets close enough that the privacy and cost advantages tip the balance.

Part 2: Remote Access with Tailscale

The local-network setup above works when your iPhone and Mac are on the same Wi-Fi. For remote access, I would not open Ollama's port to the public Internet. Use a private mesh network instead.

Tailscale is the easiest version of that for normal humans. It makes your Mac and iPhone behave like they are on the same private network, even when they are on different networks.

One catch matters for Cumbersome: Tailscale connects the devices, but Ollama still serves plain HTTP. Cumbersome requires HTTPS for a remote provider. The fix is Tailscale Serve, which puts a private HTTPS URL in front of Ollama.

Step 8: Create a Tailscale Account

Go to tailscale.com (opens in new tab) and create an account.

Use the same account on every device you want in this private network. In this setup, that means:

  • the Mac where Ollama is running
  • the iPhone, iPad, or second Mac where you run Cumbersome remotely

Step 9: Install Tailscale on the Ollama Mac

On the Mac where Ollama is running, download Tailscale for macOS: tailscale.com/download/macos (opens in new tab).

During setup, macOS may ask you to install or approve a system extension. Do it. Tailscale needs a network extension so it can create the encrypted private network interface and route traffic for your tailnet. Without that system-level networking piece, it cannot make your Mac reachable from your iPhone over Tailscale.

Turn Tailscale on and sign in with the account you created above.

Step 10: Install Tailscale on Your iPhone

Install Tailscale on the iPhone where Cumbersome is installed: Tailscale on the App Store (opens in new tab).

Open it, sign in with the same Tailscale account, and turn Tailscale on.

At this point, your Mac and iPhone are connected privately. They are not connected through HTTPS yet, which is the part Cumbersome needs.

Step 11: Rename Your Mac Before Enabling HTTPS

Before you create certificates, clean up your machine names in Tailscale: login.tailscale.com/admin/machines (opens in new tab).

Once both devices are signed in, the Tailscale admin console shows them as machines in your tailnet. Rename the Ollama Mac before issuing a certificate if the current name includes personal information.

Pick a short, boring machine name with no email address, company name, real name, or other personal information. Something like ollama-mac, home-ai, or gemma-box is enough.

This matters because Tailscale HTTPS uses public TLS certificates. Certificate Transparency logs are public, and the fully qualified machine name can appear there. Access to the device is still private inside your tailnet, but the name itself can be visible publicly.

Step 12: Enable MagicDNS and HTTPS

Open the Tailscale DNS admin page: login.tailscale.com/admin/dns (opens in new tab).

MagicDNS is usually enabled by default. If it is off, turn it on.

Then under HTTPS Certificates, select Enable HTTPS. Tailscale will warn you that machine names and your tailnet DNS name can appear in public certificate logs. That is why the previous step matters.

After this, your Mac gets a MagicDNS name that looks roughly like this:

ollama-mac.example-name.ts.net

Use your actual machine name and tailnet name, not that example.

Step 13: Put HTTPS in Front of Ollama

On the Mac where Ollama is running, first make sure Ollama answers locally:

curl -sS http://127.0.0.1:11434/v1/models

Then run:

tailscale serve --bg 11434

That tells Tailscale to create a private HTTPS reverse proxy for the local service on port 11434. Tailscale terminates HTTPS for your tailnet and forwards the request to Ollama on the same Mac.

Keep "Expose Ollama to the network" turned on (Step 5). With the macOS Tailscale app, I found the Serve proxy could not reach Ollama until network exposure was on. The macOS app runs as a sandboxed system extension, and in that sandbox it cannot reliably reach a loopback-only (127.0.0.1) service. Turning on network exposure makes Ollama listen on all interfaces (*:11434), which is what lets the Tailscale proxy connect. If tailscale serve status shows the proxy but requests hang or fail, this toggle is the first thing to check.

Understand the security tradeoff. Network exposure binds Ollama to every interface, including your local Wi-Fi or Ethernet, not just Tailscale. Ollama has no authentication, so on an untrusted network (a cafe, a shared apartment, an office LAN) anyone who can reach http://your-mac-ip:11434 can use your models and read your prompts. Tailscale adds HTTPS and tailnet access control for the *.ts.net path, but it does not lock down the raw port on your local network. A few ways to handle that:

  • Only enable network exposure when you actually need remote access, and turn it off when you are done. Run tailscale serve off at the same time.
  • On untrusted networks, turn on the macOS firewall and block incoming connections for Ollama, or stay on trusted networks (home) when the port is open.
  • Treat the open port as "anyone on this LAN can talk to my Ollama" and decide if that is acceptable wherever you are sitting.

Check what Tailscale is serving:

tailscale serve status

You should see a URL like:

https://ollama-mac.example-name.ts.net

The Cumbersome base URL is that HTTPS URL plus /v1:

https://ollama-mac.example-name.ts.net/v1

Step 14: Add the Remote Provider in Cumbersome

On the iPhone, keep Tailscale turned on. Then open Cumbersome and add a new OpenAI-compatible provider.

Use:

  • Provider Name: Ollama Remote or Local Remote
  • Base URL: https://ollama-mac.example-name.ts.net/v1
  • API key: leave it blank (or any string on older builds)

Replace ollama-mac.example-name.ts.net with the HTTPS MagicDNS name from tailscale serve status.

That is the missing iPhone step. Do not use the 100.x.y.z Tailscale IP in Cumbersome for this setup. Do not use http://machine-name:11434/v1. Use the HTTPS MagicDNS URL that Tailscale Serve prints, with /v1 at the end.

If it does not connect, check these in order:

  • Ollama is running on the Mac.
  • Tailscale is on and signed in on both devices.
  • The Mac is online in the Tailscale admin console.
  • tailscale serve status on the Mac shows the HTTPS proxy.
  • Cumbersome's base URL ends in /v1.
  • You used the full .ts.net HTTPS name, not the short MagicDNS name.

Gotcha: A Second DNS App Will Break MagicDNS

This one cost me time, so it gets its own heading. If the iPhone reports something like "a server with the specified hostname could not be found," the most likely cause is another DNS app fighting Tailscale for control of DNS on the phone.

I had NextDNS installed. NextDNS sets up its own DNS profile, and iOS let it win. So lookups for my *.tail06cbba.ts.net name never reached Tailscale's resolver, and the hostname looked like it did not exist. Tailscale routing itself was fine the whole time.

Here is how to tell them apart:

  • The .ts.net HTTPS URL fails to resolve, but the raw http://100.x.y.z:11434/v1/models URL loads (after the expected "not secure" HTTP warning). That means routing works and only name resolution is broken. That points at DNS, not Tailscale.

The fastest fix is to disable the other DNS app (NextDNS, AdGuard, Cloudflare 1.1.1.1, or any profile-based DNS or content blocker) while you use Tailscale. After I turned NextDNS off, the .ts.net name resolved immediately and Cumbersome connected.

If you want to keep a custom DNS provider running, do not stack two competing DNS profiles on the phone. Instead, add your provider as a global nameserver in the Tailscale admin console on the DNS page (opens in new tab). Tailscale then handles MagicDNS names like *.ts.net and forwards everything else to your provider, so the two stop fighting.

To turn this remote access off later:

tailscale serve off

Why All the HTTPS Work? (And Why There's No Shortcut Yet)

You might be wondering why you cannot just point Cumbersome at http://machine-name:11434 over Tailscale and skip the certificates entirely. The connection is already private inside your tailnet, so the extra HTTPS step feels redundant.

The reason is Apple's App Transport Security (ATS). On iOS, ATS blocks plain http:// requests by default, including requests to a local Ollama endpoint over Tailscale. Ollama is usually happy on local HTTP, but iOS treats that traffic as insecure and refuses it unless the app explicitly allows it. That is why the reverse proxy with a Tailscale-issued TLS cert is the reliable path today.

People ask why Cumbersome does not just ship an ATS exception so any local HTTP endpoint works out of the box. I went back and forth on this, and for now I am deliberately not adding that escape hatch.

Cumbersome sends your API keys and your full chat content to the provider endpoint. Allowing HTTP broadly would make that traffic readable on the network, which is a genuine privacy risk on untrusted networks. Apple's HTTPS-by-default posture exists for good reason, and I would rather match that conservative default than silently weaken security for everyone to save a few people some setup. It is easier to get security right from the start than to bolt it on after the fact. The HTTPS setup above is the right way to do this.

Bless up! 🙏✨