Endemic app welcome screen showing Your Models, Your Device with feature list

Run Qwen 3.5 On Your iPhone. No Cloud. No API Keys.

Download an open-weight model, chat offline. The Qwen 3.5 9B outscores models 13 times its size on reasoning benchmarks. Your prompts never leave your device.

iPhone, iPad, and Mac

Download Endemic in the App Store

Endemic chat running Qwen 3.5 locally on iPhone with no cloud connection — On-Device Chat

Endemic model catalog showing Qwen 3.5 variants from 0.8B to 9B with device recommendations — Model Catalog

Why Run AI On Your Own Hardware

Cloud AI apps charge monthly subscriptions and route your prompts through someone else's servers. You pay whether you use the service or not. You trust the provider with every conversation. And if their infrastructure goes down, you wait.

Endemic takes the opposite approach. You download an open-weight model once, and from that point on the model runs on your device's CPU and GPU. No request leaves your phone during a conversation. This is not "encrypted cloud" or "private API." The computation happens on the hardware in your hand.

The tradeoff is real: a 4B parameter model on a phone will not match a frontier model running on datacenter GPUs. If you need the absolute best responses, cloud AI is the right tool. But for everyday questions, writing, brainstorming, and code help, Qwen 3.5 running locally produces genuinely useful results. And your prompts stay yours.

Qwen 3.5: Flagship-Class AI That Fits on a Phone

Alibaba's Qwen 3.5 Small series made on-device AI practical. The 9B parameter model scores 81.7 on GPQA Diamond and 82.5 on MMLU-Pro, outperforming models over 13 times its size. The 4B fits comfortably in 8GB of RAM. The 0.8B runs on any recent iPhone and responds fast enough for back-and-forth conversation. If you want to run Qwen 3.5 on your Mac through Ollama, Cumbersome handles that. Endemic puts the model directly on your phone.

Endemic ships with the full Qwen 3.5 lineup: 0.8B, 2B, 4B, and 9B. The app detects your device's RAM and free storage, then recommends the strongest model that fits. An iPhone with 8GB handles the 4B. iPads and Macs with 16GB or more run the 9B. You pick the one that works for your hardware.

Models are open-weight GGUF files downloaded from public hosting. One download, stored locally, excluded from iCloud backup so it does not eat your storage quota. No account, no sign-up, no approval process. Download and chat.

What You Get

On-Device Inference

The model runs on your CPU and GPU. No request leaves your device during a conversation. This is local computation, not "private cloud."

iCloud Sync for Conversations

Conversations sync through your personal iCloud across iPhone, iPad, and Mac. Start a chat on your phone, continue on your laptop. No account, no backend.

Private by Architecture

There is no Folding Sky backend. No analytics on your conversations. Prompts and completions stay on your hardware. iCloud is your container, not ours.

Device-Aware Model Catalog

Endemic detects your hardware and recommends the strongest model that fits. No guessing about file sizes or RAM requirements.

Download Once, Chat Offline

Download a model over Wi-Fi, then use it anywhere. Airplane mode, subway, backcountry. No API keys, no subscriptions, no internet required after the initial download.

Edit Messages and Switch Personas

Edit any message in a conversation. Set system prompts to shape the model's behavior. The same polished chat UX from Cumbersome, running entirely on your device.

Why I Built Endemic

I built Cumbersome for people who want direct API access to cloud AI. But some conversations should not touch a server at all. Journal entries, personal brainstorming, sensitive drafts. I wanted an option where the model lived on my phone and nothing left the device.

I also wanted iCloud sync. Other local LLM apps treat each device as an island: you run a model on your phone, but the conversation is stuck there. Endemic syncs conversations through your personal iCloud container, so you can start a thread on your iPhone and pick it up on your Mac. No backend, no account creation. Just Apple's built-in sync.

When Alibaba released the Qwen 3.5 Small series, the math changed. A 4B model that scores above much larger models on reasoning benchmarks, running at conversational speed on an iPhone. That turned "local AI on a phone" from a novelty into a practical tool.

Endemic is a free app that runs open-weight models, built by one person in Beaverton, Oregon.

See It in Action

Endemic running on Mac showing the chat interface with sidebar and local Qwen 3.5 model — Mac

Built by Peter Bray. Bootstrapper in Beaverton. He sometimes takes on outside work.