Skip to main content

Local Qwen 3.5 and Gemma 4 on iPhone. No Cloud. No API Keys.

Download an open-weight model and chat offline. Attach a photo and the model sees it on-device. Qwen 3.5 9B outscores models 13 times its size on reasoning benchmarks. Conversations sync through your iCloud. Inference stays on your device.

iPhone, iPad, and Mac

On-Device Chat
Model Catalog

Why Run AI On Your Own Hardware

Cloud AI apps charge monthly fees and route your prompts through someone else's servers. You pay whether you use the service or not. You trust the provider with every conversation. If their infrastructure goes down, you wait.

Endemic runs open-weight models on your device's CPU and GPU. Download once, chat offline. Attach a photo and vision stays local too. Inference runs on your hardware, not on someone else's servers. Conversations sync through your personal iCloud when you want them on every device.

A 4B parameter model on a phone won't match a frontier model on datacenter GPUs. If you need the absolute best responses, cloud AI is the right tool. For everyday questions, writing, brainstorming, and code help, Qwen 3.5 and Gemma 4 running locally give you useful answers. Your conversations sync through iCloud, not Folding Sky's servers.

Qwen 3.5 and Gemma 4: Open Models That Fit Your Device

Alibaba's Qwen 3.5 Small series made on-device AI practical. The 9B scores 81.7 on GPQA Diamond and 82.5 on MMLU-Pro, outperforming models over 13 times its size. The 4B fits comfortably in 8GB of RAM. The 0.8B runs on any recent iPhone and keeps up in back-and-forth chat. If you want to run Qwen 3.5 on your Mac through Ollama, Cumbersome handles that. Endemic puts the model directly on your phone.

Google's Gemma 4 lineup adds another option: E2B and E4B for phones and tablets, plus larger tiers for Macs with enough RAM. Both families support on-device vision. Attach a photo and the model reads it locally through the same GGUF stack, no upload step.

Endemic ships the full Qwen 3.5 catalog (0.8B through 9B) plus Gemma 4. The app reads your device's RAM and available storage, then recommends the strongest model that fits. An iPhone with 8GB handles the 4B. iPads and Macs with 16GB or more run the 9B. You pick what works for your hardware.

Models are open-weight GGUF files from public hosting. One download, stored locally, excluded from iCloud backup so it doesn't eat your storage quota. No account, no sign-up. Download and chat.

What You Get

On-Device Inference

The model runs on your CPU and GPU. Attach a photo and vision runs through the same local stack. Each reply is computed on your device, not sent to a cloud model provider.

iCloud Sync for Conversations

Conversations sync through your personal iCloud across iPhone, iPad, and Mac. Start a chat on your phone, continue on your laptop. No account, no backend.

Private by Architecture

There is no Folding Sky backend. No analytics on your conversations. Conversations sync through your personal iCloud. Inference stays on your hardware.

Device-Aware Model Catalog

Endemic detects your hardware and recommends the strongest model that fits. No guessing about file sizes or RAM requirements.

Download Once, Chat Offline

Download a model over Wi-Fi, then use it anywhere. Airplane mode, subway, backcountry. No API keys and no internet required after the initial download.

Edit Messages and Switch Personas

Edit any message in a conversation. Set system prompts to shape the model's behavior. The same polished chat UX from Cumbersome, running entirely on your device.

Why I Built Endemic

I built Cumbersome for people who want direct API access to cloud AI. But some conversations should not touch a server at all. Journal entries, personal brainstorming, sensitive drafts. I wanted an option where the model ran on my phone and inference stayed local.

I also wanted iCloud sync. Other local LLM apps treat each device as an island: you run a model on your phone, but the conversation is stuck there. Endemic syncs conversations through your personal iCloud container, so you can start a thread on your iPhone and pick it up on your Mac. No backend, no account creation. Just Apple's built-in sync.

When Alibaba released the Qwen 3.5 Small series, the math changed. A 4B model that scores above much larger models on reasoning benchmarks, running at conversational speed on an iPhone. I've since added Gemma 4 and on-device vision, but the core bet is the same: models that stay on your hardware.

Endemic runs open-weight models on your device, built by one person in Beaverton, Oregon.

Mac
Peter with dahlias

Built by Peter Bray. Bootstrapper in Beaverton. He sometimes takes on outside work.