On-Device Inference
The model runs on your CPU and GPU. Attach a photo and vision runs through the same local stack. Each reply is computed on your device, not sent to a cloud model provider.
Download an open-weight model and chat offline. Attach a photo and the model sees it on-device. Qwen 3.5 9B outscores models 13 times its size on reasoning benchmarks. Conversations sync through your iCloud. Inference stays on your device.
iPhone, iPad, and Mac
Cloud AI apps charge monthly fees and route your prompts through someone else's servers. You pay whether you use the service or not. You trust the provider with every conversation. If their infrastructure goes down, you wait.
Endemic runs open-weight models on your device's CPU and GPU. Download once, chat offline. Attach a photo and vision stays local too. Inference runs on your hardware, not on someone else's servers. Conversations sync through your personal iCloud when you want them on every device.
A 4B parameter model on a phone won't match a frontier model on datacenter GPUs. If you need the absolute best responses, cloud AI is the right tool. For everyday questions, writing, brainstorming, and code help, Qwen 3.5 and Gemma 4 running locally give you useful answers. Your conversations sync through iCloud, not Folding Sky's servers.
Alibaba's Qwen 3.5 Small series made on-device AI practical. The 9B scores 81.7 on GPQA Diamond and 82.5 on MMLU-Pro, outperforming models over 13 times its size. The 4B fits comfortably in 8GB of RAM. The 0.8B runs on any recent iPhone and keeps up in back-and-forth chat. If you want to run Qwen 3.5 on your Mac through Ollama, Cumbersome handles that. Endemic puts the model directly on your phone.
Google's Gemma 4 lineup adds another option: E2B and E4B for phones and tablets, plus larger tiers for Macs with enough RAM. Both families support on-device vision. Attach a photo and the model reads it locally through the same GGUF stack, no upload step.
Endemic ships the full Qwen 3.5 catalog (0.8B through 9B) plus Gemma 4. The app reads your device's RAM and available storage, then recommends the strongest model that fits. An iPhone with 8GB handles the 4B. iPads and Macs with 16GB or more run the 9B. You pick what works for your hardware.
Models are open-weight GGUF files from public hosting. One download, stored locally, excluded from iCloud backup so it doesn't eat your storage quota. No account, no sign-up. Download and chat.
The model runs on your CPU and GPU. Attach a photo and vision runs through the same local stack. Each reply is computed on your device, not sent to a cloud model provider.
Conversations sync through your personal iCloud across iPhone, iPad, and Mac. Start a chat on your phone, continue on your laptop. No account, no backend.
There is no Folding Sky backend. No analytics on your conversations. Conversations sync through your personal iCloud. Inference stays on your hardware.
Endemic detects your hardware and recommends the strongest model that fits. No guessing about file sizes or RAM requirements.
Download a model over Wi-Fi, then use it anywhere. Airplane mode, subway, backcountry. No API keys and no internet required after the initial download.
Edit any message in a conversation. Set system prompts to shape the model's behavior. The same polished chat UX from Cumbersome, running entirely on your device.
I built Cumbersome for people who want direct API access to cloud AI. But some conversations should not touch a server at all. Journal entries, personal brainstorming, sensitive drafts. I wanted an option where the model ran on my phone and inference stayed local.
I also wanted iCloud sync. Other local LLM apps treat each device as an island: you run a model on your phone, but the conversation is stuck there. Endemic syncs conversations through your personal iCloud container, so you can start a thread on your iPhone and pick it up on your Mac. No backend, no account creation. Just Apple's built-in sync.
When Alibaba released the Qwen 3.5 Small series, the math changed. A 4B model that scores above much larger models on reasoning benchmarks, running at conversational speed on an iPhone. I've since added Gemma 4 and on-device vision, but the core bet is the same: models that stay on your hardware.
Endemic runs open-weight models on your device, built by one person in Beaverton, Oregon.

Built by Peter Bray. Bootstrapper in Beaverton. He sometimes takes on outside work.