February 23, 2026
I built Cumbersome, an iOS and Mac AI client that talks to APIs from the leading providers all day long. I see the good, the bad, and the ugly from direct providers as well as the newer multi-providers like Vercel AI Gateway and OpenRouter. I recently adjusted the "recommended" provider in the app Settings. It says it all.
You have a lot of options in Cumbersome to add direct AI API keys, as well as "multi-providers." After a long in-the-trenches, we now solidly recommend folks use OpenRouter directly.
Two days ago I published a balanced comparison of Vercel AI Gateway and OpenRouter. I recommended choosing based on which providers you needed ZDR for and how much you spent. After spending the weekend going deeper with both gateways (and reflecting on months of building against every major provider's raw API), I have changed my mind. I now think most people managing their own API keys should route everything through OpenRouter instead of juggling direct provider keys.
The 5.5% platform fee is worth what you get.
Why Use a Multi-Provider Gateway at All
If you are reading this, you probably already manage your own API keys for OpenAI, Anthropic, or Google AI Studio. You know the advantages over subscriptions: pay-per-use pricing, model control, no subscription trap.
But here is what I have noticed after months of daily use. The AI model landscape is a rotating cast. Last month GPT-5.2 was my default for most tasks. This month Claude Sonnet 4.6 handles certain work better. Kimi K2.5 showed up recently and it handles high-volume workloads at a fraction of the cost. The model comparison I wrote a few weeks ago is already partially outdated because new models keep shipping.
Models are becoming commodities. SOTA changes week to week. If you are like me, you are constantly swapping between providers as new models drop and benchmarks shift. That means managing separate API keys for each provider, separate credit balances, separate usage dashboards, and separate billing cycles.
A multi-provider gateway gives you one API key that covers all of them. You get a single balance, a single dashboard, and a single place to manage spending. Cumbersome already makes it easy to switch between providers mid-conversation, but using a gateway underneath means you configure one key and access everything.
Why OpenRouter Specifically
I have used both OpenRouter (opens in new tab) and Vercel AI Gateway. Both are legitimate gateways. But OpenRouter wins on the features that matter in daily use:
- Zero Data Retention that actually covers OpenAI. No enterprise agreement required.
- Per-key spending limits with daily reset. Cap your exposure if a key leaks or a bug runs away.
- Guardrails and provider restrictions. Control which models and providers each key can access.
- Unified web search across providers. One integration, every model family (with caveats).
- Standardized thinking and reasoning. Extended thinking works across OpenAI and Anthropic through a single code path.
- One dashboard for all spending. Every provider, every model, one view.
Zero Data Retention That Actually Covers OpenAI
This is the big one.
OpenRouter offers Zero Data Retention (opens in new tab) across a broad set of endpoints, including OpenAI. Vercel AI Gateway's ZDR list does not include OpenAI. If you use GPT-5.2 and want ZDR, OpenRouter is the only gateway option.
Stored Data Is Discoverable Data
Most providers say they will not train on your API data and will not store it beyond some retention period (often 30 days, sometimes longer). That sounds reasonable until you think about what "stored for 30 days" actually means in practice.
In December 2025, a federal judge ordered OpenAI to hand over 20 million ChatGPT chat logs (opens in new tab) to the New York Times in a copyright lawsuit. OpenAI fought to keep them secret and lost. The logs existed because OpenAI stores them. "We delete after 30 days" does not protect you when a court orders preservation before those 30 days are up. Stored data is discoverable data, regardless of what the privacy policy promises.
Zero Data Retention is fundamentally different. ZDR means no storage beyond the brief in-memory caching needed to process your request. There is nothing to subpoena because nothing was ever saved. It is not "we will delete it soon." It is "we never stored it."
You could get ZDR from providers directly, but that usually requires an enterprise agreement with legal review and enough API volume to justify the special treatment. Consumer apps and direct API usage without that deal still mean 30 days or more of retention. Enterprise ZDR agreements require the kind of leverage that individual developers and small teams do not have. OpenRouter negotiated these agreements on your behalf. You enable ZDR once at the account level and every request routes only to compliant endpoints. In Cumbersome, you set your OpenRouter key, enable ZDR in your OpenRouter account, and it just works.
OpenRouter's ZDR toggle. Enable it once at the account level and every request from Cumbersome routes to zero data retention endpoints only. No enterprise agreement required.
Your Prompts Are Not as Private as You Think
"What do you have to hide?" is the wrong question. The risk is not just about what you are doing with AI today. It is about what happens to stored data tomorrow. Police are already obtaining warrants for reverse keyword searches (opens in new tab), asking Google to reveal everyone who searched for specific terms in a given time window. Courts are upholding this practice. It is not a stretch to imagine the same approach applied to AI prompts. You searched for information about a topic that later became part of an investigation. Your "deleted" data turns out to have been preserved on a backup somewhere. Now you are explaining yourself.
And it is not just law enforcement. Courts are ruling that AI prompts are discoverable in litigation (opens in new tab) and that conversations with consumer AI tools are not protected by attorney-client privilege. A federal judge in the Southern District of New York held this month that a defendant's AI-generated documents were neither privileged nor work product, in part because the AI provider's privacy policy reserved the right to collect inputs and share data with third parties. If the provider stores your prompts, those prompts can be subpoenaed, discovered, and used against you.
Beyond legal exposure, stored data is a target. Breaches happen. Provider databases get compromised. Data ends up on the dark web where adversarial actors can profile you, craft targeted scams, impersonate you, or use your own words to social-engineer access to your accounts. The less data that exists about your AI usage, the smaller your attack surface. ZDR does not just protect your privacy from the provider. It eliminates an entire category of risk.
Per-Key Spending Limits with Daily Reset
If you have ever worried about an API key leaking (or a client app going haywire and burning through credits), OpenRouter has a practical answer. When you create an API key, you can set a credit limit that resets on a schedule: daily, weekly, or monthly.
Set a $5 daily limit. If the key gets compromised or a bug sends a runaway loop of requests, the damage caps at $5 before the key stops working. Next day, it resets and you are back to normal.
Per-key spending limits with daily reset. If a key leaks, the damage is capped.
Guardrails go further. You can restrict specific keys to specific models, set budget caps, and control which providers a key can access. This matters when you are working with expensive models. Some reasoning models cost over $100 per million output tokens. An accidental loop against one of those gets expensive fast. Guardrails put a ceiling on it.
Guardrails restrict which models a key can access, set budget caps, and control routing. A firewall for your AI spending.
Lock Down Your Providers
Guardrails also let you lock down which providers handle your requests. OpenRouter does not always call the model owner directly. It routes through third-party providers: some are gold standard (Azure, Google Vertex, Amazon Bedrock) with SOC 2 and trust centers; others are smaller and harder to verify. ZDR only means something if the provider on the other end actually follows through. With open-source models (DeepSeek, Qwen, etc.), anyone can host. Commercial models go through vetted providers; open-weight models can be served by anyone. Lock down your provider list. OpenRouter lets you do this at the account level under Provider Restrictions, or per guardrail. I selected only US-based, large, SOC 2-compliant providers with published trust centers.
My provider allowlist on OpenRouter.
This list is separate from ZDR. The ZDR toggle routes to zero-retention endpoints regardless. When I use models that do not offer ZDR (some open-source or newer releases), I still want trusted providers. This allowlist does that. Feel free to crib from it. SOC 2 can feel like process theater, but I want providers that have documented controls and something to lose if they cut corners. Net effect: requests go to US providers with demonstrated data-handling practices, and lower latency if you are stateside.
How ZDR Routes Through Providers
For top commercial models like GPT-5.2 and Claude Sonnet 4.6, ZDR through OpenRouter is often not directly through OpenAI or Anthropic themselves. OpenRouter does not appear to have established direct ZDR agreements with those companies. Instead, ZDR routing for these models typically goes through Azure or Google Vertex, where OpenRouter has ZDR agreements in place. This is another reason locking down your provider list matters: you want to make sure ZDR requests land on providers that actually have those agreements, not on a third party that might be serving the model without one.
Web Search That Works Across Providers (With a Big Caveat)
This is a developer concern that translates into a user benefit, but the user experience has real limitations you should know about.
The good part.
AI providers all implement web search differently. OpenAI has their own tool calling format. Anthropic has theirs. Google has Grounding. Each requires different code paths, different parameter handling, and different response parsing. OpenRouter standardized this with their web search plugin (opens in new tab). It works consistently across model families. I added support for it in Cumbersome and it worked on the first try. When I tried implementing Perplexity web search through Vercel AI Gateway, I had to revert it because of compatibility issues with different provider APIs.
OpenRouter also gives you two search engine options. For OpenAI, Anthropic, Perplexity, and xAI models, it can use their native built-in search. For everything else (or by your choice), it uses Exa (opens in new tab), an independent search engine that combines keyword and embedding-based search. This separation of concerns is genuinely useful. It means models like Kimi K2.5, which have no native web search, get search capabilities through OpenRouter. And you can force Exa even on models that have native search if you want a different perspective on the results.
The not-so-good part.
The way OpenRouter implements search has a fundamental design problem that affects daily use. Whether you choose native search or Exa, OpenRouter forces a search on every request. The model never decides whether a search is needed. A preprocessor runs before the model sees your message, searches based on your most recent text, injects the results into the context, and then hands everything to the model. Every single time.
This defeats the purpose of native search for models that have it. GPT-5.2, Claude, Gemini: these models can decide on their own when a web search would help and when it would not. When you use them through their own APIs, the model sees your message first and only triggers a search if the question warrants one. Through OpenRouter, that judgment is stripped away. The preprocessor searches regardless, which means you are paying for search tokens on every request, including ones where the model would have known search was unnecessary. Ask "tell me a story" and OpenRouter searches the web for "tell me a story" before the model even sees your prompt.
This causes three real problems:
-
Search always fires, even when it should not. If you are mid-conversation and type a follow-up like "tell me more" or "crazy story," OpenRouter searches the web for those exact words. It does not consider the conversation context. It does not know you are referring to something discussed three messages ago.
-
The model gets flooded with irrelevant data. Because the search results are injected before the model processes anything, the AI has to integrate a pile of web results that may have nothing to do with what you actually asked. This confuses the model and degrades response quality.
-
It slows everything down. The web search runs on every request, adding latency even when you do not need fresh information from the web. There is no way for the model to skip the search step.
Here is a concrete example from Cumbersome. I asked GPT-5.2 (via OpenRouter) to summarize the plot of the TV show 56 Days. With search enabled, it nailed it. Then I followed up with "crazy story" (meaning the plot I just read about). In the first screenshot, search is off for the follow-up. The AI correctly continues the conversation about 56 Days, calling it a "body found, who did it" setup with identity deception.
Search off for the follow-up. The AI stays on topic and discusses the 56 Days plot.
In the second screenshot, I used Cumbersome's "replay from here" to test search-on from the same conversation point. OpenRouter's preprocessor sees "crazy story," searches the web for those words, and injects results about King Von's rap single "Crazy Story." The model dutifully summarizes the song instead of continuing the conversation.
Search on for the follow-up. The preprocessor searches for "crazy story" out of context, finds a rap song, and the AI runs with it. The conversation about 56 Days is gone.
This is not a rare edge case. It happens any time a follow-up message is short or ambiguous. The search preprocessor has no awareness of conversation history. It treats every message as a standalone query.
The fix in Cumbersome.
I built a feature called "Only Search When Requested" that solves this. When enabled, web search only activates when your message explicitly mentions "search" or "crawl." Follow-up messages like "tell me more" or "crazy story" go straight to the model without triggering a web search. You get the full power of OpenRouter's unified search when you want it, and clean model responses when you do not.
"Only Search When Requested" in Cumbersome. Web search stays available but only fires when you explicitly ask for it. No more King Von interrupting your TV show conversations.
OpenAI and Anthropic are already agentic: their native tool-calling search lets the model decide when a web search adds value. Oddly, OpenRouter seems to bypass that. It either forces Exa when you want native search, or forces native providers to search on every request instead of letting the model choose. Until OpenRouter's plugin approach supports true agentic search, keyword triggering in Cumbersome is a practical workaround that eliminates the worst failure mode.
This is still one of OpenRouter's stronger features relative to managing provider APIs yourself. The unified search interface saved me weeks of integration work. And with Cumbersome's keyword triggering on top, you get the benefits without the noise.
Thinking and Reasoning Across Providers
API standardization is where OpenRouter consistently delivers. Extended thinking (sometimes called "reasoning") is another case where every provider does things differently. OpenAI's o3, Anthropic's Claude with extended thinking, and other reasoning models each have different APIs for how they stream thinking content.
OpenRouter standardized this too. I added reasoning support for OpenRouter in Cumbersome and it works across OpenAI and Anthropic reasoning models through a single code path. Supporting each provider's native thinking API separately is significantly more complex, and it means users wait longer for new reasoning models to be supported.
With OpenRouter handling the API differences, Cumbersome users get thinking and reasoning support across more models, faster.
The rough edges.
OpenRouter's reasoning abstraction works great until you try to turn it off. We hit a bug where
Claude Opus 4.6 kept returning <thinking> tags even with AI Reasoning toggled off. We were sending
reasoning: { effort: "none" }, which disables reasoning for OpenAI models. It does nothing for
Anthropic.
The issue: effort is only defined for OpenAI and Grok. Anthropic models need
reasoning: { enabled: false }, which OpenRouter maps to the native
thinking: { type: "disabled" }. The docs do not make this clear. We now detect the anthropic/
prefix and send the right payload for each model family.
If your "off" signal is being ignored, check that you are using enabled: false for Anthropic
models and effort: "none" for everything else. The unified API is convenient until it silently
fails.
One Dashboard for All Spending
Instead of checking OpenAI's usage page, then Anthropic's, then Google's, you see everything in one place: spend by model, request counts, and token usage over time.
All spending, requests, and token usage across every provider in one view. This month: $3.59 across Kimi K2.5, GPT-5.2, and Claude Sonnet.
The Cost: 5.5% Platform Fee
OpenRouter charges a 5.5% service fee on top of provider token prices. That is transparent and visible when you purchase credits.
$100 in credits costs $105.50. The 5.5% is the price of ZDR, spending limits, guardrails, and a unified API.
For context: if you spend $10/month on AI (typical for many API key users), the OpenRouter fee is 55 cents. Less than a dollar for ZDR, spending limits, and a consolidated dashboard. At $100/month, it is $5.50.
Vercel AI Gateway is cheaper at roughly 3% in payment processing fees, but it does not cover OpenAI for ZDR, lacks spending limits and guardrails, and the API standardization is not as mature for features like web search and reasoning.
Is 5.5% nothing? No. For someone spending $500/month, that is $27.50. But consider what you get: ZDR that would otherwise require an enterprise contract, automatic spending caps that protect you from runaway costs, and an API layer that handles the growing complexity of multi-provider integration. For most individual users and small teams, the math works.
Why You Can Trust Me on This
I built Cumbersome, an iOS and Mac app that connects directly to AI provider APIs. Every day I am in the trenches integrating with OpenAI, Anthropic, Google AI Studio, OpenRouter, and Vercel AI Gateway. I debug streaming responses, implement new model features, and test across providers. It is my full-time job to understand how these APIs actually work in practice.
I have no relationship with OpenRouter. They do not pay me. I do not get a referral fee. Two days ago I published a post that recommended choosing between Vercel and OpenRouter based on your needs. After more hands-on time, I am updating that recommendation because the evidence convinced me.
OpenRouter is the better default gateway for most people. The broader ZDR coverage, spending limits, guardrails, and API standardization are worth the 5.5%.
How to Set It Up in Cumbersome
- Create an OpenRouter account at openrouter.ai (opens in new tab).
- Purchase credits or start with their free tier.
- Create an API key with a spending limit. I recommend a daily cap as a safety net.
- Enable ZDR in your privacy settings (opens in new tab) if you want zero data retention.
- Add the key in Cumbersome under Settings. It sits alongside your direct provider keys.
That is it. One key, hundreds of models, ZDR, and spending limits. You can still keep your direct OpenAI and Anthropic keys configured for comparison or for the rare case where you want the absolute cheapest token cost. But the OpenRouter key covers everything.
The Bottom Line
The case for using your own API keys instead of AI subscriptions has not changed. Pay per use, pick your model, keep your data private.
What has changed is my recommendation for how to manage those keys. Instead of juggling separate keys for every provider, route everything through OpenRouter. The 5.5% fee buys you zero data retention (including on OpenAI models), per-key spending limits that reset daily, guardrails that cap your exposure to expensive models, and a unified API that handles the messy differences between providers.
For the handful of cases where you need the absolute cheapest token cost and nothing else matters, keep a direct key. For everything else, OpenRouter is the better default.
Try It
Cumbersome is free for iPhone, iPad, and Mac. Add your OpenRouter key and you have access to hundreds of models with one key. Enable ZDR for privacy. Set spending limits for peace of mind. You pay the providers (plus 5.5%), not us.
Bless up! 🙏✨