Skip to main content

Replacing Server-Side AI Search with iOS 26's New Headless Browser

iOS 26 and macOS 26 quietly shipped a headless browser: a SwiftUI WebKit API called WebPage that loads pages, runs JavaScript, and returns rendered HTML without any view on screen. We use it in Cumbersome to give every AI provider on-device web access, no matter whether they offer server-side search or not.

Published
iOS 26's SwiftUI WebPage API doing an on-device web search: user prompt, DDG Lite loading in the in-app browser, tool call and extracted results back in the chat, AI answer streaming in. WebPage can run fully headless with one toggle, but the modal is visible here so you can see the API at work.

If you watched the WWDC25 session called "Meet WebKit for SwiftUI" (opens in new tab), you probably remember it as a pleasant tour of scroll positions, find-in-page modifiers, and a demo app about lakes. You may not have caught the line at 4:08:

WebPage is a brand new Observable class that represents your web content. It's been designed from the ground up to work perfectly with Swift and SwiftUI. You use WebPage to load, control, and communicate with web content. WebPage can be used completely on its own. But when combined with WebView, you can build rich experiences for your web content.

"Can be used completely on its own" is doing a lot of work in that sentence. What Apple is saying is: you can instantiate a WebPage, tell it to load a URL, observe its navigation events, run JavaScript against the loaded document, and read the rendered HTML back out, without putting a view anywhere. No UIViewRepresentable wrapper. No UIKit import. No hidden-WKWebView-in-a- zero-size-frame tricks.

In other words, a headless browser. iOS 26 and macOS 26 shipped one, and we wired it into Cumbersome, our iPhone and Mac AI client.

What WebPage actually is

If you've ever embedded a web view in an iOS app, you know the old story: import WebKit, wrap a WKWebView in a UIViewRepresentable, forward delegate callbacks into Combine or async streams, and accept that your SwiftUI app now has a UIKit seam running through it.

WebPage replaces all of that. It's a plain Observable Swift class with four things worth knowing about:

  1. Create one. WebPage(). No configuration required for the default case.
  2. Load a URL. webPage.load(URLRequest(url:)).
  3. Observe state. isLoading, estimatedProgress, title, url, and currentNavigationEvent are all observable properties, same as any other SwiftUI state.
  4. Run JavaScript. await webPage.callJavaScript("...") returns an optional Any. Cast it to what you expect.

There's also a WebPage.NavigationDeciding protocol for per-request policy (allow, cancel, redirect to the system browser), useful for intercepting external links.

WebView(webPage) is the SwiftUI view that renders a WebPage if you want to show it. Attaching one doesn't change what WebPage does, it just puts pixels on screen. Same navigation events, same JavaScript bridge, same everything. So the mental model is:

  • WebPage on its own = headless browser.
  • WebPage inside a WebView = in-app browser.

One code path, two presentations.

Availability gate is #available(iOS 26.0, macOS 26.0, *).

Why replace server-side search?

Every major AI provider ships a server-side web search tool. OpenAI, Anthropic, and Gemini all have one built in. If you just want the model to look something up, that works, most of the time. Four things about the status quo pushed us to build a replacement.

  1. Blocking. Server-side search runs from the provider's data-center IPs. News sites, Reddit, Cloudflare-fronted properties, and a long tail of independent publishers have spent the last few years tightening their bot defenses. Fetches that returned clean HTML a year ago now come back as an empty shell, a 403, or a Cloudflare challenge. The model sees nothing, shrugs, and writes a confident-sounding answer anyway.
  2. Transparency. You don't know which query the model actually sent, which search engine served it, or which pages it read. Some providers expose citations, some don't. None show you the raw HTML. "AI with web access" is one of the easiest places to hide a bad result behind fluent prose.
  3. Privacy. Your question leaves the device, hits the provider's server, and fans out to whichever third-party search engine they've partnered with (Bing, Brave, and friends). Even if the provider is careful about logging, your query now lives on more servers than it needs to.
  4. Coverage. Several providers Cumbersome supports don't expose server-side search over the OpenAI-compatible endpoints we talk to: Z.AI, Ollama, custom OpenAI-compatible backends, and Vercel AI Gateway, whose advertised web search is locked behind their TypeScript SDK and a separate undocumented endpoint. A user running a local Qwen through Ollama had no web access at all.

An on-device WebPage fixes all four with the same move. The fetch comes from the user's residential IP using the real Safari engine, so sites that reflexively gate data-center traffic just load. The visible modal is the transparency. You watch the page render before the extractor runs. The query never leaves the device for anything except the actual page load. And the whole mechanism is provider-agnostic, so Z.AI, Ollama, and anything else OpenAI-compatible gets web access for free.

What we used it for

Cumbersome is a bring-your-own-key AI client. You plug in an OpenAI, Anthropic, or Gemini key (or any of a handful of other providers, covered in the OpenRouter vs direct API keys post) and talk to the models directly from your phone or Mac, with no subscription layer in between.

On iOS 26 and macOS 26 we expose two tools to the model, both backed by WebPage:

  • searchWeb(query): runs a DuckDuckGo Lite search in a WebPage.
  • openWebPageLocally(url): fetches one URL and extracts markdown.

Both ship off by default and are enabled per-user in Settings. Off by default because the first time you see it it's a surprise. A browser modal springs up mid-conversation, loads a page in front of you, and you watch the extractor strip the contents before the answer streams in. We think it's great. It fits Cumbersome's "show me the internals" posture the same way Face/Off Mode shows all three candidate responses side by side instead of silently picking one.

Settings → Local Tools (highlighted). Enable Local Web Search and Enable Local Web Browse are the two toggles that turn on the on-device WebPage-backed tools. Both ship off by default. Headless Mode is a third toggle, covered below. The Server Tools group above is unrelated. It controls provider server-side search, which already worked before iOS 26.

openWebPageLocally is the bigger half

searchWeb is the headline feature, but openWebPageLocally(url) is where on-device browsing really earns its keep. Search gets you a list of links. Reading them is where most of the value lives, and reading is where server-side fetches fall over hardest. News sites, Substacks, blogs, product pages, and company PR sites all sit behind exactly the same data-center blocks, Cloudflare challenges, and cookie walls that bite server-side search.

openWebPageLocally fixes that because it runs from the user's device. The BBC loads. The NYT loads. A random Substack loads. The AI gets the fully-rendered DOM through callJavaScript, we strip it to markdown, and the model answers from the actual article instead of a SERP snippet or a paraphrase it invented.

All four problems from the previous section hit harder here. A blocked SERP is annoying. A blocked article body means the model has nothing to answer from. A search result you can't verify is suspicious. A model summary of an article you can't see is worse, because people trust summaries. Privacy is worse too. A server-side fetch of an article hands the provider both your query and every URL it decides to open. On device, all of that stays local.

openWebPageLocally opening bbc.com/news inside the in-app modal. Same WebPage-backed browser as the search flow, just pointed at a single URL. Once isLoading flips to false, the extractor pulls the article to markdown and the AI writes its answer from the real page.

The code

Here's the shape of our LocalBrowserToolModal on iOS 26 and later, trimmed to the parts that actually matter:

@available(iOS 26.0, macOS 26.0, *)
private struct LocalBrowserToolModalModern: View {
    var coordinator: LocalToolCoordinator
    @State private var webPage = WebPage()
    @State private var lastLoadedURL: URL?

    var body: some View {
        WebView(webPage)
            .onAppear { startLoadingCurrentURLIfNeeded() }
            .onChange(of: webPage.isLoading) { wasLoading, isLoading in
                // Finished loading? Grab the rendered HTML and hand it to
                // the extractor that feeds the AI's tool result.
                guard wasLoading, !isLoading else { return }
                Task {
                    try? await Task.sleep(for: .milliseconds(1200))
                    let html = try? await webPage.callJavaScript(
                        "return document.documentElement.outerHTML.toString();"
                    ) as? String
                    coordinator.extractAndComplete(
                        renderedHTML: html,
                        finalURL: webPage.url
                    )
                }
            }
    }

    private func startLoadingCurrentURLIfNeeded() {
        guard let url = coordinator.currentBrowserURL,
              lastLoadedURL != url else { return }
        lastLoadedURL = url
        webPage.load(URLRequest(url: url))
    }
}

Three things worth calling out:

  1. @State private var webPage = WebPage() is the whole setup. No configuration object required for the default case. If you need a custom URL scheme handler, there's a WebPage.Configuration you pass in.
  2. webPage.isLoading and webPage.url and webPage.title are all just observable properties. .onChange(of:) works on them the same way it works on any other SwiftUI state. Apple's session recommends currentNavigationEvent for finer control (you get the typed sequence startedProvisionalNavigationcommittedfinish), but isLoading is enough for most cases and noticeably easier to read.
  3. callJavaScript returns Any?. You cast it. document.documentElement.outerHTML gives you the full rendered DOM after JavaScript has run, which for most sites is the difference between "got the content" and "got a shell with a skeleton loader." A 1.2 second settle delay after isLoading goes false catches the slower single-page apps. Not elegant. It works.

The extractor that consumes the HTML is a separate concern. We do plain regex-driven tag stripping in LocalBrowserHTMLParser, scoped to <main>, <article>, or attribute-based primary content containers, with anchors converted to inline markdown links. Nothing clever. All of it lives in about 250 lines across three files, each under Cumbersome's 500-line-per-file project rule.

The reason we go to markdown instead of passing raw HTML to the model is token efficiency and relevance. A typical article page is 300-600 KB of DOM once scripts, styles, analytics beacons, and layout chrome are included. The same page, stripped to its main content with preserved anchors, is more like 5-30 KB of clean markdown.

Models are trained heavily on markdown, handle it well, and stop burning context on <div class="sidebar-nav"> noise.

The same extractor runs in headless mode too. Only the source of the HTML changes. In visible mode it comes from callJavaScript against a fully-rendered WebPage, so you get the post-JavaScript DOM, which matters for SPAs. In headless mode it comes from a plain URLSession fetch. Faster and cheaper, but you get the raw HTML before any client-side JavaScript has run.

The parse-and-markdown step is shared code in both cases.

One more flow note worth calling out: if the user dismisses the visible modal mid-load, we cancel the in-flight tool call cleanly. The model receives a structured "cancelled by user" result and continues the conversation. It doesn't hang waiting for a tool response that will never come.

A real human copilot

Keeping the browser visible makes the user a real copilot, not a passive observer of a chat transcript. The AI drives, you watch, and when it gets stuck you take the wheel for a second and hand it back.

The one case we've fully wired up is the search provider's occasional "prove you're human" check. The tool call pauses, a Retry / Give Up banner drops at the bottom of the modal, the user solves the puzzle in the embedded WebView, onChange(of: webPage.isLoading) fires on the reload, extraction runs, and the AI continues. Cookie walls, age gates, and expired sessions would use the same plumbing. We haven't built those flows yet because they haven't come up in practice, but none of them are more than a flag-and-wait on the same reload hook.

The captcha case isn't even the best one. Sometimes a "Load more" button is hiding the content the AI actually needs, and the user taps it so re-extraction can pick up the expanded DOM. Sometimes the search query was plain wrong, and the user edits the URL in the embedded browser and lets the reload fire against a better page. Both use the same plumbing that drives captcha recovery. Neither needs a new tool or a new API, and neither pulls the user out of the conversation.

The visible default keeps the whole thing honest. The system runs itself when it can, and the user takes over when it can't. "AI drives, you can grab the wheel" instead of "AI drives, you hope it did the right thing." Headless Mode (next section) exists for the fully autonomous case. The visible modal is the one that earns user trust. On top of WKWebView-in-a-representable this would have been ugly. With WebPage it's a handful of lines of SwiftUI.

Cumbersome on macOS (the rest of the screenshots here are iOS, same app, same WebPage-backed modal). The one ejectable flow shipped today: when the search page asks the user to prove they're human, the user does it in the modal, the Retry banner fires, extraction runs, and the AI's answer continues. Same code path, no copy-paste, no context switch.

The headless toggle

The visible modal is the default, and it's the path most users want, because the point of running a browser on-device is being able to watch the browser. But there's also a Headless Mode toggle in Settings for users who want the AI to fetch pages quietly without a sheet flashing up mid-chat.

The tradeoff, copied straight from our Settings copy: "Runs quieter and faster without opening a browser window. You lose the visible audit trail: the browser won't flash up to show what the AI is fetching." A user who wants to watch what the AI is doing keeps it off. A user who wants speed flips it on.

Settings → Local Tools → Headless Mode. Off by default. Flip it on and the same searchWeb and openWebPageLocally tool calls run without the modal ever appearing: the page gets fetched, extracted, and the AI's answer streams in as usual.

Our implementation is strictly user-initiated. The toggle is off by default, the user opts in, and the tool only runs when the model calls it during a conversation the user started. Nothing runs in the background, and no session state persists between tool calls.

What else this unlocks

A handful of things I'm now planning for, specifically because WebPage exists in a form that doesn't drag UIKit into a SwiftUI app.

Parallel on-device crawls

Right now Cumbersome loads one page at a time, which is part of why we still lean on server-side search for simple questions. OpenAI and friends parallelize fan-out on the backend. WebPage is an ordinary Swift object, so you can spin up several of them concurrently and extract four or five pages at once. The extraction pipeline is the same code, just called N times.

The hard part is the UI. In headless mode, parallel is basically free. In visible mode, you have to decide what a concurrent fetch even looks like. Four stacked modals is not it. A tab strip inside one modal, a grid of thumbnails, a rotating single-modal view, all plausible, all nontrivial design and code work. We'll see how much the current serial feature gets used before spending that effort.

On-device agents with a local LLM

We already have Endemic, a sibling app that runs Qwen 3.5 on-device via llama.cpp (0.8B through 9B, device-aware catalog). Pairing those models with an in-app WebPage means you can have an AI agent that reads the web on your device without any of the conversation touching a provider.

Private web data behind the user's own session

The one I keep thinking about. WebPage has its own cookie jar scoped to the app, so a user who signs into Gmail, Google Docs, their bank, their company's internal wiki, or any other cookie- authed site once inside Cumbersome stays signed in for future tool calls. The AI can then open a specific Gmail thread, a specific doc, a specific internal page, extract it locally, and answer against it without any credentials, cookies, or page HTML ever leaving the device.

The app never has to implement OAuth per provider, never impersonates the user on a server, and never holds a credential vault. The user signs in once in the embedded browser and the cookie stays in the app's own jar.

Worth being precise about what this is and isn't. Cumbersome can't read your Safari cookies or attach to an existing Safari session. iOS sandboxes each app's web storage. What it can do is host its own in-app browser via WebPage, and anything the user signs into inside that browser stays signed in for later tool calls. It's opt-in per site: log into Gmail once in the Cumbersome browser and the AI has semi-persistent access to that Gmail session going forward. Nothing gets pulled from Safari, and nothing gets pulled from other apps.

It's the same shape as OpenClaw's user profile, where the agent attaches to your real signed-in Chrome and acts as you, minus the "attach to your existing browser" half. You sign in inside Cumbersome, the cookie lives in the Cumbersome app, and the model reads from there. Because the whole loop runs inside the app with the model in a sidecar (especially when paired with Endemic for local inference), it can read your session but can't exfiltrate it.

PDF ingest for chat

callJavaScript plus a rendered PDF viewer is a path to "drop a URL into chat, AI reads it."

Lightweight research workflows

"Open these five links, summarize each, find the contradictions" becomes a tool loop with a hop cap instead of a server-side agent.

These aren't new ideas. What's new is that you can now do them on an iPhone, in a pure SwiftUI app, without adopting a UIKit integration pattern that SwiftUI devs have been trying to shed for five years.

Closing

If you build anything agent-adjacent on Apple platforms, the WebKit framework docs (opens in new tab) are worth a scroll. The API is young and under-documented, but the pieces are all there, and nobody seems to be talking about the headless half of it.

If you want to try it as a user, Cumbersome and Endemic both ship these tools. Flip them on, ask about something recent, and watch a page load on your device.

Bless up! 🙏✨