AI Reasoning: What, when, why and how?

Published February 23, 2026

Same prompt. Three different outcomes. This is AI reasoning in a nutshell:

No Reasoning	Brief Reasoning	Excessive Reasoning

GPT-5.2 responds instantly. No thinking overhead. Clean answer.	GPT-5.2 thinks briefly, then responds. Arguably more creative.	Kimi K2.5 thinks obsessively, counting words over and over. Not meaningfully better.

The left screenshot shows GPT-5.2 answering "tell me a story in 10 words" with reasoning off. Fast, cheap, done. The middle shows GPT-5.2 with reasoning on (I had to adjust the prompt to trigger it, because models do not always think just because you enable the toggle). The right shows Kimi K2.5 via OpenRouter producing a massive internal monologue for the same simple task, burning reasoning tokens for no real improvement.

AI models got a new trick in late 2024: the ability to think before they answer. OpenAI called it "reasoning" when they shipped o1. Anthropic calls it "extended thinking" in Claude. Google followed with thinking in Gemini. The marketing makes it sound like an obvious upgrade. More thinking equals better answers, right?

Not always. Reasoning costs more tokens, takes longer to respond, and can produce wildly verbose internal monologues that add nothing to the final output. I have been testing reasoning across multiple models and providers through Cumbersome, and the results are more nuanced than the hype suggests.

Here is what reasoning actually is, where it genuinely helps, and where you should leave it off.

What Is AI Reasoning?

Standard AI models generate text left to right, one token at a time. They predict the next most likely word based on everything before it. This works remarkably well for most tasks, but it means the model commits to its answer as it writes it. There's no pause to plan, no scratch paper to work through a problem.

Reasoning (or "thinking") adds that scratch paper. When reasoning is enabled, the model generates an internal chain of thought before producing the visible response. It breaks the problem down, considers approaches, checks its work, and then writes the final answer. You see the thinking process in a collapsible section above the response.

The key insight: reasoning uses extra tokens for that internal thinking. You pay for those tokens. On some models, reasoning tokens are cheaper than output tokens. On others, they are the same price or more expensive. Either way, the total cost per request goes up.

A Brief History

OpenAI introduced reasoning in September 2024 with o1-preview. It was the first commercial model explicitly trained to think step by step before answering. The o1 family (o1, o1-mini, o1-pro) showed that chain-of-thought reasoning genuinely improved performance on math, coding, and logic tasks.

Anthropic followed with extended thinking in Claude, adding a thinking budget that controls how many tokens the model spends reasoning. Google brought thinking to Gemini models. By early 2026, most frontier model families offer some form of reasoning.

The latest development: reasoning is no longer limited to specialized reasoning models. OpenAI's GPT-5 family supports reasoning as a toggle. You can use GPT-5.2 with reasoning off (fast, cheap) or reasoning on (slower, more thorough). Same model, different modes. This is a better design than forcing users to pick between entirely separate model families.

What Those Three Examples Show

The screenshots above tell the whole story of reasoning's trade-offs.

No reasoning (left): GPT-5.2 answers instantly. For a simple creative task, this is all you need. No extra tokens, no waiting.

Brief reasoning (middle): I had to adjust the prompt to trigger reasoning. The original "tell me a story in 10 words" did not make GPT-5.2 think, even with reasoning enabled. I changed it to "Tell me a story in 10 words. Research a unique idea. Think hard to get creative. Reply only with the story." That was enough. The model thought briefly, and the result was arguably more creative. This highlights an important point: enabling reasoning does not guarantee the model will use it. Models assess prompt complexity and skip thinking when the task seems simple enough.

Excessive reasoning (right): Kimi K2.5 via OpenRouter shows the other extreme. The model generates a massive internal monologue, obsessively counting words, brainstorming multiple options, second-guessing itself. The final story is fine. But it's not meaningfully better than what GPT-5.2 produced without thinking. You just paid for hundreds of extra reasoning tokens that added nothing.

This is the core tension with reasoning: sometimes it helps, sometimes it's pure overhead. Knowing when to use it is the skill that matters.

When to Use Reasoning

Reasoning genuinely helps when the task requires multi-step problem solving that benefits from planning before execution.

Use reasoning for:

Complex coding tasks. Architecture decisions, debugging subtle issues, refactoring with multiple constraints. The model plans its approach before writing code, which reduces errors.
Math and logic problems. Anything with multiple steps where getting step 3 wrong invalidates everything after it. Reasoning models check their work.
Analysis with constraints. "Compare these three approaches considering cost, performance, and maintainability." Reasoning helps the model hold multiple dimensions in mind.
Tasks where accuracy matters more than speed. Legal analysis, data interpretation, anything where a wrong answer is worse than a slow answer.

When to Skip Reasoning

Reasoning adds cost and latency with no benefit for tasks the model already handles well.

Skip reasoning for:

Simple questions. "What is the capital of France?" Reasoning adds nothing here.
Creative writing. Stories, blog drafts, marketing copy. These benefit from fluency, not deliberation. Reasoning can actually make creative output feel overthought and mechanical.
Translation and summarization. The model already does these well without thinking overhead.
High-volume tasks. If you are processing hundreds of requests, reasoning multiplies your token costs for marginal improvement. Batch processing and simple prompts work better.
Conversational responses. Chat, Q&A, brainstorming. The overhead is not worth it for back and forth conversation.

The general rule: if you wouldn't spend 10 minutes thinking about this problem yourself, the model probably does not need to either.

Effort Levels: Low, Medium, and High

Most reasoning implementations let you control how hard the model thinks. OpenAI uses an effort parameter with three levels:

Low: Minimal reasoning. Quick sanity check before answering. Good for tasks that might benefit from a brief pause but do not need deep analysis.
Medium: Moderate reasoning. The default. Works well for most tasks where you want thinking without excessive overhead.
High: Maximum reasoning. The model takes as long as it needs. Reserved for genuinely complex problems where you want exhaustive consideration.

Higher effort means more reasoning tokens and higher cost. Start with medium and only bump to high when medium is not producing good enough results.

The Cost Math

Reasoning tokens add to your bill. The exact cost depends on the model and provider.

For OpenAI's GPT-5 family, reasoning tokens are priced at the model's output token rate. If you send a simple prompt that generates 50 output tokens without reasoning, enabling reasoning might add 200-500 reasoning tokens on top. For a complex problem, reasoning can generate thousands of tokens internally.

On expensive models, this adds up fast. A model charging $60 per million output tokens that generates 2,000 reasoning tokens per request adds roughly $0.12 per request just for thinking. At 100 requests a day, that's $12/day in reasoning tokens alone.

On cheaper models, reasoning overhead is negligible. GPT-5 nano with reasoning costs fractions of a cent per request regardless of effort level.

The takeaway: match the model and reasoning level to the task. Do not run high-effort reasoning on an expensive model for tasks that a cheap model handles fine without thinking.

How to Enable Reasoning in Cumbersome

Cumbersome is an iOS and Mac app that connects directly to AI provider APIs. You use your own API keys, pick exactly which model runs your request, and pay per token instead of monthly subscriptions.

Reasoning support is available now for direct OpenAI and OpenRouter providers (not all models support it). To enable it:

Tap the + button below the chat input to open Advanced Features.
Toggle AI Reasoning on.
Pick your effort level: Low, Medium, or High.

The AI Reasoning toggle in Cumbersome's Advanced Features. Enable it, pick an effort level, and the model thinks before it answers. The feature is free during beta.

When reasoning is enabled, you see a collapsible "Thinking" section above each response. It shows exactly what the model considered before writing its answer. Expand it when you are curious about the model's process. Collapse it when you just want the answer.

Your reasoning preference syncs across devices through iCloud. Set it once and it carries over to your iPhone, iPad, and Mac.

Cumbersome also tracks reasoning tokens separately in each message's metadata, so you can see exactly how much thinking cost you on each request.

The Bottom Line

Reasoning is a genuine capability improvement for AI, but it's not a universal upgrade. It shines on complex, multi-step problems where planning before execution matters. It wastes tokens and time on simple tasks.

The models themselves are getting better at deciding when to think. GPT-5.2 skipped reasoning entirely on a simple creative prompt even with reasoning enabled. That's the right behavior. But not all models are this disciplined, and you should not rely on the model to make this decision for you on every request.

My approach: leave reasoning off by default. Turn it on when I am working through a hard coding problem, debugging something subtle, or asking a question where I need the model to really consider its answer. Turn it back off when I am done.

Reasoning is a power tool. Use it when the job calls for it.

Bless up! 🙏✨