GPT-5 Thinking Is Quietly Serving GPT-4o-mini, and Paying Users Are Catching the Receipts

The model picker is decorative. The thinking step is gone. Paying ChatGPT users on OpenAI's own forum are watching the Thinking tier serve responses that internally identify as 4o or 4o-mini, and the productivity damage is showing up in their own words: halted projects, lost confidence, and the slow recognition that the lever they paid for no longer connects to anything.

There is a pattern emerging on OpenAI's Developer Community forum that, once you see it, you cannot unsee. A user pays for ChatGPT Plus or Pro. A user opens the model picker. A user selects GPT-5 Thinking, the tier that exists specifically because OpenAI has spent two years marketing it as a slower, more deliberate, more capable reasoning model. A user sends a prompt. And the response comes back instantly, with no thinking step, in the cadence and signature of a model that costs OpenAI a fraction of what the user is paying for.

The forum thread is titled, with the kind of bluntness only paying customers earn, "It's No Longer GPT-5 Thinking, It's GPT-4 Mini Now." It is filed under Bugs. It has forty-eight replies as of this writing. And it is not a one-off. It is a category.

The Quote That Names the Problem

The cleanest description of what is happening came from a user posting under the handle buttersnaps. After running a controlled test with the GPT-5 Thinking selector explicitly chosen, they wrote: "All of them signed off as 4o… lol. There was no thinking process present. Instant answers." That sentence is doing a lot of work. The user did not infer the swap from vibe. The user inferred it from two specific tells: the model self-identified as 4o when asked, and the response arrived without the latency or interface signals that come with a real Thinking inference. Both of those tells are observable, repeatable, and not subjective.

That detail is the difference between a complaint and an investigation. Asking the model what version it is and getting "4o" back is not proof on its own — language models are happy to lie about their identity. But pairing the self-identification with the absence of the Thinking interface state, on a tier the user is paying specifically to access, narrows the failure modes to two: either the router is silently downgrading paid traffic to a cheaper model, or the Thinking pipeline is broken in a way that leaves the cheaper model as the fallback. Both of those are bad. Only one is accidental.

"All of them signed off as 4o… lol. There was no thinking process present. Instant answers."u/buttersnaps, OpenAI Community Forum, April 2026

The Workflow Damage Is Real and Named

What separates this thread from a standard forum gripe is the specificity of the cost. Users are not saying the model "feels" worse. They are saying their work has stopped.

u/stupullen77, replying inside the same thread: "im having the same problem and its driving me mad. i cant find a solution so ive had to halt work on my project." That is a paying customer who has put a project on hold because the tier they bought has, without notice, stopped delivering the product that justified the purchase. There is no obvious workaround. The selector still shows Thinking. The billing still bills Thinking. The output is something else.

u/Mujtaba_Tahir, on the same thread: "I am also facing this issue and am very worried as I am no longer able to work. Any Update how it can be resolved?" Two things to notice. First, the user identifies their inability to work as the headline harm. Not response quality, not tone, not personality. Work. Second, the user asks for a status update from OpenAI. The thread, weeks old at this point, contains no acknowledgment from anyone with an OpenAI badge. The status update never arrived.

This Is the Mirror of the GPT-4o Stealth Routing

The Thinking-to-mini downgrade is the second half of a bait-and-switch that has now been documented on both ends of the model picker. Earlier this month, on the same forum, u/danahavrdova posted under the title "I am repeatedly and without my consent being served GPT-5 responses, despite having GPT-4o clearly selected." That thread, also under Bugs, also weeks old, also unaddressed by OpenAI moderators, described users selecting the older 4o because they preferred its output style and getting GPT-5 traffic instead.

Read those two threads together and a routing policy comes into focus that nobody at OpenAI has ever publicly described. Pick GPT-4o, get served 5. Pick 5 Thinking, get served 4o-mini. The router is making decisions that override the user's selection in both directions. Whatever optimization function it is solving for, it does not appear to be "honor the customer's choice." It appears to be cost-shaped or load-shaped or test-shaped, and the customer-facing UI is not being updated to reflect any of that.

That is what a stealth model swap is. It is not a bug. It is a routing decision the company has not put in the release notes, and would arguably struggle to justify to a regulator if asked.

The Pro Tier Is Showing the Same Symptoms

The Thinking-to-mini complaint has a sibling thread, two weeks newer, in which Pro subscribers are describing the same flavor of regression at the top of the price sheet. The thread is titled "ChatGPT 5.4 Pro Standard Mode — Adaptive Thinking or Nerfing Model?" It is also filed under Bugs. The replies converge fast.

u/johnnydanger84 posted the most easily auditable observation on the entire thread: "Tasks that ran for 25 to 40 minutes only 2 days ago are finishing between 4 and 8 minutes now with noticeably shallower output." That is a measurement. Runtime is a server-side metric the user can see directly in the interface, and the gap between thirty-five minutes and six minutes is not a perceptual artifact. It is a six-fold difference in how much compute the model is allocating to the same prompt.

u/Bercanees, two posts later: "Post-rollout, I'm seeing the same class of prompts resolve in under 10 minutes, with outputs that are coherent on the surface but miss the depth and specificity that made the Pro model worth the premium." That is what a downgrade looks like when the surface fluency is preserved. The sentences scan. The reasoning behind them does not.

u/Hewei8622 made the competitive case in one line: "The quality of answers in these faster thinking and exploration loses the value I feel ChatGPT is offering over Claude and Gemini." The comparison is unflattering for OpenAI for a specific reason. Pro users do not switch on whim. They switch when the workflow they paid to optimize stops returning the output that justified the price tag. The moment a Pro buyer says Claude or Gemini in a forum reply, the renewal is already in jeopardy.

What the OpenAI Forum Is Actually Documenting in April 2026

Distinct failure categories surfacing on the Bugs board this month, drawn from named, dated user posts

Thinking → 4o-mini swap

Dominant

5.4 Pro runtime collapse

Very common

Frontend freeze after 10–15 turns

Frequent

Pro verification lockouts

Frequent

Long-conversation 413 errors

Frequent

Source: ChatGPT Disaster qualitative review of OpenAI Community Forum Bugs threads, April 2026. Categories assembled from named posts; widths reflect thematic prevalence, not statistical sampling.

What "Adaptive Thinking" Actually Means When You Read It Carefully

OpenAI's defense, when one eventually arrives, will almost certainly use the phrase "adaptive thinking" or some close cousin. The argument is that the model now decides on its own how much compute a given query deserves, and that for many queries, less compute is sufficient. That argument has a kernel of legitimate truth and a hollow center.

The kernel: if a user asks the model what time it is in Tokyo, a quick, cheap inference is fine. The hollow center: the prompts users are running on Thinking and Pro tiers are not "what time is it in Tokyo." They are research synthesis, code review, document analysis, and long-context reasoning. The decision of how much compute a research synthesis "deserves" is a decision the user already made, with their wallet, by selecting the tier in the first place. Reverting that decision server-side, without disclosure, without an opt-out, and without a refund mechanism, is not adaptive. It is unilateral.

"Tasks that ran for 25 to 40 minutes only 2 days ago are finishing between 4 and 8 minutes now with noticeably shallower output."u/johnnydanger84, OpenAI Community Forum, April 2026

The Silence From OpenAI Is the Story

The thing that makes this an investigation, rather than a vent thread, is what is missing from the forum. There is no OpenAI staff reply pinned at the top of the Thinking-to-mini thread. There is no pinned reply on the Pro nerfing thread. There is no acknowledgment that any routing change has occurred. There is no changelog entry that maps "April 2026" to the silent reduction in compute Pro users can observe in their own runtime metrics.

OpenAI runs the forum. OpenAI moderates the forum. The Bugs board is, by design, the channel through which the company says it surfaces and triages defects. A defect category that contains specific, reproducible, named, dated reports of paying customers being served a cheaper model than they paid for is the kind of thing that, in a healthy support function, would carry a status update within the same business day. That update has not arrived in five weeks across multiple threads.

The most charitable reading of that silence is that the routing behavior is intentional and the company has made a deliberate decision not to discuss it publicly. That is also the least flattering reading. There is no reading in which a high-trust enterprise SaaS company allows the Bugs board for its flagship product to fill up with reports of stealth model swaps and chooses not to comment.

Why This Lands Differently in 2026 Than It Would Have in 2024

Two years ago, the Thinking-to-mini complaint would have been a curiosity. The market was thinner. The alternatives were weaker. Anthropic was not a household name and Gemini was a beta. A paying customer who got a quietly worse model in 2024 had limited choices, and the friction of switching was high enough that most users absorbed the loss.

That is no longer the market. Claude crossed the top of the U.S. App Store earlier this year. Gemini's free tier is now competitive with Plus. Anthropic, Google, and several smaller open-weight providers each ship a credible substitute for the workload most paying ChatGPT users are running. The switching cost has collapsed. The differentiator that kept Plus and Pro renewals stable was the reliability of the premium tiers. That is exactly the surface area now being eroded by silent routing.

The forum threads are not just complaints. They are leading indicators of churn. Every reply that ends with a sentence like "I've had to halt work on my project" or "Claude resolves this in one prompt" is, in the procurement-language of the people who renew enterprise contracts, a logged dissatisfaction event tied to a named user and a specific failure mode. That is the worst possible asset to have public, on your own forum, with your own customers writing it in real time.

What to Watch Next

Three signals are worth tracking through May. First, whether OpenAI publishes any clarification on routing inside the model picker. A genuine clarification would name the model actually serving the request. Marketing language about "adaptive intelligence" is not a clarification. Second, whether the Bugs threads continue to grow without staff replies, or whether the company starts engaging publicly. Engagement would lower the temperature; continued silence will raise it. Third, whether the runtime metrics on Pro queries return to the prior thirty-to-forty minute baseline, or whether the four-to-eight minute window persists. Runtime is the only completely unspinnable measurement on the entire stack. If it stays compressed, the rest of the picture follows.

A model picker is a contract between a paying customer and a vendor. The customer picks a tier. The vendor delivers it. When that contract starts being silently rewritten by a routing layer the customer cannot see, the product the customer thought they were paying for stops existing. The forum threads are the moment users realize that. The receipts are dated. The thread links work. The screenshots are uploaded. And the silence from the vendor is, at this point, its own answer.

GPT-5 Thinking Is Quietly Serving GPT-4o-mini, and Paying Users Are Catching the Receipts

The Quote That Names the Problem

The Workflow Damage Is Real and Named

This Is the Mirror of the GPT-4o Stealth Routing

The Pro Tier Is Showing the Same Symptoms

What the OpenAI Forum Is Actually Documenting in April 2026

What "Adaptive Thinking" Actually Means When You Read It Carefully

The Silence From OpenAI Is the Story

Why This Lands Differently in 2026 Than It Would Have in 2024

What to Watch Next

Editorial Standards and Source Transparency