In December 2025, Uber gave 5,000 engineers access to Claude Code. By April 2026, the company had burned through its entire annual AI budget. It capped every engineer at $1,500 per month and the COO went on record: "If you're not actually able to draw a direct line to how useful features you're shipping to your users, that trade becomes harder to justify."

One month later, Microsoft quietly cancelled Claude Code licenses for its Experiences and Devices division. 5,000 engineers. The team that builds Windows, Microsoft 365, Outlook, Teams, and Surface. Token billing had reached $2,000 per engineer per month. The division had burned its annual AI budget well ahead of schedule. Engineers migrated to GitHub Copilot.

Two of the most sophisticated software organisations in the world. Same pattern. Same outcome. Same diagnosis being written in every post-mortem: the tokens cost too much.

That diagnosis is wrong.

The Numbers Everyone Is Looking At

The token pricing shock is real. In 2023, a standard chatbot interaction cost about $0.04. A coordinated agentic workflow in 2026 costs about $1.20. That is a 30x increase in cost per task, at a time when per-token prices have actually dropped by 98%. The gap comes entirely from how agents work: they do not answer one question, they loop. One task might hit a model ten or twenty times before it resolves.

30x
cost increase per task, chatbot to agentic workflow
+1,001%
token usage growth, Jan 2025 to Apr 2026
80%
of enterprises miss their AI budget forecast by 25% or more

Most companies set their 2026 AI budgets in the autumn of 2025, before agentic tools existed at any meaningful scale. The models they budgeted for were chat assistants. The models their engineers are running are autonomous coding agents that loop for hours. Those are not the same budget line.

So yes, the bill was a shock. But the bill was not the problem. The bill was the symptom.

The Inversion Most CFOs Are Missing

At GTC 2026, Jensen Huang said something that should be printed in every enterprise AI post-mortem and taped to the wall of every CFO who has spent the last quarter panicking about token costs.

Jensen Huang
Jensen
Huang

"That $500K engineer at the end of the year, I'm going to ask him, how much did you spend in tokens? If that $500K engineer did not consume at least $250K worth of tokens, I am going to be deeply alarmed."

CEO, Nvidia — GTC 2026

The point is not that every engineer should burn $250K in tokens. The point is that spending tokens is the signal. Not spending them is the problem. An engineer with access to frontier models who is not using them for complex work is either blocked, unaware, or doing work the model should be doing for them. None of those are good answers.

Uber's engineers were spending. Microsoft's engineers were spending. That part was working. What was not working was the architecture around the spend.

What Deployed Access Without Architecture Looks Like

Here is what both companies actually did: they gave engineers a tool and a budget and told them to use it. That is not a deployment. That is procurement.

A real deployment answers three questions before a single engineer opens a session:

Question 1
Which work gets which effort level?

Fable 5 runs at effort levels from Low to xhigh. Fable Low beats Opus High on routine work and costs a fraction of the price. xhigh is for complex reasoning that earns it. Without routing, engineers default to maximum effort on everything, including work that did not need it.

Question 2
Which workflows are actually worth the token cost?

An agent that loops 20 times to fix a one-line bug is not a good investment. An agent that runs a 6-hour overnight migration that would have taken three engineers two weeks is an extraordinary one. Without workflow design, both tasks look the same in the budget.

Question 3
What does the output connect to?

If an agent produces output that lands in an email thread, you paid frontier model prices for a fancier email. If the output writes back to a queryable state layer and triggers the next workflow step, the token cost compounds into something the business can measure.

Neither Uber nor Microsoft had answers to these questions at deployment time. They had access. Access without architecture is a fire waiting to be lit.

Fig. 1
Same budget. Different outcomes.
DEPLOYED ACCESS $120M token budget all tasks at xhigh effort output in Slack / email threads no workflow connection budget gone, ROI invisible DEPLOYED ARCHITECTURE $120M token budget effort routing + workflow design output writes to queryable state downstream workflows trigger spend is tied to shipped outcomes
The budget is the same. The missing piece is the layer between access and outcome.
Analysis by nativefirst.ai · sources: Ramp, Fortune, Enterprise DNA

The $1,500 Cap Is Not a Solution

Uber's response to burning its budget was a $1,500 per engineer per month cap. Microsoft's was a migration to a cheaper tool. Both responses treat the cost as the problem, which means both responses make the underlying problem worse.

A cap does not fix effort routing. It just means engineers hit the ceiling faster and stop using the model. A cheaper tool does not fix workflow design. It means you are now doing unarchitected work with a less capable model.

Neither company is asking the question that matters: which of our engineers' tasks are actually worth frontier model cost, and how do we ensure those are the only tasks running at frontier model effort?

That question requires an operator who understands the work, the model, and the production system well enough to route between them. It cannot be answered by a budget cap or a procurement decision.

What the Token Bill Should Tell You

There is a version of this story where Uber's $120M annual token bill is not a failure. It is evidence that 5,000 engineers are using the most powerful coding models in history on real work, shipping features at a pace that was previously impossible. The bill is exactly what should happen at that scale.

The missing piece is not less spending. It is knowing which spend produced which outcome.

Ethan Mollick
Ethan
Mollick

"If you are considering taking a job offer, you may want to ask what your token budget will be. If they ask 'what do you mean?' run."

Wharton professor, One Useful Thing — February 2026

Mollick is not talking about salary. He is talking about a company's willingness to invest in the infrastructure around the model. A high token budget signals that the organisation knows how to use it. No token budget signals that the organisation treats AI as a cost to be minimised.

Uber and Microsoft had the budget. They did not have the architecture. The question for every scaling company right now is not how to spend less on AI. It is how to make the spend defensible.

Making the Spend Defensible

A defensible AI spend is not one that is small. It is one where every significant token expenditure is tied to a workflow with a measurable output. Three things make this possible.

Access Without Architecture
  • All tasks at maximum effort by default
  • Output lands in email or Slack, no further connection
  • No way to attribute spend to shipped features
  • Budget cap the only governance mechanism
  • COO cannot justify the line item
Deployed Architecture
  • Effort routing maps task complexity to model effort level
  • Workflow design connects agent output to production systems
  • Spend attribution ties token cost to shipped features and outcomes
  • Budget conversation changes from "why did we spend this" to "what did we ship"
  • COO can draw the line Uber's COO said was missing

None of this is complicated to describe. All of it requires someone on-site who can hold the model behaviour, the production system, and the workflow design in their head at the same time. That person is not a tool vendor. They are not a change management consultant. They are an operator.

Your token budget should be a board asset, not a board question.

The Diagnostic maps where your spend is producing outcomes and where it is producing noise. 30 to 45 minutes. You leave with a concrete read on which workflows to instrument first and what the defensible spend looks like at your scale.

Book the Diagnostic →
Sources
1Fortune: "Uber burned through its entire 2026 AI budget in four months. Now its COO is questioning whether it's worth it." May 2026.
2Enterprise DNA: "Microsoft Cancels Claude Code After Token Costs Blow Budget." June 2026.
3Jensen Huang, GTC 2026 (via All-In Podcast). On token spend as a productivity signal, not a cost problem.
4Ethan Mollick (@emollick), X, February 2026. On token budgets as a compensation and culture signal.
5Ramp: AI token usage benchmarks 2026. Token usage +1,001% from Jan 2025 to Apr 2026. 80% of enterprises miss AI budget forecasts by 25% or more.
6Oplexa: "AI Inference Cost Crisis 2026." Agentic workflow cost $1.20 vs $0.04 chatbot baseline. June 2026.
John Tan
John Tan

Founder and CEO of nativefirst.ai. Embeds with scaling founders and CEOs to ship Level-3 agents and AI workflows in production.