In December 2025, Uber gave 5,000 engineers access to Claude Code. By April 2026, the company had burned through its entire annual AI budget. It capped every engineer at $1,500 per month and the COO went on record: "If you're not actually able to draw a direct line to how useful features you're shipping to your users, that trade becomes harder to justify."
One month later, Microsoft quietly cancelled Claude Code licenses for its Experiences and Devices division. 5,000 engineers. The team that builds Windows, Microsoft 365, Outlook, Teams, and Surface. Token billing had reached $2,000 per engineer per month. The division had burned its annual AI budget well ahead of schedule. Engineers migrated to GitHub Copilot.
Two of the most sophisticated software organisations in the world. Same pattern. Same outcome. Same diagnosis being written in every post-mortem: the tokens cost too much.
That diagnosis is wrong.
The Numbers Everyone Is Looking At
The token pricing shock is real. In 2023, a standard chatbot interaction cost about $0.04. A coordinated agentic workflow in 2026 costs about $1.20. That is a 30x increase in cost per task, at a time when per-token prices have actually dropped by 98%. The gap comes entirely from how agents work: they do not answer one question, they loop. One task might hit a model ten or twenty times before it resolves.
Most companies set their 2026 AI budgets in the autumn of 2025, before agentic tools existed at any meaningful scale. The models they budgeted for were chat assistants. The models their engineers are running are autonomous coding agents that loop for hours. Those are not the same budget line.
So yes, the bill was a shock. But the bill was not the problem. The bill was the symptom.
The Inversion Most CFOs Are Missing
At GTC 2026, Jensen Huang said something that should be printed in every enterprise AI post-mortem and taped to the wall of every CFO who has spent the last quarter panicking about token costs.
Huang
"That $500K engineer at the end of the year, I'm going to ask him, how much did you spend in tokens? If that $500K engineer did not consume at least $250K worth of tokens, I am going to be deeply alarmed."
The point is not that every engineer should burn $250K in tokens. The point is that spending tokens is the signal. Not spending them is the problem. An engineer with access to frontier models who is not using them for complex work is either blocked, unaware, or doing work the model should be doing for them. None of those are good answers.
Uber's engineers were spending. Microsoft's engineers were spending. That part was working. What was not working was the architecture around the spend.
What Deployed Access Without Architecture Looks Like
Here is what both companies actually did: they gave engineers a tool and a budget and told them to use it. That is not a deployment. That is procurement.
A real deployment answers three questions before a single engineer opens a session:
Fable 5 runs at effort levels from Low to xhigh. Fable Low beats Opus High on routine work and costs a fraction of the price. xhigh is for complex reasoning that earns it. Without routing, engineers default to maximum effort on everything, including work that did not need it.
An agent that loops 20 times to fix a one-line bug is not a good investment. An agent that runs a 6-hour overnight migration that would have taken three engineers two weeks is an extraordinary one. Without workflow design, both tasks look the same in the budget.
If an agent produces output that lands in an email thread, you paid frontier model prices for a fancier email. If the output writes back to a queryable state layer and triggers the next workflow step, the token cost compounds into something the business can measure.
Neither Uber nor Microsoft had answers to these questions at deployment time. They had access. Access without architecture is a fire waiting to be lit.
The $1,500 Cap Is Not a Solution
Uber's response to burning its budget was a $1,500 per engineer per month cap. Microsoft's was a migration to a cheaper tool. Both responses treat the cost as the problem, which means both responses make the underlying problem worse.
A cap does not fix effort routing. It just means engineers hit the ceiling faster and stop using the model. A cheaper tool does not fix workflow design. It means you are now doing unarchitected work with a less capable model.
Neither company is asking the question that matters: which of our engineers' tasks are actually worth frontier model cost, and how do we ensure those are the only tasks running at frontier model effort?
That question requires an operator who understands the work, the model, and the production system well enough to route between them. It cannot be answered by a budget cap or a procurement decision.
What the Token Bill Should Tell You
There is a version of this story where Uber's $120M annual token bill is not a failure. It is evidence that 5,000 engineers are using the most powerful coding models in history on real work, shipping features at a pace that was previously impossible. The bill is exactly what should happen at that scale.
The missing piece is not less spending. It is knowing which spend produced which outcome.
Mollick
"If you are considering taking a job offer, you may want to ask what your token budget will be. If they ask 'what do you mean?' run."
Mollick is not talking about salary. He is talking about a company's willingness to invest in the infrastructure around the model. A high token budget signals that the organisation knows how to use it. No token budget signals that the organisation treats AI as a cost to be minimised.
Uber and Microsoft had the budget. They did not have the architecture. The question for every scaling company right now is not how to spend less on AI. It is how to make the spend defensible.
Making the Spend Defensible
A defensible AI spend is not one that is small. It is one where every significant token expenditure is tied to a workflow with a measurable output. Three things make this possible.
- All tasks at maximum effort by default
- Output lands in email or Slack, no further connection
- No way to attribute spend to shipped features
- Budget cap the only governance mechanism
- COO cannot justify the line item
- Effort routing maps task complexity to model effort level
- Workflow design connects agent output to production systems
- Spend attribution ties token cost to shipped features and outcomes
- Budget conversation changes from "why did we spend this" to "what did we ship"
- COO can draw the line Uber's COO said was missing
None of this is complicated to describe. All of it requires someone on-site who can hold the model behaviour, the production system, and the workflow design in their head at the same time. That person is not a tool vendor. They are not a change management consultant. They are an operator.
Your token budget should be a board asset, not a board question.
The Diagnostic maps where your spend is producing outcomes and where it is producing noise. 30 to 45 minutes. You leave with a concrete read on which workflows to instrument first and what the defensible spend looks like at your scale.
Book the Diagnostic →