The $40 Billion Shrug: Why AI ROI Hides in the Error Tail

insight

More than $40 billion has gone into enterprise AI, and most of the executives who signed it off say they have nothing to show for it. The gap has surprisingly little to do with the models.

Pegasystems opened PegaWorld 2026 in Las Vegas this week with a figure that deserves more attention than any of the product announcements that followed it: over $40 billion invested in enterprise AI, and 56% of CEOs reporting nothing back. The second number is the one to sit with. A majority of the people who approved the budgets are now, in effect, shrugging.

The reflex is to blame the technology. The models hallucinate, the benchmarks were misleading, the vendor oversold. Sometimes that's true. But after two years of watching AI programmes stall at proof-of-concept, we'd argue the more common failure has a different shape. Most organisations pointed their AI at the wrong work, measured the wrong number, and paid for it in a way that punished them for scaling. All three are fixable. None of them are fixed by buying a better model.

The 85% trap

Picture a model hitting 85% accuracy on a claims-handling task. In a demo, that number looks like a finish line. In production, nobody who matters is looking at the 85. The regulator and the audit committee are looking at the other 15 – or the 5, or the 2 – because in regulated, customer-facing work, the cost lives in the error tail. A wrongly approved payment is a fine. A mishandled complaint is a churned account, and occasionally a headline.

This is also why the enterprise is the hardest place to make AI pay, and the most valuable. The work with the highest potential return is regulated, customer-facing, variable across regions and product lines, and unforgiving when something goes wrong. Those properties stack. An 85% model bolted onto that kind of process doesn't deliver 85% of the value – it can deliver negative value, because the cost of the failures outweighs the benefit of the successes.

A bigger model won't fix that. Being honest about which 2% of outcomes you cannot afford to get wrong, and designing the process around them, will. That sounds obvious written down. In practice, very few AI programmes we encounter have ever produced that list. They have accuracy targets. They don't have a consequence map.

Decoration is where the money went

The second failure mode is older than AI itself: taking a broken process and putting new technology on top of it. McKinsey's State of AI research has been making this point for a while now – the organisations actually seeing bottom-line impact are the ones redesigning their workflows around the technology, with more than half of high performers rebuilding how the work gets done rather than adding an assistant to the side of it. The winners redesign. The disappointed decorate.

Decoration is seductive because it's cheap to start and nobody has to have a difficult conversation about how the work is actually structured. You buy licences, you run enablement sessions, adoption metrics go up, and a year later the finance director asks where the return is. Adoption is not an outcome. It's a cost with good PR.

Redesign is harder. It means mapping the workflow end to end, deciding which steps are genuinely rule-based, which need human judgement, and which need machine reasoning – then rebuilding the process so each step gets the cheapest mechanism that can do it reliably. That work happens before anyone writes a prompt. It's unglamorous, and it's where the return has been hiding all along.

Spend the intelligence where the reasoning is

The line from the PegaWorld keynote we expect to hear repeated for months: keep the deterministic things deterministic. Where rules exist, follow the rules, every time, and reserve agents for the parts of the process that are genuinely hard to specify. Rules give you predictability and auditability. Agents give you flexibility. Confusing the two is how you end up with an expensive system re-reasoning a postcode lookup from first principles ten thousand times a day.

Pega put numbers on this. A workload of 10,000 monthly workflows costs roughly $693,000 a year if every step is sent to a frontier model. Route the same work intelligently, spending premium reasoning only where a step actually needs it, and it runs at roughly $117,000 – the same outcomes for 83% less. Whatever you think of any individual vendor's pricing claims, the underlying architectural point stands on its own. The unit economics of agentic AI are decided by routing decisions, and routing decisions are architecture.

There's a corollary on the commercial side. Consumption-based AI pricing quietly rewards the vendor when you fail to optimise – every wasted token is their revenue. Outcome-based pricing, which Pega is now betting on, aligns the vendor with your unit economics instead. We think that shift spreads across the industry quickly, and procurement teams negotiating AI contracts this year should be asking pointed questions about it now, before they lock in three years of paying for waste.

The most credible thing said all week

Asked to put a clean dollar figure on AI's return, Pega declined. The benefit is real, they said, and the measurement is still being worked out. After a year of suspiciously round percentages and ROI claims that evaporate under questioning, an admission like that is worth more than another case study. Trust is currently the scarcest resource in enterprise AI, and you don't earn it with fantasy maths.

The centre of gravity is moving from what the model can do to whether it delivered the outcome, at a predictable cost, without breaking a process that has to work. Raw capability is becoming a commodity – every one of your competitors can buy the same models you can. Accountability for outcomes is becoming the differentiator. That's uncomfortable for anyone whose AI strategy is a licence agreement, and very good news for anyone willing to do the process work.

Q&A: Getting a Return Out of Enterprise AI

Our pilots show strong accuracy figures. Why aren't they translating into returns?
Because accuracy in a pilot measures the average case, and production cost concentrates in the worst cases. A 90% accurate system in a regulated workflow generates a steady stream of expensive exceptions: reworked claims, compliance reviews, customer complaints. Until you've quantified what the error tail costs you per failure, your accuracy figure tells you very little about ROI. Measure the consequences, then decide whether the maths works.

Should we keep AI away from regulated, customer-facing work entirely?
No – that's where most of the value sits, which is exactly why it's tempting to retreat from it. The answer is designing for failure rather than hoping it away: deterministic rules wherever rules exist, human checkpoints at the steps where an error is unaffordable, full audit trails throughout, and agents only at the steps that genuinely require reasoning. Avoiding the high-stakes work means conceding the only deployments that move the P&L.

What does "designing around the error tail" actually involve?
Map the workflow end to end and classify every step by the consequence of getting it wrong, rather than by how easy it is to automate. A misdrafted internal summary costs minutes. A misprocessed payment costs a fine and a customer. Steps in the first category can take an imperfect model today. Steps in the second need rules, escalation paths, or a human in the loop. The output is a consequence map, and it should exist before any model is selected.

Can we get value from AI assistants without redesigning our workflows?
Some, and it caps out quickly. Assistants layered onto existing processes deliver modest individual productivity gains that rarely show up in business outcomes – which is broadly what that 56% of CEOs are describing. The organisations reporting real returns rebuilt the workflow around the technology. If your AI programme hasn't changed how any process actually runs, it has decorated rather than transformed, and the results will reflect that.

How should we think about AI costs when planning these programmes?
Treat routing as an architectural decision, because it determines your unit economics. Sending every step of every workflow to a frontier model can cost five or six times more than reserving premium reasoning for the steps that need it, for identical outcomes. And scrutinise consumption-based contracts carefully – a vendor paid per token has no commercial incentive to help you optimise. Where outcome-based pricing is on the table, take the conversation seriously.

Working Through This With Vertex Agility

The pattern in this article – strong models, weak returns, and a gap that turns out to be architectural and procedural rather than technological – is one we see constantly in our consultancy work. The organisations getting it right aren't the ones with the biggest AI budgets. They're the ones that did the unfashionable work first: mapping their processes, classifying steps by consequence, and deciding deliberately where machine reasoning earns its cost.

Our AI Consultancy works with organisations on exactly this – AI strategy and implementation, workflow automation, and the governance frameworks that make deployment in regulated environments defensible rather than hopeful. Our Software Consultancy sits alongside it, rebuilding and modernising the processes and applications that AI gets deployed into, because an agent dropped into a broken workflow automates the brokenness. Having both under one roof means the process redesign and the AI deployment happen as one piece of work, with senior-led oversight throughout.

If you're trying to work out whether your organisation is set up to be in the 44% that sees a return, our free AI Readiness Mini Audit is a sensible place to start. For a more substantive conversation about where the return is hiding in your AI programme, get in touch with us directly below.

Get in touch

Related case Studies

AI Doesn’t Remove Constraints – It Exposes Them

AI is often touted as a 10x productivity booster, but without addressing underlying delivery bottlenecks, it simply creates faster queues. Learn how to apply the Theory of Constraints to engineering AI adoption.

The Claude Mythos Arms Race: Are You Prepared for the Era of Agentic Attacks?

The release of Anthropic’s Claude Mythos model has fundamentally altered the cybersecurity landscape. This article explores the rise of agentic attacks, the widening vulnerability gap, and the strategic move towards AI-driven risk management for the modern enterprise.

AI Isn’t the End of Consulting – It’s the Beginning of Delivery Acceleration

The rise of AI agents isn't killing consulting; it is eliminating inefficient delivery models. Explore how AI-augmented squads and senior leadership are becoming the new standard for enterprise technology execution.

The Delivery Integrity Gap: Why Acceleration Means Nothing Without Certainty

While AI accelerates delivery, it also creates an integrity gap. Learn how Vertex Agility uses senior-led governance and balanced squads to ensure velocity never compromises enterprise-grade outcomes.