Why Most AI Implementations Fail
95% of enterprise AI deployments produce zero or negative financial return.
The number comes from the MIT NANDA Project's State of AI in Business research and has been stable across multiple years and industries. Most organizations beginning an AI project today believe they will land in the 5.
That belief is the symptom of a deeper problem.
An open letter to all by our founder and Chief Architect: Matt Rollings
AI Is an Amplifier of Competence
Utopia (noun): A well designed process, deployed within an AI framework that autonomously completes tasks without hallucination or human intervention.
AI systems, and AI Agents are force multipliers. Make no mistake that AI systems will augment all known processes in the next 10 years - but the challenge right now, is almost no one knows how to implement them successfully. Unfortunately, most extremely intelligent people are out of their element due to how new and complex AI systems are. The old adage with software: It does what you tell it to do, 100% of the time - is just wrong. AI can be inconsistent, and simply go rogue without the right technical and functional architect.
Solutions do exist, they are incredibly counterintuitive. I have seen some of the smartest people I have ever met be paralyzed in confusion.
The technology is doing its job. The design and upstream inputs are the bottlenecks.
Most AI initiatives are amplifying the wrong inputs and accelerating in the wrong direction. AI is amplifying bad design choices. There is no forgiveness with these tools at the moment.
Most organizations have not asked the right questions yet.
The dominant questions inside boardrooms are ‘what can AI do for us?’ and ‘where can we deploy AI?’Â
Â
These produce roadmaps that bolt AI onto existing processes and run at the failure rate.
Â
The harder questions get less attention because they require understanding what AI actually does.Â
Â
1. Where is our process broken in ways AI cannot fix?Â
2. What decisions in our operating model are appropriate for agent execution, and what require human judgment?Â
3. What does our operating model look like if we redesigned it for an AI-native era from scratch?
4. What does an economically viable AI solution look like?
Â
Roadmaps that start from the harder questions have a different trajectory.
The 'AI Expert' Problem
The label has been adopted by anyone who has used ChatGPT or Claude for a few months. Conference speakers, LinkedIn thought leaders, freshly minted consultants. Real engineering depth in production AI systems is rare.
Using a model and engineering with one are different skill stacks. Engineering involves understanding model behavior under load, designing system prompts that constrain hallucination, choosing context strategies that match the task, managing token economics, and building the infrastructure that lets agents operate inside business processes.
Most individuals claiming to be experts today can't define what 'Dynamic Context Assembly' is. The cost for choosing the wrong implementation partner can be crippling, be hypervigilant.
A useful filter: ask them what percentage of an AI solution should be driven by LLM calls, and what percentage of orchestration should be structural/deterministic vs AI-driven and dynamic?
The more AI is used, the more fragile and expensive that solution will be.
The Six-Layer Stack
An AI implementation in finance is six layers deep. Each has to be designed coherently for the integrated system to produce ROI.
1. LLM behavior and engineering. How the model behaves under different prompting strategies, what it hallucinates and why.
2. Inference architecture. How tokens move through the system, where latency comes from, when to cache, how to handle concurrency.
3. Agent design. Scope-limiting to reduce hallucination, system prompts that enforce behavior, when deterministic logic outperforms agentic reasoning.
4. Process design. The business processes the AI sits inside, where feedback loops belong, what fails at the process level.
5. Supporting Architecture. How will the solution be grounded? It will need it's own data and definition layer. How will it access your vendor master, chart of accounts, data warehouses, and most importantly: how does the AI system evolve and improve over time. You need an observability layer that facilitates reporting, improvement, and optimization over time.
6. Economics. Token cost, latency cost, accuracy cost, opportunity cost of displaced human time.
The Gap Problem
Most organizations attack the stack by hiring specialists for each layer. AI engineer for layers 1 and 5. Cloud architect for layer 2. Agent specialist for layer 3. Process and Finance own layer 4. Much ambiguity surrounds layer 6.
The fact of the matter is, without knowing each layer, identifying the gaps is a nearly impossible feat. Getting 6 experts in the room is likely going to produce failure because the gaps between the silos that seem small are effectively fatal.
Every layer ends up technically sound. The integrated system does not work.
Consider one decision: choosing the context window size for an agent running a profitability analysis. The decision requires simultaneous command of how agents handle context, how finance data is structured, what the finance question actually needs, and the token economics of different context sizes. A specialist in any one of those four cannot formulate the question intelligently without the other three. The team does not fumble the handoff. The team cannot get to the decision-point at all.
Internal Politics Kills More Projects Than Technical Complexity
The technical complexity of an AI implementation is solvable. Internal politics will kill it before it even starts.
An implementation that produces ROI usually disrupts existing reports, power structures, and job descriptions. The politically safe path bolts AI onto existing processes without disturbing anything. That path produces a press release. The path that produces ROI requires redesigning processes, which means someone has to lose turf.
Inside most organizations the instinct to protect the org chart wins over the instinct to improve the operating model. If the design coming back from your team looks additive, the team chose political safety over outcome.
Executives Need Advisors Who Tell Them Hard Truths
Most consulting sells executives a version of what they already believed, dressed in better PowerPoint. Real advisors say 'you are about to make a mistake' out loud, even when it costs them the engagement.
The economics of the industry reward agreement. Disagreement loses pitches. Pushback loses follow-on work. Consultants are trained, by the incentives of their own business model, to confirm executive instincts rather than challenge them.
The right test for an AI advisor: whether they will tell you something uncomfortable in the first meeting.
The Dunning-Kruger Trap
AI is complex enough that the people evaluating their own organization's readiness usually cannot see what they are missing. Most internal teams understand one or two of the six layers well. The remaining layers feel like detail. Failure lives in those layers.
Low competence in a domain produces overconfidence, because the metacognitive ability to assess competence requires the same skills as the competence itself. Organizations evaluating their own AI readiness usually pass their own evaluation. The 95 percent failure rate is composed of those teams.
The fact that your team is confident your AI initiative will succeed is statistically weak evidence that it will.
The Integrated Lead
The combination required to drive an enterprise AI implementation in finance is roughly this: process design, finance and accounting modeling, budgeting and forecasting, EPM implementation, data architecture, statistical modeling, LLM engineering, inference architecture, agent design, and the economics of inference at scale.
Each is a career. Each takes a decade to develop real depth in. The combination is statistically rare because the path through it is not a job description anywhere. The people who have it built it on purpose, stacking disciplines over a career, refusing to specialize and stop.
I built it on purpose. Twenty years stacking these disciplines.
The Three Months of Failure
The earned right comes from one specific moment. I built the AI framework that now executes much of what we deliver. The first version did not work. The second did not work. The third did not work. For three months it did not work, and I watched competent specialists deliver competent components that integrated into a broken whole.
The fix was a single design principle that became visible only after building and breaking the system three times. That principle is now applied inside client work.
The principle itself stays private. The right place for it is a working session with a client about to make a substantial investment in something that needs it.
What This Means For You
If your AI evaluation framework is 'do we have AI experts,' you are asking the wrong question.
The right question: do we have one integrated lead who holds all the layers, or are we hoping specialists will hand off cleanly across them?
The 95 percent failure rate is what happens when the answer is the second one. If you cannot tell which side you are on, that uncertainty itself is informative.
How To Work With Us
Two engagement shapes. Advisory engagements for organizations who are about to begin or are mid-flight on an AI implementation and want an honest read on whether the design is going to work. Implementation engagements for organizations who want senior-led delivery instead of distributed specialists. Fixed-price where scope allows. Senior-led from day one. No discovery-deck theater. AI execution layer plus delivery team in Cebu City behind the architect.
01
Advisory
Working sessions with your team on whether the AI design you have on the table is going to produce ROI. Honest reads on what is missing, what to sequence first, where the failure modes live. Useful when the investment has not been committed yet, or when the project is mid-flight and the trajectory is unclear.
02
Implementation
Full builds for organizations who want senior-led delivery instead of a team of distributed specialists. Three engagement shapes sized to the actual need: Implementation Only (11 to 14 weeks), Transformation Lite (14 to 17 weeks), or Full Transformation (24 to 42 weeks).
03
How We Engage
Fixed-price where scope allows. Senior-led from day one. The architect designs. The AI execution layer (a framework we built) handles much of the technical work at scale. A delivery team in Cebu City handles human-required execution. No discovery-deck theater. No rotating juniors.