AI Integration · 2026-06-03 · 14 min read

Why Mid-Market Software Projects Fail — and How AI-Driven Development Changes That

Michael Kaiser

Michael Kaiser

Co-Founder & Head of Systems, Vincency

There is a statistic that no mid-market managing director likes to hear: most software projects do not end the way they were planned. For three decades the industry has circled the same uncomfortable finding, and the arrival of AI has not made it disappear — it has only changed its shape. The promise of 2026 is that AI writes the code for us. The data tells a more interesting, more useful story. This article looks at what the evidence actually says about why software projects fail, what AI does and does not change, and what that means for a mid-sized company that needs a working solution — not a research thesis.

The statistic nobody in the mid-market wants to hear

The most cited number comes from the Standish Group's CHAOS reports: in 1994 only about 16 percent of software projects were considered fully successful; by 2015, under a revised „modern“ definition (on time, on budget, with a satisfactory result), roughly 29 percent qualified, while around half were „challenged“ and a fifth failed outright. Those figures are quoted everywhere — and they deserve a caveat that is almost never quoted alongside them. In a careful 2010 analysis, researchers J. Laurenz Eveleens and Chris Verhoef showed that the Standish definitions are misleading and one-sided: they only count overruns, never the equally common case of finishing under budget, which makes the numbers more pessimistic than reality and, worse, manipulable. So treat CHAOS as a directional signal, not gospel.

The more rigorous evidence is also the more sobering. In the largest study of its kind, Bent Flyvbjerg and colleagues analysed 5,392 IT projects (2022) and found that cost overruns do not follow a normal distribution at all — they have „fat tails“. The median overrun is essentially zero: most projects land close to budget. But the average is dragged up to roughly 80 percent by a minority of catastrophic outliers. In an earlier dataset, Budzier and Flyvbjerg found that one in six IT projects is a „black swan“ with an average cost overrun of around 200 percent and a schedule overrun of nearly 70 percent. The lesson is not „every project is doomed.“ It is that the real danger is the tail — the project that does not just slip but detonates — and that conventional, average-based planning is blind to exactly that risk.

Why classic software projects fail — four recurring causes

From our work with mid-sized companies, the failures rarely come from a lack of engineering talent. They come from four structural causes that compound each other:

  • Requirements drift. What the business needs is unclear at the start and changes during the build. Every change ripples through code that was already written.
  • Estimation under uncertainty. Bespoke software is forecast like a known quantity, even though — per Flyvbjerg — its cost distribution has no reliable average to forecast against.
  • Missing internal resources. Germany had roughly 109,000 open IT positions in 2025 (Bitkom), with an average vacancy lasting 7.7 months. Most mid-sized firms simply cannot staff and retain a full development team.
  • Underestimated maintenance. The build is the visible part; the years of operation, security, and change behind it are the expensive part — and the part that quietly kills custom systems.

None of these is a coding problem. They are problems of clarity, capacity, and method. Which is exactly why the popular hope — „AI will write the software, so the problem goes away“ — misreads the situation.

The AI promise meets the evidence

Here the data gets genuinely interesting, because two of the most-cited studies point in opposite directions. A controlled study by Peng, Kalliamvakou, Cihon and Demirer (2023) had developers build an HTTP server with and without GitHub Copilot; the Copilot group finished 55.8 percent faster. A striking number — but note the conditions: a small, isolated, greenfield task, and authors partly affiliated with GitHub. It measures AI at its best.

Now the counter-evidence. In 2025, the research group METR ran a randomized controlled trial with experienced open-source developers working on their own, mature repositories. The result was the inverse: with AI tools they were 19 percent slower. The most telling detail is the perception gap — the developers expected AI to speed them up by 24 percent, and even after the experiment still believed it had sped them up by 20 percent, while the stopwatch said otherwise. METR is explicit that this applies to experienced developers on familiar, complex codebases and should not be generalized to all software work. But that is precisely the point: the same technology produced +56 percent and −19 percent depending on context.

Google's 2024 DORA report, drawn from a large industry survey, fills in the middle. Over 75 percent of respondents already use AI for at least one daily task, and a 25 percent increase in AI adoption was associated with measurable gains in individual factors — about +3.4 percent code quality and +3.1 percent faster code reviews. Yet the same report found a paradox: that adoption was associated with a 1.5 percent drop in delivery throughput and a 7.2 percent drop in delivery stability. And 39 percent of respondents reported little or no trust in AI-generated code. The Stack Overflow 2025 survey echoes it: 84 percent use or plan to use AI tools, but 66 percent name „almost right, but not quite“ solutions as their leading frustration, and positive sentiment toward AI tools fell from over 70 percent to 60 percent in a single year.

The real lesson: it was never „AI instead of method“

Read those studies together and a consistent pattern emerges. AI is extraordinary at generating the new, the standard, and the well-specified — and unreliable exactly where classic projects already failed: ambiguous requirements, complex existing systems, and the judgment calls that decide whether code is correct, not just plausible. AI does not remove the four failure causes above. Applied naïvely, it does something worse: it lets you reproduce them faster. A team that automates a broken process with AI now has a broken process running at machine speed.

This is why the framing „AI replaces developers“ is the wrong one. The DORA paradox — individual speed up, system delivery down — is what happens when a powerful tool meets weak method. The throughput and stability only improve when AI sits on top of the boring fundamentals: clear requirements, small increments, real testing, and someone accountable for whether the output is right. AI raises the ceiling for teams that have method. It does nothing for teams that do not — and can lower their floor.

From „building software“ to „solving the problem“

For the mid-market, the most consequential shift is not a faster way to write custom software — it is realizing how often custom software is the wrong question. A large share of what mid-sized companies actually need — qualifying leads, answering the same forty support questions, taking calls after hours, processing documents — are solved problems. They do not require a bespoke codebase with a multi-year maintenance tail. They require AI-driven integration and automation, assembled from proven components and connected to the tools the business already runs.

That reframing changes the economics entirely. Instead of a months-long custom project carrying the fat-tail risk Flyvbjerg describes, a focused integration can be live in weeks. With a specialized technology partner such as ArkeonTech, agent-based solutions start at around EUR 1,500 setup, and standard projects go live in two to four weeks — with EU hosting and maintenance handled as a service. Where genuinely proprietary logic is the competitive advantage, in-house development still makes sense; we worked through that exact make-or-buy decision in a separate article. The point is to decide deliberately, not to default to building.

How we approach it

At Vincency we do not start a software conversation with the technology. We start with the process that hurts and the outcome the business actually wants — because that is where projects are won or lost, long before the first line of code. Only once the problem and the target state are clear do we choose the smallest sensible slice to implement, use AI to accelerate the building of it, and measure whether it moves the metric that mattered. Then we expand. The technology layer — the agents, the integrations, the automation — docks onto that foundation, and where it calls for a dedicated specialist, ArkeonTech builds and operates it. The order is deliberate: strategy and process first, AI-accelerated execution second. It is the opposite of the failure pattern, not a faster version of it.

The market is moving the same way. Gartner projects that 40 percent of enterprise applications will feature task-specific AI agents by 2026, up from less than 5 percent in 2025 — but adoption alone is not an outcome. DORA already showed what adoption without method produces. The companies that win are not the ones that adopt AI fastest; they are the ones that pair it with the discipline the data keeps rewarding.

A note on transparency

I should be explicit about my position. I am a co-founder of Vincency and the founder of ArkeonTech, the AI software house referenced above. The two companies are deliberately separate — ArkeonTech builds the technology layer, Vincency orchestrates strategy, brand, and integration — and they collaborate where those layers meet. I disclose this so you can weigh the recommendations yourself. The studies cited here are independent of both companies, and I have flagged their limitations — including the contested methodology of the CHAOS reports and the narrow scope of the METR trial — precisely because honest data is more useful than convenient data.

Conclusion

Software projects in the mid-market do not fail because the code is hard. They fail because clarity, capacity, and method are hard — and those are exactly the things AI does not supply on its own. The evidence is consistent once you stop cherry-picking it: AI can make a well-run team dramatically faster and a poorly-run one measurably worse. So the question for 2026 is not „should we use AI in software development?“ — of course you should. The question is whether you will put it on top of a clear problem and a sound method, or on top of the same ambiguity that has been sinking projects for thirty years. The first path is how AI-driven development finally changes the statistic. The second is just the old failure, accelerated.

Frequently asked questions about software development and AI

Why do so many software projects fail?

Less because of technology than because of requirements drift, unrealistic estimates, missing internal resources, and underestimated maintenance. The Standish CHAOS reports have cited low success rates for decades (1994: ~16%; 2015 under the modern definition ~29%) — though their methodology is contested (Eveleens & Verhoef, 2010). More robust is the analysis by Flyvbjerg et al. (2022) of 5,392 IT projects: cost overruns follow a fat-tailed distribution — the median is 0%, but one in six projects explodes (on average +200% cost, per Budzier/Flyvbjerg). The real risk is not the averages, it is the outliers.

Does AI really make software development faster?

It depends on context — and that is exactly what gets overlooked. A controlled study on GitHub Copilot (Peng et al., 2023) measured 55.8% faster completion — on an isolated greenfield task. A randomized experiment by METR (2025) with experienced developers on their own mature codebases found the opposite: 19% slower with AI. AI dramatically accelerates the new and the standardized but can slow you down in complex, familiar systems. Blanket productivity promises are not serious.

Do we still need our own developers if AI exists?

Yes — but their role shifts. In the Stack Overflow 2025 survey, 84% of developers use AI tools, yet 66% name "almost right, but not quite" solutions as their biggest problem; only 3% fully trust the output. AI produces code drafts; humans need judgment for architecture, correctness, and business fit. In the mid-market especially, with roughly 109,000 open IT positions in Germany (Bitkom, 2025), the question is rarely "human or AI" but how scarce specialists do more with AI.

What does AI-driven software development cost in the mid-market?

Considerably less than a classic custom project, if you choose the right scope. Instead of months of in-house development, many requirements can be solved via AI integration and automation: with a specialized partner like ArkeonTech, agent solutions start at around EUR 1,500 setup, and standard projects are live in 2 to 4 weeks. The expensive option is almost always months of custom development without a clear strategy.

Build custom software or rely on AI integration?

In-house development only pays off when the software itself is the competitive advantage and a team can operate it. For solved problems — sales, support, document processing, workflows — AI-driven integration is faster, cheaper, and lower-risk. We covered this make-or-buy decision in detail in a separate article.

How do we get started with AI-driven development?

Not with the tool, but with the problem. First clarify which process actually hurts and what the target state looks like; then implement the smallest sensible slice with AI support, measure, and expand. This order — strategy and process before technology — is the only reliable safeguard against simply reproducing the failure of classic projects faster.

Sources and transparency note: Project-failure data: Standish Group CHAOS reports (1994 ff.), with the methodological critique by Eveleens & Verhoef, „The Rise and Fall of the Chaos Report Figures“ (IEEE Software, 2010); Flyvbjerg et al., „The Empirical Reality of IT Project Cost Overruns“ (Journal of Management Information Systems, 2022, n=5,392); Budzier & Flyvbjerg / „Why Your IT Project May Be Riskier Than You Think“ (Harvard Business Review, 2011). AI productivity: Peng, Kalliamvakou, Cihon & Demirer (2023, GitHub Copilot RCT); METR, „Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity“ (2025); Google DORA, „Accelerate State of DevOps Report 2024“; Stack Overflow Developer Survey 2025. Mid-market context: Bitkom IT skills study (Aug 2025); Gartner press release (Aug 2025) on task-specific AI agents. Figures are quoted with their original scope and known limitations. Transparency: the author, Michael Kaiser, is a co-founder of Vincency and the founder of ArkeonTech; the cited studies are independent of both companies.