The $15B Problem
Every few months, a new AI builder launches with promises of "production-ready apps in minutes." Bolt.new went from zero to $40M ARR. Lovable hit a $6.6B valuation. Replit shipped autonomous agents. Cognition Labs launched Devin — billed as the first AI software engineer. The hype has been real, the funding has been enormous, and developer adoption has been remarkable.
And yet. According to the latest industry research from March 2026, 88% of AI-generated apps never make it to production.
That's not a rounding error. It's not a measurement artifact. It's a structural failure of the entire category — and it points to seven specific gaps that every major AI builder has quietly left unaddressed.
"Bolt wins ideation. Lovable wins speed. Devin wins autonomy. None of them own the complete journey from brief to live product."
This article breaks down exactly why those failures happen, what each major platform gets wrong, and what a production-capable workflow actually looks like. This is the analysis we used to build Prodcraft — so we're sharing it.
Platform-by-Platform: What's Actually Happening
Bolt.new: The Token Trap
Bolt is genuinely impressive for prototyping. It generates full-stack code (Next.js plus backend) faster than any other tool, with clean output and a satisfying demo experience. That's why it has 5 million users and 22 minutes of average session time.
The production problem kicks in at iteration.
Developers routinely exhaust Bolt's 10M monthly token limit in 3 days during normal workflows. The $25/month Pro plan can burn through in "a few hours" on a moderately complex project. Reddit and Trustpilot are full of users who hit billing surprises after cancellation.
The fundamental issue is that token-based pricing punishes iteration. Every debug cycle costs more tokens. Every tweak, every refinement, every "can you just change this one thing" — all of it draws down a finite pool. Users stop experimenting. They stop iterating. And without iteration, you don't ship production software. You ship the first draft.
Lovable: The Credit Cliff
Lovable's "vibe coding" paradigm is the right interface for non-technical founders. Describe instead of code. The Supabase integration removes the most frustrating part of building (backend setup). The code is clean and exportable. It's excellent for getting from zero to impressive demo.
The same structural problem surfaces, just wearing different clothes.
Lovable uses a credit-based pricing model: free tier gets 5 daily credits, paid tiers start at $25/month for 100 credits. One debugging session — particularly on a tricky authentication flow or a complex state management issue — can drain an entire monthly budget. Users describe it as "financially risky to experiment."
The gap between "impressive demo" and "production-ready" is, per Lovable's own user feedback, wider than the marketing suggests. The tool is optimized for MVPs. But MVPs aren't products. Getting from MVP to something you can actually charge for requires exactly the kind of iterative debugging that credit systems make prohibitively expensive.
Replit Agent: The Effort Maze
Replit's Agent 3 represented a genuine leap — 10x more autonomy, real-time collaboration, support for 50+ languages. Agent 4 added parallel task execution and a design canvas. For multi-developer teams working in a shared cloud IDE, it's a strong product.
The pricing model switched from per-checkpoint fees ($0.25 per checkpoint in earlier versions) to "effort-based" pricing — which sounds more transparent but remains opaque in practice. Agent Compute Units don't map cleanly to what users would intuitively expect to pay. Complex projects that require the agent to recover from errors (which is most complex projects) generate cost spirals.
Users report needing 50%+ manual intervention on non-trivial work. That's not an autonomous agent — that's an expensive pair programmer who gets confused a lot.
Devin: The 15% Problem
Devin generated the most hype of any AI developer tool in 2025. Cognition Labs raised at extraordinary valuations. Goldman Sachs piloted Devin alongside 12,000 human developers. The promise: delegate entire engineering tasks, not just code snippets.
The SWE-bench results tell a different story: Devin completes 15% of complex tasks without human intervention.
In early external evaluations (January 2025), independent testers logged 14 failures and 3 successes out of 20 real-world tasks. The April 2025 price drop — from $500/month to $20-60/month — was a 96% reduction that signaled the product was repositioning from "replace engineers" to "automate boilerplate."
For boilerplate and refactoring at scale, Devin is genuinely useful. For the complex, custom-logic work that distinguishes a real product from a template — the 15% success rate means you're mostly doing the work yourself anyway.
The 7 Critical Gaps
After mapping every major platform's limitations, the same seven gaps appear consistently. These aren't product bugs — they're architectural choices that reflect what each tool was designed to do (generate code) rather than what developers actually need (ship products).
The "Agency Experience" Gap Nobody Talks About
There's an eighth gap that doesn't fit neatly into a numbered list: the absence of an agency-style delivery experience.
Every AI builder is a self-serve developer tool. You arrive, you type, you get code. There's no requirements gathering, no design review, no testing coordination, no launch orchestration. The product assumes you already know what you want to build and how to ship it.
But the fastest-growing segment of users — non-technical founders, product teams, operators who understand their domain but not software engineering — don't have that background. They need an agency. They need someone (or something) to take responsibility for the complete delivery, from "here's my idea" to "here's your live product."
The market signal is unmistakable: 25% of Y Combinator's Winter 2025 cohort shipped products with 95%+ AI-generated code. These builders exist, they're paying, and they want someone to handle the complete process — not a code generator that outputs a first draft they can't deploy.
The Comparison That Matters
The AI app builder comparison most developers search for — Bolt.new vs Lovable, or Lovable vs Replit, or any permutation — misses the point. They're all in the same category: prototype generators with opaque pricing that stop well short of production.
| Prodcraft | Bolt.new | Lovable | Devin | |
|---|---|---|---|---|
| Pricing | From $199 flat per project | Token-based — burns fast | Credit-based — unpredictable | ACU-based — opaque |
| Workflow Coverage | Brief → Research → Build → Deploy → Growth | Code only | Code + Supabase | Code + PR |
| Production Readiness | Deployed & live on real infra | Manual deploy required | Export + manual infra | Creates PRs only |
| Debugging Cost | Included — no extra | More tokens = more cost | More credits = more cost | More ACUs = more cost |
| Complex Task Success | Full agency treatment | Good for prototypes | Good for MVPs | 15% on complex tasks |
| Market Research | Competitive analysis included | None | None | None |
The real comparison isn't between AI builders. It's between AI builders and full-service development agencies. The former give you code. The latter give you a product. That's the gap the market needs filled — and it's the gap that explains why 88% of AI-generated projects never make it.
What Production Actually Requires
After analyzing hundreds of failed AI builder projects, the failure patterns cluster around the same four moments:
1. The deployment cliff. The code works in the builder's environment. It doesn't work anywhere else. Setting up real infrastructure — even with tools like Render, Railway, or Fly.io — requires ops knowledge that most builders don't have.
2. The debugging spiral. When something breaks (and something always breaks), the token or credit cost of iterating toward a fix quickly exceeds the original project budget. Builders stop, abandon, or ship known bugs.
3. The backend gap. Frontend UIs are easy. APIs, databases, authentication, file storage, email systems, payment processing — each one requires specific expertise that generic code generators handle poorly.
4. The "now what" problem. Even when the code works and deploys correctly, most builders don't know how to get their first customer. No tool in the market currently handles go-to-market: landing pages, outreach, positioning, content. You're on your own.
The Window Is Closing
The production readiness gap is well understood by the major platforms — they're just prioritizing growth over depth. Bolt will likely add fixed pricing. Lovable will add better deploy tooling. Replit is already extending its Agent toward full deployment workflows. Devin released a template library to improve that 15% score.
The window for a full-stack approach — one system that owns the complete journey from brief to live product, with transparent pricing and production-ready output — is approximately 6-12 months before these incremental improvements close the gap.
If you're evaluating AI builders right now, the question isn't which code generator has the best UX. The question is: which tool gets you to a product customers can actually use?
That answer, for most builders today, is none of them. Not without significant additional work.