Your AI Strategy Is Preserving the Code It Should Be Replacing
Your engineers just used Claude Code to generate 200 unit tests for a billing service written in 2016. Your tech lead used Cursor to auto-document the monolith nobody wanted to touch. Your VP of Engineering is showing velocity charts that are up and to the right.
None of that matters. You’re accelerating in the wrong direction.
The Polishing Trap
AI is making legacy code more comfortable to live in. That comfort is the trap.
A team adopts Copilot. They knock out a backlog of tech debt tickets. They generate test coverage for modules that haven’t been touched in three years. They produce architecture diagrams for a system that previously relied on tribal knowledge. Leadership sees the metrics improve and calls it a win.
Meanwhile, the same AI tooling could have scaffolded the replacement service in less time than it took to document the old one.
Here’s what makes this insidious: the better AI gets at maintaining legacy systems, the harder it becomes to justify replacing them. Every new test suite, every generated doc, every automated bug fix makes the old system feel more manageable. The case for replacement grows weaker every sprint, not because the system is getting better, but because the pain of living with it is growing duller.
And the rewrite math has changed underneath you while this was happening. The “never rewrite” rule assumed 12 to 18 months of parallel development, with a massive scope-creep risk. That assumption was based on human-speed development. A competent team with modern frameworks and AI tooling can now scaffold a functional replacement in weeks that would have taken quarters. Not toy prototypes. Working systems with test coverage that can absorb production traffic. But most engineering leaders are still running 2020 risk assumptions against 2026 tooling, and AI is actively making the status quo comfortable enough that the recalculation never happens.
You’re not paying down tech debt. You’re making it more livable. Those are very different things.
The Replace-or-Polish Framework
So, which systems should you accelerate maintenance on, and which should you accelerate replacement of? Most teams apply the “never rewrite” rule universally rather than evaluating each system individually. Here’s how to actually score it.
Bus Factor. How many people can deploy, modify, and troubleshoot this system without hand-holding? One person is red. Two to three is yellow. Four or more is green. If you’re at one or two, AI-assisted maintenance is making the bus factor problem more comfortable without solving it. You’re generating docs that nobody will read under pressure and tests that nobody else can debug when they fail. A bus factor of one with great AI-generated documentation is still a bus factor of one.
Dependency Freshness. How far behind are the core dependencies? One major version behind is green. Two is yellow. Three or more is red, and at that point, the upgrade path is effectively a rewrite anyway, you’re just pretending it isn’t. Check the framework, the runtime, and the critical libraries. If any core dependency is end-of-life with no upgrade path, that’s red regardless of version count. You’re maintaining code on a platform the maintainers abandoned.
Security Debt. This is the factor with an external clock. Two or more known CVEs in pinned dependencies with no clean patch path make this red. An EOL runtime that no longer receives security patches is red. Auth patterns that predate OAuth2 and OIDC are red. A compliance gap that would require rework and that approaches the replacement scope is red. One manageable CVE with a known remediation path is yellow. Clean scans and current auth standards are green. Bus factor is a risk. Feature drag is a cost. Security debt is a countdown. A disclosed vulnerability, a failed pen test, a new compliance requirement from your largest customer, and the “maintain” decision gets made for you under the worst possible conditions.
Feature Velocity Drag. What’s the ratio of estimated versus actual delivery time for features touching this system? Under 1.5x is green, that’s normal friction. 1.5x to 3x is yellow. Over 3x is red, and it means the architecture is actively fighting you. No amount of AI-assisted development changes a system that’s slow because of its design. If your team is using Copilot to write code faster inside that system, you’re optimizing the wrong layer.
AI Maintainability Ceiling. Can AI actually reason effectively about the codebase? Test this explicitly. Give an agent a real task in the legacy system and evaluate the output. If the agent produces usable code with minor corrections, that’s green. If it requires significant human rework but gets the intent right, yellow. If it produces consistently wrong or dangerous output because the codebase is too convoluted, red. When the polishing path doesn’t even work because the code defeats the agent, the entire argument for AI-assisted maintenance collapses.
Blast Radius. Is this a contained service with clean interfaces? Green. Does it have known dependencies on a handful of other systems? Yellow. Is it load-bearing for a significant portion of your platform with unclear boundaries? Red. High blast radius doesn’t mean “don’t replace.” It means replace with a strangler pattern and plan the migration carefully. Blast radius determines how you replace, not whether you should.
Run each system through these six factors. Red on three or more, and you’re in replacement territory. Red on security debt alone, and you may not have the luxury of waiting for the other factors to line up. Green across the board and AI-assisted maintenance is the right call. Feel good about it.
One caveat worth stating directly: if your operating horizon is short enough that a quarter-long replacement won’t pay off before an exit or a major strategic shift, maintain with AI and be honest that you’re managing the liability, not resolving it. That’s a legitimate business decision. Just don’t confuse it with a technical one.
The point isn’t to replace everything. It’s to stop applying the same default to every system when the factors are screaming different answers.
What This Means For You
The framework gives you the answer. Here’s why it won’t matter unless you do three things:
Run the framework on your top five most painful systems this quarter. Be honest about the security debt column. Most teams haven’t inventoried their CVE exposure against pinned dependencies in months. You’ll likely find two or three systems that are obvious replacement candidates hiding behind the “never rewrite” default.
Pick the strongest candidate and run a two-week spike. Give a strong team, Claude Code or Cursor, and a clear target: scaffold the replacement, absorb real production logic, produce something you can evaluate against the existing system. You’re not committing to a rewrite. You’re testing whether the timeline compression is real for your specific codebase. If the spike produces a credible path in two weeks, you have your answer.
Name the incentive problem out loud. This is the part that kills modernization efforts regardless of what the framework says, and it has nothing to do with technology.
Legacy systems create expertise moats. The engineer who’s the only person who understands the payment processing service has real job security. The team that’s been “modernizing” the same monolith for three years has a predictable roadmap. AI just strengthened the team’s position. They can point to velocity improvements and coverage metrics and say, “Look, the system is getting better.” The system isn’t getting better. The tooling around it is.
These aren’t lazy engineers. They’re rational humans responding to incentives. But if the people with the deepest institutional knowledge see replacement as a threat, the framework scores don’t matter. Give them ownership of the new architecture. They know every edge case, every buried requirement, every customer-specific hack. That expertise is the asset. The legacy codebase is the liability. Make the replacement the most interesting project on the roadmap, not a threat to their relevance.
AI didn’t just change how fast you can build. It changed how easy it is to avoid building the right thing.
Stop polishing. Start replacing.
