ARCHITECTUREAPR 20265 MIN READ← INSIGHTS

Five Legacy Systems, Zero Downtime: The Migration Playbook

NEXTJSNESTJSMIGRATIONLEGACYSTRANGLER-FIG

Rewriting a live platform is not a project. It's a programme — one that runs alongside feature development, responds to production incidents, and never fully ends. When you're serving over a million organic visitors a month, you don't get a maintenance window. You migrate live, or you don't migrate at all.

This is the story of how we're doing it at Simplilearn: what we inherited, the architectural decisions we made, what's shipped, and what's still in progress.

What We Inherited

The platform had grown through years of shipping. Two systems were the core load-bearers:

The legacy frontend app — a React-based website that had expanded well past its original scope. Fast to start, painful to extend. Slow deploys, accumulated component debt, no clear ownership boundaries between features. Engineers spent more time understanding what already existed than writing new code.

The legacy PHP backend — a PHP monolith that had been the backbone of the platform for years. It handled authentication, course delivery, background jobs, and cron-based data processing — often in the same codebase, tightly entangled. Touching one part risked breaking another.

Beyond these two, there were additional systems in the architecture: a legacy subdomain system handling course and learning path URLs, a payments platform with its own legacy codebase, WordPress-managed pages, and serverless functions on AWS Lambda handling async workflows. Each had accumulated its own debt.

The platform was not broken. That's important to understand. Both legacy systems were in production, serving real users, generating real revenue. The case for migration was not "things are on fire." It was "the cost of change is too high and getting higher."

The Architecture Decision: Two Monorepos, Not One

The most consequential early decision was how to structure the new architecture. The options were: one unified Turborepo monorepo, or two separate ones.

We chose two:

The frontend monorepo consolidates all customer-facing surfaces — five applications serving learners directly. These apps are performance-critical, SEO-sensitive, and owned by product-aligned squads. They deploy frequently with every content or feature change.

The backend monorepo consolidates internal systems — ten-plus applications covering backend APIs, ops tooling, admin interfaces, and internal workflows, all running on NestJS. These have different performance profiles, different team ownership, and don't need to move in lockstep with customer-facing deploys.

Putting them together would have introduced artificial coupling. A deploy to an internal admin tool would sit in the same pipeline as a learner-facing checkout flow. That's coordination overhead we didn't need and didn't want.

Turborepo makes this pattern sustainable: shared packages — TypeScript types, UI components, utilities — live in a packages layer that both monorepos consume, without forcing a shared deployment unit. The shared code is versioned together; the applications deploy independently.

This two-monorepo split has since been adopted as the default migration direction across the engineering org. Any legacy system being modernised routes into one of these two monorepos, not into a new standalone repo.

Migration Strategy: Strangler Fig

We didn't rewrite. We encircled.

The strangler fig pattern: stand up the new application behind the same domain. Route traffic at the CDN layer. Legacy systems stay alive and handle specific paths while we rebuild route by route. Users never see a cutover — they see one domain, and the routing handles which system responds.

code
// next.config.ts — legacy routing during migration
export default {
  async rewrites() {
    return {
      fallback: [
        { source: '/courses/:path*', destination: 'https://legacy-backend.internal/courses/:path*' },
        { source: '/checkout/:path*', destination: 'https://legacy-backend.internal/checkout/:path*' },
      ],
    };
  },
};

This configuration let us ship migrated pages incrementally. Every new page that goes live in the frontend monorepo is one fewer route proxied to the legacy system. The migration is a continuous process, not a cutover event.

We migrated in waves, ordered by risk:

  • Wave 1 — Read-only public pages. Low risk, high confidence. Category pages, roles page, public-facing content.
  • Wave 2 — Authenticated content. Users logged in, but no transactions.
  • Wave 3 — Transactional flows. Enrollments, payments. Hard prerequisite: full API contract documentation and testing before a single page moved.

Each wave had a performance budget. If a migrated route didn't hit the target, it went back. The migration was not allowed to be a lateral move — every page had to be measurably better on Next.js before going live.

What's Shipped

The Backend: Full PHP Cutover (April 2026)

The biggest single migration milestone was replacing the legacy PHP backend entirely with a NestJS equivalent in the backend monorepo.

The backend was released to production on April 14, 2026. Full cutover — all traffic moved off the legacy PHP backend — was completed April 30, 2026. Six weeks from initial production deployment to full handover, with zero downtime.

The approach that made zero-downtime possible: we ran both systems in parallel during the transition period. New API surface served from the NestJS backend monorepo; legacy PHP routes still live as a fallback. We moved traffic route by route, validating behaviour at each step. The moment all routes were verified, we cut the fallback.

AWS Secret Manager was standardised as the environment variable source-of-truth during this phase — a pattern now applied to every subsequent migration.

Frontend Pages (April–May 2026)

Alongside the backend cutover, we migrated the first set of high-priority frontend pages

Each page went through the same process: map the existing API dependencies, document the contract, build in the frontend monorepo, validate on staging, deploy with the legacy route as fallback, confirm, remove fallback.

The Figma MCP workflow — where design files feed directly into the code generation process — meaningfully reduced the time between design handoff and production-ready component for the roles page and B2B home. What previously required iterative back-and-forth between design and engineering was significantly compressed.

Cron Optimisation (May 2026)

The legacy PHP backend's cron jobs were a separate problem. The original plan was a 1:1 port to NestJS workers. We stopped before writing that code.

A 1:1 port would have moved the same architectural problem into a new language. The cron jobs had accumulated business logic, data dependencies, and timing assumptions over years. Porting them would lock that logic into a new system with the same constraints.

We paused and ran a step-based debugging analysis instead: instrument every stage of the cron execution, measure where time is actually being spent, fix the bottleneck, repeat. The result was a 25x improvement in cron execution speed — without a rewrite. The current crons now run fast enough to buy time to design the right replacement: an event-based architecture that removes the cron dependency entirely. That design is in progress.

What's Planned

The payments platform — This migration is blocked until AWS Secret Manager wiring is validated for the payments environment. The pattern was established during the PHP backend cutover; applying it to the payments platform is the prerequisite. Once that's done, migration planning begins.

Remaining legacy subdomain system pages — The subdomain system handles a large volume of course and learning path URLs. Full migration planning is owed.

Remaining legacy PHP backend APIs and pages — Some PHP backend routes were not included in the April cutover. Systematic extraction route by route is in the plan.

WordPress pages — These exist at the edge of the architecture and are the least urgent. Planning not yet started.

AI-Accelerated Migration

One pattern that's changed how we approach legacy code: before touching a legacy system, we now build an agent-readable layer on top of it.

Legacy codebases are expensive for AI coding agents to read. Years of accumulated debt, inconsistent structure, and buried context mean agents spend tokens on noise rather than signal. We applied a technique called LLM Wiki / Graphify to both the legacy frontend app and the payments platform — creating a structured documentation layer that agents read instead of raw source. The result was a 71.5x reduction in token consumption when agents explored those codebases, with meaningfully better output quality.

This matters for migration specifically: when you're extracting logic from a legacy system, you need the agent to understand what the code actually does. The wiki layer makes that possible without the agent getting lost in 8 years of accumulated context.

Outcomes So Far

Uptime improved from 95% to 99% after the core customer-facing pages moved to Next.js. The legacy architecture had failure modes baked into its deployment model — tightly coupled systems that couldn't fail independently. The new architecture isolates failure correctly.

Zero downtime on the full PHP backend cutover. A system that had been the backbone of the platform for years was replaced without a maintenance window or user-facing interruption.

25x improvement in cron execution speed through step-based optimisation, buying time to do the architectural redesign correctly.

Shared package layer that didn't exist before. TypeScript types, UI components, and utilities are now versioned and shared across the frontend monorepo's five applications. New pages don't start from scratch.

Org-wide adoption of the Turborepo pattern. The architecture established for this migration is now the default direction for all teams modernising legacy systems.

What I'd Do Differently

Design the event-based architecture before porting any crons. We caught it before porting, but the 1:1 porting instinct is strong. The right question is never "how do we run this in NestJS?" but "what problem is this cron actually solving, and what's the right way to solve that problem in the target system?"

Standardise Secret Manager earlier. AWS Secret Manager became the environment management standard during the PHP backend cutover. It should have been the standard from the first migration wave. Retrofitting environment management patterns mid-migration adds coordination overhead that's avoidable.

Build the LLM wiki layer before starting extraction, not after. We applied the agent-readable documentation layer after we'd already spent time manually exploring the legacy codebases. The 71.5x token reduction benefit would have compounded over the entire migration if we'd built it first.

Maintain a public migration backlog. Internal tracking exists, but a visible backlog — here's what's migrated, here's what's in progress, here's what's planned — prevents the false sense of completion that comes from celebrating individual milestones. The work is not done until the legacy systems are off.


The platform is not where we want it to be. It's measurably better than it was, with clear lines of sight to what comes next. That's the honest state of a live migration at scale — progress is continuous, completion is a horizon, and the work is shipping the whole way through.