What datacenter exits actually fail at

Ask a programme team whether their datacenter exit succeeded and they will answer with a date. We hit cutover on schedule. We came in under budget. We moved four hundred workloads in nine months with no major incident. All of that can be true, and the exit can still have failed — you just will not know it yet. Because the question that matters is not whether the migration completed. It is whether the platform you moved to is calmer to operate in year two than the datacenter was in its last year alive. By that measure, a great many exits that were declared successful were nothing of the sort.

This is the uncomfortable thing experience teaches. The migration is the visible, expensive, anxiety-soaked part, so it absorbs all the attention and all the credit. But the migration is also the part with the most people watching, the most budget, and the clearest deadline — which is precisely why it usually goes fine. The failures happen later, quietly, in the part nobody is measuring once the celebration email has gone out.

The clock to watch is not migration day. It is the on-call rotation eighteen months later.

Look at the two curves. Migration effort is the dashed line — it spikes hard at cutover and then falls away to almost nothing, which is exactly what everyone expects and plans for. The solid line is operational pain, and it does something the programme plan rarely shows: it stays low and flat all through year one, lulling everyone into believing the exit worked, and then it climbs steadily through year two. The cruelty of the shape is that the decisions which determine how high that second curve rises were all made back during the migration — in the weeks when nobody was thinking about year two at all.

So where, specifically, does the damage get done? In my experience it concentrates in three layers, and the same three show up again and again across industries and platforms.

Three shortcuts that cost nothing on migration day and a great deal by year two.

One — identity, bolted on at the end

Identity is the layer everyone agrees is important and almost nobody sequences first. The pattern is predictable: the migration is framed as a workload-moving exercise, identity is treated as a supporting concern to be tidied up once the applications are across, and so the team carries the old assumptions — the legacy directory, the implicit trust relationships, the access that accumulated over a decade — straight into the new environment. It works at cutover because it is, functionally, the same access model in a new location.

Then year two arrives. Nobody can say with confidence who has access to what. Joiners, movers, and leavers have been processed against a model that was never designed, only inherited. An audit asks a simple question and the honest answer takes three weeks to assemble. The cost was not paid during the migration. It was deferred, with interest, to the moment the organisation could least absorb it.

Two — the landing zone as a checklist

A landing zone is an architecture. Treated well, it is a small number of opinionated decisions — how subscriptions are structured, how networking and identity and policy compose, what the default path looks like — that make the right thing the easy thing for years afterward. Treated badly, it is a checklist someone worked through once to satisfy a reference architecture, with no underlying point of view about how the organisation actually wants to operate.

The difference is invisible at cutover, because at cutover there are only a handful of workloads and a team that still remembers every decision. It becomes visible as the environment fills up. A checklist landing zone has no clean default, so every new workload is a negotiation, every team solves the same problem a slightly different way, and the variance compounds. Two years in, the platform is not one environment; it is forty slightly different ones wearing the same logo.

A landing zone built as a checklist gives you a hundred ways to do everything. A landing zone built as an architecture gives you one good way and the discipline to keep it.

Three — runbooks that record the what, never the why

The third failure is the quietest and, over a long enough horizon, the most expensive. The migration produces documentation — runbooks, diagrams, configuration records — and on inspection it all looks thorough. It describes what was built and how to operate it. What it almost never captures is why. Why this region pairing and not that one. Why this workload was left on a legacy pattern deliberately. Why a particular constraint that looks arbitrary is actually load-bearing.

The why lives in the heads of the people who did the work, and people move on. Eighteen months later a new engineer inherits a system documented in full and understood not at all. They cannot tell the deliberate decisions from the accidental ones, so they treat all of it as fragile and touch none of it. The platform ossifies. The three-in-the-morning pages keep coming, because the knowledge that would have prevented them walked out the door with the people who never wrote it down.

What the good ones do differently

The exits that are still calm in year two are not the ones that migrated fastest or cheapest. They are the ones that treated the migration as the easy half of the job and the foundation as the hard half — and sequenced accordingly. They designed identity first, before a single workload moved, because identity is the boundary everything else depends on. They built the landing zone as a small set of defensible decisions rather than a checklist, and they defended those decisions against the steady pressure to make exceptions. And they wrote down the why — not exhaustively, but for every decision that someone two years later would otherwise have to reverse-engineer or fear.

None of this costs more in absolute terms. It is the same work, sequenced differently — and done is defined differently too. The cost is borne earlier, when there is budget and attention and the people who understand the decisions are still in the room. That is the whole trick: pay during the migration, when you can afford it, instead of in year two, when you cannot.

The temptation will always be to measure the exit on the day of cutover, because that is the day everyone is watching and the metrics are flattering. But the real verdict comes later, in the size of the on-call rotation, in how long an audit takes, in whether a new engineer can change something without flinching. The clock that decides whether your exit succeeded is not the one counting down to migration day. It is still running, quietly, eighteen months after everyone stopped paying attention.

Decision Architecture

The Four Tests of an Architecture Decision

Test 01

Clarity

Has the actual decision been stated? Not the option being considered, not the direction being explored — the decision itself, in writing, with the alternatives named.

Test 02

Sequence

Is it being made at the right point in the programme? The same decision costs almost nothing early and can cost a programme when made late. Sequence is where most of the cost is hidden.

Test 03

Ownership

Is someone accountable for making and sustaining it? A decision without an owner drifts. Ownership means someone will notice when the context changes and the decision no longer holds.

Test 04

Consequence

What becomes harder, more expensive, or irreversible afterward? Every significant decision forecloses options. Naming those options before deciding is the discipline that separates architecture from guesswork.

Applied to this essay

Does your exit programme have an explicit operating model for the post-migration platform — not just a migration plan?
Which decisions are being deferred until after the migration that should be made before it?
What would year-two actually look like — and has anyone modelled it?

Continue exploring