How a single unit difference caused a $370 million space disaster

Would you happen to know the difference between the numbers 32,768 and 32,767? Just a single meager unit difference, right?
To computers, especially in critical operations, small differences in numbers can be the root cause of catastrophic events, such as a $ 370 million space disaster 1.
On June 4, 1996, the maiden flight of Ariane 5 was destroyed 37 seconds after liftoff 2. The cause, now famous, was that a 64-bit floating-point velocity was converted to a 16-bit signed integer. The value didn't fit, and the inertial reference system halted.
The piece often left out of the retelling is more interesting. The conversion was unprotected because an analysis of the previous-generation Ariane 4 had proven the value couldn't overflow. That proof was correct. It was also specific to Ariane 4's flight envelope, whereas Ariane 5 operated in a different one.
The bug wasn't a missing check. It was a constraint that held in one regime and, in another, silently ceased to hold.
Almost thirty years later, the lesson hasn't quite been internalized. And with reusable launch, the same class of failure mode now multiplies.
What was once a singular catastrophe is now an industrial process. The cadence of launch has flipped, with vehicles flying weekly, fleets flying together, and individual boosters flying dozens of times. A lesson that mattered once, on a maiden flight in 1996, now applies thousands of times a year across the industry. The frequency of the context determines the stakes.
A reusable booster has to be correct across regimes, and an expendable booster never visited: boostback, supersonic retropropulsion, hypersonic reentry, the final seconds before touchdown. Each regime has its own envelope of valid sensor values, control authorities, and timing assumptions. The state machine has to handle transitions between them. Mission-specific software loads change between flights. And learned components are starting to make decisions inside the loop.
Every one of these is an opportunity for a constraint to hold in one place and quietly stop holding in another.
Hardware-in-the-loop testing catches a lot. Anomaly investigations catch what makes it through. Test-fly-fix iteration is genuinely effective at the cadence the new generation of launch operates on. None of these techniques, individually or together, can guarantee that a property that holds in the regimes someone thought to test will continue to hold in the regimes the vehicle eventually flies through.
The formal methods community has spent the intervening decades building the tools that prove properties across full input spaces, not just sampled ones. The work that catches the next Ariane 501 already exists. The harder problem is integrating that work into a development pipeline that ships hardware every week.
That is the problem worth working on. Formal modeling and verification is the only path toward taming the complexity of modern software systems.

References: