Living on the Edge: What Fintech Can Learn from Flight Cancellations
This week our family got stranded in Oslo.
Our flight home on Monday was canceled due to the air traffic control malfunction in the UK. We made the best of it with an impromptu city break, and I can certainly think of worse places than The Tiger City to be stranded for a few days.
But I think it could be interesting to take a look at what went wrong (looks like faulty data was submitted by a French airline), and what happened next (suspension of automatic processing to ensure that no incorrect safety-related information could be shared), and the repercussions.
A big problem
More than 1,500 flights were cancelled on Monday, and 345 on Tuesday due to the air traffic control failure, with thousands of holidaymakers left stranded abroad. The industry expects to pay over £100mm in compensation claims.
Many families were stuck in their destinations for a week. Apparently Heathrow and Gatwick are the busiest 2-runway and 1-runway airports in the world and run with very very little slack in the schedule, so when something blows up like this, the repercussions are huge.
The company that provides the air traffic control system is NATS, a public-private partnership jointly owned by the UK government, a handful of British airlines, and some pension funds. The company was founded in 1962, and it’s probably fair to say that some of the core technology underpinning the system has been there from the very beginning. Apparently a similar incident in 2014 was traced back to a bug in the code from the 1990s!
Dealing with legacy systems
Whilst at Origin we don’t deal with such safety-critical situations as air traffic control, we do have experience in building tech that needs to interface with legacy systems that are not just years, but sometimes decades old.
Those who aren’t intimately involved in building technology sometimes won’t appreciate how difficult it is to write code and build systems that are truly stable. The reason for this is “edge-cases” – situations that are not the base-case scenario for a system’s operations, but are theoretically possible within the algorithm.
The more complicated an algorithm, (and the more automation exists within the system), the more risk of edge cases tripping things up.
Hard-won experience
At Origin we have a lot of experience with this, especially in the building of our Documentation product, which now supports “multi-player mode” with many counterparties (dealers, issuer, law firm) able to interact simultaneously on a syndicated transaction.
There are so many edge cases that can potentially arise as different counterparties are reviewing, commenting, and collaborating on the documents we produce.
We put a lot of time into thoroughly testing all that we can, and also invest in design resource to ensure that if things fail, they “fail elegantly” (meaning that we revert any change, protect the rest of the platform, and deliver to the user an actionable message about what’s going on). This is hard enough when you own the whole tech stack, but becomes even harder when you are building new tech that has to interface with old tech.
We’re well versed in this, having built integrations into clearing systems, stock exchanges, and paying agents all around the capital markets… all of whom have systems in use that are 20-30+ years old (if not more).
Focus on robustness
Development can be a bit slower than initially expected, but that’s because during the development and testing cycle, we are constantly uncovering edge cases that need to be dealt with – even if the probability of those arising is extremely low. You’d be surprised by the number of random situations that we’ve had to account for that potentially threaten to derail straight through processing of issuance.
As we’ve scaled up – we’re now processing over 1,000 transactions annually – we’ve made huge investments in QA and automated end-to-end testing to ensure our processes and platform are stable. And every once in a while, we pause all development and focus on robustness – specifically looking for edge cases to ensure they are covered for.
Now that the air traffic control system is back up and running, and our family has arrived safely back in London, let’s hope that NATS takes a leaf out of Origin’s book – time for the technologists who keep our skies safe to embark on a sprint or two of robustness to prevent this from happening again!