This new report by the U.S GAO (Government Accountability Office) on airline IT outages is exposing how this problem is affecting our air travel. IT outages are having a significant impact on the airline industry and customers’ travel experience including delays and flight cancellations.
This has been an ongoing issue for many years that is caused by outdated software, post merger integration gaps and a complete lack of attention to structural quality. Much worse than other industries.
IT outages are not specifically covered in contracts of carriage, with different airlines interpreting differently. Therefore, customers are concerned that they will not receive compensation for the airlines’ faults. Some consumer advocates are worried that these events might be treated as “Act of God”. The GAO has reviewed this situation and has brought to light the many issues revolving around these system errors.
The 34 issues GAO identified in the 2015-2017 timeframe is a consistent cadence of one issue per month for the major US airlines over a three year period. Mind you, these are just the big, headline-grabbing issues. It belies the large number of daily airline IT issues that detract from our experience as airline consumers.
The GAO reports that airlines have finally started making efforts towards finding solutions to this issue by migrating to the cloud or utilizing more than one data center. This is only one piece of the solution, the data center is just one layer of the stack and tackles an increasingly small part of the problem. Carrying out major upgrades to their existing aging systems would be the most challenging as these systems are always in use and online.
We at CAST, have been vocal about highlighting these airline IT outage issues and the underlying reasons in the past :
We will have more to say about this specific GAO report in a later post. But for now, we’re delighted to see the GAO taking this issue seriously.
Read more about this latest GAO report on Airlines IT Outages here.
And just as I was about to wrap up my post, an alert hit my inbox regarding an outage over the last weekend that ended up in several cancelled flights across US. Wrapping up this post with a quote on this latest outage
“The root cause is a software design error that misinterprets GPS time updates. A ‘leap second’ event occurs once every 2.5 years within the U.S. Government GPS satellite almanac update. Our GPS-4000S (P/N 822-2189-100) and GLU-2100 (P/N 822-2532-100) software’s timing calculations have reacted to this leap second by not tracking satellites upon power-up and subsequently failing. A regularly scheduled almanac update with this ‘leap second’ was distributed by the U.S. government on 0:00 GMT Sunday, June 9, 2019, and the failures began to occur after this event.”