As a nursing home consultant and her company’s top mentor for clients that have fallen afoul of their states’ regulatory commissions, my wife travels nearly every week of the year. As a result, I have a ton of second-hand experience with airline delays and passengers being stuck in an airport far from home…especially when she flies one particular airline, which shall remain anonymous (but which rhymes with “You Guess Stair Ways”).
So when Susan Carey at The Wall Street Journal reported last week on the recent spate of computer systems outages among U.S. air carriers, my initial reaction was, “so what else is new?"
But as I read more about these issues, I began to notice that they seem to bear a significant resemblance to the financial systems outages that occurred back in February and March. This resemblance became even more pronounced when Carey pointed to a major culprit behind the outages, noting:
The problems, often arcane, can be caused by bad hardware, corrupted software, the failure of backup systems to kick in or human error. Electric power supplies can go on the fritz, and so can telecom networks connecting internal airline operations with airports and data centers.
Experts say the disruptions often occur when an airline or technology vendor is performing maintenance, installing an upgrade or making a major technology transfer.
Haven’t we been here before?
The June computer outages about which Carey wrote resulted in hundreds of flights being cancelled, twice or more that number being delayed and tens of thousands of passengers – my wife included – stuck in some airport feeling very inconvenienced. And while some airlines, like Alaska Airlines, move quickly to compensate passengers for their inconvenience, one has to wonder how these already put-off customers would feel if they knew their hassle had been brought on by a potentially avoidable problem.
As Carey noted, many of the problems stem from existing software that has either been upgraded, customized or had new versions built on top of it. Power supply issues notwithstanding, the vulnerabilities and stability of old software should not be the reason that tens of thousands of people across the country should be stuck eating airport food or sleeping overnight on benches.
Up, Up and Away!
Truth is, software failures like the ones experienced by the airlines have become all too commonplace in all industries. We treat news of software failures as though they were inevitable and almost expected and, particularly with an industry like the airlines, we accept the apologies that are granted because we have no other alternative than to do so.
But why? When exactly did we decide that software failure was an unavoidable part of business and an acceptable excuse to leave us stranded hundreds or even thousands of miles from home?
Like we’ve said in this space in the past about other industries, the airlines need to do a better job of assessing the structural quality of software before it is deployed rather than waiting for it to fail and then fixing the problem and apologizing for it. It’s not like they don’t know what causes poor software quality:
- Business Blindspot: Regardless of the industry, most developers are not experts in their particular domain when they begin working for a company. It takes time to learn about the business, but most of the learning, unfortunately, comes only by correcting mistakes after the software has malfunctioned.
- Inexperience with Technology: Mission-critical business applications are a complex array of multiple computer languages and software platforms. Rather than being built on a single platform or in a single language, they tend to be mash-ups of platforms, interfaces, business logic and data management that interact through middleware with enterprise resource systems and legacy applications. Additionally, in the case of some long-standing systems, developers often find themselves programming on top of archaic languages. It is rare to find a developer who knows “everything” when it comes to programming languages and those who don’t may make assumptions that result in software errors -- eventually leading to system outages, data corruption and security breaches.
- Speed Kills: The pace of business over the past decade has increased exponentially. Things move so fast that software is practically obsolete by the time it’s installed. The break-neck speeds at which developers are asked to ply their craft often means software quality becomes a sacrificial lamb.
- Old Code Complexities: A significant majority of software development builds upon existing code. Studies show that developers spend half their time or more trying to figure out what the “old code” did and how it can be modified for use in the current project. The more complex the code, the more time spent trying to unravel it…or not. In the interest of time (see “Speed Kills” above) complexity can also lead to “work arounds” leaving a high potential for mistakes.
- Buyer Beware: Mergers and acquisitions are a fact of life in today’s business climate and most large applications from large “acquirers” are built using code from acquired companies. Unfortunately, the acquiring organization can’t control the quality of the software they are receiving and poor structural quality is not immediately visible to them.
A quick application of automated analysis and measurement to diagnose the structural quality and health issues within the application software of the airlines’ systems would go much further toward making the skies a friendlier place to fly…and get my wife home faster, too!