I was standing at the curb waiting for my daughter’s school bus to arrive when I instinctively pulled my BlackBerry Curve out of the holster on my hip. I do this dozens if not 100 times each day because I have the vibration turned down low so as not to be like “all the other” smartphone users out there who buzz every 30 seconds when they get an email or text. That doesn’t mean I check it any less, it just means I don’t buzz when I walk.
Today’s instinctive BlackBerry check had me a bit perplexed, though. When I looked at my “Messages” folder there was nothing in there from this morning. At the very least I should have had about a dozen e-newsletters and media alerts, not to mention my morning HARO subscriptions. Not even the emails I had sent out that morning on which I had cc’d myself could be found.
My initial reaction was “what’s wrong with this thing?” believing the problem to be peculiar to my own device. Even though I knew about the three-day outage in Europe, it did not dawn on me until I read about RIM’s outage extending into the U.S. that I was part of a system-wide failure.
And so it goes; another day, another outage.
Evidently, the outages in Europe formed the basis for what would become a global email failure for RIM. On Monday, the British publication, The Guardian, reported that:
“The problems began at about 11am on Monday. The Guardian understands that RIM was attempting a software upgrade on its database but suffered corruption problems, and that attempts to switch back to an older version led to a collapse.”
As the outage spread to North America, RIM CTO, David Yach, conducted a press conference on Wednesday to explain the now global failure and stem speculation over reasons for the BlackBerry outage. He confirmed that the North American problems with BlackBerry service had in fact resulted from the outages in Europe and were not part of a security breach. As reported on Mashable:
“Yach described the initial outage as a failure of one of RIM’s core switches. However, the real trouble began when RIM’s redundant systems failed, as well. ‘The failover did not function as expected,’ Yach said, ‘despite the fact that we regularly test failover systems.’ This led to a significant backup of mail.”
So many businesses today believe the biggest threats to their IT systems are security breaches. With Halloween approaching, though, I can tell you that the image of the external hacker skulking in his basement trying to tap into someone’s system is, more often than not, about as real as “The Boogeyman.” Sure, there are hackers out there, but the percentage of systems taken down by an outside cyber attack amount to a very small fraction of all the IT systems in the world.
Structural quality is a very real problem that affects the underlying code of a huge percentage of software applications. The only way to detect structural quality issues are through a thorough examination during pre-production – waiting to test until the application is ready for roll-out is too late because the minute fraction of flawed code lines issues are buried under hundreds of thousands of other lines of code.
And quite honestly, RIM--and for that matter all other IT-intensive companies that fail to perform automated analysis and measurement of their application software--should be ashamed of themselves; they of all people should know better. If these companies do not direct more of their focus upon the structural quality of their software, they are certain to be headed for an all but apocalyptic, slow-rolling failure, much like the one that pushed RIM’s system over the brink. And, as their systems meet with failures, they are certain to drown in the market.
In the meantime, failure to perform thorough assessments of application software means structural quality issues will continue to go undetected and continue to take down entire systems…or in RIM’s case, tens of millions of BlackBerry email message folders – including mine!