Just one week after global outages, Cloudflare experienced another outage. This outage was caused by a high increase in CPU usage due to a bug that was in the company’s firewall software. This bug was not inserted by a hacker but was caused by Cloudflare themselves. The issue lasted for about 30 minutes which affected thousands of websites that trust Cloudflare to ensure their website is operational and secure. When you think of an outage or a bug in software, you probably think that it was a hacker trying to breach the website. Surprisingly, it is more common to see self-inflicted outages due to poor software quality rather than hackers attacking websites.
Modern software is very complex. With multiple system interfaces and complex requirements, the software complexity sometimes grows beyond control, rendering applications and portfolios overly costly to maintain and risky to enhance. Left unchecked, software complexity can run rampant in delivered projects, leaving behind bloated, cumbersome applications.
On top of this, if a coder writes software in a highly creative way, then it is often difficult for them or their colleagues to avoid introducing flaws when they go back to make enhancements. Software nowadays is often beyond the understanding of human software architects. We can expect to see more unpredictable behavior due to software complexity and from systems that are composed of multiple piece-parts. Only with software intelligence will it be possible to uncover software quality issues or bugs in the software. Without this sort of advanced technology, it will be too late by the time you realize there is a problem.
The ironic part of the situation is that the website DownDetector, which is responsible for informing people if a website is experiencing issues, was also down. Cloudflare handled the outage well by informing their customers of the mistake, assuring them that there was no attack, and promptly fixing the issue. The company is working to improve their testing system to prevent another incident similar to this one. Testing, however, is only part of the solution. Let’s just hope they also get some software intelligence into the mix.