In the spirit of Yogi Berra, I’ve decided to list of the obvious things that I know in life: water is wet, the sky is blue, and big software projects fail.
I’m sure that you are aware of the very public failure of the centerpiece of Obamacare, Healthcare.gov, and by now have heard enough of the public interrogations of this project, the system, its agency, and policy.
Rather than adding to that, I’d caution that instead of staring too long and too closely at this incident, we should allow it to serve as a simple reminder that there are more and bigger failures lurking.
“It’s like déjà vu all over again.” – Yogi Berra
I assure you that the conditions that led to this event are being repeated this very moment, just as they have been since the dawn of software. By the end of the Healthcare Exchange’s forensic analysis, every project management and software engineering cliché will be cited as a root cause: ill-defined requirements, complex interfaces, lack of system level oversight, over optimism, complex multi-sourcing, lack of vendor oversight, group think, and notional insight into the fundamental state of systems.
Rather than piling on the criticism, let us use this recent event as a backdrop to remember something even more obvious but less accepted: the pace of business, combined with the size and complexity of systems are too much for our traditional IT skills and processes.
Functional testing is not enough
After the dust settles, insufficient testing will be cited as a major factor in this outage because our traditional view of quality is testing. Yet functional testing by its definition is to ensure that the application operates as designed – that it meets its functional specification.
We must acknowledge that our applications are fundamentally different. Today’s applications are really systems or families of systems that are a series of interconnected applications, databases, and components consisting of hundreds of interfaces linking multiple technologies and platforms together.
“If you don’t know where you are going you will end up somewhere else.” – Yogi Berra
Our imagination and ‘requirements’ are stretching components beyond their original intent or design. The Healthcare Exchange is an extreme example of this – a more common one would be your bank’s online banking system.
Our current approach to system testing is insufficient because it relies on manual efforts. An individual needs to create the tests that test these systems. Yet no individual can effectively comprehend the complexities and boundaries of these systems and therefore, even with unlimited amount of time, create enough test cases to cover the entire system.
Even if they could, they are only testing the functional aspects of the system. Research shows that 90 percent of system outages are the result of issues that are outside the capability of functional testing. As a result, programs that invest in ‘extra’ testing are throwing money at a problem they can never fix.
Effective outsourcing and multi-sourcing is hard
Collocation, geographically-dispersed development, multi-sourcing, and captive centers are simply resourcing strategies that promise savings yet increase communication challenges and add project and process complexity. Although the pros and cons of outsourcing have been debated for years, it is not going away – especially in the federal market. References to the government’s choice for a prime contractor on the Healthcare Exchange, as well as the 54 other vendors contributing to the project, are evidence of that.
It’s not the resourcing strategy that should be at question. We should instead question the traditional vendor management contracting processes and SLAs. The current schedule/budget-centric perspective must evolve to include a measurement of the output – we must measure the product itself. Without a thorough understanding of what is actually delivered, how can you adequately judge a project’s progress? Or a team’s capability?
It is ‘obvious’ that output delivered on time that doesn’t meet a standard set of defined measures of quality or non-functional requirements does not define a successful partnership. We must adjust our methods to include precise, objective measure of quality and quantity. Only then we can accurately assess progress towards milestones to determine if we are on schedule, on budget, and on quality.
“I knew I was going to take the wrong train so I left early.” – Yogi Berra
Turning disaster recovery into disaster preparation
Some people may criticize the government’s reaction to commission a ‘tech surge’ or ‘tiger team’ to quickly identify and resolve the Healthcare Exchange issues. Yet every Fortune 500 company would respond exactly the same way except they use terms such as recovery management and disaster recovery.
Preparing for crisis is more expensive than preparing to prevent one. Period. It is time to question this mindset that suggests it is better to dedicate contingency budget (proactively funding resources to reactively manage potential issues) than to proactively fund safeguards that prevent potential issues from ever occurring. Just as Agile has challenged traditional development, transforming how we think about managing vendors, system level oversight and manual processes must be challenged.
“You can observe a lot by watching.” – Yogi Berra
As stated above, business critical IT systems have grown far beyond what we would normally consider a ‘system.’ They’re huge architecturally-complex families of systems that communicate with each other and other software platforms. It’s time we stop thinking and managing these systems as if they are stand-alone.
Developers need system level tools and processes to help them understand these complexities. In addition, Application Owners need visibility into the systems. The traditional way is not working and it is the responsibility of application owners and their business partners to require visibility into system health, risks, adherence to standards, benchmarks versus peers, and traceability against key performance indicators and non-functional requirements like performance, stability, and security.
This kind of visibility is achievable and effective in producing high reliable systems while lowering their total ownership costs.
“We made too many wrong mistakes.” – Yogi Berra
We all know the wisdom of hindsight, yet every one of the indicators on project dashboards are retroactive measures: number of defects, system availability, uptime, etc. Rather than monitoring metrics for failures, we need proactive measures that prevent failure.
How? Leverage Industry Standards.
Before new food products hit the grocery shelves in the United States it has to meet FDA standards. A set of defined criteria that assures that the food is safe, nutritious, and clearly explained to the consumer (visibility). In order to alter the norm of the “launch it and we’ll deal with the bugs later” mentality that perpetuates through software development, we need a universal set of quality standards that all new software should adhere to. Organizations like the Consortium for IT Software Quality are publishing such standards and oversight organizations across the globe are starting to adopt them.
“The future ain’t what it use to be.” – Yogi Berra
Technology is changing in a way that is almost too obvious to write down on paper. However, it’s clear that software intensive organizations need to rapidly alter their current thinking and processes. It is irresponsible for us not to address the project factors that keep producing faulty software: ill-defined requirements, rushed deadlines, lack of system level oversight, complex multi-sourcing, lack of vendor oversight, group think, and notional insight into the fundamental state of systems.
We can do this with a proactive focus on system level thinking and structural quality on the front end of development so that the finished product is not just functional but stable, secure, and resilient. Otherwise, holding fast to traditional practices will continue to result in glitch-ridden software.
After all, “It ain’t over til it’s over.”
Want to hear more about how big software projects fail and what we can learn from them? Join me and representatives from Galorath Inc. for our webinar on Dec. 17th, Healthcare.gov IT Expert Examination, where we’ll examine what went wrong in Healthcare.gov’s rollout, what can be learned from the experience, and how to avoid failure on future projects. Click here to register.