While working in a CISQ technical work group to propose the "best" quality model that would efficiently provide visibility on application quality (mostly to ensure their reliance, performance, and security), we discussed two approaches that would output exposure. The first is a remediation cost approach, which measures the distance to the required internal quality level. The other is a risk level approach, which estimates the impact internal quality issues can have on the business.
Although both are based on the same raw data, the information differs when we identify situations that do not comply with some coding, structural, and architectural practices. The former approach will estimate the cost to fix the situations while the latter approach will estimate the risk the situations create.
The remediation cost approach
This approach has appeal because:
- It is simple to understand: we are talking effort and cost. Anyone can understand that fixing this type of issue takes that amount of time and money
- It is simple to aggregate: effort or time simply adds up
- It is simple to compare: more or less effort or time for this application to meet the requirements
- It is simple to translate into an IT budget
However, its major drawback is that it does not estimate the consequences. Using the technical debt metaphor, this approach only estimates the principal of the technical debt (that is, the amount you own) without estimating the interest payments (the consequences on your development and maintenance activity as well as on the service level of the indebted application). Why should we care? Because you will have to decide: Which part of the debt am I going to repay? Where do I start for a maximum return on investment?
A half-day fix can relate to a situation that can crash the whole application. For example, an unknown variable in a massively used library might be nothing to fix, while the consequences on the application behavior in production are severe. However, the remediation cost does not convey any the sense of urgency. If I were to monitor the progress of the project, a leftover half day would not scare me and force me to decide to fix it no matter what, even if it meant postponing the release date.
If the application did crash, would the answer, "Oh, we were just a half-day away from the required internal quality …" be acceptable? I think not. Something should have told me that despite the short distance to the required internal quality, the incurred risk was too high.
The risk level estimation approach
This approach has a different kind of appeal. Its proponents say that its models are what truly matter: the risk an application faces regarding its resilience, its performance level in case of an unexpected peak of workload, its ability to ensure data integrity and confidentiality, its level of responsiveness to business requirements, its ability to fit in agile development contexts, and to benefit from all sourcing options.
It puts the focus back on the fact that applications are here to serve the business and serve it well. Technical debt would not matter so much if it had no consequences on the business -- It would remain a development organization issue and not a corporate issue.
There are some headlines in the news about late and over-budget projects in the IT sector. There are many more headlines in the mainstream news about major application blackouts and sensitive data leaks.
However, risk-level automation’s major drawback is its lack of pure objectivity. What is the business impact of a vulnerability to SQL injection? Nothing, until you find out. This isn’t so much of a problem in an internal application, but much more in a web-facing, mission-critical, data-sensitive application.
The two sides of the same coin?
Are these irreconcilable differences? Not so much if you think of the impact on the business as the interest-that-matters of the technical debt, while remediation cost are the principal sum of the technical debt.
What does "interest-that-matters" mean? It means "it depends," of course. It depends on the value the application delivers to your organization. It depends on your application development and maintenance strategies. The context is key. The same principal amount of technical debt carries widely different interests in different contexts.
Why not use the same unit, that is, $ or €? First, the amounts could be too huge to serve any value to the business (outside a Monopoly board game). They are also too unpredictable -- as the amounts are application dependent and, even for a given application, the consequences are also difficult to predict.
As for any other risk, this is more about giving a status: Is the risk level tolerable?
Many different statuses can be used:
- Severe, high, elevated, guarded, or low
- Unacceptable, poor, acceptable, good, or excellent
- Very high / extreme, high, moderate, or low
These statuses convey the interpretation of the risk assessment. The output already takes into account the different aspects of risk: likelihood and consequences in context.
If you are convinced, as I am, of the complementary nature of remediation cost and risk level, you would nonetheless point out that the major hurdle: objective risk level estimation.
Stay tuned for my next post, where we’ll look at this major hurdle to providing visibility into application quality.
How have you gotten visibility into your application’s quality? Share your story in a comment.