Is There Any Agreement on Measuring Technical Debt?

by

Want to get a roomful of software developers arguing for hours? Ask them how to measure technical debt. The first hour will be spent on arguing about the definition, and when they inevitably cannot agree on a definition, they will move on to measuring the thing they cannot define. This debate has raged on for decades.

In 2016, I was invited along with 40 or so other scientists studying technical debt to spend several days in Dagstuhl, Germany finally resolving the issue. After many hours of presentations and debate, all we resolved was that Germans brew excellent beer. Two factions emerged from the conclave: an academic faction believing that technical debt could only be defined as conscious sub-optimal design decisions that must be fixed later; and an industrial contingent believing CIOs don’t care whether a flaw was intentional or not -- if they have to pay to fix it, it is technical debt.

I am of the industrial persuasion. I worked with colleagues in the Consortium for Information and Software Quality (CISQ) to develop a standard for measuring technical debt based on estimating the time required to fix each of the CISQ weaknesses included in the four CISQ Automated Source Code Quality Measures. Philippe-Emmanuel Douziech developed a weighting method to adjust the effort for the difficulty and complexity of the environment in which the weakness had to be fixed. So we adjust the effort to fix each CISQ weakness detected in an application by the weight generated from its environment in the system, and then sum it up across all the weaknesses to develop an estimate of the technical debt in the application. The effort can then be denominated in dollars, Euros, rupees, or your favorite currency. However, this is only one of the many proposals for measuring technical debt.

Six computer scientists at three universities across Greece decided to study how much agreement there was between different commercial tools that provided a measure of technical debt1. They analyzed 50 open source programs, 25 in Java and 25 in JavaScript, with three commercial technologies: CAST’s Application Intelligence Platform, SonarCube, and Squore. They compared how each technology ranked the technical debt of each of the classes in the 50 programs. Their results indicated that there was statistically strong agreement among the three technologies on the ordering of the classes based on their amount of technical debt. This was encouraging since each of the technologies measures technical debt differently and assesses different collections of weaknesses in developing their measure.

Their conclusion was that while there is no agreement on exactly what elements should be measured to calculate a measure of technical debt, different ways of measuring nevertheless come to generally similar conclusions regarding which classes suffer the most debt. While the technologies do not always agree in assessing the technical debt in a class, there is enough agreement that developers can use the measures to target specific classes for remediation. So, while there is no common method for measuring technical debt, the technology is getting closer and IT can use technical debt measures to improve applications. In particular, we believe the CISQ measure offers the best approach since it is an open standard based on severe weaknesses included in the Common Weakness Enumeration.

1 T. Amanatidis, N. Mittas, A. Moschou, A. Chatzigeorgiou, A. Ampatzoglou, & L. Angelis (2020). Evaluating the agreement among technical debt measurement tools: Building an empirical benchmark of technical debt liabilities. Empirical Software Engineering, 25 (5), 4161–4204.

Filed in: Technical Debt
  This report describes the effects of different industrial factors on  structural quality. Structural quality differed across technologies with COBOL  applications generally having the lowest densities of critical weaknesses,  while JAVA-EE had the highest densities. While structural quality differed  slightly across industry segments, there was almost no effect from whether the  application was in- or outsourced, or whether it was produced on- or off-shore.  Large variations in the densities in critical weaknesses across applications  suggested the major factors in structural quality are more related to  conditions specific to each application. CRASH Report 2020: CAST Research on  the Structural Condition of Critical Applications Report
Open source is part of almost every software capability we use today. At the  very least libraries, frameworks or databases that get used in mission critical  IT systems. In some cases entire systems being build on top of open source  foundations. Since we have been benchmarking IT software for years, we thought  we would set our sights on some of the most commonly used open source software  (OSS) projects. Software Intelligence Report <> Papers
Making sense of cloud transitions for financial and telecoms firms Cloud  migration 2.0: shifting priorities for application modernization in 2019  Research Report
Bill Curtis
Bill Curtis Senior Vice President and Chief Scientist
Dr. Bill Curtis is Senior Vice President and Chief Scientist of CAST and heads CAST Research Labs. With 40 years in the software industry, Dr. Curtis is also Executive Director of the Consortium for IT Software Quality (CISQ) and has co-edited several ISO 25000 software quality standards. He is best known for starting the Capability Maturity Model (CMM) and People CMM at the Software Engineering Institute at Carnegie Mellon University.
Load more reviews
Thank you for the review! Your review must be approved first
You've already submitted a review for this item
|
()