Is There Any Agreement on Measuring Technical Debt?

Nov 6, 2020 | Is There Any Agreement on Measuring Technical Debt?

Want to get a roomful of software developers arguing for hours? Ask them how to measure technical debt. The first hour will be spent on arguing about the definition, and when they inevitably cannot agree on a definition, they will move on to measuring the thing they cannot define. This debate has raged on for decades.

In 2016, I was invited along with 40 or so other scientists studying technical debt to spend seve

ral days in Dagstuhl, Germany finally resolving the issue. After many hours of presentations and debate, all we resolved was that Germans brew excellent beer. Two factions emerged from the conclave: an academic faction believing that technical debt could only be defined as conscious sub-optimal design decisions that must be fixed later; and an industrial contingent believing CIOs don’t care whether a flaw was intentional or not -- if they have to pay to fix it, it is technical debt.

I am of the industrial persuasion. I worked with colleagues in the Consortium for Information and Software Quality (CISQ) to develop a standard for measuring technical debt based on estimating the time required to fix each of the CISQ weaknesses included in the four CISQ Automated Source Code Quality Measures. Philippe-Emmanuel Douziech developed a weighting method to adjust the effort for the difficulty and complexity of the environment in which the weakness had to be fixed. So we adjust the effort to fix each CISQ weakness detected in an application by the weight generated from its environment in the system, and then sum it up across all the weaknesses to develop an estimate of the technical debt in the application. The effort can then be denominated in dollars, Euros, rupees, or your favorite currency. However, this is only one of the many proposals for measuring technical debt.

Six computer scientists at three universities across Greece decided to study how much agreement there was between different commercial tools that provided a measure of technical debt1. They analyzed 50 open source programs, 25 in Java and 25 in JavaScript, with three commercial technologies: CAST’s Application Intelligence Platform, SonarCube, and Squore. They compared how each technology ranked the technical debt of each of the classes in the 50 programs. Their results indicated that there was statistically strong agreement among the three technologies on the ordering of the classes based on their amount of technical debt. This was encouraging since each of the technologies measures technical debt differently and assesses different collections of weaknesses in developing their measure.

Their conclusion was that while there is no agreement on exactly what elements should be measured to calculate a measure of technical debt, different ways of measuring nevertheless come to generally similar conclusions regarding which classes suffer the most debt. While the technologies do not always agree in assessing the technical debt in a class, there is enough agreement that developers can use the measures to target specific classes for remediation. So, while there is no common method for measuring technical debt, the technology is getting closer and IT can use technical debt measures to improve applications. In particular, we believe the CISQ measure offers the best approach since it is an open standard based on severe weaknesses included in the Common Weakness Enumeration.

1 T. Amanatidis, N. Mittas, A. Moschou, A. Chatzigeorgiou, A. Ampatzoglou, & L. Angelis (2020). Evaluating the agreement among technical debt measurement tools: Building an empirical benchmark of technical debt liabilities. Empirical Software Engineering, 25 (5), 4161–4204.