Representative vs. non-representative measures: Bipolar disorder?


In my last post, I shared my opinion on the benefits of non-representative measures for some software risk mitigation use cases. But does that mean I am always better served by non-representative measures? Of course not.

No bipolar disorder here, just a pragmatic approach to different use cases that are best handled with some adapted pieces of information.

Risk level assessment -- from decision to action

In my previous post I wrote: "I don't need to know -- at first at least -- if I have just improved but not solved the problem at hand." When digging into the root cause of risky situations, now is the time to have a representative measure of the problem.

Let's look back at the definition of a representative measure, i.e., a measure that complies with the representation condition.

Representation condition

In the computer science area: "The condition that, if one software entity is less than another entity in terms of a selected attribute, then any software metric for that attribute must associate a smaller number to the first entity than it does to the second entity." McGraw-Hill Dictionary of Scientific & Technical Terms, 6E, Copyright © 2003 by The McGraw-Hill Companies, Inc.

Non-representative risk level indicator for quick decision making

So turning information into a measurement that can be identical, despite variations in attribute values, obviously breaches the representation condition.

Hence, getting the same "not solved" or "unworthy of marking" measurement results for two applications is not mathematically representative. As I wrote, it would help me quickly grasp the situation and avoid following the meanders of endless discussion about one billionth (1/1 billion) times more or less of this or that.

Now that I know the risk-level situation, it’s time for action.

Representative risk level indicator for action

Is a "not solved" or "unworthy of marking" measurement result enough? When it comes to improving the situation or understanding if current improvement initiatives are working, of course not.

Getting back to the example of "a component will fail its tests as long as there is one security vulnerability, even if dozen vulnerabilities were removed" from my previous post, this is now about knowing:

  • How many vulnerabilities were removed?
  • How many vulnerabilities are left? (With non-representative measures, I only know there is at least one left.)
  • At which pace are these vulnerabilities removed?
  • Are there some vulnerabilities added at the same time? (A fact that could be hidden by a higher rate of vulnerability removal.)

To answer these questions, I need a basic yet efficient "count" of added, removed, and remaining vulnerabilities.

And the winner is ...

So at the end of the day, what do I need?

I need both representative and non-representative measures, with the latter relying on the former, as representative measures are mandatory to base efficient non-representative measures. Indeed, I can benefit from a "not solved" or "unworthy of marking" measurement result only if it is based on solid facts I can access on demand. The key here is to use the right information at the right time for the right purpose.

Was the title from my last post too provocative? I guess so, but only to be sure that mathematical purity does not prevent software analysis and measurement from helping the business.

Filed in: Technical Debt
Philippe-Emmanuel Douziech
Philippe-Emmanuel Douziech Principal Research Scientist
Philippe Emmanuel Douziech is a Principal Research Scientist at CAST Research Labs and is the Head of European Science Directorate at CISQ. He has worked in the software industry for more than 20 years and is skilled at assessing software risk and quality.
Load more reviews
Thank you for the review! Your review must be approved first
You've already submitted a review for this item