We just finished up the 30-minute webinar where Dr. Bill Curtis, our Chief Scientist, described some of the findings that are about to be published by CAST Research Labs. The CRASH (CAST Research on Application Software Health) report for 2014 is chock full of new data on software risk, code quality and technical debt. We expect the initial CRASH report to be produced in the next month, and based on some of the inquiries we’ve received so far, we will probably see a number of smaller follow-up studies come out of the 2014 CRASH data.
This year’s CRASH data that we saw Bill present is based on 1316 applications, comprising 706 million lines of code – a pretty large subset of the overall Appmarq repository. This means the average application in the sample was 536 KLOC. We’re talking big data for BIG apps here. This is by far the biggest repository of enterprise IT code quality and technical debt research data. Some of the findings presented included correlations between the health factors – we learned that Performance Efficiency is pretty uncorrelated to other health factors and that Security is highly correlated to software Robustness. We also saw how the health factor scores were distributed across the sample set and the differences in structural code quality by outsourcing, offshoring, Agile and CMMI level.
There were many questions, so we went 15 minutes over the 30 minute timeslot to address all of them. The main point of this post is to document some of the more important questions and my summary of the answers provided by Bill Curtis, especially for those of you who could not stay on past the half hour. So, here goes:
1. If an application is better in one health factor, e.g., robustness , does it tend to also have better scores in the other areas (security, performance )?
We have a lot of data on this, some of which was shown in the webinar. Overall, what we’re finding is that most of the health factors are correlated – especially Robustness and Security. The only exception to that is Performance Efficiency, which has very low correlation to any of the other health factors.
2. Does the age or maturity of an application have correlation to its size?
Looking at the demographic age data we have, it seems that older applications are indeed larger. Certainly the COBOL applications tend to be larger than the average. But, we did not do a statistical analysis of this particular data correlation.
3. Do you track if an application is based on a commercial package (COTS) or completely from scratch?
We do, of course, for SAP-based and Oracle-based package customizations, for which we have a large dataset. For most others, it’s hard to determine whether the custom application was at some point based on a COTS package. In some cases, even the staff in that IT organization may not know this.
4. Do you have findings by industry? Or any details on technical debt? Or, by technology?
We will have findings by industry that we will be producing in subsequent CRASH report research. We are revamping our Technical Debt analysis to make it more sophisticated and that will be published in a separate CRASH research report later this year. The technology analysis is difficult, because so many software engineering rules that we’re looking at are technology specific. We decided to leave that out of this CRASH report and we will be evaluating how to accurately report that data later on in the year.
5. Does Language have a significant impact on quality or development cost?
There are many things to consider when thinking about impact of language. In general, COBOL developers have tended to be more disciplined and trained than Java developers, but that may change over time. When looking at C or C++, there are more opportunities to make technical mistakes in memory management and pointer arithmetic. But overall, it’s hard to make generalizations industry wide. Each specific organization has stronger skillsets in certain technologies. We recommend analyzing the code quality in your organization to draw conclusions for your own technology roadmap or training regimen for developers.
6. For J-EE applications are you able to differentiate the quality & productivity for applications developed using industry frameworks versus earlier generation applications built ground-up?
Yes we can. The Appmarq repository contains data on applications with a number of enterprise Java frameworks, including Struts, Spring, Hibernate and others. We did not focus on this specific question in this year’s CRASH research, but we did publish a specific report about frameworks last year based on the last dataset pulled from Appmarq. You can take a look at it here – the findings might surprise you.
7. Do you have statistics on onshore vs offshore delivery by function points?
Not yet. We plan on collecting this data over the next several years and to analyze these statistics in subsequent CRASH reports.
8. What are the current practices to leverage CAST automated function points to measure productivity of global project teams? Are there mature models to follow?
This question is not specific to the CRASH report results, but it’s an important question that we hear with increasing frequency from IT practitioners. Large scale adoption of function points for productivity measurement is a rich, detailed topic that deserves significant treatment. Bill Curtis has held full-day seminars to train on this topic several times over the last couple years and is working on a book. Some of this material has been summarized in a whitepaper-style research report that you can find here. For more information, please contact us and we can set up more in-depth workshops for your organization.
9. If we look back to the previous CRASH reports, what do we learn when comparing to the current data?
This is a tough question to answer, as we don’t do trending analysis at this point. The difference between this report and the last report is that our dataset is about double. Some of our past findings are being confirmed, where we saw the data trend but did not want to report conclusions because we did not have statistical significance. Now we have enough data points to make statements, like our conclusion about differences in shoring. We cannot, however, say that quality in the industry is getting better or getting worse as a whole. We’ll need many more years of data to make coherent conclusions on that scale.