For enterprise IT applications, it’s all about processing data defined through multiple types and in large volumes of code. Then the number of lines of code devoted to data handling is high enough to encapsulate a large number of software bugs that are waiting for specific events to damage the IT system and impact the business.
Even if we can say that a bug is a bug and it will be fixed when it occurs, bugs related to data handling should not be underestimated and this for several reasons:
- Such bugs are generally not easy to detect among the millions of lines of code that constitute an application. They can be constituted by only one statement that is defined elsewhere, making some specificities in using it not immediately visible. They can also result from the execution of a given control flow associated to a given data flow.
- Some of them can be there for a long time and will never occur. The problem is to identify which ones belong to this category to focus on others.
- They can be activated by the conjunction of specific conditions that are not easy to identify.
- When the issue occurs, the impact for business data can be severe: applications can stop, data can be corrupted, and end users and customer satisfaction can decrease.
- Consequences are not always clearly visible and, in this case, few users detect them.
Problems are distributed
Issues can be hidden everywhere in application code. Risk management methodologies can help select the most sensible application areas and reduce the scope of the search. However, in most cases, detecting such potential issues requires the ability to check the types and structure of the data flowing from one component to another, or from a layer to another, as well as the algorithms implemented in your programming language of choice. This spells troubles for everyone.
Why does a bug activate suddenly?
There are different factors that contribute to activating a bug:
- Probability increases with the number of lines of code.
- The more a component is executed, the more its bugs can be activated.
- The more you modify the code, the more likely an unexpected behavior can occur.
- Low decoupling between data and treatments makes any changes on data impact the code.
- Market pressure that stresses the development team. Working quickly is often a good way to create new bugs and activate existing ones!
- Algorithm implementing business rules can be complex and distributed over multiple components, fostering the occurrence of bugs.
- Functional data evolutions are not always taken into account in whole application implementation and can make code that is working well run in an erratic way.
The biggest challenge comes when several factors occur at the same time – a difficult challenge for any development team!
The list of situations that can lead to troubles related to data handling is not short. For instance, database access can be made fragile when:
- Database tables are modified by several components. Data modifications are usually ruled by the use of specific routines to update, insert, and delete a specific API or a data layer that is fully tested to maintain data integrity.
- Host variable sizes are not correctly defined compare to database fields. Some queries can get a volume of data that is higher than expected. Or a change is made in the database structure and it has not been propagated to the rest of the application.
When manipulating variables and parameters, potential issues can be:
- Type mismatches are generally insidious cases. For example, it can occur with implicit conversions that occur between two compatible pieces of data, such as the ones found in different SQL dialects, injecting incorrect values into the database. Similar situations can also be found in COBOL programs when alphanumeric fields are moved into numeric fields, leading to abnormal terminations if the target variable is used in calculation or simply in a computational format. Improper casts between C++ class pointers (ex: a base class to a child class) can lead to lost data and to data corruption propagated through I/O.
- Data truncation when no control of variable size is done when moving one into another. Part of the value can be lost if the target variable is used to transport the information.
- Incoherencies in functions or program calls between arguments sent by the caller and expected parameters. This can occur when a change done in the function or program interface has not been ported in all callers, making them terminate or corrupt data.
What about consequences?
Unfortunately, there is more than one type of consequence when such bugs activate. One of the big risks for the application is related to the corruption of the data it is manipulating -- the worst case being when corruption is spreading throughout the IT system. Generally, this impacts users and the business.
I remember such situations with a banking application. Everything was working fine when the phone rang: "Hi, the numbers on my weekly reports don’t look right. I checked but it seems there is a problem. Can you check on your side?"
Well, we searched which programs generated the report but we did not find any interesting information. We checked its inputs and we found incorrect values. Then we looked at the program that produced these inputs, and finally we found the cause of the problem in a third program -- a group of variables that were not correctly valuated.
Fortunately, the problem was detected and fixed. A more critical situation happens when very small corruptions install silently and insidiously over the IT system. They are too small or too dispersed to be pointed out. For instance, some decimal values that are improperly truncated or rounded can seem like a small issue, but in the end, the total can be significant!
Another consequence is related to application behavior. Bad development practices can rapidly lead an application to erratic behavior and sometimes termination. At last, some issues, like buffer overrun, can even lead to security vulnerabilities if the data is exposed to end users, especially in web applications.
Manual search ...
Issues related to data handling are rarely discovered and anticipated when they are sought through manual and isolated operations only. The volume of code to look at, the number of data structures to check, the complex business rules to take into account, and the bug subtlety (that sometimes seems to be diabolic!) are serious obstacles for developers who cannot spend too much time to try and fix problems that might never occur.
... or automated system-level analysis?
The most efficient way to detect these types of issues is to analyze the whole application software with tools like CAST AIP to correlate findings concerning data structures with code logic. That can establish who calls who in the code, and can introspect components interacting in the data flow. Thus, the issue detection can be carried out faster, helping developers secure the code. It can be automated to regularly check the applications without disturbing the development team’s activities, allowing them to manage prevention at a lower cost.