Application Modernization – The Database (Part 1)

May 25, 2023 | IT Thought Leadership Application Modernization – The Database (Part 1)

As the pace of business-driven change and cloud adoption accelerates, the state of long-running critical applications, sometimes referred to as ‘legacy’, comes to the fore. They may require significant change–new functionality, complex integration, refactoring for cloud, or some other form of modernization.

The problem, of course, is that they span multiple technologies and millions of lines of code and are typically built for environments that are very different than the target destination.

Let’s look at the alternatives. According to Gartner, in general, projects can be classified according to difficulty and value of the effort, both going from lowest to highest as listed below: 

  1. Encapsulate: Leveraging and extending application functionality by encapsulating its data and functions, making them available as services via APIs. 
  2. Rehost: Redistribution of application components or of the application itself on another infrastructure (physical, virtual, or cloud) without modifying its code, characteristics or functions. 
  3. Replatform: Migrating to a new runtime platform, making minimal changes to the code, but not to the code structure, features, or functions. 
  4. Refactor: Restructuring and optimizing existing source code (but not its external behavior) to remove technical debt and improve non-functional attributes. 
  5. Rearchitect: Modifying source code to move it to a new application architecture to take advantage of new and improved capabilities. 
  6. Rebuild: Revisit the architectural design or rewrite application components from scratch while preserving their scope and specifications. 
  7. Replace: Elimination of all previous application components and their replacement, while taking new needs into account. 

Picture1

Example of automatic classification against "5 Rs" performed by CAST Highlight 

In several of those scenarios it is required, or at least advisable, to also review or redesign the database, which often contains a large part of the application logic in the form of stored procedures, functions, and/or views. 

Even if the database is not completely redesigned, it may be worth optimizing it, in terms of data access safety, changeability, reliability, and performance efficiency.

Below we will consider some of the important technical aspects that must be carefully evaluated when making significant database changes or moving to the cloud to maximize the value of the overall application modernization effort, in terms of compatibility, performance, space requirements, and SQL injection risks. 

Avoid SQL queries that no index can support for artifacts with high fan-in 

Determine the artifacts with a high fan-in containing at least one SQL query that doesn't use a table's indexes. SQL queries that don't use table indexes are extremely dangerous to application performance. In such cases, each execution of the query will result in a full table scan, which takes a long time. When artifacts have high fan-in, the risk is greater. The workaround is to use indexed columns in the WHERE and HAVING clauses. If this is not possible, create a secondary index for these columns. 

Avoid SQL queries on XXL tables that are not backed by any index  

Determine the artifacts containing at least one SQL query on an XXL table that doesn't use the indexes of the XXL table. XXL tables are extremely large tables that contain a huge amount of data. Also in this case, the same considerations made for the previous metric apply, in relation to execution times and the complete scan of the table. 

Avoid independent "exists" clauses 

Determine the SQL objects using "exists" that are independent (not referring to the parent object). The metric is used to isolate performance and disk space issues. 

Avoid having tables without a primary key 

Determine the list of tables and materialized views without primary key constraints. The presence of primary keys guarantees the unique identification by index of each single record, avoiding scanning the whole table. 

In relational database design, a candidate key is just a unique identifier. Next, a primary key is a candidate key that has been spotted to uniquely identify each row in a table or materialized view. A unique key or primary key encompasses a single column or a set of columns. Two distinct rows in a table or materialized view cannot have the same value (or combination of values) in those columns. 

Depending on the design, a table or materialized view can have arbitrarily many unique keys, but at most one primary key. 

In part two we will look at different aspects of database optimization from the standpoint of data access safety and address how you can automate the extraction of this type of software intelligence.