Software engineering practices have changed a great deal over past couple of decades. CIOs and software architects are ever more concerned about the nonfunctional aspects (robustness, security, maintainability, transferability etc.) of software applications as much as they are concerned about the functional counterparts. While these are more strategic in nature, in this article, I thought of sharing my own experience - an engineer’s view of how a badly written code causes software engineering mayhem.
It was back in 2008. Having completed my graduation, I joined one of the most prestigious organizations as a software engineer. My first assignment was to work on a business-critical IT system - a system used by more than 10,000 users on a daily basis. A critical defect in this system was a major pain point for all the stakeholders (from an engineer to the CIO) as it directly impacted business. A few days into my first assignment, I overheard from my colleagues that the system was not in great shape, that of late there were multiple performance glitches reported by users. I could sense the tension all around and was wondering whether it was a punishment being assigned on the project!
One month down the line, when I was still trying to get hold of things, I came across the monster - a priority one defect in production. I finally came to know what a priority one defect means. It was complete chaos. People were running around here and there. Leads, Managers, Architects were on never ending calls. People tried hard to fix the issue for a couple of days with no luck. It was not getting recreated in the test environment. The situation was scary. After a couple of days in one evening, I along with three more colleagues were called by my business unit head in his cabin. He just said:
“There is a production defect. I want you to fix that. I am dividing you into two teams. One will work on normal shift, transfer the knowledge to the other team, who will work at night. The only way to get out of it is by fixing the issue.” Unfortunately, I was chosen to work at night and can’t really describe what a nightmare it was for me.
Five 24-hour shifts later, we finally figured out the issue. It was due to a bad code. Somebody had written a database query within a loop, which was causing a DB lock and it was only happening in a very rare case. We had to change the entire logic and that took a lot of time to fix the issue (5 days for a production P1 defect is quite huge). Subsequently, we worked very hard and were able to fix all the performance issues on time, most of them were due to bad coding practices. The system became stable within 6 months or so. That was the start of my professional career.
Over the past decade, a lot people across the globe have experienced similar kind of horror in their career. This also underscores the reason why nonfunctional improvement has become one of the most important topics of discussion among the software engineering practitioners. For good, we have improved. There are a few software intelligent tools available in the market which can identify such system-level structural defects. These tools, along with helping CIOs to protect and improve their IT infrastructure, are surely making life more comfortable for all software engineering practitioners.