Every time a Wall Street firm’s trading software goes rogue, or an international bank’s back-end IT hiccups, leaving customers stranded, an audible groan rises from our offices. We’re frustrated because we’re tired of seeing common -- and preventable -- mistakes slip through the cracks in what are supposed to be world-class software development organizations.
Our goal is to let these organizations know just how easy it could be to improve the resiliency, security and performance of their enterprise systems. To that end, we’ve released an open letter to Chris Isaacson, the Chief Operating Officer at BATS Global Markets offering a five-point plan, as well as some friendly advice, on how to avoid future errors.
There's no end of examples that show it only takes seconds for a faulty software system to squash your market value. Don’t make the same mistake twice. If I might borrow some wise words from history -- “Those who cannot remember the past are condemned to repeat it.”
The full text of our open letter is below:
TO: Chris Isaacson
Senior Vice President, Chief Operating Officer
BATS Global Exchange
FROM: Philippe Guerin
Head of Solution Engineering for North America
DATE: February 5, 2013
SUBJECT: Recent BATS Software Glitch
The recent events at BATS were regrettable but avoidable. Unlike last year outages, which cost your CEO his job and cancelled your IPO, we do not want these latest IT issues to become forgettable. Enough is enough. As experts in software analysis and measurement, we wanted to offer some friendly advice on how to avoid a recurrence of this latest error which will no doubt prove to be a costly issue.
If, as the Wall Street Journal alleges, these structural code defects '...went undetected for four years, violating securities laws and allowing hundreds of thousands of bad trades to be executed’, there is need for some urgent action. Here is our five point plan:
- 1. Know your existing code - Perform structural quality inspections for all business critical code. This should happen once for the existing live system and during early development for all new code.
- 2. Enforce discipline - Create and adhere to a process of test cases for all new integrations, sorted by transaction-based risk priority.
- 3. Think big picture - Use application quality analysis to hunt down potential code quality issues at the system level, not just the unit level. Functional tests, Load and Stress testing are complementary to check for scalability on 'Go Live', but not adequate protection to ensure system robustness.
- 4. Don't overcommit - Ensure there are no shortcuts in your 'commitment process'. Have sufficient project management so your developers are not following a chaotic 'death march'.
- 5. Go configure - All of the work above will not bring long term benefit, unless you retain strict configuration control of the source code of your system, and mandate structure analysis along the way.
Not unsurprisingly many of the world's most IT-dependent organizations including Governments, major Financial Institutions and Telecom providers use CAST to help solve the sorts of issues which have dogged BATS in the last year. Feel free to reach out to us, or you can use internal resources too, but please believe me when I borrow some wise words -- "Those who cannot remember the past are condemned to repeat it." Twice already is too much.