Closing the Gap Between the Testing and Production Environment
Why is it hard to close the gap between the testing and production environments? In other words, why is it hard to increase your confidence level that once something is tested it will work fine in production?
1. Some problems only emerge at a certain scale or certain permutation of type and sequence of end user inputs -- even if you could, you don't know this test condition so you don't know how to set it up as a test condition. Because there is no algorithm for this, testing is always hypothesis driven, and there is no mechanical way to systematically reduce the test space. The problem is even harder because an application typical functions with a whole host of other applications in the production environment. Even if there are no functionality dependencies, there can be application server, database server, and middleware configuration dependencies. Bottom line: The right permutations of loading, configuration settings, and user input patterns is impossible to systematically isolate.
2. Information does help -- the problem is (a) there's too much of it and it's not clear what's relevant. (b) Even when you know what's relevant, it's hard to get that information when you need it and often you can only have it after it's too late.
3. Even when it comes to testing by applying loads (otherwise known as "database loading"), life is not easy: it's too hard to get vast amounts of data either due to privacy reasons, or not being able to generate the right kind of data - either it's too random or too uniform and doesn't correctly mimic the real world.
4. Infrastructure Blind Spots for Applications Folks - these are things that have a significant impact on how an application performs in the production environment and can only be fixed by changing the application; yet Applications folks are almost always wholly unaware of the need for such a change.
- Type of protocol used (especially in web applications)
- TCP window between client and server
- Number of application turns during a transaction (application "chattiness")
- Data payload transferred during a transaction
- Firewalls that the transaction has to travel through
5. In general, there is a coordination problem between Infrastructure and Applications that is due to three fundamental tensions:
- Misaligned Incentives: Apps. is rewarded for cutting edge functionality; Infrastructure is rewarded for rock-solid stability.
- Misaligned Metrics: Infra has availability and network latency and Apps has metrics of how their apps have passed all the performance tests with flying colors. The problem is that this can all be true, yet performance from a business-user's standpoint can be severely impaired.
- Misaligned Resourcing Priorities: When Apps. thinks it's time for Infra to work on their project, Infra has other priorities and vice versa.