From a Mathematician’s Point of View - Load and Performance Issues in the News - ...
In between big news like the Irish financial crisis and the conflict on the Korean peninsula, two little news from the realm of the Bavarian school system went mostly unnoticed by the public. “Computer-Chaos an den Berufsschulen” from November 16th and “Technik stoppt Abi-Umfrage” from November 17th, both from local Bavarian newspapers. Both messages demonstrate the scope and the possibly crushing effects of load and performance problems in modern computer systems, a topic that finds its way into the headlines of the mainstream media only very rarely.
“Computer-Chaos an den Berufsschulen” (“Computer Chaos at Berufsschulen (~tertiary colleges)”) tells about the modernization of the “pedagogical network” of the city of Munich that went terribly wrong. After the upgrading, the systems starts slowing down after just two or three students log in and sometime thereafter stops working altogether. Some 7.500 students depend on this system for their training with a variety of professional software, it will even be used for their examination tests. If nothing else works, a return to the old system will be necessary.
“Technik stoppt Abi-Umfrage” (“Technology stops examination survey”) covers the breakdown of a public website to survey the study plans of Bavarian high-school graduates. The site crashed under the onslaught of some 95.000 students within a couple of days. Now the idea is to abandon the plan of a complete survey and to do just some random sampling.
These short news messages demonstrate that load and performance problems can occur in very different setups: a large, but local application-network in one case, a website in the other one. In both cases a load that is too high leads to performance problems or outright system failure. Also in both cases, the load induced problems are mission-critical and – as publicly visible failures – especially painful. Quick solutions are costly or even unavailable.
As no more details about these two cases are given, it is of no use to speculate about the concrete reasons for these performance issues. Thus I want to share some general thoughts about possible acts and omissions that can lead to performance or load problems. The biggest and most obvious mistake is not to plan any load and performance tests at all. Given the possibly devastating effects of load and performance problems, it is plainly irresponsible not to address these risks. While theoretical considerations about load and performance can be a useful first step, unknown relations and surprises in practice are rather the norm than the exception. Therefore, in my opinion, practical load and performance testing is an indispensible part of every IT project.
Even if load and performance tests are planned, they are often scheduled towards the very end of the project, after the completion of the development phase and maybe even after functional testing. Quite frequently then, due to budget limitations or time delays, performance testing is reduced or even cancelled, thus re-inserting enormous risks into the project. Anyway, the best course of action, according to the experiences we have made at codecentric, is to incorporate performance testing from the very start of a project. As soon as there is some runnable chunk of code, it can be checked and tested for performance issues. As soon as even rudimentary use-case functionality is available, it can be load tested.
In the end it is production that matters. Thus another huge risk is a wrong test concept. The test cases and the load should be as realistic as possible to prevent unwelcome surprises. The same holds true for the test environment – testing hardware, software and their configuration – which has to be as near to the production environment as possible. Ideally, the environment for load and performance testing will be the later production environment itself.
In some cases it can be very hard to come up with a realistic load prognosis, for example for new websites with no previous experiences about customer or user attraction. Then the tests have to cover wide ranges of possible loads and their distribution over time. Additionally, the system should be flexible and scalable. These properties have to be tested, too. Of course, extensive stress testing and a flexible, scalable system are good ideas anyway, even if concrete load prognoses do exist.
Whatever the reasons for the performance problems in these recent Bavarian cases may have been, it is always costly to gain this knowledge by real-life experience. Investing the necessary time and resources into preliminary load and performance tests can help to minimize and handle these risks.