01 November 2008

Test Automation For Complex Systems Continued

In a previous post I discussed some innovative methods that had been developed to test a software product line comprising 3M lines of C/C++ server code, 10 client apps and 300 developers. In that post I discussed: 
  • A middleware layer against which the tests were run
  • The benefits of getting started early on a prototype solution
  • A CruiseControl-like build+test framework that implemented continuous builds and optionally tested each code check-in
  • Using VMware Lab Manager to dramatically improve test reliability, test hardware usage efficiency and flow of tests through the automated build+test system
  • Using methods from Experimental Design to dramatically improve test efficiency

Some issues remained

  • Testing needed to become yet more efficient. At this time the entire server still needed to be tested comprehensively on any code check-in that could affect the entire system. The full system had not been broken down into sub-systems that could each be tested comprehensively with small subsets of the full system test suite.
  • We needed to improve detection of concurrent defects such as race conditions and deadlocks. This was (and is) a well-known problem in software development. Faults found in the field were often produced under stress and timing conditions that were difficult to reproduce in testing .
  • The test system needed to find bugs without continual re-writing or re-tuning of tests.

Concurrent Testing
As developers of the server code being tested, we knew the fastest way of flowing jobs through the system. We also had a good idea of where concurrent defects were likely to occur. We developed a single solution to these two problems by refactoring our test programs into a set of threads that exercised different parts of the server. In their standard modes the new multi-threaded test programs maximized throughput by keeping all stages of all server processing pipelines full. This by itself unmasked many concurrent bugs by exercising all major threads in the system while efficiently exploring many combinations of input parameters. We then added code to trigger all the user-switchable state transitions (abort, cancel, etc) in the server's processing pipelines and control loops to call these rapidly and with different timings. This in turn uncovered many more bugs.

Finding New Bugs without Changing Tests 
Two well-known behaviors of software development organizations are

  • Defects found by tests tend to get fixed.
  • Code that is tested often tends to have fewer defects.

The inevitable result of this is that static tests will tend to find fewer bugs over time as the defects they find get fixed and the code they exercise gets debugged. The consensus among the testers in our organization was that they found at least 80% of their bugs through exploratory testing and less than 20% through running their standard test matrices. Their exploratory testing included such things as testing recently changed features and testing functionality they had observed to be fragile in the past.

Effectiveness starting at 20% then tapering off was not what we had in mind for our test system. We had partially addressed this initial design by:

  • Allowing tests to be based on server configuration. E.g. meta-data attributes were a key test parameter so the test programs had an option to read the all the meta-data keys and their allowed values from a server and generate test cases from them. This distributed test coverage evenly between previously and newly exposed code paths. While this was less directed toward testing new code paths than the QA department's exploratory strategy of targeting recently exposed changes, it was much more effective that continually re-testing the same well tested code paths as a purely static test would have done.  It also had the nice behavior that two servers with identical code and configuration would always be tested the exact same way while any change to the configuration or middleware layer would result in a different test with good coverage of the changes.
  • Having a pure exploratory mode where a crawler found new test files and a time seeded pseudo random number generator created truly random test parameters.

The configuration based testing mode was widely used. It continued to find a lot of bugs as long as code or configuration changed.
The pure exploratory mode was seldom used other than for creating data for the statistical analysis module described in the previous post.

Some of the things we learned from this work were:
  • It was important to fit the solution to the problem. IMO the one thing that most characterized our approach was that we did not start with a solution in mind. We continually re-analyzed the problem and the efficacy of our partial solutions to it and ended up with solutions that few people had expected.
  • We created a design verification system rather than automating the quality assurance department's manual testing
  • We emphasized bug find+fix volume over bug prioritization
  • We automated at the middleware layer rather than with client app button pushers
  • Virtualization was a key component of our solution
  • Statistical analysis played a key role in the design of the solution
  • Automated tests continued to find many bugs even after code stopped changing. We investigated this and found that new code paths were being exposed to our tests by changes in configuration made after code freeze. As a result the company started treating configuration changes the same as code changes, requiring them to asymptote to zero before shipping.  
  • It can be difficult to fit solutions to a large problem. Big problems often require big solutions and building big solutions can be expensive. For example, we discovered that CruiseControl brought us little value so we had to invest a lot of money in developing our own CruiseControl-like automated build+test system.
  • Integration is a major cost in big solutions. Few of the off-the-shelf tools we looked at worked well together. A large fraction the cost of this work went on integration. The major exception to this was Lab Manager and the VMware tools supporting it, which integrated well with all parts of our system.
  • The value of an effective IT department. These people understood integration of heterogeneous systems, rolling out hardware and software and meeting service level requirements in a way that we developers did not. The VMware products, which were designed for IT departments, had the same qualities.

No comments: