The Science of Catching Hidden Bugs

Originally published on stickyminds.com

Some of my all-time favorite bugs are spectacular crashes. They’re fun to watch and rewarding to find. But when software crashes, it’s obvious that something has gone wrong. Much more difficult to catch—and often more important—are subtle bugs that mislead users.

I’m reminded of one of the earliest database queries I created. The SQL looked right. The data it returned looked right. But when I started using the data, I noticed several inconsistencies that made me suspicious. Sure enough, I had made a mistake in my query that resulted in real-looking, bogus data being returned. Since I created this query for my own purposes, it was easy for me to spot the problem. But what if it had been part of a bigger system? How could another tester have caught my error?

I believe that the answer lies in the scientific method of inquiry.

Scientists are an inherently curious bunch. They formulate questions, then design experiments to find answers. Those answers often evoke additional questions. The search for knowledge goes beyond a simple “pass” or “fail” test result. “I don’t know yet,” becomes a third possibility.

Designing tests as experiments takes time and in-depth analysis, but the potential rewards are enormous. What’s it worth to discover that the software produces inaccurate results? In a complex system designed to convert raw data into usable knowledge, it can mean the difference between happy customers and lawsuits.

A Series of Experiments

Let’s take an example. If you were testing software that produces random numbers, how could you verify that the numbers are truly random? The facile answer is that you gather a large sample of generated numbers and analyze it both for the distribution of the numbers and for patterns. I thought that answer was sufficient until I began experimenting.

I created a program that produces a random integer between 0 and 9 (MyRand 1.0). You click a “Randomize!” button and it gives you a number, just like rolling a 10-sided die. I decided to test my creation using my standard answer of how to test a random number generator.

Experiment #1: Collect a large sample and analyze it for distribution and patterns.

I gathered a sample of 10,000 generated numbers. When I charted the results in a spreadsheet, I didn’t see any patterns. However, when I counted the number of times each integer came up, I found that some numbers occurred twice as often as others did. Oops. Not very random.

Did MyRand 1.0 pass or fail? I decided I didn’t know yet. Perhaps the numbers that came up more often on the first test run would come up less often on the second.

Experiment #2: Re-run Experiment #1 and compare the results to see if the distribution changed.

The result of Experiment #2 was a definitive “fail.” If MyRand 1.0 were a Las Vegas betting game, I’d bet on the number 9; it came up a lot. I went back to the code and discovered my mistake. I fixed the bug with MyRand 2.0, then re-ran my first two experiments.

MyRand 2.0 succeeded in producing an even distribution of numbers. Number 9 was no longer the odds-on favorite. But then I realized I needed to ask another question. Would the results be different if I restarted MyRand before generating each number? I’d gathered the 10,000 integers without exiting MyRand. Perhaps exiting and restarting the program would change the results.

Experiment #3: Exit the program between each “Randomize!” while gathering the 10,000 numbers.

I discovered that the chances of getting a particular number changed dramatically depending on whether or not you had just started MyRand. If you left MyRand running, you had an even chance of getting any given number. However, if you’d just started MyRand, 9 was once again much more likely to come up. Another definitive “fail.” Back to the drawing board.

Tips for Experimenting

The MyRand experiments demonstrate the power of asking, “What if…?” Adopting an experimenter’s mindset adds a new dimension to testing. When I’m experimenting, I like to:

  • Gather lots of data. Notice that my sample for the MyRand tests was rather large: 10,000 numbers.
  • Perform the same tests under different conditions. By restarting MyRand before each “Randomize!” I learned that a small change had a huge, unexpected impact on the test results.
  • Perform a series of slightly different tests and analyze the patterns of results. When I began using the mal-formed SQL query I mentioned earlier, inconsistencies in the data alerted me to a problem. What patterns or inconsistencies might alert you to problems in your system?
  • Finally, question the tests. If the system were producing incorrect results, would the tests make the failure obvious?

Scientists formulate hypotheses, design experiments, and compare predictions to observations. As testers, we also follow the scientific method. We hypothesize that we’ll find a bug if we exercise the software in a particular way. By taking our method a step further, using test results to suggest new tests, we increase our chances of exposing camouflaged bugs and gaining a deeper understanding of the complex systems we test.