Rigorous Exploratory Testing

Exploratory Testing is a style of testing in which you explore the software while simultaneously designing and executing tests, using feedback from the last test to inform the next. Exploratory Testing helps us find surprises, implications of interactions that no one ever considered, and misunderstandings about what the software is supposed to do. Cem Kaner first coined the term “Exploratory Testing” a couple decades ago, though exploratory or “ad hoc” testing has been around longer than that.

Recently, I was talking with a group of XP developers about using Exploratory Testing on XP projects to augment the TDD unit tests and automated acceptance tests.

“Oh, Exploratory Testing,” said one of the developers, “that’s where the tester does a bunch of wacky, random stuff, right?”

“Not exactly,” I replied, a little dismayed that myths about Exploratory Testing still abound after so many years. What looked wild to that developer was actually the result of careful analysis. He held a common misconception about Exploratory Testing: he noted the lack of formality and apparently arbitrary sequences and actions, and he concluded that Exploratory Testing was an exercise in keyboard pounding rather than a rigorous approach.

Two key things distinguish good Exploratory Testing as a disciplined form of testing:

  1. Using a wide variety of analysis/testing techniques to target vulnerabilities from multiple perspectives.
  2. Using charters to focus effort on those vulnerabilities that are of most interest to stakeholders.

Variety and Perspectives

The old saying goes, “If all you have is a hammer, everything looks like a nail.” If the only testing technique a tester knows is how to stuff long strings into fields in search of buffer overflow errors, that’s the only kind of vulnerability that tester is likely to find.

Good test analysis requires looking at the software from multiple perspectives. Field attacks like entering long strings or badly formatted dates, or entering data that’s the wrong type altogether (strings where a number should be) are one approach. Other approaches include:

  • Varying sequences of actions
  • Varying timing
  • Using a deployment diagram to find opportunities to test error handling by making required resources unavailable or locked, or to break connections
  • Deriving transition and interrupt tests from state models
  • Using use cases or analyzing the user perspective to identify real-world scenarios
  • Inventing personae or soap operas to generate extreme scenarios
  • Using cause-effect diagrams to test business rules or logic
  • Using entity-relationship diagrams to test around data dependendencies
  • Varying how data gets into and leaves the software under test using a data flow diagram as a guide

Each of these types of testing reveal different kinds of vulnerabilities. Some check for problems related to error handling while others look at potential problems under normal use. Some find timing problems or race conditions, others identify logic problems. Using a combination of analysis techniques increases the probability that, if there’s a problem, the testing will find it.

Charters and Focus

Because good test analysis will inevitably reveal more tests than we could possibly execute in a lifetime, much less by the ship date, we have to be choosy about how we spend our time. It’s too easy to fall into a rat hole of potentially interesting sequence and data permutations and variations.

There are a variety of test selection strategies we can employ, such as equivalence analysis and all-pairs. But even before we begin combining or eliminating test cases, we need a charter: we need to know who we’re testing for and what information they need. Exploratory Testing charters define the area we’re testing and the kind of vulnerabilities we’re looking for. I’ve used charters like these in past Exploratory Testing sessions:

  • “Use the CRUD (Create, Read, Update, Delete) heuristic, Zero-One-Many heuristic, Some-None-All heuristic, and data dependencies to find potential problems with creating, viewing, updating, and deleting the different types of entities the system tracks.”
  • “Exercise the Publish feature in various ways to find any instances where a valid publish request does not complete successfully or where the user does not receive any feedback about the actions the Publish feature took on their behalf.”
  • “Use a combination of valid and invalid transactions to explore the responses from the SOAP/XML interface.”

Notice that each charter is general enough to cover numerous different types of tests, yet specific in that it constrains my exploration to a particular interface, feature, or type of action.

Variety and Focus Yield Consistently Useful Information

Exploratory Testing is particularly good at revealing vulnerabilities that no one thought to look for before. Because you use the feedback from each experiment to inform the next, you have the opportunity to pick up on subtle cues and allow your intuition to guide you in your search for bugs.

But because Exploratory Testing involves designing tests on the fly, there’s a risk of falling into a rut of executing just one or two types of tests (the hammer/nail problem) or of discovering information that’s far afield from what your stakeholders need to know. Focusing with charters, then using a variety of analysis techniques to approach the targeted area from multiple perspectives, helps ensure that your Exploratory Testing efforts consistently yield information that your stakeholders will value.


It's All About the Variables

Let’s talk about bugs for a moment. Bad bugs. The kind of bugs that make headlines. Bugs like these:

  • From 1985 – 1987, Therac 25 radiation therapy machines overdosed patients with radiation, killing them.
  • In 1996, the Ariane 5 rocket exploded spectacularly during its first flight.
  • In 2004, the NASA Mars rover “Spirit” was inoperable for several days as it rebooted itself over and over.
  • Also in 2004, a bug in GE energy management software contributed to the devastating blackout that cut off electricity to 50 million people.

So why do I want to talk about these bugs? Because they provide fascinating examples of how variables—things we can change while testing—are sometimes subtle and tricky. Variables can be difficult to identify, and even more difficult to control. And yet, if we want to design interesting tests that will give us the information we need about vulnerabilities in our software and systems, we need to identify those subtle variables and the interesting ways in which we can tweak them.

About “Variables” in Testing

But first, let’s take a step back and talk about what I mean by “variable.”

If you’re a programmer, a variable is a named location in memory. You declare variables with statements like “int foo;” However, as a tester, I mean “variable” in the more garden-variety English sense of the word. According to www.m-w.com, a variable is something that changes. And as a system tester, I’m always alert for things I can change through external interfaces (like the UI or the file system) while executing the software.

Sometimes variables are obviously changeable things like the value in a field on a form. Sometimes they’re obvious, but not intended to be changed directly, like the key/value pairs in a URL string for a web-based application. Sometimes they’re subtle things that can only be controlled indirectly, like the number of users logged in at any given time or the number of results returned by a search. And as the bugs listed above demonstrate, the subtle variables are the ones we often miss when analyzing the software to design tests.

Horror Stories Provide Clues to Subtle Variables

So let’s consider the variables involved in these disastrous bugs.

In the case of the Therac-25 incidents, there were numerous contributing causes involved in the deaths of the patients including both software bugs and hardware safety deficiencies. This is not a simple case of one oversight but rather a cavalcade of factors. But there were some factors that were entirely controlled by the software. Nancy Leveson explains in Safeware that in at least one of the incidents the malfunction could be traced back to the technician’s entering then editing the treatment data in under 8 seconds, the time it took the magnetic locks to engage. So here are two key subtle variables: speed of input and user actions. Further in Leveson’s report is an explanation of how every 256th time the setup routine ran, it bypassed an important safety check. This provides yet another subtle variable: the number of times the setup routine ran.

The Ariane 5 rocket provides an example of code re-use gone awry. In investigating the incident, the review board concluded that the root cause of the explosion was the conversion of a 64-bit floating-point number (maximum value 8,589,934,592) to a 16-bit signed integer value (maximum value 32768). That conversion caused an overflow error, and compounding the problem, the system interpreted the resulting error codes as data and attempted to act on the information, causing the rocket to veer off course. The rocket self-destructed as designed when it detected the navigation failure. The conversion problem stemmed from differences between the Ariane 5 rocket, and its predecessor, the Ariane 4 rocket for which the control software was originally developed. It turns out that the Ariane 5 rocket was significantly faster than the Ariane 4 rocket, and the Ariane 5 software simply could not handle the horizontal velocity its sensors were registering. The variables involved here are both velocity and the presence of an error condition.

An article in Spaceflight Now explains that the Mars rover “Spirit” rebooted over and over again because of the number of files in flash memory. Every time the rover created a new file, the DOS table of files grew. Some operations created numerous small files, and over time the table of files became huge. Part of the system mirrored the flash memory contents in RAM, and there was half as much RAM as flash memory. Eventually the DOS table of files swamped the RAM, causing the continuous reboots. Note the number of variables involved, all interdependent: number of files, size of the DOS table of files, space on flash memory and available RAM.

Finally, the GE energy management software provides a cautionary tale about the problem of silent failures. As in other cases, there are numerous contributing factors in the massive-scale blackout. Everything from lack of situational awareness to lack of operator training to inadequate tree-trimming is named in the final report submitted by a US-Canadian task force. However, there are tantalizing hints in that final report that software problems contributed to the operators’ blindness to the problems with the power grid. According to the report, FirstEnergy, the company responsible for monitoring the power grid, had reported problems with the GE XA/21 software’s alarm system in the past. In his report published on SecurityFocus, Kevin Poulsen quotes GE Manager Mike Unum as pinning the blame for the software failure on a race condition that caused two processes to have write access to a data structure simultaneously. Event timing and concurrent processes turned out to be critical variables, and ones that took weeks to track down.

Looking for Variables

We may not be testing software that can heal or kill people, that blasts into space, or that manages a nation’s energy supply, but we can apply these lessons to our projects. I don’t test such mission critical systems, but the software I do work on still has variables around timing, speed, user actions, number of times a given routine or method executes, files, memory, and concurrent processes.

The final lesson in all these cases is that testing involves looking at all the variables, not just the obvious ones. So the next time you’re thinking up test cases, consider this question:

What variables can I change in the software under test, its data, or the environment in which it operates, either directly or indirectly, that might affect the system behavior? And what might be interesting ways to change them?

It’s not a simple question to answer. But just thinking about it is likely to improve your testing.



Ariane 5:

Mars Rover “Spirit”:

The East Coast Blackout of 2004: