Exploratory Testing – chapter from a forthcoming book

Jim Shore and Shane Warden are writing The Art of Agile Development to be published by O’Reilly in 2007.

I was honored when they asked me to be a guest author and write a chapter on Exploratory Testing on XP projects. It’s ready for review. In true Agile style, promoting visibility and seeking feedback, Jim and Shane have made much of the book available for public review prior to publication. And I’d like to know what folks who read my blog think of the chapter.

You can find it here http://jamesshore.com/Agile-Book/exploratory_testing.html

Testing Triangles: a Classic Exercise Updated for the Web

I answered Matthew Heusser’s Testing Challenge in my last post.  I figured it was my turn to issue one.

The “Triangle Program” is a classic testing puzzle first published in Glenford Meyer’s The Art of Software Testing in 1979. In the 1979 book, Glendford Meyer envisioned the program using punchcards for input and output. I’ve rewritten the program for the modern age: here’s a JavaScript version.

This version of the program takes as input three numbers representing the size of the sides of a triangle. When the user clicks “Draw” the program draws a picture of the triangle with the size of the sides shown in proportion and also displays the type of triangle.

Given that description, and whatever wacky (or realistic) test cases you can imagine, I challenge you to go forth and test!  And if you feel so inclined, I invite you to report bugs you find here in the comments.  (Remember to document your bugs well enough that someone else can reproduce them.)

I’m a Sucker for a Testing Challenge

Matthew Heusser sent me an email this morning. “There’s a testing challenge for you up on my blog,” he said.

I’m supposed to be doing a long list of stuff today in preparation for my upcoming trip. But I’m a total sucker for testing challenges. So here’s my answer.

Matthew wants suggestions for how to tackle automated acceptance tests for code whose job is to transmogrify data created by System A so that System B can use it. As Matthew says in his challenge, “…the requirement is to take one set of black-box data, and import it into another black box. We can test the data file that is created, but the real proof is what the second system accepts — or rejects.”

Having both written and tested code to bridge systems, I agree with Matthew’s assessment that FIT doesn’t quite fit. Carefully crafted test data is great for unit testing in this situation, but it isn’t sufficient to determine how well the bridge code will work in production.

Now because of the way Matthew framed this challenge, I am assuming that the bridge code is being unit tested. And I assume the unit tests are automated, and that they use carefully crafted test data intended to catch problems like an apostrophe (‘) in the data causes the bridge code to crash or throw an SQL error, or an ampersand (&) causes data truncation. That kind of test does not require end-to-end system tests since it’s asking questions about the bridge code and not the interaction between the bridge code and the target application.

So my answer focuses on automated end-to-end acceptance tests.

Acceptance Test #1: Can the Target Application use the Transformed Data?

Rather than using something–whether FIT, Fitnesse, or some other acceptance testing framework–that thinks of the world entirely in terms of discrete test cases with hard coded test data, I’d want to try a batch processing approach with verification checks that look for patterns of badness.

So given a snapshot of the production data (Prod DB) and a Target App that’s supposed to use the transmogrified data, the test program would:

  1. Call the code that transforms the data using Prod DB as the data source.
  2. For each record in the data, perform basic actions in the Target App and log the results.

The basic actions I’d want to perform include the RUD part of CRUD: read, update, and delete. Let’s imagine we’re talking about a billing system, and the imported data includes account information. I’d want to view the imported accounts in a list, view them in the detail screen, view them in a report, edit them, and delete them. Why all these actions? Because imported data can look fine, but the software may behave differently with imported data than with data entered through the software’s UI.

The program would compare the expected app behavior and appearance of the data against the actual results record by record, logging discrepancies as it went. In particular, it would look for any evidence of data corruption or truncation, as well as unexpected errors and crashes.

BTW, if I could, I would use Ruby for this. Jeff Fry’s on-the-fly test method generation in Ruby technique would be a particularly good way turn the production data snapshot into a set of test cases automagically. Jeff presented this technique both at AWTA and at the DT/TD Summit.

I see three key challenges with this approach:

  1. You have to be able to drive the Target App, and you’re not going to be able to get testability hooks added. Fortunately, that Target App probably isn’t changing every two weeks. But it may take some effort to write test code that can drive the Target App.
  2. The production data snapshot may or may not contain examples of all the interesting test conditions. So I definitely want to test with a snapshot of the production data, but it may not be sufficient. For good measure, I’d probably throw in some additional test data with carefully crafted data conditions, like interesting dependencies (customer with 0 invoices, 1 invoice, multiple invoices) or data values (the aforementioned ampersands and apostrophes, etc.). See the Test Heuristics Cheat Sheet for a more comprehensive list of the kinds of conditions I’d want to exercise in the acceptance tests.
  3. This approach can be SLOW to execute. The production data could involve gigs worth of records. To increase the feedback rate, the test program should do incremental logging so you can see as soon as you have a suspicious result.

Acceptance Test #2: Bridge Operations

Just because the bridge code can transform data from a Source Application to a Target Application doesn’t mean it’s ready for production quite yet. There are numerous other things that can go wrong besides corrupting or truncating data.

Perhaps as part of the acceptance test program I described above, or perhaps in a separate test program, I would automatically check for:

  • Performance over time. If it takes 26 hours for the bridge code to handle a day’s worth of data, there’s a problem. The bridge code will fall behind and never catch up. I’d also look for evidence that per-record-processing time remains relatively constant over time: the 3827th record should be processed in approximately the same amount of time as the 1st (allowing for variations in the actual data, of course).
  • Stability/reliability over time. One of the first bits of bridge code I ever wrote worked great…for the first several hundred records. Then it fell over dead. Why? A simple coding error: I opened one file handle and closed another. Oops. After several hundred times through, I had hundreds of open file handles and the operating system reported a “too many file handles open” error, at which point my code fell over dead. Similarly, memory leaks and thread handle leaks can cause problems. I’d want my automated acceptance test to monitor such system resources.

How’s That?

Since I shouldn’t be writing all this anyway, I’ll stop there.

So Matthew, you said in your challenge: “…more than half of the people of which I ask this give an answer that I believe to be unsatisfactory. Can you do better?”

How’d I do?