This post started out as a quick little entry about a cool parlor trick you can do with RSpec to make it work for auto-generated test data. But in the middle of writing what was supposed to be a simple post, my tests found a subtle bug with bad consequences. (Yeah for tests!)
So now this post is about auto-generated tests with RSpec, and what I learned hunting down my bug.
Meet RSpec
In case you haven’t encountered RSpec before, it’s one of the Behavior Driven Development developer test frameworks along with JBehave, EasyB, and others.
Each RSpec test looks something like this:
it "should be able to greet the world" do greet.should equal("Hello, World!") end
I used RSpec to TDD a solution to a slider puzzle code challenge posted on the DailyWTF.
Auto-Generating LOTS of Tests with RSpec
So let’s imagine that you’re testing something where it would be really handy to auto-generate a bunch of test cases.
In my particular case, I wanted to test my slider-puzzle example against a wide range of starting puzzle configurations.
My code takes an array representing the starting values in a 3×3 slider puzzle and, following the rules of the slider puzzle, attempts to solve it. I knew that my code would solve the puzzle sometimes, but not always. I wanted to see how often my little algorithm would work. And to test that, I wanted to pump it through a bunch of tests and give me pass/fail statistics.
I could write individual solution tests like this:
it "should be able to solve a board" do @puzzle.load([1, 2, 3, 4, 5, 6, 8, 7, nil]) @puzzle.solve @puzzle.solved?.should be_true end
But with 362,880 possible permutations of the starting board, I most certainly was NOT going to hand code all those tests. I hand coded a few in my developer tests. But I wanted more tests. Lots more.
I knew that I could generate all the board permutations. But then what? Out of the box, RSpec isn’t designed to do data driven testing.
It occurred to me that I should try putting the “it” into a loop. So I tried a tiny experiment:
require 'rubygems' require 'spec' describe "data driven testing with rspec" do 10.times { | count | it "should work on try #{count}" do # purposely fail to see test names true.should be_false end } end
Lo and behold, it worked!
I was able then to write a little “permute” function that took an array and generated all the permutations of the elements in the array. And then I instantiated a new test for each:
describe "puzzle solve algorithm" do permutations = permute([1,2,3,4,5,6,7,8,nil]) before(:each) do @puzzle = Puzzle.new end permutations.each{ |board| it "should be able to solve [#{board}]" do @puzzle.load(board) @puzzle.solve @puzzle.solved?.should be_true end } end
Sampling
Coming to my senses, I quickly realized that it would take a long, long time to run through all 362,880 permutations. So I adjusted, changing the loop to just take 1000 of the permutations:
permutations[0..999].each{ |board| it "should be able to solve [#{board}]" do @puzzle.load(board) @puzzle.solve @puzzle.solved?.should be_true end }
That returned in about 20 seconds. Encouraged, I tried it with 5000 permutations. That took about 90 seconds. I decided to push my luck with 10,000 permutations. That stalled out. I backed it down to 5200 permutations. That returned in a little over 90 seconds. I cranked it up to 6000 permutations. Stalled again.
I thought it might be some kind of limitation with rspec and I was content to keep my test runs to a sample of about 5000. But I decided that sampling the first 5000 generated boards every time wasn’t that interesting. So I wrote a little more code to randomly pick the sample.
My tests started hanging again.
My Tests Found a Bug! (But I Didn’t Believe It at First.)
Curious about why my tests would be hanging, I decided to pick a sample out of the middle of the generated boards by calling:
permutations[90000..90999]
The tests hung. I chose a different sample:
permutations[10000..10999]
No hang.
I experimented with a variety of values and found that there was a correlation: the higher the starting number for my sample, the longer the tests seemed to take.
“That’s just nuts,” I thought. “It makes no sense. But…maybe…”
In desperation, I texted my friend Glen.
I was hoping that Glen would say, “Yeah, that makes sense because [some deep arcane thing].” (Glen knows lots of deep arcane things.) Alas, he gently (but relentlessly) pushed me to try a variety of other experiments to eliminate RSpec as a cause. Sure enough, after a few experiments I figured out that my code was falling into an infinite loop.
Once I recognized that it was my code at fault, it didn’t take long to isolate the bug to a specific condition that I had not previously checked. I added the missing low-level test and discovered the root cause of the infinite loop.
It turns out that my code had two similarly-named variables, and I’d used one when I meant the other. The result was diabolically subtle: in most situations, the puzzle solving code arrived at the same outcome it would have otherwise, just in a more roundabout way. But in a few specific situations the code ended up in an infinite loop. (And in fixing the bug, I eliminated one of the two confusing variables to make sure I wouldn’t make the same mistake again.)
I never would have found that bug if I hadn’t been running my code through its paces with a large sample of the various input permutations. So I think it’s appropriate to have discovered the bug, thus demonstrating the value of high-volume auto-generated tests, while writing about the mechanics of auto-generating tests with RSpec.
In the meantime, if you would like to play with my slider puzzle sample code and tests, I’ve released it under Creative Commons license and posted it on github. Enjoy! (I’m not planning to do much more with the sample code myself, and can’t promise to provide support on it. But I’ll do my best to answer questions. Oh, and yes, it really could use some refactoring. Seriously. A bazillion methods all on one class. Ick. But I’m publishing it anyway because I think it’s a handy example.)