Adventures with Auto-Generated Tests and RSpec

This post started out as a quick little entry about a cool parlor trick you can do with RSpec to make it work for auto-generated test data. But in the middle of writing what was supposed to be a simple post, my tests found a subtle bug with bad consequences. (Yeah for tests!)

So now this post is about auto-generated tests with RSpec, and what I learned hunting down my bug.

Meet RSpec

In case you haven’t encountered RSpec before, it’s one of the Behavior Driven Development developer test frameworks along with JBehave, EasyB, and others.

Each RSpec test looks something like this:

  it "should be able to greet the world" do
      greet.should equal("Hello, World!")
  end

I used RSpec to TDD a solution to a slider puzzle code challenge posted on the DailyWTF.

Auto-Generating LOTS of Tests with RSpec

So let’s imagine that you’re testing something where it would be really handy to auto-generate a bunch of test cases.

In my particular case, I wanted to test my slider-puzzle example against a wide range of starting puzzle configurations.

My code takes an array representing the starting values in a 3×3 slider puzzle and, following the rules of the slider puzzle, attempts to solve it. I knew that my code would solve the puzzle sometimes, but not always. I wanted to see how often my little algorithm would work. And to test that, I wanted to pump it through a bunch of tests and give me pass/fail statistics.

I could write individual solution tests like this:

  it "should be able to solve a board" do
      @puzzle.load([1, 2, 3, 4, 5, 6, 8, 7, nil])
      @puzzle.solve
      @puzzle.solved?.should be_true
  end

But with 362,880 possible permutations of the starting board, I most certainly was NOT going to hand code all those tests. I hand coded a few in my developer tests. But I wanted more tests. Lots more.

I knew that I could generate all the board permutations. But then what? Out of the box, RSpec isn’t designed to do data driven testing.

It occurred to me that I should try putting the “it” into a loop. So I tried a tiny experiment:

  require 'rubygems'
  require 'spec'

  describe "data driven testing with rspec" do

      10.times { | count |
          it "should work on try #{count}" do
              # purposely fail to see test names
              true.should be_false
          end
      }

  end

Lo and behold, it worked!

I was able then to write a little “permute” function that took an array and generated all the permutations of the elements in the array. And then I instantiated a new test for each:

  describe "puzzle solve algorithm" do
      permutations = permute([1,2,3,4,5,6,7,8,nil])

      before(:each) do
        @puzzle = Puzzle.new
      end

      permutations.each{ |board|
          it "should be able to solve [#{board}]" do
              @puzzle.load(board)
              @puzzle.solve
              @puzzle.solved?.should be_true
          end
      }
  end

Sampling

Coming to my senses, I quickly realized that it would take a long, long time to run through all 362,880 permutations. So I adjusted, changing the loop to just take 1000 of the permutations:

  permutations[0..999].each{ |board|
      it "should be able to solve [#{board}]" do
          @puzzle.load(board)
          @puzzle.solve
          @puzzle.solved?.should be_true
      end
  }

That returned in about 20 seconds. Encouraged, I tried it with 5000 permutations. That took about 90 seconds. I decided to push my luck with 10,000 permutations. That stalled out. I backed it down to 5200 permutations. That returned in a little over 90 seconds. I cranked it up to 6000 permutations. Stalled again.

I thought it might be some kind of limitation with rspec and I was content to keep my test runs to a sample of about 5000. But I decided that sampling the first 5000 generated boards every time wasn’t that interesting. So I wrote a little more code to randomly pick the sample.

My tests started hanging again.

My Tests Found a Bug! (But I Didn’t Believe It at First.)

Curious about why my tests would be hanging, I decided to pick a sample out of the middle of the generated boards by calling:

  permutations[90000..90999]

The tests hung. I chose a different sample:

  permutations[10000..10999]

No hang.

I experimented with a variety of values and found that there was a correlation: the higher the starting number for my sample, the longer the tests seemed to take.

“That’s just nuts,” I thought. “It makes no sense. But…maybe…”

In desperation, I texted my friend Glen.

I was hoping that Glen would say, “Yeah, that makes sense because [some deep arcane thing].” (Glen knows lots of deep arcane things.) Alas, he gently (but relentlessly) pushed me to try a variety of other experiments to eliminate RSpec as a cause. Sure enough, after a few experiments I figured out that my code was falling into an infinite loop.

Once I recognized that it was my code at fault, it didn’t take long to isolate the bug to a specific condition that I had not previously checked. I added the missing low-level test and discovered the root cause of the infinite loop.

It turns out that my code had two similarly-named variables, and I’d used one when I meant the other. The result was diabolically subtle: in most situations, the puzzle solving code arrived at the same outcome it would have otherwise, just in a more roundabout way. But in a few specific situations the code ended up in an infinite loop. (And in fixing the bug, I eliminated one of the two confusing variables to make sure I wouldn’t make the same mistake again.)

I never would have found that bug if I hadn’t been running my code through its paces with a large sample of the various input permutations. So I think it’s appropriate to have discovered the bug, thus demonstrating the value of high-volume auto-generated tests, while writing about the mechanics of auto-generating tests with RSpec.

In the meantime, if you would like to play with my slider puzzle sample code and tests, I’ve released it under Creative Commons license and posted it on github. Enjoy! (I’m not planning to do much more with the sample code myself, and can’t promise to provide support on it. But I’ll do my best to answer questions. Oh, and yes, it really could use some refactoring. Seriously. A bazillion methods all on one class. Ick. But I’m publishing it anyway because I think it’s a handy example.)

Subscribe

Subscribe to our e-mail newsletter to receive updates.

8 Responses to Adventures with Auto-Generated Tests and RSpec

  1. David Chelimsky September 5, 2009 at 11:10 am #

    Hi Elizabeth – just a couple of comments on RSpec usage:

    foo.should equal(bar) does object identity. It will fail on “this”.should equal(“this”), so you should use either eql or ==:

    For @puzzle.solved?.should be_true, you can also say “@puzzle.should be_solved” – RSpec grabs “solved” and delegates it to the @puzzle as “solved?”. Some folks don’t like this just because it’s a bit magical, in which case you can always phrase it as you did in this post.

    Cheers,
    David

  2. Bill Tozier September 7, 2009 at 8:30 am #

    Actually, there’s nothing actually magic about rSpec “out of the box” that precludes you thinking of wrapping specs in Ruby iterators. It is, after all, Ruby itself!

    In fact, most of the projects I’ve worked on and encountered among our cohort of Rubyists here at Workantile Exchange have used Ruby to manage this sort of thing.\

    But I find myself listening to your story and wanting to reach over and refactor this acceptance test into the Cucumber level an get it out of the rSpec level of your tests. I’d much prefer seeing Cucumber code like:

    Feature: Path to solution
    In order to feel secure that the algorithm works
    Every possible permuted game board should be solvable

    Scenario: No infinite loops
    Given a list of [large number] of initial layouts
    When I puzzle#solve each one
    Then it should never take more than [theoretical max] steps before converging

  3. Alex Chaffee September 7, 2009 at 9:24 am #

    Nice trick, huh? I call this “doing it again and again”: http://www.pivotalblabs.com/users/alex/blog/articles/277-doing-it-again-and-again

  4. David Stamm September 14, 2009 at 11:49 am #

    Thanks for the thrilling article!

    @Alex, your link URL should be:
    http://www.pivotallabs.com/users/alex/blog/articles/277-doing-it-again-and-again

    Crazy that someone owns the domain PivotalBlabs.com, eh? ;-)

  5. David Allen September 16, 2009 at 8:36 pm #

    A more efficient approach might be to use a smart test-case generation tool like Pex from Microsoft Research http://research.microsoft.com/en-us/projects/Pex/. Have you explore those?

  6. Glen Newton February 1, 2012 at 4:52 pm #

    My colleague tried the following and it failed:

    x=0
    while x<10 do
    it "should be able to solve [#{x}]" do
    x += 1
    end
    end

    This 'while' does not behave as expected….It just loops infinitely

    Yet this works OK:

    (0..9).each do |x|
    it "should be able to solve [#{x}]" do
    end
    end

    Any explanation?

    rspec -version 2.7.1

  7. Bala Paranj May 10, 2012 at 6:31 pm #

    Like Alex points out, this is a test smell according to Gerard Mezos. In .NET they have something called as data driven unit test. In Ruby we can accomplish the same thing using the following:

    def data_driven_spec(container)
    container.each do |element|
    yield(element)
    end
    end

    and in your test you can do :

    it “should return back slash pre-pended to all special characters” do

    SPECIAL_CHARACTERS = [“?”, “^”, “$”, “/”, “\\”, “[", "]“, “{“, “}”, “(“, “)”, “+”, “*”, “.” ]

    data_driven_spec(SPECIAL_CHARACTERS) do |special_character|
    result = Regexpgen::Component.match_this_character(special_character)
    result.should == “\\#{special_character}”
    end
    end

    The code is taken from my regexpgen ruby gem: https://github.com/bparanj/regexpgen

  8. ambar June 20, 2013 at 4:25 pm #

    Your post helped me figure out how to run a data-driven test where each iteration corresponds to one example.

    Until now I was using the idea at http://testdrivenwebsites.com/2011/08/17/different-ways-of-code-reuse-in-rspec/ to write a data-driven test, but there was only one example for all the data-iterations.

    Thank you!

Leave a Reply