Not Exhaustively Tested

It sounds like Joe Stump is having a bad time of it right now.

Joe Stump, formerly of Digg, left Digg to co-found a Mobile games company. They released the first of their games, Chess Wars, in late June.

Soon after, new players found serious problems that prevented them from playing the game. In response, the company re-submitted a new binary to Apple in July. As of this writing, the current version of Chess Wars is 1.1.

The trouble started with patch release #2. Apparently, even six weeks after Joe’s company submitted the new binary (release number 3 for those who are counting), Apple still hasn’t approved it.

Eventually Joe got so fed up with waiting, and with seeing an average rating of two-and-a-half out of five stars, that he wrote a vitriolic blog post [WARNING: LANGUAGE NOT SAFE FOR WORK (or for anyone with delicate sensibilities)] blaming Apple for his woes.

That garnered the attention of Business Insider who then published an article about the whole mess.

Predictably, reactions in the comments called out Joe Stump for releasing crappy software.

I should mention here that I don’t know Joe. I don’t know anything about how he develops software. I think that there’s some delightful irony in the name of his company: Crash Corp. But I doubt he actually intended to release software that crashes.

Anyway, Joe submitted a comment to the Business Insider article defending his company’s development practices:

We have about 50 beta testers and exhaustively test the application before pushing the binary. In addition to that the application has around 200 unit tests. The two problems were edge cases that effect [sic] only users who had nobody who were friends with the application installed.

I’m having a great deal of trouble with this defense.

Problem #1: Dismissing the Problems as “Edge Cases”

The problems “only” occur when users do not have any Facebook Friends with the application. But that’s not an aberrant corner case. This is a new application. As of the first release, no one has it yet. That means any given new user has a high probability of being the first user within a circle of friends. So this is the norm for the target audience.

Joe seems to think that it’s perfectly understandable that they didn’t find the bugs during development. But just because you didn’t think of a condition doesn’t make it an “edge case.” It might well mean that you didn’t think hard enough.

Problem #2: Thinking that “50 Beta Testers” and “200 Unit Tests” Constitutes Exhaustive Testing

Having beta testers and unit tests is a good and groovy thing. But it’s not sufficient, as this story shows. What appears to be missing is any kind of rigorous end-to-end testing.

Given an understanding of the application under development, a skilled tester would probably have identified “Number of Friends with Chess Wars Installed” as an interesting thing to vary during testing.

And since it’s a thing we can count, it’s natural to apply the 0-1-Many heuristic (as described on the Test Heuristics Cheat Sheet). So we end up testing 0-friends-with-app, 1-friend-with-app, and Many-friends-with-app.

So even the most cursory Exploratory Testing by someone with testing skill would have been likely to reveal the problem.

I’m not suggesting that Joe’s company needed to hire a tester. I am saying that someone on the implementation team should have taken a step back from the guts of the code long enough to think about how to test it. Having failed to do that, they experienced sufficiently severe quality problems to warrant not one but two patch releases.

Blaming Apple for being slow to release the second update feels to me like a cheap way of sidestepping responsibility for figuring out how to make software that works as advertised.

In short, Joe’s defense doesn’t hold water.

It’s not that I think Apple is justified in holding up the release. I have no idea what Apple’s side of the story is.

But what I really wanted to hear from Joe, as a highly visible representative of his company, is something less like “Apple sucks” and something much more like “Dang. We screwed up. Here’s what we learned…”

And I’d really like to think that maybe, just maybe, Joe’s company has learned something about testing and risk and about assuming that just because 50 people haphazardly pound on your app for a while that it’s been “exhaustively” tested.

20 thoughts on “Not Exhaustively Tested

  1. Pretty funny that Stump’s “edge case” is actually his base case, as you point out. It’s an “edge case” that none of my friends has your brand-new application installed? Trippy!

  2. Great post and yeah — he and his company have no excuse for:

    * not testing that obvious use case (a new user with zero friends)
    * expecting he could do a sloppy release-debug-patch-repeat cycle, when it can take weeks for the App Store to approve software
    * throwing Apple, the company that made his app possible and is the only one who can get him out of this mess, under the bus
    * not apologizing

  3. Nice post. I started jotting some notes a while back for an evolutionary tree for test ideas, approaches and terminology – the aim being to loosely chart their progression or “evolution”.

    I’ll include this reference to the “exhaustive testing” part – of course, in evolutionary terms it’ll be a dead end…

    And no, my list does not intend to be “exhaustive”!

  4. This bug is caused by what is sometimes called “environmental factors”. Many developers make the mistake of testing their code against different inputs but neglect to test it against different environments. It’s like two (orthogonal) dimensions.

    Of it is impossible to fully exercise even one dimension, let alone two. Luckily this is not needed. Excericising a relatively small, finite, number of combinations (of inputs & environments factors) gives great milage in terms of finding bugs. To make this happen all that is needed is some plumbing that will make it as easy as possible for the programmer to enter new tests.

    The equation is simple: Ease of writing tests => More tests => Less bugs

  5. Why do I have this image of Wallace Shawn running around yelling “I have 50 beta testers and two-HUNDRED unit tests! A defect? Escaping my exhaustive testing?! INCONCEIVABLE!”

  6. Exhaustive testing might mean testing until you’re tired of it…If I was relying on 50 beta testers to do that for me, getting tired might not take long.

  7. Fantastic post, Elisabeth, and a great reminder about the value of testing. Stories like this are a reason why your test heuristics cheat sheet should be required reading for anyone considering how to test software.

  8. This whole story is so rich – at first I read your blog entry and said, ‘eh – can’t be right’. Then reading your links to the original articles.. ‘ohh – oh dear’ … then reading the comments … ‘LOL!’

    – it’s really such a simple problem, and it gets so clouded so fast that only fascinating few people sees it. There’s so much learning in this one, not about app deployment or even testing, but how people deal with the information they get.

    Thank you for a great blog post.

  9. “We have about 50 beta testers and exhaustively test the application before pushing the binary”
    Translation: We have no testers and developers play around with the application before pushing the binary
    Which could be considered “good-enogh” for an $1 app I guess. Just don’t throw a fit if you don’t get lucky, You reap what You sow and all that.
    Although it does raise the question what are Apple reviewers doing for six weeks if they can’t find a bug which boils down to “Application crashes on startup”.

  10. OMG! I just found this today. It made me laugh. I think I’m going to make it through Wednesday now.

    I actually LOVE stories like this. Without sounding completely malicious, I can Google how to perform a vasectomy, but it doesn’t mean I’m just going to go ahead and let my best bud give it a shot on me! I mean what makes ANYONE think that they can just pick up ANYTHING and go with it? Huh, testing seems easy, why not give it a shot.

    Not that I’m looking for “props” to testers around the world, but give us an ounce of credit. 200 unit tests? Laughable. 50 beta testers? Shoot, Google had millions of beta testers testing for years on Gmail and….ummmm….wait a sec. Is it out of beta yet? Lol.

    Great post! Loved it!

  11. In general I’m not a fan of beta testing as an alternative to testing for startups. I think this is an old and historic way of thinking.

    With the advances in how software testing is performed there is no need to use beta testing instead of some really good exploratory testing (as you so well put in your post).

    Though I disagree with you one one thing, I do think the ought to have hired a tester!

    Thanks for a lovely post – Anne-Marie

  12. Elisabeth,

    Bravo! Nice entry. The only thing I can say is is that Mr. Stump seems to have a JSTF (Just Ship The Fracker) mentality. And in this day and age that just doesn’t work. As they say in the south, “That dog don’t hunt”.

  13. I thought the same thing Rasmus did. “We have 50 beta testers” means that everyone in the office plays with it a little before they throw it over the wall. And having unit tests is, as you mention, a good thing, but hardly sufficient.

    “What appears to be missing is any kind of rigorous end-to-end testing.”

    Exactly. You know, the kind of testing that…TESTERS do. Not developers. Not your family and friends. Real, professional testers.

  14. Actually I think most of the comments that have been made on this topic (I’m referring predominantly to the ones following the linked articles rather than those on this blog) are a little premature. Ask yourself this: How would your attitude to this whole situation be affected by the app working perfectly when Apple finally updates it on the store?

    Then compare your answer to: How would your attitude to this whole situation be affected by the app presenting more bugs when Apple finally updates it on the store?

    Everyone is jumping on this guy because he is using Beta testers and has a very small list of Unit tests. Its an iPhone app not the new version of Windows. It might make some money for them or it might not and its not surprising that there is no specific testing department. What we have is just a group of developers who were dumb. (who hasn’t seen that before?)

    In the (slightly unlikely) event that the app works fine now, we are left with Apple taking 6 weeks to update it on the store while the developer’s reputation is being wiped out, and not responding until after they were published in an online article referencing them being completely flamed in a blog post.

    Contrary to the appearance of the above, I’m not trying to defend the guy as it should have had better testing in the first place. However, I am not impressed with Apple’s handling of the flowon effects.

  15. For what it’s worth, I think “dang, we screwed up” is so rare in this world that it’s notable. I make an effort to applaud those who stand up and say that; they are are enough to be memorable.

    I do appreciate your analysis of the situation; I think we could use a lot more of those kind of “case study” approaches to software dev/test issues. Thank you, Elisabeth!

  16. Hi, Elizabeth.

    When conducting a failure analysis, people often look at an array of conditions, pick one that if you flipped it could have made things different, and declare that the problem. Joe Stump picked Apple’s approval process. You picked Joe Stump’s approach to testing. I might pick how ridiculously hard Apple makes it to do beta releases. Someone else might pick Facebook’s poor test infrastructure, or Joe’s inexperience building for a mobile platform, or the general lack of experience in testing mobile app + web service combinations.

    But really, picking just one thing to blame isn’t helpful. If changing any of those could fix it, then fixing just one leaves us on the edge of failure. I’d rather fix all of those.

  17. Itay Maman writes …

    “The equation is simple: Ease of writing tests => More tests => Less bugs”

    This assumes that testing stops bugs. It’s not true, testing may or may not find defects, it depends where you look. It is correct implementation of the code in the first place that means less bugs. It also depends on how many defects that you choose to fix as to whether you get less bugs.

    For me the equation for less bugs is significantly more complex and that’s why people like Joe have made wild assumptions about their code being ok to release and that their testing is adequate.

Comments are closed.