Top 10 Testing Mistakes XP Teams Tend to Make
March 23rd, 2006
Filed under Agile, Ruminations, Thinking Like a Tester
Top 10 Testing Mistakes XP Teams Tend to Make
XP practices like Test-Driven Development (TDD), continuous integration, refactoring, and paired programming make it the most rigorous process I’ve ever encountered. The resulting code is usually extremely high quality. But, unfortunately, XP is not a panacea. Nor is code produced by XP teams guaranteed to delight users. Pulling from my own observations as well as the stories others have told me, here is my top 10 list of testing mistakes XP teams tend to make that can result in bad surprises when the software is deployed.
- Attempting to substitute unit tests for acceptance tests. The automated unit tests that result from TDD provide fabulous fast feedback about the possible harmful side effects of any given change. That feedback is so fast, and so useful, that teams are sometimes tempted to claim that the software is good because the unit tests are green. But no matter how comprehensive the unit tests are, end-to-end acceptance tests give us information that unit tests can’t. Where unit tests tell us how well the internals of the system conform to explicit expectations, acceptance tests give us information about an end user’s experience. You can’t substitute one for the other.
- Not automating the acceptance tests. Numerous XP teams around the globe are re-discovering that automating end-to-end acceptance tests through a UI is painfully hard. All too often, I see XP teams skimp on end-to-end acceptance test automation because it’s too time-consuming and painful. But this is one case where the adage “if it hurts, do more of it” is especially helpful. If it’s too time consuming or painful to automate the acceptance tests, it means something needs to change to make it easier. Maybe that means using a different automated testing tool, maybe it means changing the interface to make it more testable, or maybe it means the team just needs more practice. Whatever the problem, doing it more will give the team more opportunities to find remedies for the pain. Avoiding it just causes more pain in the long run when the manual testing becomes too big a burden and bugs start slipping through the cracks.
- Thinking the automated tests are sufficient. Having a fully automated suite of tests at both a unit and acceptance level is such an ideal goal, it’s easy to forget that we can’t predict and code tests for every interesting condition. Some amount of manual testing will always be necessary to catch those surprises we couldn’t possibly foresee.
- Letting the Customer accept features with insufficient testing. Sometimes Customers let their eagerness to see the software deployed gets the better of their skepticism, and they skimp on the acceptance testing. If that happens, it’s up to the team to gently but firmly make the Customer understand the risks of accepting Stories too readily.
- Overly relying on the Customer to specify every detail of both desired and undesired behavior on every Story. Some XP teams place all the burden of specifying behavior on the Customer, saying “if the Customer didn’t ask for it specifically in the Story, it doesn’t count, and we shouldn’t do it.” Consider an application that crashes when the user enters invalid data in the field. Some XP teams will say, “We don’t need to write any code to guard against bad input unless the Customer explicitly asks for it.” The problem is that the Customer usually assumes that some acceptance criteria are obvious. Not crashing if the user happens to enter an ampersand (”&”) in a description field would be right up there in their minds with “obvious.” But how should the team draw the fine line between gold-plating a release with features the Customer didn’t request and anticipating the Customer’s needs to avoid rework? The best way I know is to discuss assumptions about these “Level 0″ requirements: the acceptance criteria the Customer assumes will be in place without having to explicitly state them in each and every Story.
- Underestimating the need for integration testing. Story A has automated unit and acceptance tests. Story B has automated unit and acceptance tests. Story A works great. Story B works great. The customer has carefully reviewed, tested, and accepted the stories. Everyone’s happy. End of, er, well, Story. Right? The problem is that Story A and Story B might not work so well together. Perhaps Story A has a side effect of corrupting the data used by Story B. The solution is to include end-to-end scenarios that touch multiple Stories when testing.
- Underestimating the need for extended sequence testing. Tests in XP environments tend to be straightforward. Set up the conditions. Perform the actions. Verify the results. Repeat. But that’s not how real world users use software. A real world user is more likely to set up some conditions, perform some actions, change some conditions, take a coffee break, un-do then re-do some of the actions (but not all), view the results, revisit the actions, and so on. The real world is messy. Simplistic, linear tests don’t tell us enough about the risks lurking in the software when real users use it in a real-world way.
- Forgetting about non-functional criteria. How many XP teams write automated tests to detect memory leaks? Or random, high-volume automated tests designed to find reliability problems? My guess is just those bitten by memory or reliability related bugs. XP teams rarely articulate non-functional quality criteria such as reliability, usability, performance, scalability, and memory footprint in Stories. And that means XP teams rarely have tests designed to provide information about these attributes. Non-functional quality criteria are by their nature more ambiguous and vague than feature Stories. But they’re just as important to the overall user experience. It’s worth the extra effort to test the non-functional attributes of the system and articulate acceptance criteria.
- “Fixing” a build by commenting out a test that “shouldn’t be failing.” The JUnit mantra “Keep the code clean, keep the bar green” is so powerful that XP teams have been known to cheat by simply commenting out the tests that are failing to get the build back to Green. I know: you’re all shocked. “No one on my team would ever do such a thing!” you protest. Perhaps not. But I’ve seen it happen. And I’ve been tempted to do it myself. “There is no good reason this test should be failing,” I say to myself. “It must be something unrelated to this particular test.” And sometimes that’s true. After digging around, I discover that the problem is not with the failing test but with some data pollution caused by another test. And sometimes the assertions in the test are no longer valid. So if the test is truly invalid, delete it instead of commenting it out. But sometimes the failing test is giving me a very important message, one that I’d be a fool to ignore. Commenting out the test without investigating the problem more deeply is like applying heavy cologne to cover a bad smell. And in the case of code, it’s risky behavior that undermines the power of those unit tests.
- Not including testing activities in the Planning Game. I’ve been in a number of Planning Game meetings at this point. And I’ve noticed that when I suggest we include time for activities related to creating test data or setting up test configurations, I usually encounter resistance. Sometimes the resistance is a Catch-22: “The Customer has to tell us he wants those activities done by putting them in Stories.” says the team. When I propose we add the activities as Stories, the team objects: “But those are infrastructure activities that have no inherent value to the Customer. They’re not Stories.” No matter how we account for the time, we’re going to have to do the testing tasks. (Or accept the risk of inadequate testing. See Item 4.) So if we don’t want our Velocity to suffer because we spend unbudgeted time on testing tasks, we should budget the time, whether in a Story or by reducing our Velocity estimates. And in order to ensure the testing tasks are done, we should track them the same way we track other infrastructure activities.