Better Testing, Worse Testing
March 20th, 2006
Filed under Agile, Lessons Learned, Ruminations
I presented “Better Testing, Worse Quality” in 2001 at the SM/ASM conference. The paper remains one of the most popular on my site. In it, I use a diagram of system effects to explain how a big improvement in system-level independent testing can, ironically, lead to worse quality as the level of developer testing goes down.
A few months ago, S.R. Ramachandran contacted me to point out that the paper only looked at the feedback loop from one direction. What happens, he asked, if development improves? Will the independent testing become worse?
I wrote “Better Testing, Worse Quality” long before I became involved in the Agile community. At the time, I didn’t know about test-infected developers, Extreme Programming, or Test-Driven Development. Re-reading my words now, I realize that the paper is one-sided. It identifies just one possible system effect: developer testing diminishing as system testing increases. So could improved developer testing, as often happens on Agile projects, ironically lead to worse quality as the level of independent testing goes down?
As I thought about the question more, I realized that I’ve seen this happen. While working with an Extreme Programming team, I overheard the Customer comment, “I don’t need to test all that because the developer tests cover it.” Uh oh.
Back before the organization adopted XP, the manager playing the Customer role would have had a swarm of testers cover every inch of the software looking for problems. But because the developer testing had improved so much, she felt that extensive system-level testing would duplicate the developers’ efforts. She accepted the new features after only a cursory examination. A few weeks later the Customer was stunned when users surfaced bugs that more rigorous system testing would have caught. Better unit testing had led to worse system testing; and worse system testing had led to worse quality.
That memory triggered another. A development manager at another company described, in animated detail, how a COM architecture would reduce the system testing burden. “If we test all the COM objects thoroughly at the code level, everything will just work when we integrate the whole system!” he declared. The team scheduled very little system test time. Some months later, the team was still battling mysterious crashes and timing bugs.
Then I remembered a situation a participant in one of my classes described. An executive in his company was pressuring him to reduce his test estimates saying “The developers will be doing extensive unit testing. Now how much can you cut your system testing?” Notice that the executive didn’t ask, “How much time do you think efficiency gains due to quality improvements will buy us?” Instead, he pointed to increased developer testing as an argument to reduce the system testing.
Notice also that the developers weren’t yet doing all that unit testing: it was planned for the future. Apparently just the promise of more unit testing is enough for some to decide less system testing is needed.
My new, more general conclusion is that better testing at one level tends to result in worse testing in another given no other changes in the system.
This is a problem. It means that the more information we have about the software from one perspective, the less we are likely to have from other perspectives. And that means overall risk tends to remain constant, even after significant improvements in an isolated part of the overall process. Yikes!
In my original paper, I described a difficult conversation in which a VP castigated a Test Manager saying, “We’ve given you a well-stocked lab, you’ve hired a large team of experienced professionals, you’ve brought in training for them, and you’ve established good test practices. With all this investment in testing, how is it that our software is worse?”
Now I can imagine an executive saying to an XP coach, “We’ve given you a well-stocked bull pen, you’ve hired a large team of experienced XP professionals, you’ve brought in training for them, and you’ve established good development practices. With all this investment in development, how is it that our software is worse?”
How tragic.
This leads me to my next general conclusion: an isolated improvement in one aspect of a development process tends to be offset by declines in another, resulting in no overall improvement in the final result. So how do we improve results? By paying attention to the whole process and not just isolated aspects of it.
We can’t afford to use an increase in one kind of testing to justify skimping on another for the simple reason that we can’t substitute one kind of testing for another. Different types of tests answer different types of questions. Unit tests tell us very little about how the overall system works, just as system testing gives us precious little information about how well each code module or class fulfills its responsibilities. Both code-level and system-level tests are necessary to give us a complete picture of the system under test.
Instead of slashing test efforts at any level, let’s focus on efficiency gains. How can we do the same level or even more testing in less time or using fewer resources? Will improvements in overall quality enable us to spend less time spinning our wheels? How can we leverage improvements in one kind of testing to improve the efficiency of another? What steps can we take to ensure setup scripts, test harnesses, fixtures, or data are reusable?
By focusing on efficiency, I’m convinced that we can leverage better testing in one area into better testing in other areas. And that means “Better Testing, Better Testing.”
Leave a Comment
Because of the rise in blog-spam, I've turned on comment moderation. If it takes a while before your comment appears, I hope you understand.
Moderation Policy: I approve substantive comments. I reject ads. And if I don't know whether it's substantive or advertising, it sits in my moderation queue until I get sick of looking at it, at which point I reject it, kind of like the questionable meatloaf in the fridge. But please be assured that I think long and hard before clicking that reject link. I really am grateful for every comment any human takes the time to make. (Spambots, not so much. But if you're reading this, you're probably human.) So please contribute to the conversation...