Bugs Spread Disease

I have wasted countless hours of my life arguing to fix bugs in bug triage meetings.

Bug advocacy is a core skill for testers in traditional software development organizations that follow code-then-test practices. Over time, I got reasonably good at it. I could explain to both business and technical stakeholders not only the symptoms of the bug and steps to reproduce it, but also the corresponding impact to users. I could help my internal stakeholders see the connection between the risk that the bug represented and the core value of whatever system we were working on. In short, I became adept at making a business case to fix bugs.

It was awful. I hated every minute of it.

Every bug triage meeting was torture. They were long and contentious. I didn’t understand then, nor do I understand now, why an organization would bother spending so much time and money on testing if they weren’t going to do anything with the information that testing revealed.

These days I choose to do hands-on-the-keyboard work only in organizations where they value clean code so highly that I don’t have to draw on those bug advocacy skills.

But even now, there is a mistaken notion in the software development community that bugs are inevitable, that you can’t fix them all, and that it makes sense to make a fix/defer decision on bugs based on some notion of ROI.

This line of thinking is a deathtrap.

It might be a slow, lingering death. It might, in fact, be such a slow death that it’s not apparent how each defer decision is bringing death for the product, or company, a little closer. But once we start down the apologists path of deferring “minor” bugs in mountainous inventories of defect backlogs, we’re inviting a sinister guest into our midst.

If you think I’m being overly dramatic, consider this: two of the companies where I spent the most time in bug triage meetings are now dead. So I think I know a deathtrap when I see it. Let me share their stories.

One of the companies sold software directly to consumers. The other was a paid service with a software component distributed through large OEMs. In both cases we “didn’t have time” to fix all the bugs. There were always higher priorities. We had to get features out. We had to make deadlines. We had to respond to customer demands.

In both cases the executive staff impressed on us that every day we delayed a release incurred a massive cost. They pushed us to be pragmatic. Reasonable. Make good business decisions about tradeoffs.

The executives were not wrong. Both companies were in rapidly changing markets where speed was critical. In the case of the company that sold to OEM partner companies, the deadlines were incredibly real. Miss the drop date by even a single day, and we wouldn’t be on the platform. That meant we wouldn’t see revenue on that platform for another 6 months.

Under such conditions it’s easy to see how we made the decisions we did.

But remember, both companies are dead.

It wasn’t the bugs that killed us directly. Rather, the bugs became a pervasive infection. They hampered us, slowing down our productivity. Eventually we were paralyzed, unable to make even tiny changes safely or quickly.

Whenever we considered the question of whether to defer or fix a bug, we looked at the obvious costs associated with releasing software with known bugs: annoyed customers, tech support, and the overhead associated with releasing patches for the most egregious cases.

We were right about those costs. So we weren’t surprised when the resulting interruptions and escalations forced us to split our efforts between new development and maintenance. We just hired more people to keep up with the increased demands and told ourselves that it was the cost of doing business in the software industry.

However, it was the hidden costs that drained us of life: hours lost arguing about bugs in bug triage meetings; hours lost stumbling over the same known issues again and again; hours lost fighting with a fragile and error-prone code base to make even small changes; hours lost cataloging and categorizing the backlog. It was demoralizing and immensely expensive.

All those hours leached away, drained our capacity. We could see it happening but were powerless to stop it. By the time we realized how much trouble we were in, it was too late. The infection had taken root and nothing short of a total overhaul of the entire code base would make things better.

The consumer software company eventually died because a more nimble competitor came along. We simply could not keep up with the changing market demands in a rapidly evolving field.

The OEM partner died because we just couldn’t generate enough revenue. What little revenue we had disappeared as one-by-one, the large OEMs dropped us. The OEM’s internal staff had no faith in our software so when their customers complained to their tech support, the OEM tech support blamed our software.

There’s a lesson here about the real cost of tolerating bugs, of supporting practices that involve triaging defects. The cost of carrying a bug is far far greater than most people realize. That trivial little “cosmetic” issue will cost a little time to fix. If you have to argue about whether or not to fix it, then track it, then revisit it, then prioritize it, then argue again about whether to fix it or defer it yet again, you will spend hours upon hours on it.

This is why I have no patience for the “bugs are inevitable; you can’t fix them all” attitude. Bugs kill. They do it slowly, painfully, but relentlessly.

Want your software to evade that horrific death?

Cancel all the bug triage meetings; spend the time instead preventing and fixing defects. Test early and often to find the bugs earlier. Fix bugs as soon as they’re identified; attend to your broken windows early.

And whatever you do, don’t accept the apologist’s excuse that bugs in production are inevitable, so the only pragmatic approach is to make a cost/benefit decision on whether or not to fix them. The cost is almost always far higher than anyone might guess.

I have been blogging even less than usual because all my writing time is going into a book that hasn’t been formally announced yet. I’ll have official news soon.

22 thoughts on “Bugs Spread Disease

  1. I like many of your writings. It takes talent to write well. Thanks for sharing your good thoughts. Helen

  2. Great article. Thanks for sharing your thoughts.

    I’ve seen bugs that indicated that the business process was wrong. For example an analyst tells you it should do X so you code X and then the users ask why it isn’t doing Y. Either a bug in the code or a bug in the process means something should be fixed.

  3. It’s certainly true that technical debt can kill products, and then companies. How to measure it so that new features aren’t the only thing in new releases is an interesting question. It also depends on how much the new features change the original product. Fixing a bug is often less invasive than adding a feature.

  4. “Cancel all the bug triage meetings; spend the time instead preventing and fixing defects.”

    This implies that all bugs are worth fixing — and that the effort to fix them is more important than any new functionality. That doesn’t match my experiences. The process of bug triage (as I am used to it) answers important questions such as when (and whether) to fix an issue, and where it fits in relation to other work. How would you propose answering these questions?

  5. Pingback: Five Blogs – 4 Augustus 2012 « 5blogs

  6. As I developer I have a different perspective on things to a tester. Despite this, I completely agree that all bugs should be fixed – and I live by that by foregoing bug-triage entirely and placing every bug on the board so it can be fixed for the product I my team and I work on.

    Bugs are like lime-scale, once they start to build up the problem grows exponentially until important bugs can’t be distinguished from allegedly unimportant bugs.

    The time it takes to fix a bug is nearly always less than the time it takes to allow it to endlessly loop around. As well as the customer impact, an old bug is discussed in meetings, is accidentally raised again and again by testers and then closed as a duplicate again and again by developers. The longer it hangs around, the more chance there is for it to cost more than it would have cost to fix it straight away.

  7. Yet another interesting and well-written article. Thank you for sharing your experiences!

    I once wrote a blog entry on the costs of fixing a bug, in particular how these costs increase if the bug is fixed ‘later’: http://www.open-closure.com/1/post/2011/09/the-costs-of-a-bug.html. Your article conveys two costs I hadn’t considered, frustration and administrative overhead. I’ve updated my post with a link to your article. Could you let me know what you think of the (development of) costs I have listed?


  8. I just learned of your blog via Jergen Apello and with this one article am now a committed reader! I’d also like to link to your blog from mine once its up and running (soon).

    One of the greatest things about this article is how much I can personally relate to your article — having witnessed first hand 5 such companies over the last 20 years. Two are dead, two are on their death beds and the last one will be within the next year if they don’t wake up soon. I’ve been recently wondering if I’m the only one that realizes this and it makes me want to bang my head on my desk some days. As a result of your article I know I’m not alone on this perspective, and for that I am extremely grateful. Thanks! 🙂

  9. Prevention is key; all companies I worked for are excellent at detection, but none were remotely close to implementing preventative programs. Why? Too darned busy fighting the fires of past past buggy releases.

  10. I was reminded of the first essay in Frederick Brooks book “The Mythical Man-Month”, the essay named “The Tar Pit”:
    “No scene from prehistory is quite so vivid as that of the mortal struggles of the great beasts in the tar pits. […] Large-system programming has over the past decade been such a tar pit” (and has continued to be a tar pit ever since).
    Deferring bug fixes is definitely one of the reasons that development efforts ends up entangeled in the tar pit, experiencing a slow and painfull death.

  11. Pingback: Perspectives on Testing » The Seapine View

  12. Bravo! Elisabeth, thanks for another great post that I know I will be sharing.

    Gabe raises a great point that not all bugs are worth fixing. It’s true! IMO, Elisabeth’s suggestion to cancel the bug triage meeting goes hand-in-hand with empowering teams to make decisions. Tell them to fix the bugs worth fixing and they will make better choices and at a much lower cost than any bug triage committee of cross-functional stakeholders could achieve.

    And along with that, encourage QA and Dev (who are sitting next to each other, right?) to stop creating bug reports in the first place and that will reduce the triage temptation even more so. The cost of opening a bug report so the person sitting next to you can close the bug report is wasted effort. Instead, just fix the bug, wrap it in an automated test if appropriate, and move on. If a bug is not worth fixing, again the answer is to not file a bug report… there’s no point in documenting a bug that’s not worth fixing just so we can confirm in the future that it’s not worth fixing.

    We’ve found value in tracking (only) the bugs that are reported from clients (our external customers). Their satisfaction is important to us, and we want to measure, understand, and reduce the number of client-facing bugs that, despite our efforts, leak out to customers.

  13. In this first comment in your blog I want to take the time to thank you for all the insights you’ve given me into the quality process. I’m halfway through your first book on leanpub and look forwards to the next one.

    I’m thinking of using the situations portrayed in your blogposts to enrich my interview questions ( I get to interview at least 10 people a month). I’m also thinking using said situations in focused mini labs in testing. I want to develop our testers into test obsessed individuals.

    As per bug triage meetings, it has been a long time since I was in one, but I do agree that they’re despicable! The blame slinging that occurs there, the tension getting cut with a knife. I don’t think I want to do that any time soon. And since I’ve been delivering test automation scripts lately (in a semi-agile way), I don’t get that many bug reports, so I can manage 10-15 bug/enhancement reports easily.

    And finally, deciding to eliminate bug triage meetings sounds very Agile-style. Not creating unnecessary bug reports which can immediately be shown to a dev and quickly fixed, or testers who program and fix their own bugs ( as Don suggests ) should be the way to go. But we need testers more focused on development than just Functional testers.

  14. Pingback: Epidemia dos bugs: corrigir agora ou postergar? at Nazareno Neto

  15. Pingback: Software Testing Carnival #1 — Hexawise

  16. Really great idea.. I also think that time for meeting should be short and more and more testing effort should be introduced…..

  17. While I admire the focus on Quality, it may not be practical or feasible to develop defect free software. (You can spend a lot of time and $ polishing the apple and it still may not taste any better!) As such, it’s only logical that a criteria be established to determine what issues must be fixed and what issues may be deferred. The decision to defer the resolution of an issue should consider; Severity, Probability of Occurance, Dependencies and Risk. If clearly defined, management approved criteria is utilized, “bug triage” can be an efficient process. I suspect it’s the absence of a clearly defined criteria/process that results in wasted time debating whether an issue should be fixed or not.

  18. Pingback: Weekly Roundup: Jonathan Kohl at Agile Vancouver

  19. Thanks for the great post Elizabeth.

    In one of my previous companies, they tried the strategy of primarily fixing all new bugs and ignoring old ones. That resulted in discussions that were even more nonsensical: “That bug is not a regression, so we will not fix it”. That appeared to put a cap on the technical debt (as measured by bug counts). Instead, it led to the trap of not addressing inherent architectural problems with the app, which in turn led to restrictions on innovation. This is a hydra that can definitely grow many heads.

  20. Pingback: Software Testing Carnival #3 — Hexawise

  21. I just discovered your blog. I have had the pleasure of being on a product team for many years at my company where my bugs are treated well, so to speak. They know if I or my team found it, it is a true bug, and detailed in the report. Unfortunately over the years we have had changes in management and the product team work was not respected, bugs questioned, distracting policies implemented. But I have enjoyed the collaborative work when it was allowed to progress.

Comments are closed.