Bugs Spread Disease

I have wasted countless hours of my life arguing to fix bugs in bug triage meetings.

Bug advocacy is a core skill for testers in traditional software development organizations that follow code-then-test practices. Over time, I got reasonably good at it. I could explain to both business and technical stakeholders not only the symptoms of the bug and steps to reproduce it, but also the corresponding impact to users. I could help my internal stakeholders see the connection between the risk that the bug represented and the core value of whatever system we were working on. In short, I became adept at making a business case to fix bugs.

It was awful. I hated every minute of it.

Every bug triage meeting was torture. They were long and contentious. I didn’t understand then, nor do I understand now, why an organization would bother spending so much time and money on testing if they weren’t going to do anything with the information that testing revealed.

These days I choose to do hands-on-the-keyboard work only in organizations where they value clean code so highly that I don’t have to draw on those bug advocacy skills.

But even now, there is a mistaken notion in the software development community that bugs are inevitable, that you can’t fix them all, and that it makes sense to make a fix/defer decision on bugs based on some notion of ROI.

This line of thinking is a deathtrap.

It might be a slow, lingering death. It might, in fact, be such a slow death that it’s not apparent how each defer decision is bringing death for the product, or company, a little closer. But once we start down the apologists path of deferring “minor” bugs in mountainous inventories of defect backlogs, we’re inviting a sinister guest into our midst.

If you think I’m being overly dramatic, consider this: two of the companies where I spent the most time in bug triage meetings are now dead. So I think I know a deathtrap when I see it. Let me share their stories.

One of the companies sold software directly to consumers. The other was a paid service with a software component distributed through large OEMs. In both cases we “didn’t have time” to fix all the bugs. There were always higher priorities. We had to get features out. We had to make deadlines. We had to respond to customer demands.

In both cases the executive staff impressed on us that every day we delayed a release incurred a massive cost. They pushed us to be pragmatic. Reasonable. Make good business decisions about tradeoffs.

The executives were not wrong. Both companies were in rapidly changing markets where speed was critical. In the case of the company that sold to OEM partner companies, the deadlines were incredibly real. Miss the drop date by even a single day, and we wouldn’t be on the platform. That meant we wouldn’t see revenue on that platform for another 6 months.

Under such conditions it’s easy to see how we made the decisions we did.

But remember, both companies are dead.

It wasn’t the bugs that killed us directly. Rather, the bugs became a pervasive infection. They hampered us, slowing down our productivity. Eventually we were paralyzed, unable to make even tiny changes safely or quickly.

Whenever we considered the question of whether to defer or fix a bug, we looked at the obvious costs associated with releasing software with known bugs: annoyed customers, tech support, and the overhead associated with releasing patches for the most egregious cases.

We were right about those costs. So we weren’t surprised when the resulting interruptions and escalations forced us to split our efforts between new development and maintenance. We just hired more people to keep up with the increased demands and told ourselves that it was the cost of doing business in the software industry.

However, it was the hidden costs that drained us of life: hours lost arguing about bugs in bug triage meetings; hours lost stumbling over the same known issues again and again; hours lost fighting with a fragile and error-prone code base to make even small changes; hours lost cataloging and categorizing the backlog. It was demoralizing and immensely expensive.

All those hours leached away, drained our capacity. We could see it happening but were powerless to stop it. By the time we realized how much trouble we were in, it was too late. The infection had taken root and nothing short of a total overhaul of the entire code base would make things better.

The consumer software company eventually died because a more nimble competitor came along. We simply could not keep up with the changing market demands in a rapidly evolving field.

The OEM partner died because we just couldn’t generate enough revenue. What little revenue we had disappeared as one-by-one, the large OEMs dropped us. The OEM’s internal staff had no faith in our software so when their customers complained to their tech support, the OEM tech support blamed our software.

There’s a lesson here about the real cost of tolerating bugs, of supporting practices that involve triaging defects. The cost of carrying a bug is far far greater than most people realize. That trivial little “cosmetic” issue will cost a little time to fix. If you have to argue about whether or not to fix it, then track it, then revisit it, then prioritize it, then argue again about whether to fix it or defer it yet again, you will spend hours upon hours on it.

This is why I have no patience for the “bugs are inevitable; you can’t fix them all” attitude. Bugs kill. They do it slowly, painfully, but relentlessly.

Want your software to evade that horrific death?

Cancel all the bug triage meetings; spend the time instead preventing and fixing defects. Test early and often to find the bugs earlier. Fix bugs as soon as they’re identified; attend to your broken windows early.

And whatever you do, don’t accept the apologist’s excuse that bugs in production are inevitable, so the only pragmatic approach is to make a cost/benefit decision on whether or not to fix them. The cost is almost always far higher than anyone might guess.

I have been blogging even less than usual because all my writing time is going into a book that hasn’t been formally announced yet. I’ll have official news soon.