Question from the Mailbox: What Metrics Do You Use in Agile?

A reader wrote me to ask:

I know what metrics to use to manage a traditional phased project. But what metrics do you use on Agile projects?

I started drafting my answer in a private email but I decided that it was time to put my answer on the record.

This is a great question. Too many organizations transitioning to Agile don’t ask it. They continue using the same metrics they were using for traditional projects even though the process is fundamentally different.

The problem is that traditional measures don’t work in Agile. Sometimes they’re actively harmful.

  • Requirements coverage is meaningless if the definition of “Done” includes tested. The number will always be 100%. If it’s not 100%, you have a deeper problem that metrics will not help you solve.
  • Test pass/fail ratios are not useful. If a test that was passing starts failing, the team stops and fixes the regression right away.
  • Counting the defects found internally is not particularly helpful, unless it’s a big number, and then we stop to figure out why there are so many bugs.
  • Traditional QA defect ratio-based metrics, especially defect detection efficiency, can be actively harmful. It’s particularly horrific if the testers are being judged based on DDE. In every situation I’ve seen and heard about where that happened, the testers spent more time arguing about what was and was not a bug than actually helping to move the project forward.

So that’s what I don’t do. Before I launch into the metrics that I do use, there are some caveats I have to tell you.

First, I only use metrics to get a 50,000 foot view on what’s going on. They provide an indicator that there’s something to investigate, but they don’t give me enough understanding to make critical decisions. To make important decisions, I have to dig into the details.

Second, I do not use metrics to compare teams or individuals. Ever. This is important. The best way to screw up the usefulness of any process metric is to use it to judge people. This is a basic principle of metrics and has nothing to do with the development process. But apparently it’s something that needs repeating; I still see managers trying to use velocity or defect metrics to compare teams.

I even saw one manager institute the notion of “personal velocity” on an Agile team. He thought it was great. He wanted to know who the solid producers were, and he thought it led to greater attention to individual responsibility. But he was blind to the side effects: people did not help each other because they knew their own personal productivity would decline if they spent their time helping others.

Third, I am much more focused on qualitative than quantitative measures. Counting leads to the illusion that we can understand something because we can quantify it. But numbers can be very misleading. Worse, counting leads to perverse incentives all too often. Even when managers are not assessing people based on process metrics, counting things affects behavior.

Given all that, here’s the list of what I do use:

  1. The core measure I use is velocity, or Running Tested Features as Ron Jeffries calls it. Note that the “Tested” part is important: velocity is only meaningful if Tested is part of the definition of Done.
  2. I sometimes look at cycle time (the time it takes from when a story moves from To Do into In Progress to the time it’s Done). If the team is having difficulty producing Running Tested Features, measuring the cycle time of the simplest possible change is an enlightening diagnostic. If taking a tiny, self-contained, non-risky change all the way to Done takes more than an hour, then it’s not at all surprising the team can’t get real stuff completed. At that point I start digging to find out what the impediments are.
  3. I use the Continuous Integration system to tell me about the health of the build. Sometimes if the build seems to be red a lot I count the time the build stays red, or the time the build has been green. (Think of a big poster: “Days since last build fail: 15”) But any time you count build metrics there is a risk of affecting checkin behavior. I’ve seen a team that is so scared of breaking the build that they stopped checking in altogether. Clearly this is not ideal.
  4. I do measure code coverage on the unit tests. I don’t have any illusion that code coverage tools can actually tell me anything about how good the testing was, but if there are large swaths of code with no unit tests, I view that as technical debt and start paying it down.
  5. I do pay attention to defects reported from the field. If there are enough of them, counting might be useful. But counting isn’t necessary. One of the more effective things I’ve seen is a support group that created visibility around field problems by posting the big ones on a big visible chart prominently located where everyone (testers, developers, executives, etc.) would see it.