Measuring performance

James Gregory has an entry up called Metrics, on metrics as used to measure the performance of your software product: is it as fast as we need, as stable as we need, as tested as we need? He then goes on to talk about measuring the performance of your coders:

There’s other metrics you should watch as well: the deficit between a programmer’s estimate on time to fix a bug and actual time taken is honestly the only way to start getting good time-estimates. Which is another important point: metrics don’t just let you improve your product now, they help your staff get better at their jobs.

Similarly, it’s imperfect, but trends from time-lines of bugs open, and rates of bugs opened per unit time show interesting trends. I’ve found these to be good indicators of project completeness in the past, but these metrics are dangerous: it’s easy to be misled by this stuff, so treat such metrics with suitably large salt-grains.

He goes on to note Joel Spolsky’s objections to measuring programmer performance like this, which are in Spolsky’s article The Econ 101 Management Method. James says:

The example [Spolsky] gave was in reference to his company’s bug-tracking software, and he submitted that once you can measure the bugs-closed per unit time value, programmers will start artificially inflating their count by lying to the bug-tracker and to you. I frankly find this position a bit offensive, but if such situations exist it reflects poorly on management rather than the engineers. You need to be accepting of metrics that show problems, because if people hide the problems, you can’t solve them, but similarly: if you can’t measure, or, don’t look for the problems, you can’t solve them either.

Spolsky has an argument from psychology too, which I’ll get to, but first I thought it was worth noting that while James who is (let’s stipulate) a good manager needs to work with, optimise around or fire bad engineers occasionally, we as (let’s stipulate) good engineers need to defend ourselves against bad managers. I can come up with all kinds of amusing or horrifying stories about managers who’ve gotten bees in their bonnet about various metrics or other ‘signs’ that a project is going well or badly, or that they’re being undermined or are about to get a bonus or whatever. For every manager who is trying to notice patterns like every time you tell me a bug fix is going to take a day, it actually takes a month there’s one who is roaming around reminding people triumphantly that all version control commit messages must have at least 100 words in them. [I should note that this example is hypothetical: no actual crazy managers were harmed in the making of this blog entry. That said, I’m sure there’s some manager out there doing this right now.]

That’s a bad manager! isn’t much consolation when you work under said manager.

Anyway, as far as it goes, I suppose Spolsky’s point is that programmer performance is unlikely to be measured by a single metric any more than product quality is in James’s post, with increases in speed needing to be balanced against security, stability and maintainability metrics. But, if you do make the mistake of relying on a single metric for product quality, as James notes, you get a very fast piece of software which barely works. That’s one thing. But Spolsky is arguing that if you rely on a single metric for programmer quality, you not only get say, a low bug count at the expense of tests, you also get worse programmers:

Intrinsic motivation is your own, natural desire to do things well… Extrinsic motivation is a motivation that comes from outside, like when you’re paid to achieve something specific.

Intrinsic motivation is much stronger than extrinsic motivation. People work much harder at things that they actually want to do. That’s not very controversial.

But when you offer people money to do things that they wanted to do, anyway, they suffer from something called the Overjustification Effect. “I must be writing bug-free code because I like the money I get for it,” they think, and the extrinsic motivation displaces the intrinsic motivation. Since extrinsic motivation is a much weaker effect, the net result is that you’ve actually reduced their desire to do a good job.

So Spolsky has a point beyond you’ve got to be careful that you don’t mistake the metric for what it’s measuring, although that’s part of his point. This is a point that scientific method training hammers into you: you need metrics to help you decide what reality is doing, but the metric is not the same thing as reality, it is only ever an approximation to it, and is only good insofar as it is a useful approximation.

Spolsky’s other point is that regardless of whether you, the stipulated good manager, mistake the metric for reality, the people judged based on the metric will: they’ll even start mistaking the metric for their conscience. This, I think, is a point James hasn’t addressed yet.