So if you read my last post, you might be wondering, why not use velocity to compare teams? There are a number of problems with using velocity that way.
Each team estimates the relative points for a set of stories against other stories they have already seen. So team A looks at some stories and might say “that one is about in the middle so let’s call it a 3, those other ones are smaller so let’s call them 1 each, and that one is big so it is a 5”.
Then team B looks at some other stories and might say “that one is about in the middle so let’s call it an 8, those other ones are smaller so let’s call them 3 each, and that one is big so it is a 13”. So team B’s estimates will be larger and their velocity therefore higher, even though they’re not more “productive” or “efficient”.
No you can’t. Remember, these are relative sizes, not absolute. So if a team thinks a story is roughly five or six times bigger than another story, they would be right both in saying that the first story is a 1 and the second is 5, and equally correct in saying the first one is a 2 and the second one is a 13. What is important is that they are consistent with themselves over time; i.e. if another story comes along next sprint that is about the same size as the 1, then it is a 1 and not a 2.
A team can be inconsistent with themselves but not with another team. That’s because they are not making estimates against absolute standards, but their own view of the relative sizes between items. Think about it this way.
You ask two groups of people to rank and describe the sizes of a golf ball, a tennis ball and a basketball. The first describes them as “small, medium, large”. The second describes them as “tiny, small, medium”. Which is right? Both of them. Is the golf ball “tiny”? It is not if you compare it to a tennis ball. But it is if you compare it to a house.
No you can’t. Story size estimates are relative by definition. If they’re not then you’re not doing story size estimation, you’re doing something else. People keep trying to standardise this stuff and it doesn’t work. Let me know if you find a way to make it work. But that still won’t solve the problem, because…
Software teams are building different things. If two software teams were both building exactly the same thing at exactly the same time, you would fire one of the teams, and probably some other people for allowing this to happen. So even if you found some objective standards against which each team could estimate, the things they are estimating are different. And the contexts in which they are operating are likely to be different too.
One team might be building on a new architecture platform, the other might not. Another team might be refactoring some technical debt as they go, the other might not. One team might have a product owner who changes their mind every five minutes, the other might not. You might be thinking “but that’s exactly it, I can use these drops in velocities to look for these problems and try and fix them!”. That’s true, but it’s missing the point, because…
The team performs its own introspection at the end of each sprint and looks for problems and how to fix them. This may include looking at recent dips in velocity. The team doesn’t need a manager breathing down their neck telling them to pick up their game; they know when they have done well and when they haven’t.
It is trivially easy to game this system; the team can just pump up their estimates each sprint, which means their velocity increases each sprint. This is obviously ridiculous and makes the entire metric, and estimation process, worthless. I strongly believe, based on experience and anecdotal evidence, that teams will not game the system if you don’t give them any reasons to do so. So don’t give them any reasons to do so.
It seems not very useful. Should we just drop it altogether and not do any estimation? Perhaps. For now, assuming you are doing estimation, then velocity has really one use. A team can use their velocity to estimate how far they will get through a given backlog of work, based on how far through that backlog they are, and the rate at which they are completing those backlog items.
That assumes that the backlog of work has either already been estimated, or an average story point estimate can and should be used (i.e. they haven’t left all the hard stuff for last). A team might also find velocity helpful to review in a retrospective to see how they went, but the team really should know that intuitively without looking at a velocity chart.
I hope you’ve found this interesting and convincing. Question time – have you ever been asked to compare the velocity of two or more teams to find the “underpeforming” ones? I’d like to read about it in the comments.