Points For Stories and the Perplexing Nature of Estimating Software

For four years, I’ve been using the point system for estimating software effort.  This post is an attempt to convey all the variables involved in estimating software effort.  I’ll also touch measuring effectiveness of a software project.

First, as a manager, I want to know if my software organization is being effective.  Furthermore, I want to be able to measure each team in my organization and do some comparison between them.  What is the metric by which I should measure them?  Also, how do I measure and communicate improvement or degradation in effectiveness?

Many people have wrestled with this, and I am just one of them.  I reached the conclusion that points cannot be used as a metric of effectiveness, and I’ll illustrate why.

A “point”, as used for software estimation is a measure of effort.  It is a relative measure of effort, and it’s context is completely within a single development team.  One team’s point is not the same size as another team’s point.  Because a point is relative within a single team, it can only be used to infer information about that team.

It is tempting to use a point as a measure of value delivered.  As a manager, I want to know how much value is being delivered because my goal is to maximize the value delivered within a given iteration.  I’m still left with the problem of how to measure value.  This post will not answer that question since that is a very holistic topic and not isolated to software development.

Sometimes I would like a point to be a measure of value.  If it were, then I could just mandate more points delivered.  As a manager, I can do that.  I can mandate that the team deliver more points per sprint.  The team, in self defense of an non-actionable request, will deliver more points in the sprint. . . but I will see the project progressing at the same rate.  Because points are relative and not rooted in anything concrete, their size is controlled by the team using them. 

So I’m left without a measure of value.  All I have is estimates of effort.  

If I accept that, I will still want some way of knowing if the team is speeding up or slowing down over time.  This industry is plagued with teams that slow down.  As defect lists grow, the team seems to slow to a crawl.  If I’m rigorous about automation, I can keep my teams from suffering that fate (I’ve proven that).  However, I still want to know how much my teams are speeding up over time.  I know from past experience that agile engineering practices cause software development to accelerate over time as opposed to speed up; however, how can I quantify that?

I could turn to mapping points delivered per iteration.  If I measure that over many months, shouldn’t I see an upward trend?  That would seem only logical, right?  As the team finds better ways to do things. . . as the team forms solid standards around the project and refactors common functionality into standard components within the software. . . shouldn’t the team be able to deliver more software each iteration?  The answer is YES, but oddly enough, the number of points stays mostly flat over time.

How can the above be true if the team is actually speeding up?  This is because the point is a measure of effort.  As time goes by the team finds ways to make delivering similar functionality easier.  Because the standards and componentization makes things easier, the effort for a similar feature actually decreases.  The effort required to deliver a similar feature is less than it was before.  This causes the team to estimate less than they had previously estimated.  At the end of the iteration, the team has delivered more, but my graph of points stays flat.

This can be very frustrating, since I’m still grasping for my metric.  I need a graph.  I need a report.  I need statistics about how well my software teams are doing.  If a point is a measure of effort, and I have finally accepted this, then it will stay flat over time.

The whole point of this post is that worrying about points delivered is futile in the big picture.  Because it is so consistent, it is a great predictor, but it is not open to suboptimization.  Points remaining to the next milestone is a better measure because that metric is at least relative to the effort metric.  When effort decreases per unit of value, the points remaining to the next milestone will also decrease.

This whole scenario is a bit frustrating, but if you, dear reader, have found the magic metric for measuring project effectiveness, please let me know in the comments.