Using reference class forecasting to improve your estimates

Nov 14, 2016

I’d like to share with you the lessons we’ve learned trying to improve the accuracy of our estimation process. Many agile software businesses use story points to estimate the complexity of features. A story point is not meant to be a measure of man-hours required to deliver a feature, but of comparative complexity. In our business, story points are part of our business model and must be tied to some time period in order to set sensible rates. TrikeApps uses a shared-risk model to charge for feature delivery, based on story points. In order to be financially successful Trikelings need to be — on average, across a sprint — accurate with their estimates. In this post, I talk about the steps we’ve taken to improve our estimation accuracy without introducing enormous amounts of overhead and what has and hasn’t worked for us. If your business relies on accurate developer estimates to be profitable, there are some valuable lessons to be learned.

Lessons we’ve learned from past approaches

There is a significant body of literature that suggests that reference class forecasting, or taking the “outside view”, leads to much more accurate estimates. Planning fallacy is a common cognitive bias that leads us to produce more optimistically inaccurate estimates of effort as we are presented with more and more detail. To avoid the planning fallacy, we are much better off looking at similar, completed features and selecting the same estimate rather than drilling down into the details of what needs to be done. Throughout all of our approaches, our developers were encouraged to use reference class forecasting while estimating.

We’ve made several attempts at improving estimation accuracy with mixed results. We tried planning poker but found that gaining consensus for all feature estimates each sprint was far too time consuming. Many of our less-experienced developers were not comfortable challenging more experienced developers and subsequently disengaged during discussion. Furthermore, we found that this process made our estimates no more accurate than individuals estimating on their own. A subsequent attempt to have developers estimating stories in pairs yielded similar results. We tried generating multiple possible implementation plans individually and then paired on estimating them; this produced better estimates but bottlenecked our development process.

Our first attempts at using reference class forecasting were time consuming. In order to do it effectively, we needed a canonical set of stories that we could use for reference. We tried to do this by creating a set of tags that described the broad functional areas of our systems and tagging each of our completed stories. When estimating, we could then easily bring up a list of features we’d implemented in a similar functional area. The problem with this approach, apart from needing a significant amount of time and discipline, was that we weren’t taking into account whether we were using stories that had been massive blowouts in our canonical dataset. It’s no good saying “my feature is like that two-point feature” if that two-point feature took the equivalent of five points to deliver.

Less overhead but more cognitive load on developers

Our latest attempt feels much more natural and requires much less additional overhead. At the end of the sprint our developers tag a feature as “canonical” if they would give it the same estimate should they need to do it again. We’ve built an estimation tool that asks for high-level details of a possible implementation and comes back with another canonical feature from the same project for comparison. The developers are asked if the feature being presented is less complex, equally complex or more complex than the approach they’ve come up with. Once their answers have narrowed down the possible estimate to a single value (we use the Fibonacci sequence for possible point values) that value is selected as the estimate. At no point during this process does the developer actually see what that estimate has been set to. This discourages anchoring and second-guessing.

This approach has some drawbacks — it takes a bit more cognitive effort to look at the canonical example and get a feel for its complexity, especially if it’s an older story — but on the whole it’s started to yield better estimates and happier developers. Our developers estimate individually, but the developer who actually begins work on the feature has the power to change the estimate providing no more than 20% of the original budget has been expended and the client agrees to the revised projected cost; tracking whether this rule results in measurably better outcomes is a work in progress.

What’s your method for estimating, and does it work for you?.

<< Back to blog post listing