One thing to do is to show your estimates as distribution curves instead of averages. It can be helpful to choose estimates from a Fibonacci scale. Set the pessimistic estimate high enough that it’ll rarely be missed.
Edit: I change the units based on how far out I’m estimating. If I’m estimating the next six weeks, I’ll use hours. If I’m estimating a roadmap I’ll use weeks. The numbers get large quickly (1 2 3 5 8 13 21 34 55 89 144). It’s rare you’ll find yourself planning farther than 2 years out (104 weeks).
As you work, keep updating the estimated time remaining. If the initial estimate for a task is an 8 (between 5 and 13), keep reporting 8 until your estimate falls below 8, then report 5 (between 3 and 8). Avoid reporting intermediate values like 7.
Part of the answer is don't give estimates when you can help it. I've had a lot of success just having a roadmap and telling people, this is what we're working on now, this is what's next in the pipeline, and here's where you can track our progress for yourself.
Another tip I have is to constrain estimates to board timescales like Q3 or H2 when you are able to (there are certain business scenarios that do require estimates).
A similar thing you can do is give your "estimate" not in the form of a date but in the form of ambiguity and scale. Or better yet just state the factors that would go into giving an estimate without offering one.
Lastly I'd suggest just not shying away from commitments but making them about effort not outcome. "I have 3 developers working on this full time and it will be their sole focus until we ship. Nothing you say or do is going to get this feature launched any faster than it's going to be now."
Given that other skills work at different time scales, this doesn't always work. Two of the most common examples:
1) You work anywhere near money and commerce. The existence of Black Friday and the following weeks of shopping frenzy ensure that you will always be very aware what date Thanksgiving is. And that everybody will need timelines especially close around that date.
2) You work on a product that also gets marketing. There's a lead time of several months for a good marketing effort with coordinated press, and you really don't want to have all that lined up and then blow your timeline.
If you can help it, at all, learn estimation. Sure, don't share it if your management is incompetent at handling estimations, but the ability to predict a timeline with error bars is extremely useful. Practice by yourself. You'll be happy you did.
(And if your estimates are reasonably correct and you have decent management, you experience magic like "OK, then let's cut scope" or "Is there anybody who'd accelerate this if they were on your team". With a heavy nod to the fact that there are not enough managers who can pull off that magic - because they never understood estimation)
If you run estimates with an abstracted unit of measurement ("ideal developer days," "story points," "cups of coffee," "tshirt sizes" etc), then you get three important super powers for this.
1) Your long range timelines come with a specific, quantified error margin. It's no longer "we'll be done by December 15," it's "we'll be done by December 15 with a 95% confidence, or January 15 with a 99% confidence." (Not to mention, those CIs are real and so your estimates have a very high degree of certainty)
2) your estimates have explicit conditions built in, most notably "based on our current understanding of the work." The door is already explicitly open to respond to feature requests, changes, or just new information with an estimate change.
3) Your estimate adjusts very quickly, so the conversations from #2 are also clear.
You can make commitments under those circumstances a lot more easily.
No idea why "story points" or "cups of coffee" or "shirt sizes" have much relation to time. I mean... I get it, but... many places I've worked also go to pains to say "this isn't hours, we're just estimating relative complexity". But plenty of issues are extremely complex, but may only take a few days, vs some items which are less complex, but larger (touching a number of files, or screens, etc).
With that rubric, 8 story points might take 3-4 days of focused concentration, and 3 story points might take 5-6 days of less focused but more brute force. Nowhere I've worked accepts that as legitimate, and want to redefine the language in to something that approximates time. So... why not just estimate days or hours anyway?
Both an estimate of "large shirt" and "30 hours" can have 'explicit conditions built in'. This will be 30 hours with my current understanding of the request. If that understanding changes, 30 hours will change. I don't think you need 'shirts' for that?
I can easily make commitments if the people I'm committing to are fine with a change in the dates. That's a big 'if', and not one that plays out positively most times.
This is the rub, because most places I've been at, the commitment ends up being treated as a deadline, because... that seems to be how people work. "Dec 15 with 90% confidence" becomes "dec 15" and other parties start making plans and decisions based on "dec 15" without any consultation or being looped in to the process, and when 'dec 15' has to become 'jan 10', many many people are impacted and generally upset.
Because humans are extremely bad at estimating time, which is borne out by studies many times over. A good overview is the original research on this, for which Kahneman et al won a Nobel prize (yes, the same Kahneman who would later go on to write HN favorite, "thinking fast and slow"). The broad stroke is, the very best time estimators in the very best circumstances only underestimate their time needs by 33%. The norm is more like 80%. They propose a few time estimation strategies to get around it, like "third party estimation" and "tripartite estimation". But the simplest approach (which emerged in later research) is to ask them to estimate "size" of task, and use statistical corellation to convert that to a number.
This last is hand wavy unless you're familiar with the law of large numbers, the law that makes casinos profitable. A casino cannot (without cheating) determine the outcome of a single roulette spin. But they can predict with extremely high certainty the aggregate outcome of a thousand spins. This is the same with your estimates. You can't predict the corellation to time of a single story point. As you pointed out, sometimes something that looked complicated turns out to be easy and vice-versa. But given a sufficient sample size (of estimates with a consistent corellation to time), you can predict with extreme accuracy the time for 1000 story points.
"Consistent corellation to time" is a bit of a PITA in a group, BTW. If you have developers do their own estimations individually, each one will have a different corellation to time. You would need a very large sample size to overcome that much variation. This is why so many systems encourage team estimation, so the consistency is dependent on the team dynamic, which is much more stable even when adding/removing engineers. But as I said, if it's the same person or team always writing your tasks, you can use their team dynamic instead, since their story size will be consistent.
FWIW by sufficient sample size, I mean after about 3 sprints (of any duration) you can make reasonable predictions. After 6 sprints you'll have confusing outliers, and after about 9 sprints it will be clear with some numerical weight to it.
Which brings up question 2, "the commitment ends up being a deadline". This is a human nature thing, you're right! But the problem isn't a mismatch between human nature and your estimate. The mismatch is between human nature and the uncertainty of reality. How you push to improve this is contextual to your org. In hard situations I reverse the statement of my estimate, to "if we set dec 15 as the deadline, there's a 5% chance we won't make it. What's our fallback?" Asking that question a lot is helpful. But there's no magic bullet to making leadership - or worse, people who are afraid of leadership - plan appropriately for uncertainty. The best you can do is expose the uncertainty as clearly as possible, and give lots of lead time for the times when they still run into conflict between deadline, resources, and scope. After that, it's the manager's job to "manage" things and decide which variable they will alter to break the conflict.
Put another way: reality is uncertain. When that uncertainty leads to a conflict between deadline, scope, and available resources - because that will happen sometimes per point 1 - only someone with deadline, scope, or hiring authority can solve it. That's (usually) not within your purview as a lead engineer. The best you can do is to 1) call out the uncertainty as clearly as you can, as early as you can, and 2) signal that conflict as early as you can, so those managers have maximum leeway. Abstracted estimation makes that possible. Guesses and hopes don't.
Yes, you can do the same math if your stories are consistently sized, or if you have a sufficiently large data set. All you need is a consistent unit that is related to complexity, risk, and time. The less precise that relation, the larger sample size you need.
For many teams who get requests from external stakeholders in widely varied technical environments - ie consulting, often - estimation is functionally just a conversion process to a consistent unit. But you're absolutely right that for some teams the stories themselves are a good enough unit.
I think the demand for timeline commitments comes from a worry that otherwise the request will never be done, or that it will languish in the queue behind less important things.
Where possible, I try to frame my relationship with my stakeholders in terms of their priority order for their requests of me, and to demonstrate consistent progress on that stream. Thus, we have an understanding that the things they ask for do get done, and they want a particular one done sooner, they can move it up in the stack rank.
Could not agree more with this, and it can scale to small teams. Establishing consistent “velocity” and demonstrating progress is way more important than estimating individual features.
You could start by not doing estimates? Instead, take an educated guess of the complexity of the problem and then ask how long the enterprise is willing to commit to implementing it.
Estimates are just a way of delegating responsibility downwards. The organisation is rarely asked to estimate how much a feature is worth, despite this being probably the more important factor.
Imagine a timeline from left to right. On the left is where there's the least knowledge about the project, and on the right the most (when the project is done). Estimates are made at the far left, when the developers know the least.
So as you move along the timeline and more knowledge is discovered, you need to reset expectations frequently.
If you need to actually make meaningful commitments, you need to track how long stories/projects/whatever actually take, and use the data to make a prediction. Subjective “estimates” can be an input, but they’re basically guesses.