Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a neat math trick, but it seems more accurate to say this lets you calculate bad estimates on the back of a napkin. Unless you really think food carts carry $5000 in change.

The quantitative work I do has to do with measuring latency, where the minimum, median, 90%, and 99% values are more meaningful than the mean or standard deviation. Programs typically have a best-case scenario (everything cached) and a long one-sided tail.



Saying that it's silly to think that food carts have $5,000 in change is unproductive because I was illustrating how the calculation works, not how the economics of food carts works, and the numbers I used for illustrative purposes were not intended to reflect reality. (1,000 customers in a day? Not likely, my guess is 200 for the busiest food carts.)

But it's good to have bad estimates, at least, it's better to have bad estimates than it is to have no estimates at all. I'm not saying that standard deviation is a substitute for more thorough analysis, just that standard deviation is an improvement over just talking about the mean.

Another example: We'd like to hire you, the mean number of hours per week you'd work is 40.

Versus:

We'd like to hire you, the mean number of hours per week you'd work is 40, and the standard deviation is 15. So your bad estimate is that you'd have two 70-hour weeks each year. But it's better than no estimate.


Sure, two points is better than one but what's special about two? I'd rather have a graph. We have computers so there's rarely a reason to compress the data so much.


We often have to compress the data down to a single decision or statistic: yes/no should I accept the job offer, how much money should I save before buying a house, or what's the probability that I'll die in the next 10 years.

I hate to quote XKCD, but it's like saying your favorite map projection is a globe (http://xkcd.com/977/). Yes, you've preserved all the data, but even with computers, your beloved graph will not make it all the way to the end.


Preserving all the data is the logical endpoint but that's not what I was suggesting. I'm just saying there's nothing special about keeping two points.

I'd rather not feed two points to my decision algorithm, whether it's machine learning or a human looking at the data. It makes more sense to make some attempt to preserve the shape of the graph unless you have strong reason to believe it's Gaussian, and even then the assumption should be checked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: