It's a neat math trick, but it seems more accurate to say this lets you calculat...

klodolph · on Jan 15, 2014

Saying that it's silly to think that food carts have $5,000 in change is unproductive because I was illustrating how the calculation works, not how the economics of food carts works, and the numbers I used for illustrative purposes were not intended to reflect reality. (1,000 customers in a day? Not likely, my guess is 200 for the busiest food carts.)

But it's good to have bad estimates, at least, it's better to have bad estimates than it is to have no estimates at all. I'm not saying that standard deviation is a substitute for more thorough analysis, just that standard deviation is an improvement over just talking about the mean.

Another example: We'd like to hire you, the mean number of hours per week you'd work is 40.

Versus:

We'd like to hire you, the mean number of hours per week you'd work is 40, and the standard deviation is 15. So your bad estimate is that you'd have two 70-hour weeks each year. But it's better than no estimate.

skybrian · on Jan 15, 2014

Sure, two points is better than one but what's special about two? I'd rather have a graph. We have computers so there's rarely a reason to compress the data so much.

klodolph · on Jan 16, 2014

We often have to compress the data down to a single decision or statistic: yes/no should I accept the job offer, how much money should I save before buying a house, or what's the probability that I'll die in the next 10 years.

I hate to quote XKCD, but it's like saying your favorite map projection is a globe (http://xkcd.com/977/). Yes, you've preserved all the data, but even with computers, your beloved graph will not make it all the way to the end.

skybrian · on Jan 16, 2014

Preserving all the data is the logical endpoint but that's not what I was suggesting. I'm just saying there's nothing special about keeping two points.

I'd rather not feed two points to my decision algorithm, whether it's machine learning or a human looking at the data. It makes more sense to make some attempt to preserve the shape of the graph unless you have strong reason to believe it's Gaussian, and even then the assumption should be checked.