You're proving his point, which is that people's intuition is often dangerously wrong. The sample size is actually often nowhere near as important as your sampling method. For an extreme example, I can conduct a billion coin flips with a weighted coin, but it's going to tell me jack about the general behavior of coin flips. Avoiding sample bias is a hard problem, far harder than most people appreciate.
To complicate matters further, "accuracy" is an extremely tricky concept in statistical analysis because error rates work very differently than they do in, say, physics. In physics, when you measure something you can be sure that your results are accurate, so long as you stay outside your instrument's range of error. In stats, your confidence interval just tells you how likely it is that your results are completely wrong, or even worse, wrong by a completely unknown amount. Every statistical inference you make has a chance of completely blowing up on you. That chance can be defined and reduced, but it can never be eliminated. There's also things like frequentist vs Bayesian statistics, where the interpretation of the same data can be completely different.
Zed may be a jerk sometimes, but on this topic he's dead right. Most programmers are far more confident about this stuff than they should be.
After doing some research, I think I've completely abused the notion of the confidence interval. Which I think just helps to prove Zed's point. This stuff be hard.
To complicate matters further, "accuracy" is an extremely tricky concept in statistical analysis because error rates work very differently than they do in, say, physics. In physics, when you measure something you can be sure that your results are accurate, so long as you stay outside your instrument's range of error. In stats, your confidence interval just tells you how likely it is that your results are completely wrong, or even worse, wrong by a completely unknown amount. Every statistical inference you make has a chance of completely blowing up on you. That chance can be defined and reduced, but it can never be eliminated. There's also things like frequentist vs Bayesian statistics, where the interpretation of the same data can be completely different.
Zed may be a jerk sometimes, but on this topic he's dead right. Most programmers are far more confident about this stuff than they should be.