It really depends on how you define "False Positive Rate". There are at least 2 ...

felisml · on March 21, 2016

"False positive" has a precise meaning in statistical analysis. The second thing you're talking about is useful to know, but calling it the false positive rate is just wrong.

"Out of 1000 positive tests, 50 were false positives, 950 were true positives" - valid statement.

"The false positive rate was 50 out of 1000" - abuses a common technical term in a way that sounds valid on the face of it, but which is potentially VERY misleading.

We can't even calculate a valid false positive rate from the above data, since that requires taking the ratio vs. all tests and not just positive tests.

SeanDav · on March 21, 2016

Here is a definition of False Positive from wikipedia: "In medical testing, and more generally in binary classification, a false positive is an error in data reporting in which a test result improperly indicates presence of a condition, such as a disease (the result is positive), when in reality it is not"

I don't see anything here that definitively clarifies which of the 2 scenarios above it can exclusively be applied to.

felisml · on March 21, 2016

There are many ways in which you can incorrectly interpret statements on Wikipedia, which is why specialized textbooks and so forth still have use.

timbre · on March 21, 2016

That's what a "false positive" is but Wikipedia also has a separate article on "false positive rate", which gives the formula

FP / (FP + TN)

Where FP is number of false positives, and TN is number of true negatives. So it's a third option:

- Out of 1000 actually negative samples, 50 were tested as positive.

So in the case of 1000 samples, 949 correctly testing as negative, 50 incorrectly testing as positive, and 1 correctly testing as positive, the false positive rate is 50 / 999.

sdenton4 · on March 21, 2016

Right; this is the definition of the numerator (the number of false positives). The false positive RATE also has a denominator, which is defined as the total number of tests performed (the second case in the parent poster's question).

Dividing by the number of positive tests instead gives what's called the 'false discovery rate' which is pretty rarely used.

chestervonwinch · on March 21, 2016

The traditional way (that I'm aware of) of defining the false positive rate is derived from the conditional probability of a positive prediction given that the true underlying state is negative:

    False Pos. Rate = P(predict + | state -)
                    = P(predict + and state -) / P(state -)
                    = P(predict + and state -) / (P(predict + and state-) + P(predict - and state -))
                    ~ #FP/samplesize / (#FP/samplesize + #TN/samplesize)
                    = #FP / (#FP + #TN)

The latter quantity is usually given as the definition of false positive rate. Roughly speaking, it's the ratio of how often you predict positive when the state is negative versus how often the state is negative.

2is24 · on March 21, 2016

The first way is wrong. To show you clearly way: If you apply 1000 tests to a sample in which everyone is ill, there is 0 percent of false positives because a false positive requires the person to be sane.

So to detect false positives in a mathematical way, you should apply the test only to sane people and now the proportion positive/total is an estimation of the false positive rate.

im3w1l · on March 21, 2016

The latter ratio is called precision. There is a handy table at https://en.wikipedia.org/wiki/Precision_and_recall#Probabili...

dmoy · on March 20, 2016

Are there actually two accepted ways of defining type I error rates? (Genuinely curious, I am not a statistician)