"False positive" has a precise meaning in statistical analysis. The second thing you're talking about is useful to know, but calling it the false positive rate is just wrong.
"Out of 1000 positive tests, 50 were false positives, 950 were true positives" - valid statement.
"The false positive rate was 50 out of 1000" - abuses a common technical term in a way that sounds valid on the face of it, but which is potentially VERY misleading.
We can't even calculate a valid false positive rate from the above data, since that requires taking the ratio vs. all tests and not just positive tests.
Here is a definition of False Positive from wikipedia: "In medical testing, and more generally in binary classification, a false positive is an error in data reporting in which a test result improperly indicates presence of a condition, such as a disease (the result is positive), when in reality it is not"
I don't see anything here that definitively clarifies which of the 2 scenarios above it can exclusively be applied to.
That's what a "false positive" is but Wikipedia also has a separate article on "false positive rate", which gives the formula
FP / (FP + TN)
Where FP is number of false positives, and TN is number of true negatives. So it's a third option:
- Out of 1000 actually negative samples, 50 were tested as positive.
So in the case of 1000 samples, 949 correctly testing as negative, 50 incorrectly testing as positive, and 1 correctly testing as positive, the false positive rate is 50 / 999.
Right; this is the definition of the numerator (the number of false positives). The false positive RATE also has a denominator, which is defined as the total number of tests performed (the second case in the parent poster's question).
Dividing by the number of positive tests instead gives what's called the 'false discovery rate' which is pretty rarely used.
The traditional way (that I'm aware of) of defining the false positive rate is derived from the conditional probability of a positive prediction given that the true underlying state is negative:
False Pos. Rate = P(predict + | state -)
= P(predict + and state -) / P(state -)
= P(predict + and state -) / (P(predict + and state-) + P(predict - and state -))
~ #FP/samplesize / (#FP/samplesize + #TN/samplesize)
= #FP / (#FP + #TN)
The latter quantity is usually given as the definition of false positive rate. Roughly speaking, it's the ratio of how often you predict positive when the state is negative versus how often the state is negative.
The first way is wrong. To show you clearly way: If you apply 1000 tests to a sample in which everyone is ill, there is 0 percent of false positives because a false positive requires the person to be sane.
So to detect false positives in a mathematical way, you should apply the test only to sane people and now the proportion positive/total is an estimation of the false positive rate.
- Out of 1000 tests 51 will be positive, 50 of which are incorrect.
- Out of 1000 positive tests, 50 will be wrong, 950 will be correct.
The 2 interpretations give vastly different results.