Bayesian Probability

[Originally posted Oct. 12, 2009]

Spent the early hours of this morning reading a great blog post by Jeff Atwoodwhich makes reference to a Elizer S. Yudkowsky’s Intuitive Explanationof Bayesian Probability. The timing was good as I had just finished the (highly recommended) book The Drunkard’s Walk which agreed with Jeff that most humans simply are not wired to deal with probability very well.

The Drunkard’s Walk: How Randomness Rules Our Lives (Vintage) (9780307275172): Leonard Mlodinow: Books
ISBN: 0307275175
ISBN-13: 9780307275172

Yudkowsky poses the following canonical problem:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

The frightening thing is that according to Atwood, only 15% of doctors get this right. And they’re off by a lot. That is, the average answer is in the range of 80% while the correct answer is 7.8%. Apparently, there is something about the way we think about the problem that makes 7.8% hard to accept, and Yudkowsky does a great job of walking you through the logic in painfully small steps.

To me, however, there is a pretty straight-forward way to think about this (though it may only be intuitive since I’ve been through this a few times).

What Do We Know & What Does It Imply?

We have three pieces of information:

1% of sample are TRUE (that is have cancer)

80% of sample who are TRUE will test TRUE

9.6% of sample who are FALSE will test TRUE.

On the face of it, we should guess that the percentage of women who test TRUE who actually are TRUE (test positive and actually have cancer) is pretty small based on two facts provided: the actual percentage of women from the sample who are TRUE (regardless of testing) is only 1%, and the test has a false positive for 9.6% of those tested.

So, my reasoning to solve this is:

1. Assume we have a sample of 1000 women (I use 1000 to reduce the amount I have to talk about fractional people, but I don’t use 10,000 as I get lost in the zeros).

2. We know that the reality is that of the 1,000 women, 10 will have cancer (1%).

990 = no cancer
10 = cancer

3. Of the 10 who have cancer, 8 will test positive
8 out 1000 women tested will test True and are True

4. Of the 990 with no cancer 9.6% will also test positive = 990 * .096 = 95.04.
95.04 women out of 1,000 will test True but are False.

5. The total number testing true is 8 + 95.04 = 103.04.
Of these, 8 actually have Cancer.

6. So the value for tests positive (103.04) versus is positive (8) is 8:103.4 or 0.773 or 7.8%
(8 of the 103.4 = 8/103.4)

Not Being Misled

The key to this and many problems like it is to realize that what you are trying to find is the relationship between those who Test positive vs. the reality of those who are positive, which is why you need all three numbers.

Probability is Weird and Cancer Is Scary

True story: I have a friend who is the head of breast cancer surgery at averyrespected hospital. I was at a dinner party where he mentioned that a “very small percentage of those who show up with a positive test actually have cancer.” This caused a lot of confusion, and that is because we (a) don’t deal with probability well and (b) don’t understand policy tradeoffs as a result.

From a policy point of view, it is too expensive (money, time, etc.) to test everyone with a biopsy, etc. The preliminary screening is sufficient to move your knowledge from the general 1% probability to the more specific 7.8%, which is enough to pursue. The fact that 20% of women with cancer will get a false negative doesn’t mean that it is better to test everyone, and the fact that out of every 1,000 tested 95 will falsely test positive (and be subjected to needless worry) also does not mean that it isn’t worth using the screening test.

That said, it would be good if, before you took the test, you were told that a positive result means you have less than an 8% chance of actually being positive and that in any case, the test doesn’t change the reality of whether or not youarepositive!

This kind of confusion leads to people not flying to countries where there has been a terrorist incident but happily driving across country.

It is estimated that after the 9/11 attack, more people lost their lives by choosing to drive when they otherwise would have flown than were killed in the actual attack.