I was surfing the web and stumbled across a fascinating example of the application of Bayesian statistics that I thought had some pedagogical power to it. The original post, which is self-admittedly excruciating, is here. In any case, here’s the data for the example:

1.) 1% of women over the age of 40 who participate in routine screening have breast cancer

2.) 80% of women with breast cancer will get positive mammographies

3.) 9.6% of women *without* breat cancer also get positive mammographies

Question: A woman in this age group has a positive mammography. What is the likelihood she has cancer?

Apparently most doctors get the answer to this question wrong. Perhaps surprisingly (depending upon how your brain naturally processes these statistics) most doctors answer that the likelihood that the woman has breast cancer is somewhere between 70% and 80%. Before I show you the correct answer I will note that only 15% of doctors actually get this right as it is worded. Rather, if it is worded as follows, 46% of doctors get it correct:

1.) 100 out of 10,000 women over the age of 40 who participate in routine screening have breast cancer

2.) 80 of 100 women with breast cancer will get a positive mammography

3.) 950 of 9900 women *without* breast cancer will also get positive mammographies

Question: A woman in this age group has a positive mammography. What is the likelihood she has cancer?

The correct answer is 7.8%. How do you get that result? Well, the total number of women who get *positive* mammographies is 950 + 80 = 1030 (notice that it doesn’t really matter how many get a mammography at all, just how many who had one got a positive result). But only 80 of them actually turned out to have cancer. Therefore:

Pr(cancer if MMO +) = 80/1030 = 0.07767 or 7.8%

What’s the key to Bayesian statistics? The key is prior knowledge. Bayesian probabilities can easily be modified if the given information changes. In a sense it is because there is some correlation or link between certain quantities. The way in which a typical probability is interpreted is as a measure of how frequent an event is. So if you roll a pair of dice 10 times in a row and come up with a roll of four more times than any other roll you might be tempted to think that four is the most likely roll on any pair of dice which is patently false (theoretically a roll of seven is the most likely). This is the frequentist interpretation of probabilities. The Bayesian interpretation assigns probabilities to propositions that are uncertain since it is in some sense a measure of the degree of certainty. Certainly there are plenty of instances when the two give the same result but often cases where they do not. In the analysis above it is important not to care about frequencies, rather just exact data for the given situation. In the dicing example the number of rolls wouldn’t make a difference. Rather a Bayesian analysis might look at the full situation and make and argument from that (in that sense I argue that the idea of counting microstates and macrostates in order to determine probabilities is Bayesian since it has nothing to do with how frequently an event *occurs* but rather how many possible combinations are available, that is to say the amount of knowledge one has).

What does this have to do with Bell’s inequalities? Well, in Wigner’s derivation of Bell’s inequalities he *clearly* uses the frequentist approach to probabilities. Are Bell’s inequalities inherently frequentist then? Not necessarily since it is quite clear that one could consider even the Wigner form and assume that information about two of the systems both independently depend on (or can be informationally updated from) a third. Plenty of authors have considered this point of view but the details are beyond this current post.

**Note to my students:** Think you’ve found an elusive macroscopic violation of (A, not B) + (B, not C) ≥ (A, not C)? Post it here!