Rationally Speaking is a blog maintained by Prof. Massimo Pigliucci, a philosopher at the City University of New York. The blog reflects the Enlightenment figure Marquis de Condorcet's idea of what a public intellectual (yes, we know, that's such a bad word) ought to be: someone who devotes himself to "the tracking down of prejudices in the hiding places where priests, the schools, the government, and all long-established institutions had gathered and protected them." You're welcome. Please notice that the contents of this blog can be reprinted under the standard Creative Commons license.

## Thursday, November 29, 2012

### Odds again: Bayes made usable

 2.bp.blogspot.com
by Ian Pollock

[Note: this post assumes basic familiarity with probability math, and also presupposes a subjectivist view in philosophy of probability.]

Readers of this blog, and of others a few Erdos numbers (Massimo numbers?) away from it, will by now be used to having Bayes’ theorem hammered into their heads all the time, as the Great Equation of Power and the Timeless Secret of the Universe.

I suspect that I am not the only one who has occasionally felt somewhat disingenuous when harping on Bayes. Even though I do actually think it’s the secret of the universe, memorizing the formula is liable to become little more than a signal of in-group identity (along the lines of being able to recite the Nicene Creed or the roster of Local Sports Team), unless people know what it means, and how to sometimes actually maybe possibly use it.

When I talk about “using” Bayes theorem, I have a different picture in mind than what you may think. I do not necessarily mean a textbook problem with all the needed information clearly specified and relevant numbers handed to you. What I tend to think of instead are problems like:
“The car in front of me just swerved halfway into my lane. How likely is the driver to be drunk?"
These underspecified problems are the meat of day-to-day probability judgments.

But let’s look at Bayes theorem as traditionally presented:

P(H|E) = P(H)•P(E|H) / ( P(E|H)•P(H) + P(E|¬H)•P(¬H) )

[Terminology: P(_) stands for “probability of _,” H stands for “hypothesis,” E stands for “evidence,” the vertical bar stands for “given,” e.g., P(E|H) is the “probability of E given that H is true”, and finally ¬ means “not.”]

This formula is hideous on at least two levels:

First, it has too many terms (some repeating) and too many operations. You end up performing 2 or 3 multiplications, 1 addition, 1 subtraction ( P(¬H) = 1 - P(H) ) and 1 division, in order to get the answer. This does not conduce to doing the arithmetic in your head in real time, unless you are unusually good at arithmetic and have good fluid memory (neither of which apply to me).

Second, and perhaps most importantly, it is conceptually opaque. You do not see the structure of reasoning when you look at Bayes’ theorem in that form; all you see is a porridge of symbols. The “prior” that Bayesians are always harping on about, P(H), appears three separate times, once in the numerator and twice in the denominator, all tangled up with P(E|H) and P(E|¬H) — the “evidence terms.” Granted, the denominator is really just an expansion of P(E), which makes it a bit less opaque. But you can rarely calculate P(E) without doing the expansion.

Notice that when we speak of using Bayes’ theorem we are speaking of modifying (1) prior judgment in the light of (2) evidence to arrive at (3) a new judgment. Ideally, we would like a formula that looks more like:

posterior = prior [operation] evidence

Well, here is Bayes’ theorem in odds form:

O(H|E) = O(H) * P(E|H) / P(E|¬H)

As you can see, it consists of only one division and one multiplication. And lo, O(H) is just the prior odds, and the ratio P(E|H)/P(E|¬H) corresponds to “evidential strength,” although the literature usually calls it a likelihood ratio or a Bayes factor.

If you’re not used to how odds work, now would be a good time to check out my old article on them, in which for some inscrutable reason I didn’t get round to talking about their advantage in re: Bayes’ theorem. The rest of this article assumes you are moderately comfortable with odds talk.

Let’s see how Bayes works with an example.

In the classic 1957 film “12 Angry Men” (one of my favorites), a young man is accused of killing his father. One of the pieces of evidence brought against him is the fact that he was identified by a store clerk as having recently purchased a switchblade knife with an unusual handle, and the same kind of knife had been found on the body (wiped of fingerprints). See a nice clip here of the jurors debating the relevance of this piece of evidence.

At first, the unusual character of the knife led the jurors to believe that it was, if not one of a kind, at least very rare. But they are led by the touch of Henry Fonda’s cool hand to modify that assessment and consider the knife a much more commonplace one than they had thought. One of the hawkish jurors then asks petulantly: “Maybe there are ten knives like that, so what?” So what indeed.

We are interested in estimating the odds that the boy is guilty, given that he had purchased a knife the same as the one found at the murder scene — O(guilty|knife). Let us assume that it is certain that the boy did indeed purchase the knife as the store clerk said (actually a very charitable interpretation in the prosecution’s favour).

The first thing we need to think about is our prior. This represents what we think the chance is that the boy committed the murder, before the knife evidence is considered at all. Different people will have different priors, but let us suppose that enough evidence had been presented at trial already to make you consider him 20% likely to be guilty, or odds of 1:4 in favor: O(guilty) = 1:4.

We still need to know two more things.

First, P(knife|guilty) — assuming the boy is guilty, how likely is the knife evidence?
Well, it is not beyond the realm of possibility that the boy could have stabbed his father and disposed of the knife altogether, so even if he is guilty, there is no guarantee of seeing the knife. However, since we know he did buy an identical knife, it is not very surprising to see it at the crime scene if he is guilty. Let us estimate this probability as P(knife|guilty) = 0.6.

We also need to know P(knife|¬guilty) — assuming the boy is innocent, how likely is the knife evidence?

If (as the jurors at first seem to assume) there is only one knife in the whole world that looks like the murder weapon, and we know that the boy bought it, then the only plausible way it could have been the murder weapon and yet the boy be innocent, is if somebody else acquired it from the boy, and then used it to kill the boy’s father. One can understand the hawkish jurors’ impatience with this “possibility.” It requires not only that the boy somehow lost possession of the knife, but that somebody else (coincidentally?) wanted to use it to kill his father in particular. This rates a very low probability, let us say 1000:1 against or P(knife|¬guilty) = 0.001.

Now we have everything we need to figure out the odds of the boy being guilty, given this evidence. We already have the prior — 4:1 against or 1:4 in favor. The “evidential strength” is just the ratio of P(knife|guilty)/P(knife|¬guilty) = 0.6/0.001 = 600. We just multiply the prior by the evidence:

O(guilty|knife) = (1:4)*600 = 150:1 in favor of guilt.

So far so good, although the three numbers involved can all be quibbled with. But here is where Henry Fonda’s duplicate knife becomes important. It does not really change the top part of the evidence ratio: P(knife|guilty) is about the same. But suddenly that factor of 1000 that was making the boy look so guilty is going to drop, because now we know that the killer had access to lots of identical knives, not just the defendant’s. Now it looks like P(knife|¬guilty) is just the fraction of all knives available in the victim’s neighborhood that look like the murder weapon. We can guess that this is something like 1 in 10. So the evidence ratio becomes 0.6/0.1 = 6, and we multiply by the prior to get

O(guilty|knife) = (1:4)*6 = 3:2 in favor of guilt.

Thus, what Fonda showed is that although the knife is evidence of the boy’s guilt, it is much weaker evidence than the jurors had been led to believe. We do not convict criminals at odds of 3:2, or at least, we ought not to.

To address one objection I anticipate: yes, many of the numbers above are very rough guesses. Wherever possible, they should be improved upon by more objective data. But in my defense, notice how mapping out the underlying structure of the reasoning directs inquiry to where it needs to go, rather than to irrelevancies. You can challenge the prior I chose of 4:1 against guilt, by saying that the other evidence presented at trial makes him look a lot more guilty than that. You can challenge the drop in the evidence ratio by checking exactly how many of these knives are sold in nearby shops. These are exactly the questions juries should be thinking about.

Meanwhile, other questions, when seen in a Bayesian light, are obviously non-starters. A bigoted juror in the movie makes much of the boy’s poor background, as if that ought to weigh heavily in favor of his guilt. Unfortunately, while his fellow jurors express their disgust at this man’s prejudice, they fail to notice the obvious silliness of the underlying logic in this case. For if the boy is more likely to commit a crime by virtue of living in a bad neighborhood, so too are all the other people in the neighborhood, leaving the boy’s relative chances of having committed this particular crime approximately the same as they would have been if he had lived in a good neighborhood. Likewise, it is not much good emphasizing the victim’s bad relationship with his son, when he had bad relations with innumerable others.

To recap what we did in our example: we had a prior judgment about how likely the boy was to be guilty, not considering the knife evidence. Then, we considered the evidential strength of the knife evidence, which can be summarized with the phrase: “how much more likely was the evidence if he was guilty, than if he was innocent?”

This way of thinking about uncertainty, while normatively correct, departs from how humans automatically reason about these things in two important ways.

First, it gives equal weight to evidence and to prior. This is important because people constantly forget all about their priors as soon as they see evidence confirming a hypothesis. “I just met Sally. She is very adventurous, a real adrenaline junkie. Is Sally more likely to be a skydiving instructor, or an accountant?” Most people will answer that Sally is probably a skydiving instructor, forgetting that although all skydiving instructors are surely adventurous, there are way more accountants than skydiving instructors (and some accountants are adventurous too). The skeptical community usually sums up the insight that priors matter as much as evidence, with Carl Sagan’s excellent slogan “extraordinary claims require extraordinary evidence,” although they sometimes display a woeful lack of inclination to generalize this principle beyond Bigfoot.

Second, it emphasizes that what matters is not that evidence be consistent with some hypothesis, but that it be more likely if the hypothesis is true, than if it is false. This has the side effect of emphasizing the non-binary nature of evidence. Amanda Knox acted oddly (for example, doing a handstand) after the murder of her roommate Meredith Kirchner, about which the prosecution made much hay. The question we now know to ask is, “How much more likely is a person to act oddly after the murder of their friend if they are guilty, as opposed to if they are innocent?”

Um... a little more likely? Maybe twice as likely, at most? Possibly even less likely, as a guilty person might be more careful not to stand out... If this is evidence of guilt at all, it is extremely weak and ambiguous evidence, an evidence ratio of close to 1.

Most of us will not serve on many juries, but the same logic applies, rather famously, to medical tests of various kinds. If I go in for random screening against bowel cancer, and test positive, I am liable to assume that I almost certainly have the disease. However, the questions that really need to be asked at this point are: (a) what’s the base rate in the population (aka, prior) and (b) how much more likely is a positive test if I have the disease than if I don’t?

Wikipedia tells us that Fecal Occult Blood screening for bowel cancer has 67% sensitivity (67% of people with the disease test positive) and 91% specificity (9% of people without the disease test positive anyway). This means the evidential strength of a positive test is P(pos_test|cancer)/P(pos_test|¬cancer) = 67/9 = 7. So whatever the prior odds were, multiply them by ~10. [1]
The base rate for bowel cancer looks to be about 54 per 100,000 or around 2000:1 against, so O(cancer|pos_test) = (1:2000)*10 = 1:200 in favor = 200:1 against. As you can see, a positive test is cause for concern, but not panic. You probably don’t have the disease. In fact, you didn’t even need to look up the incidence in this case - all you needed to do was realize that unless 1 in 10 people in your reference class have bowel cancer (surely not!), your odds of having it are less than 50:50.

I hope that this reformulation of Bayes, mathematically trivial as it is, serves you as well as it now serves me. Even if you don’t actually calculate (hard to do in the messiness of the real world), knowing how it works is, I think, very epistemically salutary.
_______

[1] 7=10 in guerrilla arithmetic. We spit on your bourgeois Peano axioms.

1. Noam Chomsky, as part of a great interview (non-political, all scientific issues) in the Atlantic, indicated he doesn't have a lot of use for Bayesian thinking, in general, at least as it's used (or overused) today. The interview is very worth reading for dozens of reasons: http://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/?single_page=true

1. Interesting interview! I hadn't really engaged with Chomsky in some time, since my university days when we would occasionally debate cognitive linguistics vs (Chomsky's) generative grammar.

Let me see if I can summarize his criticism here. Taking his example about weather, he is saying that one approach to predicting weather is to understand weather systems from the perspective of physics, and another is to predict them using statistical methods without bothering to know anything about the internal dynamics of the system. Moreover, the latter can be something of a blind alley, because you may end up achieving your ends (predicting weather) without actually understanding weather.

This is a fair point, but it's not clear to what extent it is a critique of Bayesian methods, as opposed to merely a critique of superficial research programs using those methods.

My own perspective is that Bayes theorem models normatively correct probabilistic reasoning, and so it's kind of obvious that you can apply that reasoning either to superficialities or to understanding the deep structure of weather. If you decide to do the former, there's no point blaming the methods - blame whoever chose the target.

Also note that no matter how good mere theoryless weather prediction gets, it will only be improved by having a better theory of how weather works (although this may be more computationally expensive than superficial heuristics).

2. I would partially agree with Chomsky. His concerns about Bayesianism are part of a larger concern about "greedy reductionism," it seems. (And, it's ironic ... or worse ... that the man who coined that phrase is himself sometimes, or more than sometimes, a greedy reductionist.)

I think your weather analogy is a fair analysis of at least part of his objections, though. To further it, it's like he is saying that one can perform Bayesian analysis too soon, before one has enough knowledge of those internal dynamics, and then become wedded to that analysis. If one then becomes locked into one's initial analysis, then one has problems.

Anyway, beyond that, here's my own thoughts on some of that Chomsky interview: http://wordsofsocraticgadfly.blogspot.com/2012/11/does-ai-engage-in-behavioralism-chomsky.html

2. Thank you for this. I always found Bayes' Theorem easier to remember if the denominator is simply P(E), knowing this can be expanded as necessary. Multiplying both sides by P(E) gives P(H|E)*P(E) = P(E|H)*P(H). Both sides of this equality are identical to P(H^E) (the probability of both the hypothesis and the evidence) -- the reverse derivation of the theorem. So I can always mentally reconstruct the equation from the identity, P(H^E) = P(H^E), if I forget the full form. That's the advantage of understanding a theorem rather than just learning by rote.

I never thought about it in terms of odds, so from now on I will have another shorthand form O(H|E) = O(H) * S(E), where S(E) is the strength of the evidence as you've defined it.

I also never thought about applying it to a jury trial. My only jury experience was a civil trial -- and it would have been difficult to apply in that context. We knew who the plaintiff's doctors were, and the medical records were all there; it was a matter of deciding which procedures were done improperly and the damages to attribute. But in a criminal trial, I see that we could start with a small prior probability 1/P that the suspect is guilty, where P is the population of individuals in reasonable proximity at the time of the crime (P could be anywhere from 1 to millions), and construct a product of terms Si for exhibits i = 0 through N: O(H|E) = S0*S1*...*SN/P.

1. Yes indeed, being able to derive it is a huge help.

There have been attempts to get jurors to use Bayes theorem in their deliberations, mostly abortive. I think the idea has potential, but needs to be presented in a very beginner-friendly way, and probably overseen by an expert.

To put it mildly, introducing an unfamiliar equation to a bunch of jurors and hoping that (unlike most stats undegrads, say) they'll use it properly, is kinda optimistic.

2. I don't have any illusions that a full jury would ever resort to something as rational as Bayesian analysis. On the other hand, if I am ever on a criminal trial, I can always apply it to decide how I would vote, setting beforehand some lower limit on the odds that I would consider "beyond reasonable doubt" (100 to 1? 100000 to 1? It might depend on the severity of the penalties involved). That I could use the argument to sway other jurors seems doubtful, but it would give me enough justification to stick to my conclusion, even if that lead to a deadlock. Better a hung jury than a wrongly-convicted human being.

3. Agreed.

>the odds that I would consider "beyond reasonable doubt" (100 to 1? 100000 to 1? It might depend on the severity of the penalties involved).

Every real justice system *implicitly* if not explicitly represents some tradeoff between incarceration of the guilty and acquittal of the innocent. Empirically, it looks based on some quick & dirty googling like the false imprisonment rate in the USA is in the ballpark of 1%, so there's your (descriptive, not normative) answer!

3. Another great post Ian, one can almost see your car applying Bayesian analysis before changing its program from collision avoidance mode into drunk driver avoidance mode. And promoting that occult test by crunching some numbers could save people considering colonoscopies a lot of money.

And you can even apply Bayes to anything you like, using extremely loose definitions of nearly every word except for 'that' and 'given', as in "What's the probability that gods exist given that humans created machines".

The thing that really fascinates is that when we also loosen the definition of 'given' when considering such questions, as in "What is the probability that 'x' is occurring given the past or future occurrence of 'y'. You might say we have lots of numbers on the priors but none on the future events. But we do, in the form of bets, stock prices, polls, etc.

Wonder if belief data vs 'real' data could be factored into the equation? If the weather sucks, will work on this over the weekend. Or not, - if the hypothesis sucks, will work on enjoying whatever weather we get.

1. "And you can even apply Bayes to anything you like, using extremely loose definitions of nearly every word except for 'that' and 'given', as in "What's the probability that gods exist given that humans created machines"."

Hmmm, how would you set the prior P(G) on that one? And the conditional probabilities, P(M|G) and P(M|!G)? (M is the evidence "humans created machines", and G is the hypothesis "gods exist".

Personally, I would think that P(M|G) = P(M|!G), the hypothesis and the evidence being independent, so that the evidence isn't evidence at all.

2. Richard - I tried (my quasi-causal belief tweaking, not the god stuff)- it doesn't work. The theorem doesn't really care about causality or even 'events' whatever they mean. You might be able to get away with saying it doesn't even care about the objective truth or falsity about anything either, except maybe the defined relationships within.

Re the god | machines setup - for me its simple not sure about others. I don't believe in any kind of reality except personal ones, to the extent the the word exists does not make sense except in the context of the phrase "exists for...". The existence of god for any individual is similar to the existence of fear, pride, coffee cups, and other peoples beliefs about the believer. All are imaginary and represent beliefs based on available information. Your average skeptic reading this kind of blog attributes more value to the 'emotion' and 'physical object' than the spooky apparition that dominates the imagination of many others. I think its all the same stuff, and it goes by a common name, and that is called information.

So the Bayes equation setup was flip, but but I'll try to respond seriously.

The way I thinks gods may work (in a manner acceptable to others) is that if it could be somehow proved that there was some race of gods 'up there' with their own selfish interests that had nothing to to with human corcerns etc.... From our common sense, we see that big things are sometimes often more powerful than small things, so let's assign the relative power of a god to the number of its human adherents. And focus on the top 5 gods of humans as agreed by 100% of human population having opinions on such matters.

Now P(G) is simply the finding some silly fact about our world that can only have been caused by inter-god warfare. Whether there is one supreme thing functioning as a god to the gods is irrelevant for our purposes. Pretty sure Bohm or Cantor would back me up here.

P(M) is a toughie because similar to P(G), you have to shoot for stuff we created for which there was no 'godly interest'. Hammer? Nope, goes to utility Pop-tarts? Possibly, but too localized... Nuclear weapons? Nope too powerful, Prevalent types of 'garbage', say flip-top rings from aluminum cans? Yep, this one works. They are everywhere, we made them, and I don't see the god-utility.

Now for the conditionals. Back to P(G)... Let's say we agree that if we can prove that weather patterns exhibited some behavior due to what can only have been tweaking - What is the probability that this weather pattern was basen on the existence of pop-top rings? And v.v.

I know this doesn't make sense, (at least not to me) but that is the sort of direction I would take if I had to get serious about proving this stuff via Bayes. For me, if 58% of the world's population say gods exist, then they exist, just like if 100% of people say that coffee cups exist, then they exist.

A much more interesting non-Bayesian way to go to in proving the 'existence' of 'gods' would be to track common symbols as used throughout long periods of human history, say the Lemniscate of Bernoulli, cross, circle, numbers 7, 13, and 14 (the Asian 13), dig around, and see what you can come with.

Oh well, back to Ladyman's stuff as told by the S.E.P, which best as I can tell - maps the phrase 'real unreal reality' to the more sensible-sounding 'ontic structural realism'.

Re the comment about stuff being independent, well like anything else, that's relative, nothing is independent of anything else if one entity can conceive both things.

3. "For me, if 58% of the world's population say gods exist, then they exist, just like if 100% of people say that coffee cups exist, then they exist."

Suppose 58% of the population denied that coffee cups exist -- would that change your mind about coffee cups? If 58% denied evolution, would evolution be made false? Empiricism, not consensus, is the proper way to build models of reality.

"Re the comment about stuff being independent, well like anything else, that's relative, nothing is independent of anything else if one entity can conceive both things."

If the chance of rain tomorrow is 60%, and I flip a coin, does the chance alter depending on whether my coin lands heads or tails? That is what I mean by independence. If you want to establish some dependence between machine-building and existence of gods, then that dependence is purely axiomatic, not empirical. The validity of the resulting P(G|M) is then conditioned on the validity of the axiom.

4. Richard - if the number of believers in coffee cups dropped from 100% to 58% I'd probably be more curious as to why that happened. Did 42% of them go off-grid and take up residence in places where utensils could not easily exist, say the sea? Did 42% of them start really thinking about these things and decided they could not exist without human observers? Did 42% of them start making judgments about the nature of all 'physical' stuff, then rendered it impossible to talk about it without assuming many other things, things more mathematical than physical? Not quite sure how your empiricism works. Are you saying that if you and I were both coffee beans who knew nothing about cups, once one of us sensed a coffee cup two aisles away in the grocery, there would be an easy way of advising the other about it? Empiricism is a useful tool for those who know how to use it and use agreed methods, and its use leads to consensus read belief. A 'proper' view of reality is one that views it as a social construct and little else.

5. Richard - re the independence stuff, sorry to drag back to the gods stuff, which bugs Massimo, who I forgot to congratulate, seeing that A for A book in the Barnes and Noble today, but anyway if some god responsible for local weather also had some interest in your coin flip, I guess there could be some linkage. But if we permit an atheistic world (all gods are focused on some intergalactic event, having noticed that earth stuff runs with or without their intervention) we can still establish a linkage, but not as easy. With 60% chance of rain, and 50ish% chance afforded by the coin flip, every object that (a) stands to lose by the rain and (b) is aware of the upcoming flip may try to game the outcome of the flip. Besides you, there is every possible reason to think the coin is aware, and other elements in the 'local environment' whatever that means is aware. If any activity whatsoever in this universe takes place as a result of knowledge of both (a) and (b) then the chances have been affected. Maybe only two more drops fell near your front door as a result of what I can only weakly describe as a 'cosmic concurrence' but what more people would attribute to combined consciousnesses at work. All explainable from a naturalistic point of view, just not anytime in the next 20 years - I think.

6. All of your premises are axiomatic, not empirical. The means we have for determining the chance of rain are empirical. We measure pressure, temperature, humidity, wind, and we study satellite photos. None of which are influenced by a coin toss, not even by a million coin tosses (unless we toss the million coins right into the clouds and that triggers precipitation by direct physical interaction). Even if your axiom happened to be true true (in an alternate universe where magic is real), you could never establish a causal link, because one week your neighbor who is performing the same experiment throws heads and cancels your tails, and the next week you both throw tails... Not to mention the ten people across town and the hundred in the neighboring city all trying the same thing. You will never be able to separate all the variables to get a meaningful result. In other words, the two things would remain statistically independent even if your axioms were true.

7. Richard - 100 people tossing coins confuses the issue. I am talking about 'you' tossing a coin, and 'you' getting rain. Given limitations of time, cannot go into what does 'you' even means, but am saying that other objects in the local environment may have an interest in the results, and game them. You can call this magic if you like, it's just stuff we don't understand yet well enough to communicate to others in ways we can all agree. Some call this philosophy - others call it science, but both are established communication methods.

The idea behind an 'informational' approach is simple:

Given two objects (a sender, a receiver), we have some sort of constrained but fluid process behind it.

8. Unfortunately, if any linkage exists between the coin toss and the weather, you will never be able to extricate any statistical pattern from all of the noise. More to the point, you will never be able to do a controlled experiment to verify your original assertion, which was that existence of gods and the creation of machines by man are somehow linked. You are operating on conjecture not evidence.

4. Another point to remember is that the proof that "absence of evidence is evidence of absence" is Bayesian. See http://lesswrong.com/lw/ih/absence_of_evidence_is_evidence_of_absence/

1. Yes, this is very important. I still hear skeptics say that absence of evidence is not evidence of absence all the time, alas. It's pretty easy to correct by asking a simple question like "how do we know that dodos are extinct?"

2. If anything, it should be more particular to non-skeptics (e.g., the view that absence of evidence of psychic powers -- in spite of controlled tests for them -- is not evidence of absence of the same). It's a common misconception in the general population, and perhaps it does spill over into some skeptics. I think the view comes from misunderstanding or misinterpreting the true statement that "Absence of evidence is not *proof* of absence".

3. Absence of evidence means you didn't look for evidence, or you didn't use the right instruments. If you looked everywhere you'd expect to find something if it exists, and didn't find it, that's evidence of absence. Still, if I can't find my keys, that doesn't mean they don't exist.

4. No Max, if the conditional probability p(E|H) is greater than p(E|!H), then it is implied that evidence was sought. The ratio p(E|H)/p(E|!H) is a direct measure of the effectiveness of the search or experiment that *was* performed. We can't say, "Well, the ratio is 10 to 1, so we don't need to do the experiment, we'll just multiply the prior odds by 10". If we don't actually do the experiment, or search for the evidence, or we don't have the required tools to test the hypothesis, then the ratio is 1 to 1. A 1 to 1 ratio only represents ignorance of evidence, not absence of evidence.

"Still, if I can't find my keys, that doesn't mean they don't exist." Well no, of course not...unless: your prior probability that the keys don't exist was zero (i.e., you never had keys in the first place). And if you are already certain that you have keys (p = 1, odds infinite), then any nonzero ratio of conditional probabilities leaves those odds at infinity. But let's say you have amnesia and really don't know if you had keys or not, and your prior odds are 1 to 1. Let's say you do a search that has a probability of 60% of not finding your keys if they exist, and a 100% probability of not finding them if they do not. Then not finding your keys reduces the odds that they exist to 3 to 5. Evidence of absence? I think so!

5. Interesting post. I'm going to have to re-read a couple of times to see if I can really get it. But looking at the history of the podcast, I will say that I've noted that Julia originally rejected the Anthropic Principle (sorry I'm too lazy to look up the episode) on frequentist grounds, but later defended the Simulation Argument on Bayesian grounds. I would say this is a sad turn of events. In my mind, both of those arguments suffer from similar weaknesses, but Bayes somehow gives license where frequentism doesn't. It makes me wonder if Bayes leaves the barn door open, as I assume the good Reverend had hoped, to assertions of religious or quasi religious ideas. Or maybe it's the broader "subjectivist view" that is to blame. And, if I understand it correctly, the danger lies in the general population not understanding the difference. Can Bostrom's claim that there is a 20% chance that there is a creator god have the same epistemological validity as a meteorologist's claim that there is a 20% chance of rain tomorrow? Does either of those compare to the claim that there is a 1/6 chance of rolling a 3 on a di?

1. >It makes me wonder if Bayes leaves the barn door open, as I assume the good Reverend had hoped, to assertions of religious or quasi religious ideas.

The history here is interesting. Bayes never published his essay on probability, which was in itself not particularly related to his religious views. But it was published posthumously by his friend Richard Price, who *did* want to use it to prove a creator (I think using standard design arguments).

>Can Bostrom's claim that there is a 20% chance that there is a creator god have the same epistemological validity as a meteorologist's claim that there is a 20% chance of rain tomorrow? Does either of those compare to the claim that there is a 1/6 chance of rolling a 3 on a di?

Well, I have never heard that particular claim of Bostrom's, but I am skeptical about our ability to correctly draw out the implications of bizarre metaphysical theses, then reason on them, in the way Bostrom likes to. I have no problem believing about Bostrom that *he* thinks a creator god is 20% likely (or is he just being contrarian?), but no, that is not *my* assessment based on my background knowledge (maybe I should read his argument though).

Yes, I would say that the meteorologist's claim of 20% chance of rain is of exactly the same *type* as a claim of 1/6 chance of rolling a 3 - it reflects an uncertain state of knowledge on the part of the speaker. I am slightly more inclined to adopt the 1/6 chance than the meteorologist's 20% chance, however, because ceteris paribus I think dice-rollers are calibrated (know the extent of their own ignorance) slightly better than meteorologists.

You should never forget that when you hear a second or third-hand reported probability, your evidence is never just "20% chance of rain," your evidence is "this person says there is a 20% chance of rain."

As an aside, has anybody noticed how in the past 2-5 years jokes about how bad weather forecasts are have petered out? I think maybe this reflects much better forecasts.

2. Regarding "leaving the barn door open," what do you mean there exactly?

3. 80% agreed about better weather forecasts, the other 20% putting it to fear of more drastic weather change in the last decade than the previous 10, supported by both changes in the data and relative invariance of recent global warming predictions.

4. Ian, thanks for your response. A couple of points of clarification. Bostrom doesn't explicitly make the claim. I reduce his Simulation Argument to that. Because, as Julia noted at the end of the SA podcast, the SA is tantamount to Intelligent Design. But boy do I agree with your skepticism about bizarre metaphysical claims, and that is exactly where I suspect that Bayesian Analysis might be "leaving the barn door open." I'm out of my depth here, because I don't really grock the math, but "stating your priors" seems like it could, in practice, be the same as "pull a probability out of your ass." And the further you get from that stated prior, either in a chain of a Bayesian calculations or just running too far with your conclusions, the more you are disconnected from reality. I've heard that people on LessWrong like to argue about the intentions of the post human super intelligence that is running our simulation. They wonder how one might court its favor and avoid pissing it off. That they feel comfortable doing this is proof to me that a barn door somewhere has been left open. And I don't think they are using frequentist probability to get there.

5. >…but "stating your priors" seems like it could, in practice, be the same as "pull a probability out of your ass.”

Ok, I think I see where you’re going with this. The first thing to note is that there are two schools of thought on priors. Subjective Bayesians (e.g., de Finetti) hold that there are no constraints on priors besides the laws of probability themselves. According to them, as long as your probability assignments obey the axioms of probability (usually cashed out with Cox’s theorem), they can take whatever values you please. So a subjective Bayesian could legitimately have a prior probability of 99% that Obama is a Cthulhu cultist, as long as they don’t become logically inconsistent (for example, by simultaneously claiming a 50% probability that he is not a Cthulhu cultist).

Objective Bayesians (e.g., Jaynes, myself) think that there are almost always some kind of rational constraints on priors – as a trivial example, if I am about to flip a biased coin, & that is all the information I have, it makes no sense for me to assign a prior of 70% to heads, since I have zero evidence that could distinguish heads from tails at this stage. The only rational prior in this situation is 50%, even though no evidence has been collected and we *know* the coin is biased.

However, I would wish to stress that even in situations where there is no obvious quantitative constraint on priors, it is still vital to use whatever qualitative, vague knowledge you have – even if it seems like “pulling something out of your ass.” The reason why that’s so vital is that *failing* to do so does, in fact, leave the barn door open to nonsense – this comic nicely shows why.

To sum up, extraordinary claims require extraordinary evidence, BUT without using priors (explicitly or implicitly), you can’t actually make a *distinction* between an extraordinary claim and an ordinary one. Unable to make that distinction, the probability that the sun has exploded is treated exactly the same as any other hypothesis - we “let the data speak for themselves” and end up believing in lunacy, as long as the lunacy has p<0.05. By contrast, a Bayesian is going to have a prior on the sun exploding – potentially just a wild guess “pulled out of their ass” (hey, I don’t know whether it should be 1 in a billion or 1 in a trillion) – that keeps their beliefs in line with common sense when they see the result of that experiment.

Now, it is certainly possible to quote stupid priors, and many people have & will. But doing without them completely is a recipe for epistemic insanity.

> I've heard that people on LessWrong like to argue about the intentions of the post human super intelligence that is running our simulation. They wonder how one might court its favor and avoid pissing it off.

I’m not really interested in becoming the local Stout Defender of Less Wrong, but keep in mind that it is basically a group philosophy blog (despite Yudkowsky’s dislike of philosophy), and as you know, philosophers like to play with thought experiments and wild speculative hypotheticals as a way to clarify concepts and test theories. Without a link to this discussion you reference, it’s hard for me to comment, but don’t jump to the conclusion that anybody on LW is seriously claiming such things are actually the case. Like Newcomb’s problem (also discussed on LW a lot), it may just be a way of testing one of the local decision theories, for example.

6. How would you apply it to solve this problem?
A guy is accused of murdering someone.
Evidence: the victim is his wife, and she was stabbed to death.

Knowing nothing else, we'd just take all the cases of wives who were stabbed to death, and see what fraction of them were stabbed by their husband.

But how would you use the likelihood ratio here? What are the prior odds of guilt, knowing nothing about the guy? 2 in 7 billion? Is the probability that the victim is his wife greater if he killed her or if he didn't kill her? Weird question.

1. >Knowing nothing else, we'd just take all the cases of wives who were stabbed to death, and see what fraction of them were stabbed by their husband.

Sounds like a good start, although if the police know nothing else, it's not clear why he would have been arrested.

>But how would you use the likelihood ratio here? What are the prior odds of guilt, knowing nothing about the guy? 2 in 7 billion?

Well, if we treat the fact that he is the victim's husband as background information, we've got no evidence on which to update, so there is no need to use the likelihood ratio (or Bayes theorem). Just stick with your prior: prior odds =(# of murders carried out by spouses)/(# of murders NOT carried out by spouses).

You would only use Bayes theorem if you found out something additional to your background information - for example, that the killer left AB+ blood on the scene and the husband has AB+ blood.

>Is the probability that the victim is his wife greater if he killed her or if he didn't kill her? Weird question.

Yeah, not sure I follow. ;)

2. I want to start from knowing as little as possible about the defendant, and treat every piece of information as evidence. So, the first piece of evidence is that the defendant is a man. The second piece of evidence is that the victim is his wife. The third piece of evidence is that she was stabbed to death.
If we treat it all as background info, then the solution is simple. But how would you use the likelihood ratio to update the probability of guilt as you learn each of the above three pieces of evidence?
You'd have to do weird calculations like P(victim=wife | guilty) / P(victim=wife | not guilty)

And in such calculations, does "guilty" refer to this specific crime, or to any similar crime?
Like, the prior odds that a random person has some cancer are the prevalence of this cancer in the population. The prior odds that a random husband murdered his wife are the prevalence of such murderers in the population. But the prior odds that a random person murdered this specific victim are not the prevalence of murderers, they're 1 in 7 billion.

3. >But how would you use the likelihood ratio to update the probability of guilt as you learn each of the above three pieces of evidence?
You'd have to do weird calculations like P(victim=wife | guilty) / P(victim=wife | not guilty)
...But the prior odds that a random person murdered this specific victim are not the prevalence of murderers, they're 1 in 7 billion.

If you really want to rewind to the state of extreme ignorance where all of humanity from a Mongolian goatherd to Dick Cheney is equally suspect, then you'll have to include a *lot* of updates (based on location, age, physical ability, mobility, relationship...) in order to get even a manageable suspect list, let alone focus in on the husband in particular. Thankfully, our common sense does a lot of that work for us.

But skipping ahead to the spousal relationship issue, I think the correct framing would be a prior on Joe Doaks' guilt as a "randomly chosen" citizen, followed by an update on the fact that Joe Doaks is married to the victim - it would look something like

O(Joe Doaks guilty) = 1:100,000 (say)
P(married to victim|guilty)/P(married to victim|not guilty)=(fraction of murders in which victim is spouse of killer)/(fraction of innocent people married to a murder victim).

4. The math does seem to work out:
O(guilty|married to victim) =
O(guilty)*P(married to victim|guilty)/P(married to victim|not guilty)

Then, calculate P from O: P=O/(O+1)

My simple solution was to directly calculate P(guilty|married to victim), which is the fraction of people married to a murder victim who were the killers.

But I'm still confused about the precise meaning of P(guilty), maybe because there are two ways of looking at it.
First, there's Bayesian probability, where we start off with P(guilty)=prevalence of murderers in the population, and then update it based on cases with similar evidence.
But then there's the process of elimination, where we start off with P(guilty)=1 in 7 billion, and then narrow it down as we learn about the killer. Like, the fact that the victim was stabbed rules out anyone who couldn't have stabbed her.
How do you reconcile the two?

5. "P(guilty)=1 in 7 billion"

You wouldn't need to include the 6 billion people who were nowhere near North America at the time of the crime in your initial pool. In fact I would go much narrower. I would pick P(guilty) -- the probability of a randomly selected person being the perpetrator or a specific crime -- to be 1/N where N is the number of people who, a priori, had the opportunity to commit the crime, if the person is selected from that group (and 0 otherwise).

Take a different example: A murder takes place on a cruise ship, which has total crew and passengers = 2000.
The prior odds O(PERP(X)) that random passenger or crew member X did it are 1 to 1999. Now 300 people were in the dining room at the time of the crime and could not have perpetrated it. D(X) = "in the dining room"; if X was not in the dining room (!D(X); I use the engineering notation "!" for not), then we have O(PERP(X)|!D(X)) = O(PERP(X)) * P(!D(X)|PERP(X))/P(!D(X)|!PERP(X)). The numerator P(!D(X)|PERP(X)) is 1 because we know the perp couldn't have been in the dining room. The denominator is 1699/1999, because 1699 of the 1999 people who aren't perps were not in the dining room. So the odds that X committed the crime, taking into account the 300 alibis, become 1 to 1699. And so on with each additional piece of evidence.

Obviously you could just eliminate the 300 from the start, and assign odds 1 to 1699 without doing the Bayesian calculation, but I went through the exercise to show that the numbers work out as we would expect.

You could get more sophisticated and divide the people into classes. The victim's spouse is in a class by him/herself, having a higher conditional probability (based on opportunity and possible motives) than the people in neighboring cabins; those residing on the same deck have higher conditional probabilities than those residing on other decks, those who have been seen interacting with the victim have higher conditional probabilities than those who have not, and so on. Each potential suspect could have odds calculated based on the classes to which he/she belongs or does not belong.

Something cautions me about putting too much weight on the spousal victimhood statistics as evidence. I get a feeling it is too generic; every crime is different. Surely it is rational to include the information, but only if we know all other relevant facts are being presented. If the investigation was sloppy, or a dishonest prosecutor fails to disclose exculpatory evidence, then the statistical evidence could tilt the case against an innocent man the wrong way. And of course, a jury is only allowed to consider the evidence presented in the courtroom.

6. Yeah, Richard's approach is basically correct here.

I would only add that one must be *very* careful about assigning probabilities of exactly 0 and 1 to anything. It's okay to do it as a shortcut to make the math simpler, but unless you know with *logical certainty* that nobody in the dining room could have committed the murder, AND that there is absolutely no possibility of an outsider to the cruise ship having snuck on, you've got to keep these possibilities, however unlikely, in the back of your mind as a longshot bet.

>Something cautions me about putting too much weight on the spousal victimhood statistics as evidence. I get a feeling it is too generic; every crime is different.

Absolutely. For one thing, there is an art to choosing reference classes correctly - for example, maybe spouses in the defendant's particular cultural group are *less* likely to kill each other than random strangers, even though the statistics for spouses in general have the opposite tendency.

And unless the spousal stats are extreme (almost all murders are inter-spousal), you're going to need a lot more evidence in order to come near convicting Joe Doaks. At best, the slightly higher prior for a husband is a starting point for the police, directing inquiry to the likeliest of many unlikely suspects, in order that he may be ruled out.

7. Good points, Ian. I'd also point out that whenever new evidence raises the probability that someone from class X did it, it lowers the probability for persons outside of class X. Once you determine the weapon was a pistol and not a shotgun, not only must you raise the probability that Yosemite Sam (a member of the pistol-totin' class) did it; you have to lower the odds on Elmer Fudd (habitually a shotgun carrier). Doing one while failing to do the other will lead to inconsistency.

7. Thank you, Ian, I find this version of the theorem way clearer.

I was also able to start an argument using the odds form of Bayes' theorem, and I credited your article: "Betting on horses and the resurrection of Jesus (I)".

Thank you again!

8. Ian, I'd love to hear your thoughts on the Doomsday Arguments of Gott, Carter, Bostrum, et al. I think there is a perfectly good refutation of these arguments, but from what I've read, the proponents are aware of the refutation and don't accept it. There seems to be a specific misapplication of Bayesian inference at work. I can go into more detail later...

1. To get the ball rolling (pun intended), here are two thought experiments which at first glance seem to be equivalent but are actually quite different:

1. Suppose I have two urns, labeled A and B. Urn A contains 10 balls, numbered from 1 through 10. Urn B contains 100 balls, numbered from 1 through 100. I pick an urn at random (you can't see which), and draw a ball from it at random. The number on the ball is 7. What are the odds that it was drawn from Urn A?

O(UrnA|Number7) = O(UrnA) * (P(Number7|UrnA)/P(Number7|UrnB))
= 1 * (1/10) / (1/100) = 10 to 1 odds

2. Suppose I have 110 sealed envelopes. 10 cards (forming group A) are numbered uniquely with a number from 1 to 10 on the outside, with a card inside that has the letter A printed on it. 100 cards (group B) are numbered uniquely with a number from 1 to 100 on the outside, with a card inside that has the letter B printed on it. 110 people enter the room, and each is handed one of the cards on entering. Your envelope has the number 7 printed on it. What are the chances that the card inside your envelope has the letter A? Are the odds

O(GroupA|Number7) = O(GroupA) * (P(Number7|GroupA)/P(Number7|GroupB))
= 1 * (1/10) / (1/100) = 10 to 1 odds?

This answer is wrong, because the prior odds for me giving you a card from group A are not 1 to 1, they are 1 to 10 against. So the correct answer is that (after factoring in the conditionals) your envelope has an equal chance of being in group A or group B. As confirmation, consider that there is just one other person in the room whose envelope has "7" on it. One of you has the A envelope, and the other has the B envelope.

In the first experiment, I chose one of the two urns randomly (giving 1 to 1 odds for my prior) before I drew a ball. So there is a selection bias at work that makes the two experiments different.

I think the Doomsday Argument proponents are trying to solve a problem of the second kind with a solution of the first kind. But they will reject this refutation. In Bostrum's words, ""It can be showed (see e.g. http://www.anthropic-principle.com/preprints/alive.html) that this greater prior probability that you are in the bigger race would exactly counterbalance and cancel the probability shift that the DA says you should make when you discover that you were born early (i.e. that you have a birth rank that is compatible with you being in the small race). This would annul the DA, but it only works if we know that there are both long-lasting and short-lasting races out there, and an anthropic argument can be made against that assumption -- if there were so many long-lasting races, how come we are not in one of them and having a great birth rank; for most observers would then be in such races and with great birth ranks." (http://www.anthropic-principle.com/?q=anthropic_principle/faqs)

What gives? I think that what Bostrum suggests is, in effect, first removing the selection bias, then sneaking it back into the prior because, according to him, the evidence of your low birth number supports it. But if you do that, you can't multiply this new prior by the ratio of conditionals, since that ratio is already factored into the prior. In reality, this new "prior" is not really a prior at all.

Have I missed something, or is Bostrum off his rocker?

2. Great question. I have to think about it.

3. Richard, I think I am going to write a post on the DA in which I try to make sense of whether its logic works; stay tuned.

9. Ian, thanks, I am looking forward to it!