Rationally Speaking is a blog maintained by Prof. Massimo Pigliucci, a philosopher at the City University of New York. The blog reflects the Enlightenment figure Marquis de Condorcet's idea of what a public intellectual (yes, we know, that's such a bad word) ought to be: someone who devotes himself to "the tracking down of prejudices in the hiding places where priests, the schools, the government, and all long-established institutions had gathered and protected them." You're welcome. Please notice that the contents of this blog can be reprinted under the standard Creative Commons license.

## Saturday, May 24, 2008

### The probability of Gremlins

I’m reading philosopher Elliott Sober’s Evidence and Evolution: The Logic Behind the Science (I am in the midst of reviewing the book for Trends in Ecology & Evolution), and the first chapter presents a nice little example about probabilities and likelihoods that is pertinent to anyone interested in hypothesis testing and pseudoscience.

The context of Sober’s discussion is a chapter devoted to inference, and in particular to the advantages and disadvantages of Bayesian vs. frequentist statistical frameworks. There is a crucial and often unappreciated distinction between the probability of a given hypothesis being true if one observes certain data [ P(HD) in Bayesian terms] vs. the probability of observing certain data if a given hypothesis is true [ P(DH) ]. Here is how Sober explains it (p. 10 of the book):

“Suppose you hear a noise coming from the attic of your house. The likelihood of this hypothesis [ P(DH) ] is very high, since if there are gremlins bowling in the attic, there probably will be noise. But surely you don’t think that the noise makes it very probable that there are gremlins up there bowling,” because that probability of that hypothesis given the data, P(HD), is very low indeed.

Notice that P(DH) is referred to as the likelihood (of the data given the hypothesis), while P(HD) is the (posterior) probability of the hypothesis given the data. A confusion between these two quantities underlies much pseudoscience. Sober’s own example is not far fetched at all: people who believe in ghosts and haunted houses use exactly that sort of “reasoning.” Believers in UFOs also fall into the same trap: if you observe strange lights in the night sky, the likelihood that those are due to an extraterrestrial spaceship is high if there is a good a priori chance (a high prior in Bayesian jargon) that we really are visited by space aliens. But the probability of the UFO hypothesis being true just on the basis that you observed unidentified lights in night is very, very low.

An analogous application of the distinction between likelihoods and posterior probabilities can be made in the case of intelligent design “theory.” The observation of a complex biological structure such as the bacterial flagellum has a high likelihood if one assumes that there is a supernatural intelligent designer messing around with the universe. But the probability that there is an intelligent designer simply based on the fact that we observe complex biological structures is, again, vanishingly small.

If you’d like to play with an online Bayesian calculator, check this one here for simple situations (two hypotheses), or this one here for more complex ones (up to five competitive hypotheses). Just remember, don’t confuse your likelihoods with your probabilities!

1. Scientists use another important probability calculation: the probability that data would occur by chance: P (D | not (H1 or H2 or ... Hn)), where the Hx is in the list of hypotheses that includes all non-chance explanations.

This statistic is a useful way to estimate the prior probability for a Baysian inference. Counter-intuitively, we can set the prior probability of a specific to this "negative" probability, so that the posterior probability from one data point (or one set of known a priori data is 0.50. This technique requires us to either evaluate the known data one by one (which allows us to rigorously examine the simplicity of the hypothesis) or requires us to match a second set of new data with the same hypothesis.

3. This reminds me of William Lane Craig on the resurrection. He argued that, although it was indeed highly improbable as a natural event, it was entirely probable as a supernatural one.

4. An example I gave when teaching (and which would be great if it's true) are the investigations the Vatican does for putative miracles (for wanna-be saints). What I don't know is how scientific the researchers are, but supposing that they are serious, then it may be that they judgments are bonna fide scientific evaluations... With the caveat that they have a very high a priori probability for the hypothesis of a loving God helping a sick person who unexpectedly recovered.

5. barefoot bum,

That option is available if you post while logged in with a google/blogger account. It encourages people to subscribe to the system...

6. Massimo,
I'm logged in and I don't see it. All I can do is subscribe to follow up comments once I post. I don't think that's what barefoot bum was asking for.

7. Great post and thanks for keeping it at a level where occasional readers can still enjoy.

I suppose this distinction lies at the heart of divine "manifestations" whereby a crying statue of Marie would be a reliable sign of God if it had been proven there is in fact a God whereas it is statistically meaningless in the current context: God has not been proven to exist and the statue does not increase its prior probability significantly.