Last updated: 2017-04-17
Code version: 335d169
You should know what a p value is.
A key problem with p values, when testing null hypotheses, is that they can be difficult to calibrate. That is, it is hard to answer the question “If I get a p-value of 0.01 (or any other number) how strong is the evidence against the null hypothesis?”
Here we just give a simple (but artificial) example of an test in which a p value of 0.01 actually corresponds to evidence for the null, even though 0.01 is usually considered to be strong evidence against the null. (This example is modified from the book Bayesian Analysis, by J Berger, p25.)
Suppose x∈{1,2,3} and θ∈{0,1} with
x | 1 | 2 | 3 |
p(x|θ=0) | 0.005 | 0.005 | 0.99 |
p(x|θ=1) | 0.009 | 0.001 | 0 |
Note that the likelihood ratios for H1 vs H0 for x=1,2,3 are 9/5,1/5 and 0 respectively. So as x increases the evidence against H0 decreases.
Now, let us suppose that we observe x=2. Then by definition the p value for this observation is p:=Pr
Here “evidence as strong or stronger against H_0 than x=2” is x \in \{1,2\}. And the probability of this under H_0 is \Pr(x \in \{1,2\} | H_0) = 0.005+0.005 = 0.01.
So the p value for x=2 is 0.01.
And yet, the observation x=2 is 5 times more probable under H_0 than under H_1! So x=2 has p value 0.01 but is actually evidence for H_0.
This example is obviously contrived to make a point: so it only demonstrates that it is possible to contrive a situation where p=0.01 corresponds to evidence for H_0.
However, given this it seems natural to ask: in “typical” situations, does p=0.01 correspond to evidence for or against H_0? Of course, the answer to this depends on what one views as “typical”. For a start towards answering this question see here.
