**Last updated:** 2017-04-24

**Code version:** ebc0b7a

You should know what a \(p\) value is.

A key problem with \(p\) values, when testing null hypotheses, is that they can be difficult to calibrate. That is, it is hard to answer the question “If I get a \(p\)-value of 0.01 (or any other number) how strong is the evidence against the null hypothesis?”

Here we just give a simple (but artificial) example of an test in which a \(p\) value of 0.01 actually corresponds to evidence *for* the null, even though 0.01 is usually considered to be strong evidence *against* the null. (This example is modified from the book Bayesian Analysis, by J Berger, p25.)

Suppose \(x \in \{1,2,3\}\) and \(\theta \in \{0,1\}\) with

x | 1 | 2 | 3 |
---|---|---|---|

\(p(x | \theta=0)\) | 0.005 | 0.005 | 0.99 |

\(p(x | \theta=1)\) | 0.999 | 0.001 | 0 |

Note that the likelihood ratios for \(H_1\) vs \(H_0\) for \(x=1,2,3\) are \(999/5, 1/5\) and \(0\) respectively. So as \(x\) increases the evidence against \(H_0\) decreases.

Now, let us suppose that we observe \(x=2\). Then by definition the \(p\) value for this observation is \[p:= \Pr(\text{we would see evidence as strong or stronger against $H_0$ than $x=2$} | \theta=0).\]

Here “evidence as strong or stronger against \(H_0\) than \(x=2\)” is \(x \in \{1,2\}\). And the probability of this under \(H_0\) is \[\Pr(x \in \{1,2\} | H_0) = 0.005+0.005 = 0.01.\]

So the \(p\) value for \(x=2\) is 0.01.

And yet, the observation \(x=2\) is 5 times more probable under \(H_0\) than under \(H_1\)! So \(x=2\) has \(p\) value 0.01 but is actually evidence *for* \(H_0\).

This example is obviously contrived to make a point: so it only demonstrates that it is possible to contrive a situation where \(p=0.01\) corresponds to evidence *for* \(H_0\).

However, given this it seems natural to ask: in “typical” situations, does \(p=0.01\) correspond to evidence for or against \(H_0\)? Of course, the answer to this depends on what one views as “typical”. For a start towards answering this question see here.

`sessionInfo()`

```
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 highr_0.6 git2r_0.18.0
[4] BiocInstaller_1.24.0 workflowr_0.4.0 bitops_1.0-6
[7] iterators_1.0.8 tools_3.3.2 digest_0.6.12
[10] evaluate_0.10 lattice_0.20-34 Matrix_1.2-8
[13] foreach_1.4.3 graph_1.52.0 BiocCheck_1.10.1
[16] yaml_2.1.14 parallel_3.3.2 httr_1.2.1
[19] stringr_1.2.0 knitr_1.15.1 REBayes_0.73
[22] stats4_3.3.2 rprojroot_1.2 grid_3.3.2
[25] getopt_1.20.0 optparse_1.3.2 Biobase_2.34.0
[28] R6_2.2.0 XML_3.98-1.5 RBGL_1.50.0
[31] rmarkdown_1.4 ashr_2.1-10 magrittr_1.5
[34] backports_1.0.5 codetools_0.2-15 htmltools_0.3.5
[37] biocViews_1.42.0 MASS_7.3-45 BiocGenerics_0.20.0
[40] RUnit_0.4.31 assertthat_0.2.0 stringi_1.1.2
[43] RCurl_1.95-4.8 pscl_1.4.9 doParallel_1.0.10
[46] truncnorm_1.0-7 SQUAREM_2016.8-2
```

This site was created with R Markdown