Last updated: 2019-05-08

Checks: 1 1

Knit directory: MSTPsummerstatistics/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   analysis/Bayes.Rmd
    Modified:   analysis/CLT.Rmd
    Modified:   analysis/HMM.Rmd
    Modified:   analysis/introR.Rmd
    Modified:   analysis/markov.Rmd
    Modified:   analysis/multipleTesting.Rmd
    Modified:   analysis/powerAnalyses.Rmd
    Modified:   analysis/syllabus.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html 2ec7944 Anthony Hung 2019-05-06 Build site.
Rmd d45dca4 Anthony Hung 2019-05-06 Republish
html d45dca4 Anthony Hung 2019-05-06 Republish
html ee75486 Anthony Hung 2019-05-04 Build site.
html 5ea5f30 Anthony Hung 2019-04-29 Build site.
Rmd 22ae3cd Anthony Hung 2019-04-29 Add HMM file
html e746cf5 Anthony Hung 2019-04-28 Build site.
Rmd 133df4a Anthony Hung 2019-04-28 introR
html 22b3720 Anthony Hung 2019-04-26 Build site.
html ddb3114 Anthony Hung 2019-04-26 Build site.
html 413d065 Anthony Hung 2019-04-26 Build site.
html 6b98d6c Anthony Hung 2019-04-26 Build site.
Rmd 9f13e70 Anthony Hung 2019-04-25 finish CLT
html 9f13e70 Anthony Hung 2019-04-25 finish CLT

Introduction

Multiple testing describes situations where many hypotheses are simultaneously investigated from a given dataset. Correct treatment of statistics when working with multiple hypotheses is paramount, as mistakes can easily lead to false interpretations of results and many false positives. Our objectives today are to review the framework behind hypothesis testing in single hypotheses, why this framework falls apart in multiple testing, and different methods that have been proposed to correct for multiple testing.

Hypothesis testing

The basic idea in hypothesis testing is to use data or observations to choose between two possible realities: a null hypothesis or an alternative hypothesis.

The issue of multiple testing

As many scientific fields enter an age of “Big Data,” where the ability to collect and work with data from a large number of measurements gives rise to the ability to test many hypotheses at the same time. However, as scientistists tests many more hypotheses, the standard view of hypothesis testing falls apart.

To illustrate this, consider the xkcd comic (https://xkcd.com/882/). Obviously, something is not right with the conclusions of the study, since we all have an intuition that green jelly beans do not have any true association with skin conditions. To better understand why mutliple testing can easily lead to false positive associations unless adequately treated, let us walk through the calculations for the probability of making a Type 1 error given the number of tests you are performing.

Case 1: Performing 1 test

Let us say we are performing the study in the comic and testing for a link between purple jelly beans and acne at a significance level \(\alpha = 0.05\). What is the probability that we make a type 1 error?

As \(\alpha\) is equal to our type 1 error rate (the probability of rejecting the null hypothesis given the null hypothesis is true), we know that the probability is equal to 0.05.

Case 2: Performing 20 tests

Now, let us test for an association between 20 different colors of jelly beans and acne at a significance level of \(\alpha = 0.05\) for each individual test. What is the probability that we make at least one type 1 error now?

Here, we are interested in finding the P(making a type 1 error), which is the same as 1 - P(NOT making a type 1 error). The P(NOT making a type 1 error) for each of the individual tests is equal to \(1- \alpha = 0.95\). If we assume that each of the separate tests is independent, then our probability of making at least one type 1 error amongst our 20 tests is \(1-(1-0.05)^{20} = 0.6415\).

Clearly,

Bonferroni Correction

Holm’s Procedure

q values and False Discovery Rates