Last updated: 2019-05-08
Checks: 1 1
Knit directory: MSTPsummerstatistics/
This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Unstaged changes:
Modified: analysis/Bayes.Rmd
Modified: analysis/CLT.Rmd
Modified: analysis/HMM.Rmd
Modified: analysis/introR.Rmd
Modified: analysis/markov.Rmd
Modified: analysis/multipleTesting.Rmd
Modified: analysis/powerAnalyses.Rmd
Modified: analysis/syllabus.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 2ec7944 | Anthony Hung | 2019-05-06 | Build site. |
Rmd | d45dca4 | Anthony Hung | 2019-05-06 | Republish |
html | d45dca4 | Anthony Hung | 2019-05-06 | Republish |
html | ee75486 | Anthony Hung | 2019-05-04 | Build site. |
html | 5ea5f30 | Anthony Hung | 2019-04-29 | Build site. |
Rmd | 22ae3cd | Anthony Hung | 2019-04-29 | Add HMM file |
html | e746cf5 | Anthony Hung | 2019-04-28 | Build site. |
Rmd | 133df4a | Anthony Hung | 2019-04-28 | introR |
html | 22b3720 | Anthony Hung | 2019-04-26 | Build site. |
html | ddb3114 | Anthony Hung | 2019-04-26 | Build site. |
html | 413d065 | Anthony Hung | 2019-04-26 | Build site. |
html | 6b98d6c | Anthony Hung | 2019-04-26 | Build site. |
Rmd | 9f13e70 | Anthony Hung | 2019-04-25 | finish CLT |
html | 9f13e70 | Anthony Hung | 2019-04-25 | finish CLT |
Multiple testing describes situations where many hypotheses are simultaneously investigated from a given dataset. Correct treatment of statistics when working with multiple hypotheses is paramount, as mistakes can easily lead to false interpretations of results and many false positives. Our objectives today are to review the framework behind hypothesis testing in single hypotheses, why this framework falls apart in multiple testing, and different methods that have been proposed to correct for multiple testing.
The basic idea in hypothesis testing is to use data or observations to choose between two possible realities: a null hypothesis or an alternative hypothesis.
As many scientific fields enter an age of “Big Data,” where the ability to collect and work with data from a large number of measurements gives rise to the ability to test many hypotheses at the same time. However, as scientistists tests many more hypotheses, the standard view of hypothesis testing falls apart.
To illustrate this, consider the xkcd comic (https://xkcd.com/882/). Obviously, something is not right with the conclusions of the study, since we all have an intuition that green jelly beans do not have any true association with skin conditions. To better understand why mutliple testing can easily lead to false positive associations unless adequately treated, let us walk through the calculations for the probability of making a Type 1 error given the number of tests you are performing.
Let us say we are performing the study in the comic and testing for a link between purple jelly beans and acne at a significance level \(\alpha = 0.05\). What is the probability that we make a type 1 error?
Now, let us test for an association between 20 different colors of jelly beans and acne at a significance level of \(\alpha = 0.05\) for each individual test. What is the probability that we make at least one type 1 error now?
Clearly,