Last updated: 2020-06-11
Checks: 2 0
Knit directory: MSTPsummerstatistics/
This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: data/.DS_Store
Unstaged changes:
Modified: analysis/introR.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 9bb0ed6 | Anthony Hung | 2020-05-12 | Build site. |
Rmd | 6d7daf1 | Anthony Hung | 2020-05-10 | change syllabus |
html | 29c91df | Anthony Hung | 2020-05-10 | Build site. |
html | a6d0787 | Anthony Hung | 2020-05-09 | Build site. |
html | e18c369 | Anthony Hung | 2020-05-02 | Build site. |
html | 0e6b6d0 | Anthony Hung | 2020-04-30 | Build site. |
html | 5cbe42c | Anthony Hung | 2020-04-23 | Build site. |
Rmd | 6c6f3e6 | Anthony Hung | 2020-03-30 | Knit analysis files |
html | 6c6f3e6 | Anthony Hung | 2020-03-30 | Knit analysis files |
html | f15db48 | Anthony Hung | 2020-03-30 | Build site. |
html | 310d040 | Anthony Hung | 2020-02-20 | Build site. |
html | 5a37a3e | Anthony Hung | 2020-02-14 | Build site. |
html | 96722bd | Anthony Hung | 2019-08-07 | Build site. |
html | 15ca1f1 | Anthony Hung | 2019-07-18 | Build site. |
html | a3aa9e0 | Anthony Hung | 2019-07-18 | Build site. |
Rmd | 3283a68 | Anthony Hung | 2019-07-18 | Edits for asethetic code/evaluation |
html | ceb577e | Anthony Hung | 2019-07-12 | Build site. |
html | 397882b | Anthony Hung | 2019-05-30 | Build site. |
html | 6d3e1c8 | Anthony Hung | 2019-05-28 | Build site. |
html | c117ef1 | Anthony Hung | 2019-05-28 | Build site. |
html | b291d24 | Anthony Hung | 2019-05-24 | Build site. |
Rmd | a321d7b | Anthony Hung | 2019-05-24 | commit before republish |
html | a321d7b | Anthony Hung | 2019-05-24 | commit before republish |
html | c4bdfdc | Anthony Hung | 2019-05-22 | Build site. |
Rmd | dd1e411 | Anthony Hung | 2019-05-22 | before republishing syllabus |
html | dd1e411 | Anthony Hung | 2019-05-22 | before republishing syllabus |
html | 096760a | Anthony Hung | 2019-05-19 | Build site. |
html | da98ae8 | Anthony Hung | 2019-05-18 | Build site. |
Rmd | 239723e | Anthony Hung | 2019-05-08 | Update learning objectives |
html | 239723e | Anthony Hung | 2019-05-08 | Update learning objectives |
html | 2ec7944 | Anthony Hung | 2019-05-06 | Build site. |
Rmd | d45dca4 | Anthony Hung | 2019-05-06 | Republish |
Rmd | 68addd7 | Anthony Hung | 2019-05-05 | Publish |
Rmd | ee75486 | Anthony Hung | 2019-05-05 | Build site. |
html | 5ea5f30 | Anthony Hung | 2019-04-30 | Build site. |
Rmd | 22ae3cd | Anthony Hung | 2019-04-30 | Add HMM file |
html | e746cf5 | Anthony Hung | 2019-04-29 | Build site. |
Rmd | 133df4a | Anthony Hung | 2019-04-29 | introR |
html | 133df4a | Anthony Hung | 2019-04-29 | introR |
html | 22b3720 | Anthony Hung | 2019-04-26 | Build site. |
html | ddb3114 | Anthony Hung | 2019-04-26 | Build site. |
html | 413d065 | Anthony Hung | 2019-04-26 | Build site. |
html | 6b98d6c | Anthony Hung | 2019-04-26 | Build site. |
html | 602e0f9 | Anthony Hung | 2019-04-25 | Build site. |
Rmd | ecc06a5 | Anthony Hung | 2019-04-24 | Add syllabus: |
html | ecc06a5 | Anthony Hung | 2019-04-24 | Add syllabus: |
Rmd | 2459910 | Anthony Hung | 2018-09-28 | Update all webpages |
html | 2459910 | Anthony Hung | 2018-09-28 | Update all webpages |
A thorough understanding of statistics is essential for both experimental design and data analysis in biomedical research. Too often, time and resources are wasted due to a poor understanding of sample size and power calculations, and the reliability of scientific reports has repeatedly been scrutinized in recent years due to questionable, if not fraudulent, application of statistical tests.
In an era of increasingly accessible computational tools, big data, and an emphasis on open-access databases, the need for rigorous training in statistical methods has become more important than ever. As a requirement for admission into Pritzker, all MSTP students must have taken a statistics or biomathematics course in college. Many of us agree that further training in statistics would be a valuable resource and would help us feel more confident about the work we are doing.
Building off of a basic, college-level understanding of statistics, this course aims to have students review and teach essential statistical theory and methods. This will then allow students to use their statistical toolbox for a project that suits their particular research interests and goals.
Statistical Theory
Students will build a foundation of theoretical and applied understanding of probability and statistics. This will provide a deeper understanding of statistical tests specific to particular research areas. By the end of the course, students should have working knowledge of key concepts, including, but not limited to probability distributions, hypothesis testing, Bayesian inference, Markov models, multiple testing corrections. See the course schedule for topics covered in the course. Students should be able to apply these concepts to perform data analysis using their own data, or explain how statistical tests are used in the scientific literature.
Statistical Methods
This course also requires students to build computational skills in order to perform data analysis. For this course, the programming language R will be taught. Students, regardless of previous experience, will be expected to learn the fundamentals of how to use R and RStudio. Accompanying this will be a basic exposure the Unix terminal shell and Git/Github. By the end of this course, students should be familiar with the following in R: installation, variable assignment, variable types, operators and functions, data structures, reading and writing data, conditional branching, looping, loading and installing packages, writing functions, and, of course, basic statistical testing and plotting. These concepts will be taught again for the Basic Computing in R track at the BSD QBio5 MBL boot camp. Students are expected by then to be proficient enough in R to choose the Advanced Computing in R track instead.
Establish a groundwork in theoretical and applied understanding of probability and distributions
Introduce the theoretical basis behind commonly used statistical tests and assumptions inherent to each test
Obtain a working knowledge of how to use R to streamline robust and reproducible statistical analyses
Learn how to integrate statistical thinking into experimental work throughout the scientific process from study design to data analysis
Connect students with resources to seek out statistical advising in the future
Perform a data analysis project using real data
Following the 4-5 weeks of lectures, each student/group will be expected to prepare a 10-minute presentation to be delivered on the 10th week of classes.
A rubric for the presentation can be found here.
The grades for this course are based on the following criteria:
Participation (30%):
Students are expected to attend all sessions listed in the course schedule. In cases of expected scheduling conflicts, please notify the instructors at least 24 hours in advance for an excused absence.
Homework (30%):
Quizzes after each lecture will be given in order to assess understanding after each lecture. Each is quiz is due by the time the next lecture begins (except for the last lecture; the quiz will be due 48 hours after lecture ends). Because these quizzes are meant to help students review and reinforce concepts, they will be graded based on completion and effort. Tutorials and other resources will be given for students to learn R. These tutorials must be completed by the end of the course. Grading will depend on timely submission of an R script that contains code used for the tutorial.
Final Project (40%):
Grading for the final project will be based on completion, effort, and a demonstrated understanding of statistical concepts. Students will need to submit all scripts/code, slides, papers, and/or other materials used in their final project before the day of their presentation.
Week 1: Course Introduction; the Central Limit Theorem; Intro to R; Probability and distributions Week 2: Bayesian inference; Naive Bayes classifier; Review of basic statistical tests
Week 3: Review of basic statistical tests continued; Regression
Week 4: Data visualization; Resampling methods; Power analysis
Weeks 6-9: Breakouts within specific programs (No formal lecture. Independent time to complete final project.)
Work with PIs or senior grad students in same research area
Handle real data and work hands-on with analysis tools used by people in the field
Prepare a presentation with results of statistical analyses
Week 7: Final project proposed to instructors
Week 10 : Project presentations
In this course, students are expected to produce original work. This means that all sources used in written work (including articles, books, chalk posts) should be properly cited. The University policy on academic honesty is central to the ideals that undergird this course. Students are expected to be independently familiar with the policy and to recognize that their work in the course is to be their own original work that truthfully represents the time and effort applied. Violations of the policy are taken seriously and will be handled in a manner that fully represents the extent of the policy and that befits the seriousness of its violation.
If you have specific physical, psychiatric or learning disabilities and require accommodations, please let the instructors know early in the course so that your learning needs may be appropriately met. You will also need to meet with the Office of Student Disability Services located at 5501 S Ellis Avenue.
As course instructors, we understand that everyone has had different experiences and backgrounds in statistics. Thus, we are aware that a concept taught in lecture may be review for one and new to another. What you get out of this course ultimately depends on how much effort you put in. We know that laboratory rotations will take most of your time, but we hope that you challenge yourself to learn something new in this course. We want this course to be helpful for you, so please give us feedback throughout the course.
The reason we also encourage you to learn basic computing in R is that you will encounter lessons in R again at BSD QBio. We do not want your learning in this class the the boot camp to be redundant, and therefore hope this course becomes a stepping stone for you to learn more advanced computing at the boot camp. We hope this sounds fair, and do not expect the tutorials to take too long.
Finally, we hope that the final project on a topic of your choice will be something that is relevant to your graduate studies and something you are excited about. Please feel free to email us for office hours if you need any help selecting a topic for your final project. Overall, we are excited for the opportunity to work with and get to better know you all. We hope you are as well!
~Anthony and Allen
Thank you very much to Frank Wen and Katie Lee, who started the course and gave helpful advice on course material and structure.