Last updated: 2019-07-12

Checks: 2 0

Knit directory: MSTPsummerstatistics/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html 397882b Anthony Hung 2019-05-30 Build site.
html 6d3e1c8 Anthony Hung 2019-05-28 Build site.
html c117ef1 Anthony Hung 2019-05-27 Build site.
html b291d24 Anthony Hung 2019-05-24 Build site.
Rmd a321d7b Anthony Hung 2019-05-24 commit before republish
html a321d7b Anthony Hung 2019-05-24 commit before republish
html c4bdfdc Anthony Hung 2019-05-22 Build site.
Rmd dd1e411 Anthony Hung 2019-05-22 before republishing syllabus
html dd1e411 Anthony Hung 2019-05-22 before republishing syllabus
html 096760a Anthony Hung 2019-05-18 Build site.
html da98ae8 Anthony Hung 2019-05-17 Build site.
Rmd 239723e Anthony Hung 2019-05-08 Update learning objectives
html 239723e Anthony Hung 2019-05-08 Update learning objectives
html 2ec7944 Anthony Hung 2019-05-06 Build site.
Rmd d45dca4 Anthony Hung 2019-05-06 Republish
Rmd 68addd7 Anthony Hung 2019-05-04 Publish
Rmd ee75486 Anthony Hung 2019-05-04 Build site.
html 5ea5f30 Anthony Hung 2019-04-29 Build site.
Rmd 22ae3cd Anthony Hung 2019-04-29 Add HMM file
html e746cf5 Anthony Hung 2019-04-28 Build site.
Rmd 133df4a Anthony Hung 2019-04-28 introR
html 133df4a Anthony Hung 2019-04-28 introR
html 22b3720 Anthony Hung 2019-04-26 Build site.
html ddb3114 Anthony Hung 2019-04-26 Build site.
html 413d065 Anthony Hung 2019-04-26 Build site.
html 6b98d6c Anthony Hung 2019-04-26 Build site.
html 602e0f9 Anthony Hung 2019-04-25 Build site.
Rmd ecc06a5 Anthony Hung 2019-04-24 Add syllabus:
html ecc06a5 Anthony Hung 2019-04-24 Add syllabus:
Rmd 2459910 Anthony Hung 2018-09-28 Update all webpages
html 2459910 Anthony Hung 2018-09-28 Update all webpages

ISTP Summer Journal Club: Statistical Theory & Methods

Overview

A thorough understanding of statistics is essential for both experimental design and data analysis in biomedical research. Too often, time and resources are wasted due to a poor understanding of sample size and power calculations, and the reliability of scientific reports has repeatedly been scrutinized in recent years due to questionable, if not fraudulent, application of statistical tests.

In an era of increasingly accessible computational tools, big data, and an emphasis on open-access databases, the need for rigorous training in statistical methods has become more important than ever. As a requirement for admission into Pritzker, all MSTP students must have taken a statistics or biomathematics course in college. Many of us agree that further training in statistics would be a valuable resource and would help us feel more confident about the work we are doing.

Building off of a basic, college-level understanding of statistics, this course aims to have students review and teach essential statistical theory and methods. This will then allow students to use their statistical toolbox for a project that suits their particular research interests and goals.

Objectives

Statistical Theory

Students will build a foundation of theoretical and applied understanding of probability and statistics. This will provide a deeper understanding of statistical tests specific to particular research areas. By the end of the course, students should have working knowledge of key concepts, including, but not limited to probability distributions, hypothesis testing, Bayesian inference, Markov models, multiple testing corrections. See the course schedule for topics covered in the course. Students should be able to apply these concepts to perform data analysis using their own data, or explain how statistical tests are used in the scientific literature.

Statistical Methods

This course also requires students to build computational skills in order to perform data analysis. For this course, the programming language R will be taught. Students, regardless of previous experience, will be expected to learn the fundamentals of how to use R and RStudio. Accompanying this will be a basic exposure the Unix terminal shell and Git/Github. By the end of this course, students should be familiar with the following in R: installation, variable assignment, variable types, operators and functions, data structures, reading and writing data, conditional branching, looping, loading and installing packages, writing functions, and, of course, basic statistical testing and plotting. These concepts will be taught again for the Basic Computing in R track at the BSD QBio5 MBL boot camp. Students are expected by then to be proficient enough in R to choose the Advanced Computing in R track instead.

Learning Objectives

  • Establish a groundwork in theoretical and applied understanding of probability and distributions

  • Introduce the theoretical basis behind commonly used statistical tests and assumptions inherent to each test

  • Obtain a working knowledge of how to use R to streamline robust and reproducible statistical analyses

  • Learn how to integrate statistical thinking into experimental work throughout the scientific process from study design to data analysis

  • Connect students with resources to seek out statistical advising in the future

  • Perform a data analysis project using real data

Final Project

Following the 4-5 weeks of lectures, each student will be expected to prepare a 10-minute presentation to be delivered on the 10th week of classes. Two options are available for the presentation:

  1. Complete a data analysis project using data from a rotation project or data previously collected by members of a rotation lab. In addition to carrying out the project, each student will create a 10 minute presentation describing the data that were collected as well as the statistical methods used to analyze the data to be presented in the 10th week.

  2. Select a primary research paper in a field of interest and create a 10 minute presentation describing the statistical methods used in the paper and the conclusions drawn using those statistical tests to be presented in the 10th week.

Students should propose a topic for their presentation and send the proposed topic to the instructors via email by the 7th week of classes. A rubric for the presentation can be found here.

Grading

The grades for this course will be Pass/Fail (P/F), with Pass being a final grade 70% or higher, and are based on the following criteria:

Participation (30%):

Students are expected to attend all sessions listed in the course schedule. In cases of expected scheduling conflicts, please notify the instructors at least 24 hours in advance for an excused absence.

Homework (40%):

Quizzes after each lecture will be given in order to assess understanding after each lecture. Each is quiz is due by the time the next lecture begins (except for the last lecture; the quiz will be due 48 hours after lecture ends). Because these quizzes are meant to help students review and reinforce concepts, they will be graded based on completion and effort. Tutorials and other resources will be given for students to learn R. These tutorials must be completed by the end of the course. Grading will depend on timely submission of an R script that contains code used for the tutorial.

Final Project (40%):

Grading for the final project will be based on completion, effort, and a demonstrated understanding of statistical concepts. Students will need to submit all scripts/code, slides, papers, and/or other materials used in their final project before the day of their presentation.

Outline of Course

Week 1: Course Introduction; the Central Limit Theorem; Intro to R; Probability and distributions

Week 2: Bayesian inference

Week 3: Markov Chains; Hidden Markov Models; Review of basic statistical tests

Week 4: Review of basic statistical tests Continued; Linear Regression

Week 5: Mutliple Testing; Power Analyses

Weeks 6-9: Breakouts within specific programs (No formal lecture. Independent time to complete final project.)

  • Work with PIs or senior grad students in same research area

  • Handle real data and work hands-on with analysis tools used by people in the field

  • Prepare a presentation with results of statistical analyses

  • Week 7: Final project proposed to instructors

Week 10 : Project presentations

Academic Integrity

In this course, students are expected to produce original work. This means that all sources used in written work (including articles, books, chalk posts) should be properly cited. The University policy on academic honesty is central to the ideals that undergird this course. Students are expected to be independently familiar with the policy and to recognize that their work in the course is to be their own original work that truthfully represents the time and effort applied. Violations of the policy are taken seriously and will be handled in a manner that fully represents the extent of the policy and that befits the seriousness of its violation.

Disability Accommodations

If you have specific physical, psychiatric or learning disabilities and require accommodations, please let the instructors know early in the course so that your learning needs may be appropriately met. You will also need to meet with the Office of Student Disability Services located at 5501 S Ellis Avenue.

Final Note

As course instructors, we understand that everyone has had different experiences and backgrounds in statistics. Thus, we are aware that a concept taught in lecture may be review for one and new to another. What you get out of this course ultimately depends on how much effort you put in. We know that laboratory rotations will take most of your time, but we hope that you challenge yourself to learn something new in this course. We want this course to be helpful for you, so please give us feedback throughout the course.

The reason we also encourage you to learn basic computing in R is that you will encounter lessons in R again at BSD QBio. We do not want your learning in this class the the boot camp to be redundant, and therefore hope this course becomes a stepping stone for you to learn more advanced computing at the boot camp. We hope this sounds fair, and do not expect the tutorials to take too long.

Finally, we hope that the final project on a topic of your choice will be something that is relevant to your graduate studies and something you are excited about. Please feel free to email us for office hours if you need any help selecting a topic for your final project. Overall, we are excited for the opportunity to work with and get to better know you all. We hope you are as well!

~Anthony and Allen

Acknowledgements

Thank you very much to Frank Wen and Katie Lee, who started the course and gave helpful advice on course material and structure.