class: left, middle, inverse, title-slide .title[ # Introduction to EPIB 704 ] .author[ ### Mabel Carabali ] .institute[ ### EBOH, McGill University ] .institute[ ### EPIB 704 ] .date[ ### (updated: 2025-08-28) ] --- class: middle ### Outline 1. Presentations 2. Course description 3. Reproducible research 4. Descriptive epidemiology --- class: middle ## Who are we? .pull-left[ ### Mabel - Social and Infectious Diseases epidemiologist - _**"Recovered"**_ clinician; **ES(T)L** - _**Twitter/FB/IG-less**_ - [Website](https://socialepimethods.lab.mcgill.ca/) ] -- .pull-right[ ### Gabrielle Jacob ### Julia Brillinger ] --- class: middle # Who are you? ## What are the things you <span style="color:darkmagenta">dislike the most about EPI?</span> π€’ --- class: middle ### [How good are we at getting things done?](https://bigthink.com/neuropsych/task-completion-study/) <br> -- <img src="images/finishthisweek.jpg" width="70%" style="display: block; margin: auto;" /> --- class: middle ## Course description - more specifically - Emphasis on epidemiologic theory and the estimation of epidemiologic effect measures and uncertainty in different study designs - Emphasis on causal inference concepts and its distinction from, and yet simultaneously dependency, on statistical models - Emphasizes the limitations of mainstream null hypothesis statistical significance (NHST) paradigm - Encourages the recognition of the value of an estimation (Bayesian) paradigm for optimal inferences --- class: middle ## Course description (3) - Attention will also be given to exploratory data analysis (tabular and graphical), data interpretation, critical examination of assumptions and reproducible research - Focus mainly on cumulative incidence measures for categorical outcomes, with attention to model checking and screening for confounders and effect measure modifiers - Regression models covered will include linear, logistic, Poisson, survival, & meta-analytical (hierarchical) models - Experimental and quasi-experimental designs will be discussed - Miscellaneous topics will include attributable fractions, selection bias, sensitivity analyses, bootstrapping, matched data, missing data, and misclassified data --- class: middle ## Objectives: 1) Identify common threats of the validity or biases, 2) Identify and interpret their effects in the measurement of associations, and 3) Identify opportunities to investigate, correct, and address these biases. --- class: middle ## Learning Outcomes By the end of this course you should be able to: 1. Contrast advantages and disadvantages of Frequentist and Bayesian statistical inference 2. Recommend analytical approaches to each (every) different study designs 3. Identify the presence of potential threats to the validity of a statistical analysis 4. Design a basic statistical analysis plan to answer a given research question accounting for threats to the validity --- class: middle ## Instructional Methods - βHands-onβ approach whereby the various key statistical concepts covered will be illustrated by computer coding, the new calculus for modern epidemiologic methods - Counting has historically been the essential background to epidemiologic research and remains so in the `\(21^{st}\)` century but is now often best accomplished by simulation and sampling of posterior probability distributions - Examples of the required computing code will be extensively provided. --- class: middle ## Reference textbooks 1) **_What if?_** by Miguel HernΓ‘n and James Robins, available [here](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) (Chapters 1-11) 2) **_Modern Epidemiology_** by Timothy L. Lash, Tyler J. VanderWeele, Sebastien Haneuse, Kenneth J. Rothman, available at the McGill Library [here](https://mcgill.worldcat.org/title/modern-epidemiology/) 3) **_Regression and Other Stories_** by Andrew Gelman, Jennifer Hill, and Aki Vehtari, available [here](https://users.aalto.fi/~ave/ROS.pdf) -- Optional deeper Bayesian dive consider the excellent 4) **_Statistical rethinking: a Bayesian course with examples in `R` and `Stan`_** by Richard McElreath available at the McGill Library [here](https://mcgill.worldcat.org/title/statistical-rethinking-a-bayesian-course-with-examples-in-r-and-stan/oclc/1145123627&referer=brief_results) (Chapters 1-10) - Associated website: https://xcelab.net/rm/statistical-rethinking/ and YouTube lectures [here](https://youtube.com/playlist?list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus) Other readings will be assigned each week from published journal articles --- class: middle ## Computing language The computing language of choice for the course is `R`. There are several reasons for this choice including its open source and rich online community which means help is often only a `Google` away. We believe `R` has become the _lingua franca_ for much of the epidemiology/biostatistical universe. Of course, other languages including `Julia` or `Python` will give the same results but unfortunately, we canβt supply the necessary support for those languages. `Stata` is a popular proprietary software, but we consider the scripting and reproducibility offered by `R-Markdown` provides additional advantages for its choice. --- class: middle ## R - Related resources We understand that not everyone is familiar with `R` as a computing language and therefore **<span style="color:darkred">strongly recommend </span>** starting, reviewing and or strengthening your knowledge on the software. **In-person:** McGill's [Computational and Data Systems Initiative R Summer Camp]((https://www.mcgill.ca/cdsi/training/summer-camps/r-math-summer-camp/r-summer-camp): August 19 to 23, 2024. Other resources from the [CDSI here](https://www.mcgill.ca/cdsi/training/workshops) --- class: middle ## R - Related resources #### Online - R-Studio Education: Independent learning (https://education.rstudio.com/learn/) - R-Studio/posit: Practice-based learning (https://posit.co/products/enterprise/academy/) - R-related Books: https://www.rstudio.com/resources/books/ - Princeton site to explore `R`: https://exploringr.princeton.edu/self-learning-resources-for-r/ - R-resources list: https://thatdatatho.com/r-resources-beginner-advanced/ --- class: middle ##Course Structure - Lectures (mostly discussions!) - Assignments (submitted through _mycourses_) - Rapid Reviews --- class: middle ##Assessment/Evaluation | | **%** | **Rubrics** | |:---------------------------|---------------------:|:-----------------------------------:| |Attendance & Participation | 5% | Presence, attention, active participation π² π΄ π³| |Homework Assignments (5 x 15% each)| 75 % |Accuracy, demonstrated skills, timely delivery π β° π | |Rapid Review (RR) | 15% | Critical appraisal, time management β° π π¬ π| |RR Participation | 5% | Meaningful contribution on RRs from peers π©βπ» π π¦| | **Total** | **100 %**| | --- class: middle ## Attendance & Participation We expect to see everyone **<span style="color:darkred">in class in-person </span>**. - Since life happens: _Zoom_ attendance can be an option **under special circumstances ONLY**. Attendance will be graded of 5 points to make up for the 5% of the total grade: - Attend class, pay attention to the lectures and your classmates' presentations, and to participate actively of the in-class discussion. Doing the suggested lecture **readings BEFORE the lectures** will assure an optimal learning experience. --- class: middle ## Attendance & Participation _To safeguard the well being of everyone, **if you present respiratory-like symptoms or any signs or symptoms of communicable conditions** that may endanger your health, the health of classmates or instructors, **please consult a medical professional and inform the instructors as early as possible** to make the required accommodations._ Please see McGill's guidelines for attendance: [Guidelines and Policies](https://www.mcgill.ca/study/2023-2024/university_regulations_and_resources) --- class: middle ## Rapid Review Every enrolled student must sign up for one _rapid review_ which is a 10-minute presentation of a **<span style="color:darkred"> methodological critique </span>** about any new paper (**publication date since July 2025**) published in the biomedical literature (i.e., indexed in PUBMED). - Presenters should comment on study design, validity, precision, causal interpretation, modeling strategy and other concepts from the course. -- The comments can be praises or critiques but **must** demonstrate thoughtful application of the principles discussed in the lectures as applied to new work in the field. **<span style="color:darkred">The rapid review will be graded out of 15 points to make up for the 15% of the total grade.</span>** --- class: middle ## Guidelines for the _rapid review_ - **Selection of the document:** Identify the published manuscript considering the date of publication, your personal/thesis interest and the methodological approach. **[5/15 total points]** -- Consider the **focus on methods**, consider exploring new methodological approaches or novel use of existent epidemiological methods. - E-pubs or online publications ahead and _pre-prints_ are allowed. - Please provide a pdf of your target paper **at least two weeks before the presentation** to be shared with the class. --- class: middle ## Guidelines for the _rapid review_ - **All students** must participate in the review of the paper through the [Perusall platform](https://www.perusall.com/). + You must have received an email to access the EPIB 704 Perusall platform. + You could also access it through this [link](https://app.perusall.com/courses/fall-2025-epib-704-001-doctoral-level-epib-methods-1/_/dashboard/assignments/PD37EMfwgzMm9EJAQ) - This activity will have an individual grade of 5 points to make **<span style="color:red"> 5% of the total grade</span>**. --- class: middle ## Guidelines for the _rapid review_ (2) - **Critical appraisal:** Please read and critically appraise the selected manuscript considering key aspects of the design and analytic plan. To illustrate your understanding of the selected document, **<span style="color:darkred">you must re-write the abstract</span>**, as follows: - Write a 250-words (maximum) structured abstract: Background, Objectives, Methods, Results, and Conclusion. - Provide a **50-words** (maximum) section of **strengths and limitations**. -- - Finally, provide a brief statement (50-words) indicating if the methodological approach used in the reviewed paper is something that you will do, or something that you would like to do, or something that you would definitely not do, and why. - Submit the **written document (max 350 words total)** the day of your review through [myCourses's](https://mycourses2.mcgill.ca/) rapid review section. **[5/15 total points]** --- class: middle ## Guidelines for the _rapid review_ (3) - **In-Class presentation format:** In a maximum total time of **10 minutes** present to the class the results of your review **[5/15 total points]**, considering the following: - Power point / slides or any visual aids are welcome, but there **should not be more than 5 slides/pages**. - After each presentation there would be a _**brief period for Q&As (approximately 5 minutes)**_. - Interactive discussions are highly encouraged but a **maximum of 15 minutes** are allocated for each presentation and discussion in every class. --- class: middle ### Dates assignment select an article and a date (last date Nov 28) `\(^1\)` and **inform the instructor by September 9, 2025** 1st come 1st served. | Date | Name |Article/Manuscript | |:-------|:-------------------------|:------------------------------| | 09 Sep | TAs | TBD | | 11 Sep | | | | 16 Sep | | | | 18 Sep | | | | 23 Sep | | | | 25 Sep | | | | 30 Sep | | | | 02 Oct | | | | 07 Oct | | | | 09 Oct | | | | 21 Oct | | | | 23 Oct | | | | 28 Oct | | | | 30 Oct | | | | 11 Nov | | | | 13 Nov | | | | 18 Nov | | | | 20 Nov | | | | 25 Nov | | | --- class: middle ## Homework Assignments The homework will make 75% of the total grade. There will be 5 homework assignments (each worth 15%) throughout the 13 weeks of the semester, which will be graded by the TAs. The schedule of dates for when these will be assigned and handed in and handed back is as follows: |**HW #** | **Assigned** | **Due** | **Returned** | |:----|:-----------:|:------------:|:---------------:| |**1**| Sept 11 | Sept 18 | Sept 25 | |**2**| Sept 25 | Oct 02 | Oct 09 | |**3**| Oct 09 | Oct 23 | Oct 30 | |**4**| Oct 30 | Nov 06 | Nov 13 | |**5**| Nov 06 `\(^*\)` | Nov 20 | Nov 27 | `\(^*\)` Last assignment is given 15 working days in advance. See Policy on Assessment of Student Learning [(PASL)](https://www.mcgill.ca/assessment-for-learning/pasl) subsection 6.7. in Point 6. [COMMUNICATION OF COURSE ASSESSMENT TASKS](https://www.mcgill.ca/secretariat/files/secretariat/assessment_of_student_learning.pdf). --- class: middle ### Howework Grading Submitted assignments should be your **individual effort**, even if you consult with other students about your strategy for obtaining these solutions (see plagiarism note below). To reinforce the concept of scientific **reproducibility** all assignments should be submitted as `R-Markdown` files. Data files for the assignments are available on GitHub and may be installed directly into `R`. Late assignments carry a very severe penalty of **20% off per day late**. This is primarily to protect the time of the TAs. _Pleas of mercy for extenuating circumstances will be accepted only with written documentation. Taking extra time to do a better job is probably not a worthwhile strategy, because the late penalty is so costly._ --- class: middle ### Some Admin & Logistics - Important dates (**McGill**): [ Add/Drop class: Sep 9th, 2025](https://www.mcgill.ca/importantdates/key-dates) - Key dates: Assignments and Rapid Reviews - Communication channels: [_mycourses_](https://mycourses2.mcgill.ca/d2l/loginh/) & emails : Be Mindful of time schedules, office hours - We all have other activities outside the classroom - [Course Evaluations](https://www.mcgill.ca/mercury/students/information) : Feedback throughout the course is MORE THAN WELCOME! Read the syllabus! :) --- class: middle ### Some Admin & Logistics - [Academic Integrity](https://www.mcgill.ca/students/srr/honest) : Plagiarism, misconducts, & any form of discrimination, aggressive behavior or harassment towards ANY member of the community is **UNACCEPTABLE** - [Policy for the academic accommodation of religious observances](https://www.mcgill.ca/secretariat/files/secretariat/religious_holy_days_policy_on_accomodation_of.pdf) - [Guidelines for the Academic Accommodation of Pregnant Students and Students Caring for Dependants](https://www.mcgill.ca/students/srr/files/students.srr/guidelines_for_the_academic_accommodation_of_pregnant_students_and_students_caring_for_dependants.pdf) - [Policy concerning the rights to academic accommodations for students with dissabilities](https://www.mcgill.ca/secretariat/files/secretariat/students_with_disabilities_policy_concerning_the_right_to_academic_accommodations_for.pdf) - [Charter of Studentsβ Rights](https://www.mcgill.ca/secretariat/files/secretariat/charter_of_student_rights_last_approved_october_262017.pdf) - Other resources at [Student Wellness Hub](https://www.mcgill.ca/resources-services-students#) : FEEL FREE TO ASK! --- # Where are we on our Epi knowledge? ### Baseline Assessment [Quizz](https://forms.office.com/r/e6wSzVPYtk) <img src="images/baseline_2025.png" width="50%" style="display: block; margin: auto;" /> --- class: middle [Some medical research bingo we will cover](https://twitter.com/MaartenvSmeden/status/1504422629255561219) <br> <img src="images/bingo.jpeg" width="50%" style="display: block; margin: auto;" /> --- Class: middle ## Descriptive Epidemiology - What is Descriptive Epidemiology? -- - How much bias is _acceptable_? (π) -- - Descriptive vs. Causal (π§) .footnote[.small[ _Matthew P Fox, Eleanor J Murray, Catherine R Lesko, Shawnita Sealy-Jefferson, On the Need to Revitalize Descriptive Epidemiology, American Journal of Epidemiology, Volume 191, Issue 7, July 2022, Pages 1174β1179, https://doi.org/10.1093/aje/kwac056_ ]] --- class: middle ## Not every research needs to be causal, -- <br> # but everything should be robust (β) --- ## Framework for descriptive Epidemiology 1. Research Question and Background 2. Population and Data Source 3. Outcome Ascertainment/Covariates 4. Analysis Plan: Targeted Measure of Occurrence 5. Results (Presentation of) 6. **Bias**, **Limitations** and **Interpretations** --- class: middle ### A little bit of Bayes <br> <img src="images/seashell_2x.png" width="40%" style="display: block; margin: auto;" /> [Source: https://xkcd.com/1236/] --- class: middle ### QUESTIONS? ## COMMENTS? # RECOMMENDATIONS? --- class: middle # Other resources ## Reproducible research Can you reproduce a project that you completed 3 weeks ago? How about 3 months ago? 3 years ago? What if your data changed, even minimally, could you easily redo your analyses? How long would it take you to update your manuscript? --- class: middle ## Getting setup and starting a project in R Studio 1. Put all the files you received into a folder and give it a name (e.g., `704_course`) 2. Choose `File > New Project` 3. Associate the project with the course folder (e.g., `704_course`) 4. From now on, you can open your project by clicking directly on the `.Rproj` file in that folder (or using the drop-down menu on the top right) 5. Now create an R Notebook file (`File > New File > R Notebook`) --- class: middle ## A note on computer code and syntax - There are two main "dialects" of R: base R and tidyverse - I generally use tidyverse syntax - There are three main differences: - The pipe `%>%` - `dplyr` verbs (e.g., `select()`, `filter()`, `mutate()`) - graphing with `ggplot2` (but this antedated the Tidyverse) - However a case can be made that base R is [kinder, gentler and more efficient](https://github.com/matloff/TidyverseSkeptic#case-study--tapply-i) - This course is not about learning computer syntax so plenty of sample code will be provided - It is about understanding what code to use and understanding its output --- class: middle ## Tidyverse examples - Instead of `tidy(lm(y ~ x))`, type `lm(y ~ x) %>% tidy()` - For data management, you can easily filter and select columns: -- ``` r data_subset <- data %>% filter(female == 0) %>% # keep female respondents select(income, education) %>% # keep two variables mutate(educ_2 = education^2) %>% # create edu^2 drop_na() # listwise delete missings ``` ??? It's not less code but it's clearer code! --- class: middle ## A few useful keyboard shortcuts Object/Action | Windows | MacOS :-----|:-------|:-----| `<-` | `Alt + -` | `Option + -` `%>%` | `Ctrl + Shift + M` | `Cmd + Shift + M` New code block | `Ctrl + Alt + I` | `Cmd + Option + I` Run highlight | `Ctrl + Enter` | `Cmd + Enter` Clear console | `Ctrl + L` | `Ctrl + L` --- class: middle ## R can help reproducibility Can integrate code and text. Uses RMarkdown which is simple intuitive language which can be out put to different formats - HTML - pdf - LaTex - Word {height=40%} --- class: middle ## What is R Markdown? 1. ["An authoring framework for data science."](https://rmarkdown.rstudio.com/lesson-1.html) (βοΈ) 1. [A document format (`.Rmd`).](https://bookdown.org/yihui/rmarkdown/) (βοΈ) 1. [An R package named `rmarkdown`.](https://rmarkdown.rstudio.com/docs/) (βοΈ) 1. ["A file format for making dynamic documents with R."](https://rmarkdown.rstudio.com/articles_intro.html) (βοΈ) 1. ["A tool for integrating text, code, and results."](https://r4ds.had.co.nz/communicate-intro.html) (βοΈ) 1. ["A computational document."](http://radar.oreilly.com/2011/07/wolframs-computational-documen.html) (βοΈ) 1. Wizardry. (π§ββοΈ) ] --- class: middle ## R Markdown `R Markdown` depends on `knitr` and `Pandoc` `knitr` executes the computer code embedded in Markdown, and converts R Markdown to Markdown `Pandoc` renders Markdown to the output format you want (such as PDF, HTML, Word, etc) <br> <br> β [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) β `RStudio` has a Markdown quick reference drop down help menu --- class: middle ## Basic R-Markdown anatomy 1. The **metadata (YAML)** 1. The **text** 1. The **code** 1. The **output** --- class: middle ## Gentle tutorial [R Markdown for writing reproducible scientific papers](https://uopsych-r-bootcamp-2020.netlify.app/slides/09-rmarkdown_handout.html) --- class: middle ## R Chunk .pull-left[ ```` ```{r echo=TRUE, results='hide'} glimpse(mockdata) ``` ```` ] .pull-right[ .small[default options] ``` ## List of 54 ## $ eval : logi TRUE ## $ echo : logi FALSE ## $ results : chr "markup" ## $ tidy : logi FALSE ## $ tidy.opts : NULL ## $ collapse : logi FALSE ## $ prompt : logi FALSE ## $ comment : chr "##" ## $ highlight : logi TRUE ## $ size : chr "normalsize" ## $ background : chr "#F7F7F7" ## $ strip.white : 'AsIs' logi TRUE ## $ cache : logi FALSE ## $ cache.path : chr "L1_EPIB704_25_cache/html/" ## $ cache.vars : NULL ## $ cache.lazy : logi TRUE ## $ dependson : NULL ## $ autodep : logi FALSE ## $ cache.rebuild: logi FALSE ## $ fig.keep : chr "high" ## $ fig.show : chr "asis" ## $ fig.align : chr "center" ## $ fig.path : chr "L1_EPIB704_25_files/figure-html/" ## $ dev : chr "svg" ## $ dev.args : NULL ## $ dpi : num 72 ## $ fig.ext : NULL ## $ fig.width : num 6 ## $ fig.height : num 7 ## $ fig.env : chr "figure" ## $ fig.cap : NULL ## $ fig.scap : NULL ## $ fig.lp : chr "fig:" ## $ fig.subcap : NULL ## $ fig.pos : chr "" ## $ out.width : chr "80%" ## $ out.height : NULL ## $ out.extra : NULL ## $ fig.retina : num 3 ## $ external : logi TRUE ## $ sanitize : logi FALSE ## $ interval : num 1 ## $ aniopts : chr "controls,loop" ## $ warning : logi FALSE ## $ error : logi FALSE ## $ message : logi FALSE ## $ render : NULL ## $ ref.label : NULL ## $ child : NULL ## $ engine : chr "R" ## $ split : logi FALSE ## $ include : logi TRUE ## $ purl : logi TRUE ## $ fig.asp : num 0.618 ``` ] --- ## Visualizations Great way to understand and share insights into your data R is great for visualizations - either base R or tidyverse with **ggplot2** package .pull-left[ ``` r library(gapminder) p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point() ``` ] .pull-right[ ``` r p ``` <img src="L1_EPIB704_25_files/figure-html/unnamed-chunk-8-1.svg" width="80%" style="display: block; margin: auto;" /> ] --- ## Build plots with layers ``` r p + geom_smooth() ``` <img src="L1_EPIB704_25_files/figure-html/unnamed-chunk-9-1.svg" width="80%" style="display: block; margin: auto;" /> --- ## Increasing complexity .pull-left[ ``` r p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent, fill = continent)) + geom_point() + geom_smooth(method = "loess") + scale_x_log10() ``` ] .pull-right[ ``` r p ``` <img src="L1_EPIB704_25_files/figure-html/unnamed-chunk-11-1.svg" width="100%" style="display: block; margin: auto;" /> ]