class: left, middle, inverse, title-slide .title[ # Introduction to EPIB 704 ] .author[ ### Mabel Carabali ] .institute[ ### EBOH, McGill University ] .institute[ ### EPIB 704 ] .date[ ### (updated: 2024-09-06) ] --- class: middle ### Outline 1. Presentations 2. Course description 3. Reproducible research 4. Descriptive epidemiology --- class: middle ##Who are we? .pull-left[ ###Mabel - Social and Infectious Diseases epidemiologist - _**"Recovered"**_ clinician; **ES(T)L** - _**Twitter/FB/IG-less**_ - [Website](https://socialepimethods.lab.mcgill.ca/) ] -- .pull-right[ ### Rina ### Edgar ] --- class: middle ##Who are you? --- ##Course description [Some medical research bingo we will cover](https://twitter.com/MaartenvSmeden/status/1504422629255561219) <br> <img src="images/bingo.jpeg" width="50%" style="display: block; margin: auto;" /> --- class: middle ## Course description - more specifically - Emphasis on epidemiologic theory and the estimation of epidemiologic effect measures and uncertainty in different study designs - Emphasis on causal inference concepts and its distinction from, and yet simultaneously dependency, on statistical models - Emphasizes the limitations of mainstream null hypothesis statistical significance (NHST) paradigm - Encourages the recognition of the value of an estimation (Bayesian) paradigm for optimal inferences --- class: middle ## Course description (2) - βHands-onβ approach whereby the various key statistical concepts covered will be illustrated by computer coding, the new calculus for modern epidemiologic methods - Counting has historically been the essential background to epidemiologic research and remains so in the `\(21^{st}\)` century but is now often best accomplished by simulation and sampling of posterior probability distributions - Examples of the required computing code will be extensively provided. --- class: middle ## Course description (3) - Attention will also be given to exploratory data analysis (tabular and graphical), data interpretation, critical examination of assumptions and reproducible research - Focus mainly on cumulative incidence measures for categorical outcomes, with attention to model checking and screening for confounders and effect measure modifiers - Regression models covered will include linear, logistic, Poisson, survival, & meta-analytical (hierarchical) models - Experimental and quasi-experimental designs will be discussed - Miscellaneous topics will include attributable fractions, selection bias, sensitivity analyses, bootstrapping, matched data, missing data, and misclassified data --- class: middle ## Reference textbooks 1) **_What if?_** by Miguel HernΓ‘n and James Robins, available [here](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) (Chapters 1-11) 2) **_Modern Epidemiology_** by Timothy L. Lash, Tyler J. VanderWeele, Sebastien Haneuse, Kenneth J. Rothman, available at the McGill Library [here](https://mcgill.worldcat.org/title/modern-epidemiology/) 3) **_Regression and Other Stories_** by Andrew Gelman, Jennifer Hill, and Aki Vehtari, available [here](https://users.aalto.fi/~ave/ROS.pdf) Optional deeper Bayesian dive consider the excellent 4) **_Statistical rethinking: a Bayesian course with examples in `R` and `Stan`_** by Richard McElreath available at the McGill Library [here](https://mcgill.worldcat.org/title/statistical-rethinking-a-bayesian-course-with-examples-in-r-and-stan/oclc/1145123627&referer=brief_results) (Chapters 1-10) - Associated website: https://xcelab.net/rm/statistical-rethinking/ and YouTube lectures [here](https://youtube.com/playlist?list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus) Other readings will be assigned each week from published journal articles --- class: middle ## Computing language The computing language of choice for the course is `R`. There are several reasons for this choice including its open source and rich online community which means help is often only a `Google` away. We believe `R` has become the _lingua franca_ for much of the epidemiology/biostatistical universe. Of course, other languages including `Julia` or `Python` will give the same results but unfortunately, we canβt supply the necessary support for those languages. `Stata` is a popular proprietary software, but we consider the scripting and reproducibility offered by `R-Markdown` provides additional advantages for its choice. --- class: middle ## R - Related resources We understand that not everyone is familiar with `R` as a computing language and therefore **<span style="color:darkred">strongly recommend </span>** starting, reviewing and or strengthening your knowledge on the software. This will make your experience much more pleasant and will greatly contribute to the learning experience during this course and throughout your PhD training. Here we provide a list of resources to access before and throughout the academic term. **In-person:** McGill's [Computational and Data Systems Initiative R Summer Camp](https://www.mcgill.ca/cdsi/training/r-math-summer-camp): August 19 to 23, 2024. Registration [here](https://mcgill-cdsi.libcal.com/event/3726537). #### Online - R-Studio Education: Independent learning (https://education.rstudio.com/learn/) - R-Studio/posit: Practice-based learning (https://posit.co/products/enterprise/academy/) - R-related Books: https://www.rstudio.com/resources/books/ - Princeton site to explore `R`: https://exploringr.princeton.edu/self-learning-resources-for-r/ - R-resources list: https://thatdatatho.com/r-resources-beginner-advanced/ --- class: middle ##Course Structure - Lectures - Assignments (submitted through _mycourses_) - Rapid Reviews --- class: middle ##Assessment/Evaluation | | **%** | **Rubrics** | |:--------------------------|--------------:|:----------------------------:| |Attendance & Participation | 10% | Presence, attention, active participation π² π΄ π³| |Rapid Review |18% | Critical appraisal, time management β° π π¬| |Homework Assignments (6x12%)|72% |Accuracy, demonstrated skills, timely delivery π β°| | | **100%**| --- class: middle ## Attendance & Participation We expect to see everyone **<span style="color:darkred">in class in-person </span>**. We consider that this will facilitate the learning experience and will provide everyone the opportunity to interact as colleagues. We understand that life happens and therefore will enable _Zoom_ attendance **only under special circumstances**. Attendance will be graded for a total of 6 points to make up for the 6% of the total grade. We expect you to attend class, pay attention to the lectures and your classmates' presentations, and to participate actively of the in-class discussion, which is another reason to **<span style="color:darkred">favor and encourage in-person attendance </span>**. Doing the suggested lecture **readings BEFORE the lectures** will assure an optimal learning experience. _To safeguard the well being of everyone, **if you present respiratory-like symptoms or any signs or symptoms of communicable conditions** that may endanger your health, the health of classmates or instructors, **please consult a medical professional and inform the instructors as early as possible** to make the required accommodations._ Please see McGill's guidelines for attendance: [Guidelines and Policies](https://www.mcgill.ca/study/2023-2024/university_regulations_and_resources) --- class: middle ## Rapid Review Every enrolled student must sign up for one _rapid review_ which is a 10-minute presentation of a **<span style="color:darkred"> methodological critique </span>** about any new paper (**publication date since July 2024**) published in the biomedical literature (i.e., indexed in PUBMED). - Presenters should comment on study design, validity, precision, causal interpretation, modeling strategy and other concepts from the course. The comments can be praise or critique but **must** demonstrate thoughtful application of the principles discussed in the lectures as applied to new work in the field. **<span style="color:darkred">The rapid review will be graded out of 18 points to make up for the 18% of the total grade.</span>** --- class: middle ## Guidelines for the _rapid review_ - **Selection of the document:** Identify the published manuscript considering the date of publication, your personal/thesis interest and the methodological approach. **[6/18 total points]** Consider the **focus on methods**, consider exploring new methodological approaches or novel use of existent epidemiological methods. - E-pubs or online publications ahead and _pre-prints_ are allowed. - Please provide a pdf of your target paper **at least two weeks before the presentation** to be shared with the class. - **All students** must participate in the review of the paper through the [Perusall platform](https://www.perusall.com/). **<span style="color:darkred">Details bellow.</span>** + You must have received an email to access the EPIB 704 Perusall platform. However, you could also access it through this [link](https://app.perusall.com/courses/fall-2024-epib-704-001-doctoral-level-epib-methods-1/_/dashboard/assignments/zzhAfb9GFiHKSpqjY) --- class: middle ## Guidelines for the _rapid review_ (2) - **Critical appraisal:** Please read and critically appraise the selected manuscript considering key aspects of the design and analytic plan. To illustrate your understanding of the selected document, **<span style="color:darkred">you must re-write the abstract</span>**, as follows: - Write a 300-words (maximum) structured abstract: Background, Objectives, Methods, Results, and Conclusion. - Provide a **50-words** (maximum) section of **strengths and limitations**. - Finally, provide a brief statement (50-words) indicating if the methodological approach used in the reviewed paper is something that you will do, or something that you would like to do, or something that you would definitely not do, and why. - Submit the **written document (max 500 words total)** the day of your review through [myCourses's](https://mycourses2.mcgill.ca/) rapid review section. **[6/18 total points]** --- class: middle ## Guidelines for the _rapid review_ (3) - **In-Class presentation format:** In a maximum total time of **10 minutes** present to the class the results of your review **[6/18 total points]**, considering the following: - Power point / slides or any visual aids are welcome, but there **should not be more than 5 slides/pages**. - After each presentation there would be a _**brief period for Q&As (approximately 5 minutes)**_. - Interactive discussions are highly encouraged but a **maximum of 15 minutes** are allocated for each presentation and discussion in every class. --- class: middle ### Dates assignment select an article and a date (last date Nov 28) `\(^1\)` and **inform the instructors by September 19, 2024** 1st come 1st served. | Date | Name |Article/Manuscript | |:-------|:-------------------------|:------------------------------| | 24 Sep | TAs | TBD | | 26 Sep | | | | 01 Oct | | | | 03 Oct | | | | 08 Oct | | | | 10 Oct | | | | 22 Oct | | | | 24 Oct | | | | 29 Oct | | | | 31 Oct | | | | 05 Nov | | | | 07 Nov | | | | 12 Nov | | | | 14 Nov | | | | 19 Nov | | | | 21 Nov | | | | 26 Nov | | | | 28 Nov | | | --- class: middle ## Homework Assignments The homework will make 72% of the total grade. There will be 6 homework assignments (each worth 12%) throughout the 13 weeks of the semester, which will be graded by the TAs. The schedule of dates for when these will be assigned and handed in and handed back is as follows: |HW # | Assigned | Due | Returned | |:----|:-----------:|:------------:|:---------------:| |**1**| Sept 10 | Sept 17 | Sep 24 | |**2**| Sept 24 | Oct 01 | Oct 08 | |**3**| Oct 08 | Oct 22 | Oct 29 | |**4**| Oct 24 | Oct 31 | Nov 07 | |**5**| Nov 07 | Nov 14 | Nov 21 | |**6**| Nov 21 | Nov 28 |Dec 3 (end of term) | --- class: middle ### Howework Grading Some collaboration in homework assignments may be beneficial but please use good judgment in preventing your collaboration from becoming detrimental to your learning of the material. Submitted assignments should be your **individual effort**, even if you consult with other students about your strategy for obtaining these solutions (see plagiarism note below). To reinforce the concept of scientific **reproducibility** all assignments should be submitted as `R-Markdown` files. Data files for the assignments are available on GitHub and may be installed directly into `R`. Late assignments carry a very severe penalty of **10% off per day late**. This is primarily to protect the time of the TAs. _Pleas of mercy for extenuating circumstances will be accepted only with written documentation. Taking extra time to do a better job is probably not a worthwhile strategy, because the late penalty is so costly._ --- class: middle ### Some Admin & Logistics - Important dates (**McGill**): [ Add/Drop class: Sep 10th, 2024](https://www.mcgill.ca/importantdates/key-dates) - Key dates: Assignments and Rapid Reviews - Communication channels: [_mycourses_](https://mycourses2.mcgill.ca/d2l/loginh/) & emails : Be Mindful of time schedules, office hours - We all have other activities outside the classroom - [Course Evaluations](https://www.mcgill.ca/mercury/students/information) : Feedback throughout the course is MORE THAN WELCOME! - [Academic Integrity](https://www.mcgill.ca/students/srr/honest) : Plagiarism, misconducts, & any form of discrimination, aggressive behavior or harassment towards ANY member of the community is **UNACCEPTABLE** - Other resources at [Student Wellness Hub](https://www.mcgill.ca/resources-services-students#) : FEEL FREE TO ASK! --- class: middle ##A little bit of Bayes <br> <img src="images/detector1.png" width="45%" style="display: block; margin: auto;" /> --- class: middle ##A little bit of Bayes <br> <img src="images/detector2.png" width="50%" style="display: block; margin: auto;" /> --- Class: middle ## Descriptive Epidemiology - What is Descriptive Epidemiology? -- - How much bias is _acceptable_? (π) -- - Descriptive vs. Causal (π§) .footnote[.small[ _Matthew P Fox, Eleanor J Murray, Catherine R Lesko, Shawnita Sealy-Jefferson, On the Need to Revitalize Descriptive Epidemiology, American Journal of Epidemiology, Volume 191, Issue 7, July 2022, Pages 1174β1179, https://doi.org/10.1093/aje/kwac056_ ]] --- class: middle ## Not every research needs to be causal, -- <br> # but everything should be robust (β) --- ## Framework for descriptive Epidemiology 1. Research Question and Background 2. Population and Data Source 3. Outcome Ascertainment/Covariates 4. Analysis Plan: Targeted Measure of Occurrence 5. Results (Presentation of) 6. **Bias**, **Limitations** and **Interpretations** --- # Where are we on our Epi knowledge? ### Baseline Assessment <img src="images/baseline.png" width="40%" style="display: block; margin: auto;" /> --- class: middle ### QUESTIONS? ## COMMENTS? # RECOMMENDATIONS? --- class: middle # Other resources ## Reproducible research Can you reproduce a project that you completed 3 weeks ago? How about 3 months ago? 3 years ago? What if your data changed, even minimally, could you easily redo your analyses? How long would it take you to update your manuscript? --- class: middle ## Getting setup and starting a project in R Studio 1. Put all the files you received into a folder and give it a name (e.g., `704_course`) 2. Choose `File > New Project` 3. Associate the project with the course folder (e.g., `704_course`) 4. From now on, you can open your project by clicking directly on the `.Rproj` file in that folder (or using the drop-down menu on the top right) 5. Now create an R Notebook file (`File > New File > R Notebook`) --- class: middle ## A note on computer code and syntax - There are two main "dialects" of R: base R and tidyverse - I generally use tidyverse syntax - There are three main differences: - The pipe `%>%` - `dplyr` verbs (e.g., `select()`, `filter()`, `mutate()`) - graphing with `ggplot2` (but this antedated the Tidyverse) - However a case can be made that base R is [kinder, gentler and more efficient](https://github.com/matloff/TidyverseSkeptic#case-study--tapply-i) - This course is not about learning computer syntax so plenty of sample code will be provided - It is about understanding what code to use and understanding its output --- class: middle ## Tidyverse examples - Instead of `tidy(lm(y ~ x))`, type `lm(y ~ x) %>% tidy()` - For data management, you can easily filter and select columns: -- ```r data_subset <- data %>% filter(female == 0) %>% # keep female respondents select(income, education) %>% # keep two variables mutate(educ_2 = education^2) %>% # create edu^2 drop_na() # listwise delete missings ``` ??? It's not less code but it's clearer code! --- class: middle ## A few useful keyboard shortcuts Object/Action | Windows | MacOS :-----|:-------|:-----| `<-` | `Alt + -` | `Option + -` `%>%` | `Ctrl + Shift + M` | `Cmd + Shift + M` New code block | `Ctrl + Alt + I` | `Cmd + Option + I` Run highlight | `Ctrl + Enter` | `Cmd + Enter` Clear console | `Ctrl + L` | `Ctrl + L` --- class: middle ## R can help reproducibility Can integrate code and text. Uses RMarkdown which is simple intuitive language which can be out put to different formats - HTML - pdf - LaTex - Word {height=40%} --- class: middle ## What is R Markdown? 1. ["An authoring framework for data science."](https://rmarkdown.rstudio.com/lesson-1.html) (βοΈ) 1. [A document format (`.Rmd`).](https://bookdown.org/yihui/rmarkdown/) (βοΈ) 1. [An R package named `rmarkdown`.](https://rmarkdown.rstudio.com/docs/) (βοΈ) 1. ["A file format for making dynamic documents with R."](https://rmarkdown.rstudio.com/articles_intro.html) (βοΈ) 1. ["A tool for integrating text, code, and results."](https://r4ds.had.co.nz/communicate-intro.html) (βοΈ) 1. ["A computational document."](http://radar.oreilly.com/2011/07/wolframs-computational-documen.html) (βοΈ) 1. Wizardry. (π§ββοΈ) ] --- class: middle ## R Markdown `R Markdown` depends on `knitr` and `Pandoc` `knitr` executes the computer code embedded in Markdown, and converts R Markdown to Markdown `Pandoc` renders Markdown to the output format you want (such as PDF, HTML, Word, etc) <br> <br> β [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) β `RStudio` has a Markdown quick reference drop down help menu --- class: middle ## Basic R-Markdown anatomy 1. The **metadata (YAML)** 1. The **text** 1. The **code** 1. The **output** --- class: middle ## Gentle tutorial [R Markdown for writing reproducible scientific papers](https://uopsych-r-bootcamp-2020.netlify.app/slides/09-rmarkdown_handout.html) --- class: middle ## R Chunk .pull-left[ ```` ```{r echo=TRUE, results='hide'} glimpse(mockdata) ``` ```` ] .pull-right[ .small[default options] ``` ## List of 54 ## $ eval : logi TRUE ## $ echo : logi FALSE ## $ results : chr "markup" ## $ tidy : logi FALSE ## $ tidy.opts : NULL ## $ collapse : logi FALSE ## $ prompt : logi FALSE ## $ comment : chr "##" ## $ highlight : logi TRUE ## $ size : chr "normalsize" ## $ background : chr "#F7F7F7" ## $ strip.white : 'AsIs' logi TRUE ## $ cache : logi FALSE ## $ cache.path : chr "L1_EPIB704_v0_cache/html/" ## $ cache.vars : NULL ## $ cache.lazy : logi TRUE ## $ dependson : NULL ## $ autodep : logi FALSE ## $ cache.rebuild: logi FALSE ## $ fig.keep : chr "high" ## $ fig.show : chr "asis" ## $ fig.align : chr "center" ## $ fig.path : chr "L1_EPIB704_v0_files/figure-html/" ## $ dev : chr "svg" ## $ dev.args : NULL ## $ dpi : num 72 ## $ fig.ext : NULL ## $ fig.width : num 6 ## $ fig.height : num 7 ## $ fig.env : chr "figure" ## $ fig.cap : NULL ## $ fig.scap : NULL ## $ fig.lp : chr "fig:" ## $ fig.subcap : NULL ## $ fig.pos : chr "" ## $ out.width : chr "80%" ## $ out.height : NULL ## $ out.extra : NULL ## $ fig.retina : num 3 ## $ external : logi TRUE ## $ sanitize : logi FALSE ## $ interval : num 1 ## $ aniopts : chr "controls,loop" ## $ warning : logi FALSE ## $ error : logi FALSE ## $ message : logi FALSE ## $ render : NULL ## $ ref.label : NULL ## $ child : NULL ## $ engine : chr "R" ## $ split : logi FALSE ## $ include : logi TRUE ## $ purl : logi TRUE ## $ fig.asp : num 0.618 ``` ] --- ## Visualizations Great way to understand and share insights into your data R is great for visualizations - either base R or tidyverse with **ggplot2** package .pull-left[ ```r library(gapminder) p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point() ``` ] .pull-right[ ```r p ``` <img src="L1_EPIB704_v0_files/figure-html/unnamed-chunk-8-1.svg" width="80%" style="display: block; margin: auto;" /> ] --- ## Build plots with layers ```r p + geom_smooth() ``` <img src="L1_EPIB704_v0_files/figure-html/unnamed-chunk-9-1.svg" width="80%" style="display: block; margin: auto;" /> --- ## Increasing complexity .pull-left[ ```r p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent, fill = continent)) + geom_point() + geom_smooth(method = "loess") + scale_x_log10() ``` ] .pull-right[ ```r p ``` <img src="L1_EPIB704_v0_files/figure-html/unnamed-chunk-11-1.svg" width="100%" style="display: block; margin: auto;" /> ]