class: center, middle, inverse, title-slide .title[ # Data Literacy: Introduction to R ] .subtitle[ ## 101 Starting with R ] .author[ ### Veronika Batzdorfer ] .date[ ### 2025-05-23 ] --- layout: true --- ## About this course By the end of this course you should be able... - to use *RStudio* - to *import*, *wrangle*, and *explore* your data - to create/conduct basic *visualizations* and *analyses* of your data with `R` - to generate reproducible research *reports* using `R Markdown` - to apply basic *debugging* of your code - to call *LLMs* in `R` -- **Note**: *Not a statistics workshop.* --- ## The whole research cycle with `R` .center[ <img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/795c039ba2520455d833b4034befc8cf360a70ba/558a5/diagrams/data-science-explore.png" width="70%" style="display: block; margin: auto;" /> ] .small[ Source: [*R for Data Science*](http://r4ds.had.co.nz/) ] -- .small[ - **Import**: read in formats (e.g., .csv, .xls, .sav, .dta) - **Tidy**: clean data (1 row = 1 case, 1 column = 1 variable), rename & recode variables - **Transform**: prepare data for analysis (e.g., by aggregating and/or filtering) - **Visualize**: explore/analyze data - **Model**: analyze the data by creating models (e.g, linear regression model) - **Communicate**: present the results (to others) ] Source: [*Packages*](https://github.com/qinwf/awesome-R#data-packages) --- ## My jou`R`ney .pull-left[ <img src="data:image/png;base64,#../img/interest.png" width="200%" style="display: block; margin: auto;" /> ] <br> .small[.pull-right[ - ITZ (Institut für Technikzukünfte) working at the EU-Projekt **SoMe4Dem** (Social Media for Democracy) ] ] .small[.pull-right[ - Looking for a Bachelor/Master's thesis, let me know ] ] <br> Contact:<br> [veronika.batzdorfer@kit.edu](mailto:veronika.batzdorfer@kit.edu) | [KIT website](https://sociology.itz.kit.edu/21_138.php) --- ## About you - What's your name? - Where do you work/study? What are you working on/studying? --- ## Workshop Structure & Materials - The workshop consists of a combination of lectures and hands-on exercises - Slides & materials are available at .center[`https://github.com/nika-akin/data-analysis-with-R`] --- ## Course schedule - Day 1 <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: darkgreen !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;"> Onboarding & Getting Started with R </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 13:00 - 13:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;"> 13:15 - 14:00 </td> <td style="text-align:left;font-weight: bold;"> Data Types & Loading </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 14:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;"> 15:00 - 16:00 </td> <td style="text-align:left;font-weight: bold;"> Data Wrangling </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 16:00 - 16:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Friday </td> <td style="text-align:left;color: darkgreen !important;"> 16:15 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Data Workflows & Github </td> </tr> </tbody> </table> --- ## Course schedule - Day 2 <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: darkgreen !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;"> Data Wrangling 2.0 </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 13:00 - 13:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;"> 13:15 - 14:00 </td> <td style="text-align:left;font-weight: bold;"> Exploratory Analyses </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 14:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;"> 15:00 - 16:00 </td> <td style="text-align:left;font-weight: bold;"> Data Visualization </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 16:00 - 16:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Saturday </td> <td style="text-align:left;color: darkgreen !important;"> 16:15 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Relational Data </td> </tr> </tbody> </table> --- ## Course schedule - Day 3 <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: darkgreen !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;"> Recap </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 13:00 - 13:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;"> 13:15 - 14:00 </td> <td style="text-align:left;font-weight: bold;"> Confirmatory Analyses </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 14:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;"> 15:00 - 16:00 </td> <td style="text-align:left;font-weight: bold;"> Reproducible Reporting with R Markdown </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 16:00 - 16:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Monday </td> <td style="text-align:left;color: darkgreen !important;"> 16:15 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> R Markdown revisited </td> </tr> </tbody> </table> --- ## Course schedule - Day 4 <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: darkgreen !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;"> Run LLMs locally </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 13:00 - 13:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;"> 13:15 - 14:00 </td> <td style="text-align:left;font-weight: bold;"> LLMs in RStudio </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 14:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;"> 15:00 - 16:00 </td> <td style="text-align:left;font-weight: bold;"> Group data exploration </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 16:00 - 16:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Tuesday </td> <td style="text-align:left;color: darkgreen !important;"> 16:15 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Group data exploration </td> </tr> </tbody> </table> --- ## Course schedule - Day 5 <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: darkgreen !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;"> Group data challenge </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 13:00 - 13:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;"> 13:15 - 14:00 </td> <td style="text-align:left;font-weight: bold;"> Group data challenge </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 14:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;"> 15:00 - 16:00 </td> <td style="text-align:left;font-weight: bold;"> Group presentations </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;color: gray !important;"> 16:00 - 16:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: darkgreen !important;"> Wednesday </td> <td style="text-align:left;color: darkgreen !important;"> 16:15 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Wrap-up </td> </tr> </tbody> </table> --- ## Follow-up Coordination ### Discussion of Report Proposals - <span style="background-color: yellow;">(**Anfang August** )</span> - Make a Zoom Appointment --- ## Delieverable - Form Groups [~ 2-3 people] - Conduct Analysis and Documentation involving: - Finding **Hypotheses** - Pre-process/Wrangle Data - Explorative/Confirmatory Analysis - Discuss Results and Analysis - Report your <span style="background-color: yellow;">Code and Output</span> - Present your research idea and workflow [10 min] - Write a Report on your Case Study in <span style="background-color: yellow;">R Markdown </span> - Each group submits: RMarkdown code + HTML file to this github repository `assignments` - <span style="background-color: yellow;">Deadline: 20-08-2025</span> -- - Data sources: [Tidy Tuesday Challenge on github](https://github.com/rfordatascience/tidytuesday) --- ## TidyTuesday Examples - [David Robinson](https://www.rscreencasts.com/?ref=reddit) <img src="data:image/png;base64,#../img/shiny.png" width="2177" style="display: block; margin: auto;" /> --- ## Evaluation <img src="data:image/png;base64,#../img/evaluation.png" width="45%" style="display: block; margin: auto;" /> Or: .center[`https://onlineumfrage.kit.edu/evasys/online.php?p=JGCAJ `] --- ## Why use `R`? - it is **free** and **open-source** -- - it is **modular** through the use of packages - You want to do X with `R`? There's ~~an app~~ a package for that! - The universe of packages for `R` keeps on expanding -- - it can be used for all steps in the research process: data collection, processing, exploration, analysis, and reporting/publishing -- - it offers extremely powerful and versatile options for **data visualization** -- - it has a **great community** with groups like [*R-Ladies*](https://rladies.org/), the [*R Consortium*](https://www.r-consortium.org/), [*rOpenSci*](https://ropensci.org/), and many local [`R` user groups](https://jumpingrivers.github.io/meetingsR/r-user-groups.html) worldwide - also check out the [*#rstats* hashtag on Twitter](https://twitter.com/search?q=%23rstats&src=typed_query) -- - it is becoming **increasingly popular** in [academic publications](http://r4stats.com/articles/popularity/), [programming communities like Stack Overflow](https://stackoverflow.blog/2017/10/10/impressive-growth-r/) --- ## Fun with `R` You can use `R` to... - [create memes](https://github.com/sctyner/memer) - there's also a whole [genre of `R` memes](https://github.com/favstats/rstatsmemes) -- - statistical models like [Bayesian: `Stan`](https://mc-stan.org/tools/index.html) -- - query Large Language Models e.g. [`tidychatmodels`](https://albert-rapp.de/posts/20_tidychatmodels/20_tidychatmodels) or local models [httr2](https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/LLMs_2_ollama.md) --- ## The versatility of `R` .small[ - statistical analysis & machine learning (e.g., with [tidymodels](https://www.tidymodels.org/)) - text mining and natural language processing (e.g., with [quanteda](https://quanteda.io/)) - collecting data, for example: - surveys (or diary studies) with [formr](https://formr.org/) - web scraping with [rvest](https://rvest.tidyverse.org/) - all sorts of visualizations, including: - animated plots with [gganimate](https://gganimate.com/) - interactive visualizations with [plotly](https://plotly.com/r/) - (interactive) maps with [tmap](https://r-tmap.github.io/tmap/) - interactive web applications with [Shiny](https://shiny.rstudio.com/) - various sorts of publications, such as... - reproducible reports and publications with [R Markdown](https://rmarkdown.rstudio.com/) - websites with [blogdown](https://bookdown.org/yihui/blogdown/) - books with [bookdown](https://bookdown.org/) - presentations with [xaringan](https://github.com/yihui/xaringan) ] --- ## Installing `R` You can download `R` via the [`R` POSIT-Project website](https://posit.co/download/rstudio-desktop/). The exact installation process depends on your operating system (OS). The *R Cookbook* provides a [detailed explanation of the installation process for *Windows*, *macOS*, and *Linux/Unix*](https://rc2e.com/gettingstarted#recipe-id001). If you want or need to update your version of `R`, you can do this the same way as for the first-time installation. If you use *Windows*, you can also use the [`installr` package](https://github.com/talgalili/installr) to update `R` (we will talk about packages in a bit). --- ## Graphical user interfaces (GUI) for `R` `R` comes with a basic GUI (on *Windows* you can access it by opening the `Rgui.exe` file). <img src="data:image/png;base64,#../img/base_R_gui.PNG" width="95%" style="display: block; margin: auto;" /> --- ## Integrated development interfaces (IDE) for `R` Using an IDE provides several advantages, such as: - syntax highlighting - auto-completion - better overview of files, libraries, created objects/output --- ## *RStudio* *RStudio/POSIT* is the most widely used IDE<sup>1</sup> for `R`. - easy integration with version control via `Git` (see [*Happy Git and GitHub for the useR*](https://happygitwithr.com/)) - interfaces to [`Python` via the `reticulate` package](https://rstudio.github.io/reticulate/) and [`SQL`, e.g., via the `dbplyr` package](https://irene.rbind.io/post/using-sql-in-rstudio/) .small[ [1] Another popular option is [*Visual Studio*](https://visualstudio.microsoft.com/) from *Microsoft* (for which an [`R` extension](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r) is available). ] --- ## *RStudio* interface <img src="data:image/png;base64,#../img/rstudio_1st_explained.png" width="180%" style="display: block; margin: auto;" /> .smaller[ *Note*: You will only see the `Git` tab if you have [`Git`](https://git-scm.com/) installed and set up. In new(er) versions of *RStudio*, the pane including the *Files* tab also includes a *Presentation* tab (which is meant to be used for [`Quarto`](https://quarto.org/) files.) ] --- ## The `R` console in *RStudio* The console is the interactive input-output window of *RStudio*. You can enter commands here and press <kbd>Enter</kbd> to execute them. Typically, the output the the commands you enter into the console will also be displayed here. --- ## The `R` console in *RStudio* If you see the <span style="background-color: yellow;">**`>`**</span> in the console, it means ready to receive commands. -- If you see a <span style="background-color: yellow;">**`+`**</span> at the beginning of the console input line, this means that the command is incomplete. A common reason for this is a missing `)` or `"`. -- Once you have executed at least one command in the console you can cycle through previous ones using <span style="background-color: yellow;">↑</span> and <span style="background-color: yellow;">↓</span> on your keyboard. --- ## `R` as a calculator The simplest thing you can do with the `R` console is to use it as a calculator. ``` r 3+2 ``` ``` ## [1] 5 ``` ``` r 2^3 ``` ``` ## [1] 8 ``` ``` r 1/3 ``` ``` ## [1] 0.3333333 ``` *Note:* In the console, you won't see the `##` in the output. The `[1]` before the result indicates that this is the first output value of the command (more complex commands can have more than one output value). --- ## `R` as a calculator ``` r 100^3 ``` ``` ## [1] 1e+06 ``` ``` r 1/2500 ``` ``` ## [1] 4e-04 ``` .small[ For printing very small and very large numbers, [`R` uses scientific notation](https://www.datakwery.com/post/2020-07-11-scientific-notation-in-r/). If you want to avoid this (e.g., for the console output), you can use the command `options(scipen=10)`. *Note*: You may have to use a higher number and this setting will only be active for the current session. ] ``` r options(scipen=10) 100^3 ``` ``` ## [1] 1000000 ``` ``` r 1/2500 ``` ``` ## [1] 0.0004 ``` --- ## Objects in `R` `R` is object-oriented. The simplest example of assignment in `R` is the assignment of a single value to an object. ``` r x <- 10 y <- "This is a character string" x ``` ``` ## [1] 10 ``` ``` r y ``` ``` ## [1] "This is a character string" ``` --- ## `R` objects in *RStudio* Once one or more objects have been assigned values they also appear in the `Environment` tab in *RStudio*. <img src="data:image/png;base64,#../img/rstudio_environment_objects.png" width="100%" style="display: block; margin: auto;" /> --- ## `R` workspace The `Environment` tab in *RStudio* shows the content of your current working environment (also called <span style="background-color: yellow;">workspace</span>) which includes any used-defined objects. The contents of the current environment are stored in the working memory (RAM) of your computer until you exit `R` (or *RStudio*). -- *Note*: The fact that `R` objects are stored in your computer's RAM can become problematic if you work with "big data". However, there are solutions for working with larger-than-RAM data in `R` (such as [`disk.frame`](https://diskframe.com/)). --- ## `R`s memory use The `Environment` tab includes a small icon that displays the system's overall memory use and the amount of RAM used by `R` <img src="data:image/png;base64,#../img/env_mem_use.png" width="75%" style="display: block; margin: auto;" /> --- ## `R`s memory use *Memory Usage Report* to get more information about current working memory use. <img src="data:image/png;base64,#../img/mem_use_report.png" width="85%" style="display: block; margin: auto;" /> --- ## The two workhorses 🐴 of `R`: Functions and packages 📦 If you want to do anything in `R`, you need to use functions, and functions are provided through packages. We will go through the basics of functions and packages in `R` in the following. --- ## Functions - a function takes an input -> does something with it -> produces some sort of output. - arguments in parentheses (e.g. a value or object) as a single argument ``` r sqrt(9) ``` ``` ## [1] 3 ``` ``` r x <- 9 sqrt(x) ``` ``` ## [1] 3 ``` --- ## Functions The output of a function can be assigned to an object. ``` r x <- sqrt(9) x ``` ``` ## [1] 3 ``` --- ## Functions Most functions in `R` have more than one argument. ``` r y <- "This is a character string" gsub(pattern = "i", replacement = "X", y) ``` ``` ## [1] "ThXs Xs a character strXng" ``` ``` r # replace the character i with the character X in the object y (in this case a character string) ``` *Note*: Technically, functions are also objects in `R`. --- ## Functions If you want to know how to use a function, you can consult its help file. You can do that via the `?` command followed by the function name: ``` r ?gsub ``` In *RStudio*, this will open a file in the `Help` tab. <img src="data:image/png;base64,#../img/help_example.png" width="55%" style="display: block; margin: auto;" /> --- ## Functions Functions can have required and **optional** arguments. You can easily identify required and optional arguments in the `Usage` section of the help file for a function: If the argument is in the format `argument = value` it is optional. If only the argument name is provided `function(argument_1)`, this means that this argument is required. <img src="data:image/png;base64,#../img/help_example.png" width="55%" style="display: block; margin: auto;" /> --- ## Functions Function arguments: (a) <span style="background-color: yellow;"> specified order</span> or (b) by <span style="background-color: yellow;" > referencing </span> them by name (then the order can change). ``` r y <- "This is a character string" gsub("i", "X", y) ``` ``` ## [1] "ThXs Xs a character strXng" ``` ``` r gsub(y, replacement = "X", pattern = "i") ``` ``` ## [1] "ThXs Xs a character strXng" ``` Typing the argument names is more work but it increases the comprehensibility of your code for human readers. --- ## Functions For the <span style="background-color: orange;" >"inner workings" </span> of a function (or maybe use code from existing functions for writing your own functions), you can print the function body: ``` r gsub ``` ``` ## function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE, ## fixed = FALSE, useBytes = FALSE) ## { ## if (is.factor(x) && length(levels(x)) < length(x)) { ## gsub(pattern, replacement, levels(x), ignore.case, perl, ## fixed, useBytes)[x] ## } ## else { ## if (!is.character(x)) ## x <- as.character(x) ## .Internal(gsub(as.character(pattern), as.character(replacement), ## x, ignore.case, perl, fixed, useBytes)) ## } ## } ## <bytecode: 0x000002835975f0c0> ## <environment: namespace:base> ``` --- class: center, middle # [Exercise](https://rawcdn.githack.com/nika-akin/r-intro/9d05476f895e390df08662eecbefd4137f67acf4/exercises/Exercise_1_1_1_First_Steps.html) time 🏋️♀️💪🏃🚴 ## [Solutions](https://rawcdn.githack.com/nika-akin/r-intro/9d05476f895e390df08662eecbefd4137f67acf4/solutions/Exercise_1_1_1_First_Steps.html) --- ## `R` packages They essentially are collections of functions (and sometimes also data sets). The basic `R` system as well as a huge number of additional packages that extend its functionalities are available via [*The Comprehensive R Archive Network* (CRAN)](https://cran.r-project.org/). >CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R ([CRAN website](https://cran.r-project.org/)). --- ## `base R` When we talk about <span style="background-color: yellow;" >`base R` </span> we refer to the packages that come with a new installation of `R` via *CRAN*. There also is a package called `base` included with this but the `base R` system includes a number of other packages as well: `utils`, `stats`, `datasets`, `graphics`, `grDevices`, `grid`, `methods`, `tools`, `parallel`, `compiler`, `splines`, `tcltk`, `stats4`. --- ## Finding packages *CRAN* provides an [alphabetically sorted list with all available packages](https://cran.r-project.org/web/packages/available_packages_by_name.html). You can search for your keywords of interest in that list, but that is not the most convenient option. Two helpful resources for finding `R` packages: - [CRAN Task Views](https://cran.r-project.org/) provide curated lists of recommended packages for specific tasks/areas/topics - [METACRAN](https://www.r-pkg.org/) allows you to search and browse all packages on *CRAN* --- ## Installing packages from *CRAN* in `R` Installing packages from *CRAN* in `R` is very straightforward. ``` r # Install a package install.packages("correlation") # single or double quotation marks # Install multiple packages at once install.packages(c("correlation", "effectsize")) ``` `R` packages are installed in specific directories on your computer. **NB**: If you have multiple versions of `R` installed, there are directories for each version To find where packages are installed on your machine for the version of `R` you are currently using you can use the following command: ``` r .libPaths() ``` --- ## Loading packages Once installed, you need to load it to be able to use the functions (and/or datasets) it contains in your `R` session. ``` r library(correlation) # no quotation marks needed ``` **NB**: While you only need to install packages once, you need to load them at the start of each session to be able to use them. --- ## Other sources for `R` packages Another important source of `R` packages, especially those that are still in early development, is [*GitHub*](https://github.com/). To be able to install packages hosted on *GitHub* you need to use functions from the [`devtools`](https://devtools.r-lib.org/) or the [`remotes`](https://remotes.r-lib.org/) package (which you need to install first as they do not come with `base R`). .small[ ``` r # Option 1 library(devtools) install_github("Felixmil/rollR") # last part of the GitHub URL (user name + repository name) # Option 2 library(remotes) install_github("Felixmil/rollR") # last part of the GitHub URL (user name + repository name) ``` ] *Note*: To be able to install packages from *GitHub* on *Windows* machines, you will need to install [`Rtools`](https://cran.r-project.org/bin/windows/Rtools/) first (make sure to choose the correct version!). --- ## Packages about packages There are a few packages that facilitate the installation and loading of `R` packages (from various sources). Two popular ones are: - [easypackages](https://cran.r-project.org/web/packages/easypackages/index.html) - [pacman](https://github.com/trinker/pacman) --- ## Installed packages You can get information about the packages you have installed on your system with the following function: ``` r installed.packages() ``` --- ## Managing packages with the GUI You can also use the `Packages` tab in the *RStudio* GUI to install, load, update, and uninstall packages. <img src="data:image/png;base64,#../img/rstudio_pkg_tab.png" width="50%" style="display: block; margin: auto;" /> --- ## `R` scripts While the console is useful for trying things out, you should not use it for your actual data analysis. For this you should use `R` scripts that allow you to store and document your code. `R` scripts are similar to syntax files for *SPSS* or do-files for *Stata*. `R` scripts have the file extension <span style="background-color: yellow;" > `.R`</span>. --- ## `R` scripts `File` -> `New File` -> `R Script` --- ## *RStudio* interface: Scripts When you open or create a script in *RStudio* this will be displayed in a fourth pane (which will have multiple tabs if you open/create more than one `R` script or other types of source files). <img src="data:image/png;base64,#../img/r_studio_script.PNG" width="90%" style="display: block; margin: auto;" /> --- ## Working with `R` scripts You can write your code in an `R` script just like you do in the console. If you want to execute a single command from your script in *RStudio*, you can do so by placing your cursor somewhere in command (or directly after it) and clicking the `Run` button in the menu or by using the keyboard shortcut <kbd>Ctrl + Return</kbd> (*Windows* & *Linux*)/<kbd>Cmd + Enter</kbd> (*Mac*). This also works if you select multiple lines of code/commands. You can also run all commands in your script by selecting `Run all` from the dropdown menu next to the `Run` button or via the keyboard shortcut <kbd>Ctrl + Alt + R</kbd> (*Windows* & *Linux*)/<kbd>Cmd + Option + R</kbd> (*Mac*). --- ## Working with `R` scripts You can save your script in *RStudio* via `File` -> `Save` or `Save As...`, by clicking the small blue floppy disk icon, or through the keyboard shortcut <kbd>Ctrl + S</kbd> (*Windows* & *Linux*)/<kbd>Cmd + S</kbd> (*Mac*). --- ## Commenting `R` scripts To properly document your code (for your future self as well as other people who may use your code) it is good practice to use comments. In `R` scripts, you can create a comment by starting a line with a `#`. In *RStudio*, to comment or uncomment one or more lines in a script you can also select them and use the keyboard shortcut <kbd>Ctrl + Shift + C</kbd> (*Windows* & *Linux*)/<kbd>Cmd + Shift + C</kbd> (*Mac*). ``` r # this is a comment library(tidyverse) ``` --- ## Setup and workflows for `R` and *RStudio* In this session, we will only cover the basics that are necessary for establishing such workflows. If you are interested in some further information on setting up and maintaining your installation of `R` and *RStudio* as well as the optimization of workflows, and troubleshooting, you can check out the [appendix slides with additional materials]. .small[ *Note*: Most of the recommendations in the following (as well as in the additional materials) are largely based on the freely available online book [What They Forgot to Teach You About R](https://rstats.wtf/). ] --- ## Working directory The working directory is where `R` will look for and save files by default. Where are we? ``` r getwd() ``` In *RStudio*, the current working directory is also displayed at the top of the `Console` tab. --- ## Setting the working directory via the *RStudio* GUI The *RStudio* menu `Session` -> `Set Working Directory` which provides different options: - "To Project Directory": can be used if you have an `.Rproj` file - "To Source File Location": sets the working directory to the location of the currently active source file (typically an `R` script) - "To FilesPane Location": sets the working directory to the directory that is currently visible in the `Files` tab - "Choose Directory": opens a file browser window that lets you choose a directory --- ## Setting the working directory using functions To increase the reproducibility of your work, using functions in scripts is generally the better approach than using the *RStudio* GUI. You can set a working directory with the following command (of course, you need to replace the file path with the correct one for your system): ``` r setwd("C:/Users/user/Documents/analysis") ``` --- ## Interlude: File paths `R` uses `Unix`-style file paths with `/`, while *Windows* uses `\` in file paths. However `\\` also works in `R` for both OS types. There is a [Stackoverflow post](https://stackoverflow.com/questions/17605563/efficiently-convert-backslash-to-forward-slash-in-r) discussing several ways of dealing with that. A helpful tool in this context for *Windows* users is [*Path Copy Copy*](https://pathcopycopy.github.io/) which is an add-on for the file explorer that lets you copy file paths in different formats. --- ## Interlude: File paths There are absolute (example: "C:/Users/user/Documents/example.R") and relative file paths (example: "./r-scripts/example.R"). -- [**Relative file**](https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner's-guide/) paths are relative to the current working directory. Common shorthand options for relative file paths are `.` for the current (working) directory, `..` for one folder level up (parent folder), and `~` for the home directory (which is the default working directory in `R`). --- ## General settings for *RStudio* By default, `R` stores your workspace and command history when closing a session (and also restores the former upon startup). While this can be helpful, this creates files that you probably will not use, and can also be a barrier for adopting reproducible workflows. To avoid that, there are some general settings in *RStudio* that you might want to change via `Tools` -> `Global Options` -> `General`. --- ## Recommended settings for *RStudio* <img src="data:image/png;base64,#../img/rstudio_general_settings_highlighted.png" width="75%" style="display: block; margin: auto;" /> --- ## Basic workflow and setup recommendations - use `R` scripts to store your code - save/export important output in appropriate file formats (more on that in the following session on *Data Import & Export*) - (try to) use relative file paths in your scripts - eventually consider adopting a [project-oriented workflow](https://rstats.wtf/project-oriented-workflow.html) --- ## Troubleshooting 101 In case you get an error message or if your `R` session crashes, there are a couple of things you can do/try out: - abort `R` process: via `Session` -> `Terminate R` in the *RStudio* menu or the stop sign icon in the upper right corner of the console - Restart `R` (*RStudio* menu: `Session` -> `Restart R`) or *RStudio* - re-install packages --- ## Common sources of errors - typos (**NB**: `R` is case-sensitive) - missing or unmatched `(`, `'`, or `"` (often at the end of a command) - incorrect file paths - `\` instead of `/` in file paths (e.g., when copied from the *Windows* explorer) - packages not installed or loaded - code (chunks) executed in the wrong order <img src="data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/blob/main/rstats-artwork/breakr.gif?raw=true" width="20%" style="display: block; margin: auto;" /> .center[ <small><small>GIF by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)</small></small> ] --- class: center, middle # [Exercise](https://rawcdn.githack.com/nika-akin/r-intro/9d05476f895e390df08662eecbefd4137f67acf4/exercises/Exercise_1_1_2_Packages_Scripts.html) time 🏋️♀️💪🏃🚴 ## [Solutions](https://rawcdn.githack.com/nika-akin/r-intro/9d05476f895e390df08662eecbefd4137f67acf4/solutions/Exercise_1_1_2_Packages_Scripts.html) --- ## Some house keeping ``` r testvar # not ideal 01.test_var # not good coxph_plot # ok # specify functions from packages directly dplyr::mutate() ``` --- ## Cheatsheets *RStudio* offers a good collection of [cheatsheets for R](https://www.rstudio.com/resources/cheatsheets/). The following ones are of particular interest for this workshop: - [RStudio IDE Cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/rstudio-ide.pdf) - [Data Import Cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf) - [Data Transformation with `dplyr` Cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf) - [Data Visualization with `ggplot2` Cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf) - [R Markdown Cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown.pdf)