class: center, middle, inverse, title-slide .title[ # Data Literacy: Introduction to R ] .subtitle[ ## Workflow & Github ] .author[ ### Veronika Batzdorfer ] .date[ ### 2026-05-08 ] --- layout: true --- ## Space of names ♠️ With more than 20,000 packages on CRAN, different developers sometimes choose the same function name. When you load a package, its functions enter the search path and can mask functions from packages loaded earlier. --- ## Masking 😷 If you load a package and some of its functions have the same names as those from packages you loaded before in your current `R` session, it "masks" these functions. ``` r library(dplyr) ``` From now on, `filter()` refers to dplyr::filter(). --- ## Avoiding/resolving namespace conflicts The order in which packages are loaded (in a session) matters as the masking of functions happens consecutively. You can, however, still access masked functions. ``` r stats::filter() # package_name::function_name() ``` This is also a way of accessing functions from an installed package without loading it with the `library()` command. --- ## Avoiding/resolving namespace conflicts A helpful tool for detecting as well as resolving namespace conflicts is the [`conflicted` package](https://www.tidyverse.org/blog/2018/06/conflicted/). --- ## Updating packages You can update all packages you have installed with the following command: ``` r update.packages() ``` Another option for updating specific packages is the following: ``` r update.packages(oldPkgs = c("correlation", "effectsize")) ``` --- ## Uninstalling packages If you want to uninstall one or more packages, you can do so as follows: ``` r # uninstall one package remove.packages("correlation") # uninstall multiple packages remove.packages(c("correlation", "effectsize")) ``` --- ## Advanced commenting of `R` scripts In RStudio, end a comment with four (or more) dashes `----` to create a collapsible section: ``` r # 1. Load packages ---- library(tidyverse) # 2. Import data ---- dat <- read_csv("data.csv") # 3. Clean data ---- dat <- dat %>% filter(!is.na(id)) ``` --- ## Advanced commenting of `R` scripts <img src="data:image/png;base64,#../img/script_sections_expanded_highlighted.png" width="100%" style="display: block; margin: auto;" /> --- ## *RStudio* projects An RStudio Project (`.Rproj` file) is the safest way to organize work: - It sets the working directory automatically to the project folder. - It keeps scripts, data, and output together. - It makes file paths relative (e.g., "data/survey.csv" instead of "/Users/veronika/.../survey.csv"). Create one: File → New Project → New Directory → New Project. --- ## Recommended project structure Keep a consistent folder structure: ``` r my_project/ ├── my_project.Rproj ├── .gitignore ├── README.md ├── R/ │ ├── 01_import.R │ ├── 02_clean.R │ └── 03_analyze.R ├── data/ │ ├── raw/ │ └── processed/ └── output/ ├── figures/ └── tables/ ``` --- ## `R` workspace In addition to the `Environment` tab in *RStudio*, you can view the content of your workspace via the .highlight[`ls()`] command. You can remove objects from your workspace with the .highlight[`rm()`] function. Unless you change the default settings in *RStudio* as suggested in the "Getting Started" session, `R` will save the workspace in a file called `.RData` in your current working directory and restore the workspace when you restart it. --- ## Starting with Git & GitHub - .highlight[Git]: version control in code/ projects (history with snapshots (commits)) - .highlight[Github]: platform for hosting Git repositories and collaboration --- ## Why use Github with R? - to track and merge changes in R scripts or projects with multiple collaborators - maintain version history/ backup - early publishing share work (RMarkdown) --- <img src="data:image/png;base64,#../img/github.jpg" width="95%" style="display: block; margin: auto;" /> [cheetsheet](https://education.github.com/git-cheat-sheet-education.pdf) [documentation](https://docs.github.com/de) .small[[Source](https://github.com/gesis-css-python/materials/blob/main/1-css/1-2-workflow.ipynb) ] --- ## Setup - [create a GitHub account](https://github.com/) - [GitHub Desktop](https://desktop.github.com): userfriendly interface to manage repositories - [install Git](https://git-scm.com/) - (in case doesn't work straight away: Configure Git: go to **Tools** > **Global Options** > **Git/SVN** and make sure that the box Git executable points to your Git executable) --- ## Integration with R ##### Step 1: Clone a Repository with GitHub Desktop - Open **GitHub Desktop**. - Click "File" > "Clone Repository". - Choose a GitHub repo (or paste URL), select local path, click "Clone". -- ##### Step 2: Open the Project in RStudio - In RStudio: File > Open Project, navigate to the cloned folder .highlight[(.Rproj file)]. - Now you're working in an RStudio Project linked to GitHub. --- ## Workflow Overview - Pull – Get the latest changes from GitHub. ``` r git pull ``` -- - Edit and Save your R scripts or R Markdown. -- - Commit – Save changes with a message (e.g., "added analysis plot"). ``` r git commit -m "version control- go for launch" ``` -- - Push – Send your changes back to GitHub. ``` r git push ``` --- ## Workflow in RStudio (Use Git tab in RStudio or GitHub Desktop for commit/push) -- <img src="data:image/png;base64,#../img/git_r.png" width="90%" style="display: block; margin: auto;" /> --- ## Tips | Bad | Good | | -------- | -------------------------------------------------- | | `fix` | `Fix missing value recoding in org_size` | | `update` | `Add mean trust score and reverse-coded items` | | `asdf` | `Refactor gapminder analysis into separate script` | --- ## Ignoring files with `.gitignore` Not everything belongs on GitHub. Large datasets, passwords, or temporary output should stay local. Create a file named .gitignore in your project root: ``` r # R specific .Rproj.user .Rhistory .RData # Large data files data/raw/*.csv data/raw/*.dta # Output we can regenerate output/figures/*.png output/tables/*.html ``` .small[ Rule of thumb: version-control code, not data or generated output. ] --- ## Tips for Git - Commit early, commit often — one logical change per commit. - Pull before you push — always sync before uploading. - Use branches for experiments (git checkout -b new-analysis). - Never commit sensitive information (passwords, API keys, personal data). --- ## Your turn ### Resolve a namespace conflict - Load both `dplyr` and `plyr` in a fresh R session. - What happens to `summarise` / `summarize`? - Use conflict_prefer() to enforce the dplyr version and write a line of code that proves it works.