Data Literacy: Introduction to R

class: center, middle, inverse, title-slide

.title[
# Data Literacy: Introduction to R
]
.subtitle[
## Workflow & Github
]
.author[
### Veronika Batzdorfer
]
.date[
### 2026-05-08
]

---

layout: true

---

## Space of names ♠️

With more than 20,000 packages on CRAN, different developers sometimes choose the same function name. When you load a package, its functions enter the search path and can mask functions from packages loaded earlier.

---

## Masking 😷

If you load a package and some of its functions have the same names as those from packages you loaded before in your current `R` session, it "masks" these functions.

``` r
library(dplyr)
```

From now on, `filter()` refers to dplyr::filter().

---

## Avoiding/resolving namespace conflicts

The order in which packages are loaded (in a session) matters as the masking of functions happens consecutively. You can, however, still access masked functions.

``` r
stats::filter() # package_name::function_name()
```

This is also a way of accessing functions from an installed package without loading it with the `library()` command.

---

## Avoiding/resolving namespace conflicts

A helpful tool for detecting as well as resolving namespace conflicts is the [`conflicted` package](https://www.tidyverse.org/blog/2018/06/conflicted/).

---

## Updating packages

You can update all packages you have installed with the following command:

``` r
update.packages()
```

Another option for updating specific packages is the following:

``` r
update.packages(oldPkgs = c("correlation", "effectsize"))
```

---

## Uninstalling packages

If you want to uninstall one or more packages, you can do so as follows:

``` r
# uninstall one package
remove.packages("correlation")

# uninstall multiple packages
remove.packages(c("correlation", "effectsize"))
```

---

## Advanced commenting of `R` scripts

In RStudio, end a comment with four (or more) dashes `----` to create a collapsible section:

``` r
# 1. Load packages ----
library(tidyverse)

# 2. Import data ----
dat <- read_csv("data.csv")

# 3. Clean data ----
dat <- dat %>% filter(!is.na(id))
```

---

## Advanced commenting of `R` scripts

---

## *RStudio* projects

An RStudio Project (`.Rproj` file) is the safest way to organize work:

- It sets the working directory automatically to the project folder.
- It keeps scripts, data, and output together.
- It makes file paths relative (e.g., "data/survey.csv" instead of "/Users/veronika/.../survey.csv").

Create one: File → New Project → New Directory → New Project.

---

## Recommended project structure

Keep a consistent folder structure:

``` r
my_project/
├── my_project.Rproj
├── .gitignore
├── README.md
├── R/
│   ├── 01_import.R
│   ├── 02_clean.R
│   └── 03_analyze.R
├── data/
│   ├── raw/
│   └── processed/
└── output/
    ├── figures/
    └── tables/
```
    
    
---

## `R` workspace

In addition to the `Environment` tab in *RStudio*, you can view the content of your workspace via the .highlight[`ls()`] command.

You can remove objects from your workspace with the .highlight[`rm()`] function.

Unless you change the default settings in *RStudio* as suggested in the "Getting Started" session, `R` will save the workspace in a file called `.RData` in your current working directory and restore the workspace when you restart it.

---

## Starting with Git & GitHub

- .highlight[Git]: version control in code/ projects (history with snapshots (commits))

- .highlight[Github]: platform for hosting Git repositories and collaboration

---

## Why use Github with R?

- to track and merge changes in R scripts or projects with multiple collaborators

- maintain version history/ backup

- early publishing share work (RMarkdown)

---

[cheetsheet](https://education.github.com/git-cheat-sheet-education.pdf)
[documentation](https://docs.github.com/de)

.small[[Source](https://github.com/gesis-css-python/materials/blob/main/1-css/1-2-workflow.ipynb)

]

---

## Setup

- [create a GitHub account](https://github.com/)

- [GitHub Desktop](https://desktop.github.com): userfriendly interface to manage repositories

- [install Git](https://git-scm.com/)

- (in case doesn't work straight away: Configure Git: go to **Tools** > **Global Options** > **Git/SVN** and make sure that the box Git executable points to your Git executable)

---

## Integration with R

##### Step 1: Clone a Repository with GitHub Desktop

- Open **GitHub Desktop**.

- Click "File" > "Clone Repository".

- Choose a GitHub repo (or paste URL), select local path, click "Clone".
--

##### Step 2: Open the Project in RStudio

- In RStudio: File > Open Project, navigate to the cloned folder .highlight[(.Rproj file)].

- Now you're working in an RStudio Project linked to GitHub.
    
---

## Workflow Overview

- Pull – Get the latest changes from GitHub.

``` r
git pull
```

- Edit and Save your R scripts or R Markdown.
--

- Commit – Save changes with a message (e.g., "added analysis plot").

``` r
git commit -m "version control- go for launch"
```

- Push – Send your changes back to GitHub.

``` r
git push
```

---

## Workflow in RStudio

(Use Git tab in RStudio or GitHub Desktop for commit/push)

---

## Tips

| Bad      | Good                                               |
| -------- | -------------------------------------------------- |
| `fix`    | `Fix missing value recoding in org_size`           |
| `update` | `Add mean trust score and reverse-coded items`     |
| `asdf`   | `Refactor gapminder analysis into separate script` |

---
## Ignoring files with `.gitignore`

Not everything belongs on GitHub. Large datasets, passwords, or temporary output should stay local.
Create a file named .gitignore in your project root:

``` r
# R specific
.Rproj.user
.Rhistory
.RData

# Large data files
data/raw/*.csv
data/raw/*.dta

# Output we can regenerate
output/figures/*.png
output/tables/*.html
```

.small[
Rule of thumb: version-control code, not data or generated output.
]

---

## Tips for  Git

- Commit early, commit often — one logical change per commit.
- Pull before you push — always sync before uploading.
- Use branches for experiments (git checkout -b new-analysis).
- Never commit sensitive information (passwords, API keys, personal data).

---

## Your turn

### Resolve a namespace conflict

- Load both `dplyr` and `plyr` in a fresh R session. 
- What happens to `summarise` / `summarize`? 
- Use conflict_prefer() to enforce the dplyr version and write a line of code that proves it works.