Practical R Code Snippets

Common Data Visualization & Analysis Solutions

Practical R Code Snippets

A collection of solutions for common data visualization and analysis situations

Based on: github.com/zilinskyjan/R-snippets

Overview

Data Visualization

  • Legend management
  • Label wrapping
  • Title positioning
  • Gridlines & padding
  • Transparent colors
  • Date formatting

Data Manipulation

  • Subsetting in ggplot
  • Prediction values
  • Summary statistics
  • Filtering by sample size

Part 1: Data Visualization with ggplot2

Fix Legend Order Mismatches

Problem: Legend colors don’t match your data order

Solution:

guides(fill = guide_legend(reverse = TRUE))
guides(color = guide_legend(reverse = TRUE))

When to use: Your categorical data appears in one order but the legend shows colors in reverse order.

Wrapping Long Labels

Sub-plots from facet_wrap()

Problem: Long facet labels overlap or look messy

Solution:

facet_wrap(~ var, labeller = label_wrap_gen(width = 30))

# Make text smaller
theme(strip.text = element_text(size = 9))

Wrapping Long Labels

Legend Labels

Problem: Legend labels are too long

Solution:

scale_fill_manual(
  name = "Legend Title",
  values = c("A" = "#1b9e77", "B" = "#d95f02", "C" = "#7570b3"),
  labels = function(x) str_wrap(x, width = 20)
)

Wrapping Long Labels

Axis Labels

Problem: Category names overlap on axes

Solution:

scale_x_discrete(labels = scales::label_wrap(20))

Alternative (emergency):

theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

Bonus: Format numbers nicely:

scale_y_continuous(labels = scales::comma)

Better Title Positioning

Problem: Default title positioning leaves awkward spacing

Solution:

theme(plot.title.position = "plot")

When to use: You need more space or want the title aligned with the plot area edge.

Getting Rid of Awkward Padding

Problem: Unwanted padding around your plot

Solution:

scale_x_continuous(expand = c(0, 0), limits = c(0, 42))

The key part is expand = c(0, 0) to remove padding

Getting Rid of (Some) Gridlines

Problem: Too many gridlines clutter the plot

Solution:

theme_minimal() +
theme(
  panel.grid.major.y = element_blank(),
  panel.grid.minor = element_blank(),
  plot.title = element_text(size = 14, face = "bold"),
  axis.text = element_text(size = 11)
)

Control Legend Visibility

Problem: Unwanted legend entries from specific geoms

Solution:

geom_text(show.legend = FALSE)
geom_point(show.legend = FALSE)
# etc.

When to use: Need to remove specific legend entries while keeping others.

Transparent Colors

Problem: Overlapping points or areas obscure underlying data

Solution:

scales::alpha("blue", 0.5)  # 50% transparent blue

When to use: Overlapping points, areas, or when you need to see underlying data patterns.

Custom Date Axis Formatting

Problem: Default date formatting doesn’t meet your needs

Solution:

scale_x_date(
  name = NULL,  
  breaks = scales::breaks_width("2 years"),
  labels = scales::label_date("'%y") 
)

When to use: You need specific date breaks and label formats.

Part 2: Data Manipulation

Subset Data Within ggplot

Problem: Need different data filtering for specific plot layers

Solution:

ggplot(data, aes(x = var1, y = var2)) +
  stat_smooth(
    method = "loess",
    se = FALSE,
    data = . %>% filter(!is.na(var2))  # Remove NAs for this layer only
  )

When to use: You need different data filtering for specific plot layers without modifying your main dataset.

Generate Prediction Values for Effects Plots

Problem: Need smooth predictions across a range of values

Solution:

mod <- lm(y ~ var1 + var2, data = D)
ggeffects::ggpredict(mod, terms = "var1 [1:10 by=0.1]")

When to use: Creating smooth effect plots or predictions across a range of values.

Summary Statistics with Confidence Intervals

Option 1: Large Sample Size

When to use: Large sample sizes (normal approximation)

D %>% 
  group_by(var) %>%
  summarise(
    M = mean(resp, na.rm = TRUE),
    sd = sd(resp, na.rm = TRUE),
    n = sum(!is.na(resp)),
    se = sd / sqrt(n)
  ) %>%
  ggplot(aes(x = M, y = yvar)) +
  geom_col() +
  geom_errorbar(
    aes(xmin = M - 1.96*se, xmax = M + 1.96*se)
  )

Summary Statistics with Confidence Intervals

Option 1: Small Sample Size

When to use: Small sample sizes (t-distribution)

geom_errorbar(
  aes(
    xmin = M + qt(0.025, df = n-1) * se, 
    xmax = M + qt(0.975, df = n-1) * se
  )
)

Note: qt(0.025, df = 9) is negative, so M + qt(0.025, df = 9) gives the lower bound.

Summary Statistics with Confidence Intervals

Option 2: Using t.test()

When to use: Quick confidence intervals (less robust if data is constant or missing)

D %>% 
  group_by(var) %>%
  summarise(
    M = mean(outcome, na.rm = TRUE),
    M_hi = t.test(outcome)$conf.int[2],
    M_lo = t.test(outcome)$conf.int[1]
  )

Summary Statistics with Confidence Intervals

Option 3: Empirical 95% Range

When to use: Want the empirical 95% range (not a confidence interval)

data %>%
  group_by(group_var) %>%
  summarize(
    mean_val = mean(x, na.rm = TRUE),
    lower = quantile(x, prob = 0.025, na.rm = TRUE),
    upper = quantile(x, prob = 0.975, na.rm = TRUE),
    n_obs = sum(!is.na(x))
  )

Filter by Minimum Observations

Problem: Need to calculate statistics only for groups with sufficient sample sizes

Solution:

analysis_data <- raw_data %>%
  group_by(group1, group2) %>%
  summarize(
    n_complete = sum(!is.na(var1) & !is.na(var2)),
    correlation = ifelse(
      n_complete > 5, 
      cor(var1, var2, use = "complete.obs"), 
      NA
    )
  )

When to use: Avoid unreliable estimates from small groups.

Part 3: External Data Access

Harvard Dataverse Integration

Problem: Need to access publicly available research datasets

Solution:

# Set up connection to Harvard Dataverse
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

# Download dataset directly into R
dataset <- dataverse::get_dataframe_by_name(
  "filename.tab",
  "doi:10.7910/DVN/XXXXXX"  # Replace with actual DOI
)

Resources: CRAN dataverse package vignette

Summary

Visualization Tips - Control legends, labels, titles - Manage spacing and gridlines - Use transparency effectively - Format dates properly

Analysis Tips - Filter data within plots - Generate predictions - Calculate CIs correctly - Filter by sample size

Resources

Questions?