Base R
visualization
tidyverse
/ggplot2
visualization
There's more that we won't cover:
lattice
plots, for exampleamount_topcountries <- stackoverflow_survey_single_response %>% filter(!is.na(country)) %>% group_by(country) %>% summarise(Amount = n()) %>% arrange(desc(Amount)) %>% head(20) %>% # Top 20 first mutate(country = reorder(country, Amount)) %>% # Reorder country factor ungroup()amount_topcountries %>% head()
## # A tibble: 6 × 2## country Amount## <fct> <int>## 1 United States of America 11095## 2 Germany 4947## 3 India 4231## 4 United Kingdom of Great Britain and Northern Ireland 3224## 5 Ukraine 2672## 6 France 2110
R
The graphics
package is already included.
#-- Adjust plot margins--par(mar = c(10, 4, 4, 2)) barplot( height = amount_topcountries$Amount, names.arg = amount_topcountries$country, las = 2, # Rotate labels for readability main = "Top 20 Countries by Count", ylab = "Number of Records", cex.names = 0.9, # Resize labels border = NA # Remove border )#-- Reset margins after the plotpar(mar = c(5, 4, 4, 2))
The most basic function to plot in R is plot()
.
options(scipen = 999) #prevent exponent. scientific notationplot_df <- stackoverflow_survey_single_response %>% filter(!is.na(converted_comp_yearly), !is.na(years_code))plot(plot_df$years_code, plot_df$converted_comp_yearly)
plot( jitter(df$years_coding, 2), jitter(df$participation_frequency, 2), pch = 1, main = "Relationship Experience and Participation Frequency on SO", xlab = "Year of Experience", ylab = "SO Participation Frequency")
base R
graphics?Using similar procedures, we can add more and more stuff to our plot or edit its elements:
We can also create different plot types, such as
boxplot( df$years_coding ~ df$participation_frequency )
par()
and dev.off()
functions for plotspar()
stands for graphical parameters and is called before the actual plotting function. It prepares the graphics device in R
. The most commonly used options are for "telling" the device that 2, 3, 4, or x
plots have to be printed.
We can, e.g., use mfrow
for specifying how many rows (the first value in the vector) and columns (the second value in the vector) we aim to plot.
par(mfrow = c(2, 2))
One caveat of using this function is that we actively have to turn off the device before generating another independent plot.
dev.off()
It's nice that R
provides such pleasant plotting opportunities. However, to include them in our papers, we need to export them. As said in the beginning, numerous export formats are available in R
.
Alternatively, you can also export plots with the commands png()
, pdf()
or jpeg()
, for example. For this purpose, you first have to wrap the plot call between one of those functions and a dev.off()
call.
png("Plot.png")hist(df$years_coding)dev.off()
pdf("Plot.pdf")hist(df$years_coding)dev.off()
jpeg("Plot.jpeg")hist(df$years_coding)dev.off()
ggplot2
?ggplot2
is another R
package for creating plots and is part of the tidyverse
.
It uses the grammar of graphics. Some things to note about ggplot2
:
plot_call + layer_1 + layer_2 + ... + layer_n
base R
ggplot(df , aes(x = age_group)) + geom_bar()
base R
ggplot( df , aes( x = as.factor(main_branch), y = years_coding )) + geom_boxplot()
According to Wickham (2010, 8)* a layered plot consists of the following components:
* http://dx.doi.org/10.1198/jcgs.2009.07098
plot_call + data + aesthetics + geometries + scales + facets
You can use one single data frame to create a plot in ggplot2
. This creates a smooth workflow from data wrangling to the final presentation of the results.
Source: http://r4ds.had.co.nz
ggplot2
prefers data in long format (NB: of course, only if this is possible and makes sense for the data set at hand)
The architecture of building plots in ggplot
is similar to standard R
graphics. There is an initial plotting call, and subsequently, more stuff is added to the plot.
However, in base R
, it is sometimes tricky to find out how to add (or remove) certain plot elements. For example, think of removing the axis ticks in the scatter plot.
We will systematically explore which elements are used in ggplot
in this session.
We do not want to give a lecture on the theory behind data visualization (if you want that, we suggest having a look at the excellent book Fundamentals of Data Visualization by Claus O. Wilke).
Creating plots is all about practice... and 'borrowing' code from others.
Three components are important:
Now, let's start from the beginning and have a closer look at the grammar of graphics.
ggplot()
is the most basic command to create a plot:
ggplot()
But it doesn't show anything...
ggplot(data = df )
Still nothing there...
aes
thetics!ggplot
requires information about the variables to plot.
ggplot(data = df ) + aes(x = years_coding, y = yearly_compensation)
That's a little bit better, right?
geom
s!Finally, ggplot
needs information how to plot the variables.
ggplot(data = df ) + aes(x = years_coding, y = yearly_compensation) + geom_point()
A scatter plot!
geom
We can also add more than one geom
.
ggplot(data = df) + aes(x = years_coding, y = yearly_compensation) + geom_jitter() + geom_smooth(method = "lm", se = FALSE)
A regression line! (without confidence intervals; the regression behind this operation is run automatically)
aes
theticsWe can add different colors for different groups in our data.
df %>% filter(!is.na(ai_threat)) %>% ggplot(aes( x = years_coding, y = participation_frequency, group = main_branch )) + geom_smooth(method = "lm", se = FALSE)
aes
theticsWe can also change the colors that are used in the plot.
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency, group = main_branch, color = main_branch)) + geom_smooth(method = "lm", se = FALSE)
The legend is drawn automatically, that's handy!
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency, group = main_branch, color = main_branch)) + geom_smooth(method = "lm", se = FALSE) + scale_color_brewer( palette = "Dark2" )
color
and fill
Notably, there are two components of the plot or geom
associated with colors: color
and fill
.
Generally, color
refers to the geometry borders, such as a line. fill
refers to a geometry area, such as a polygon.
Remember when using scale_color_brewer
or scale_fill_brewer
in your plots.
theme
sOne particular strength of ggplot2
lies in its immense theming capabilities. The package has some built-in theme functions that makes theming a plot fairly easy, e.g.,
theme_bw()
theme_apa()
theme_void()
See: https://ggplot2.tidyverse.org/reference/ggtheme.html
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency)) + geom_smooth( color = "black", method = "lm", se = FALSE ) + facet_wrap(~main_branch, ncol = 3, nrow=2) + papaja::theme_apa()
theme()
argument in generalThe most direct interface for manipulating your theme is the theme()
argument. Here you can change the appearance of:
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency)) + geom_smooth( color = "black", method = "lm", se = FALSE ) + facet_wrap(~main_branch, ncol = 3, nrow = 2) + theme_bw()+ theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), strip.background = element_rect(fill = "white") )
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency)) + geom_smooth( color = "black", method = "lm", se = FALSE ) + facet_wrap(~main_branch, ncol = 3, nrow = 2) + theme_bw()+ theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), strip.background = element_rect(fill = "white") ) + ylab("Participation Frequency") + xlab("Year of Coding Experience")
Working with combined aesthetics and different data inputs can become challenging.
Particularly, plotting similar aesthetics which interfere with the automatic procedures can create conflicts.
Some 'favorites' include:
geoms
ggplot
plots are 'simple' objectsIn contrast to standard R
plots, ggplot2
outputs are standard objects like any other object in R
(they are lists). So there is no graphics device involved from which we have to record our plot to re-use it later. We can just use it directly.
my_fancy_plot <- ggplot(data = df, aes( x = years_coding, y = participation_frequency ) ) + geom_point()my_fancy_plot <- my_fancy_plot + geom_smooth()
Additionally, there is also no need to call dev.off()
As of today, there are now a lot of packages that help to combine ggplot2
s fairly easily. For example, the cowplot
package provides a really flexible framework.
Yet, fiddling with this package can become quite complicated. A very easy-to-use package for combining ggplot
s is patchwork
package.
library(patchwork)my_barplot <- ggplot( df , aes(x = years_coding) ) + geom_bar()my_boxplot <- ggplot( df , aes(y = years_pro_coding) ) + geom_boxplot()my_barplot | my_boxplot
my_barplot / my_boxplot
You can also annotate plots with titles, subtitles, captions, and tags.
You can nest plots and introduce more complex layouts.
If you're interested in this, you should check out the patchwork
repository on GitHub as everything is really well-documented there.
Exporting ggplot2
graphics is fairly easy with the ggsave()
function. It automatically detects the file format. You can also define the plot height, width, and dpi, which is particularly useful to produce high-class graphics for publications.
nice_plot <- ggplot( df , aes(x = years_coding) ) + geom_bar()ggsave("nice_plot.png", nice_plot, dpi = 300)
Or:
ggsave("nice_plot.tiff", nice_plot, dpi = 300)
In the session on Exploratory Data Analysis (EDA), we have said that visualization should be part of EDA. We can use ggplot2
for this, but there also are many packages out there that offer helpful visualization functions. We will look at two of those, visdat
(for visualizing missing data patterns) and GGAlly
(for visualizing correlations) in the following. Many of these packages build on ggplot2
and their output can, hence, be further customized or extended using ggplot2
or its extension packages.
library(visdat)vis_miss(df [,18:23])
library(scales)df %>% ggplot( aes( x = age_group, fill = age_group # Fill bars with color based on age group ) ) + geom_bar( # Create a bar plot aes( y = (..count..)/sum(..count..) # Compute relative frequencies (proportions) ) ) + scale_y_continuous( labels = percent # Format y-axis labels as percentages ) + ylab("Relative Frequencies")+ theme_classic() + theme(legend.position = "none") # Hide the legend
df %>% filter(!is.na(ai_complexity)) %>% ggplot(aes(x = ai_complexity, fill = ai_complexity)) + geom_bar(aes(y = (..count..)/sum(..count..))) + scale_y_continuous(labels = percent, expand = expansion(mult = c(0, 0.1))) + ylab("Relative Frequencies") + xlab("")+ theme_classic()+ theme(legend.position = "none")
survey <- stackoverflow_survey_single_response %>% mutate( so_comm = recode(as.character(so_comm), `1` = "Neutral", `2` = "No, not at all", `3` = "No, not really", `4` = "Not sure", `5` = "Yes, definitely", `6` = "Yes, somewhat"), so_part_freq = recode(as.character(so_part_freq), `1` = "A few times per month or weekly", `2` = "A few times per week", `3` = "Daily or almost daily", `4` = "I have never participated in Q&A on Stack Overflow", `5` = "Less than once per month or monthly", `6` = "Multiple times per day") )
library(scales)library(ggrepel)survey %>% select(country, so_comm, so_part_freq) %>% filter(!is.na(country), !is.na(so_comm), !is.na(so_part_freq)) %>% group_by(country) %>% summarise( ConsiderMember = mean(so_comm %in% c("Yes, definitely", "Yes, somewhat"), na.rm = TRUE),#return a logical vector true/false for each row (country) if in freq. group and then calc. mean particip = mean(so_part_freq %in% c( "Multiple times per day", "Daily or almost daily", "A few times per week", "A few times per month or weekly"), na.rm = TRUE), n = n() ) %>% filter(n > 700) %>% ggplot(aes(particip, ConsiderMember, label = country, col = ConsiderMember)) + geom_text_repel(size = 3, point.padding = 0.25) + geom_point(aes(size = n), alpha = 1) + scale_y_continuous(labels = percent_format()) + scale_x_continuous(labels = percent_format()) + scale_size_continuous(labels = comma_format()) + scale_color_gradientn(colors = viridis::viridis(50))+ theme_minimal() + theme(legend.position = "none") + labs( x = "% who participated at least weekly", y = "% who consider themselves as SO Community", title = "Community Membership by Country and Stack Overflow Participation" )
ggplot2 - Elegant Graphics for Data Analysis by Hadley Wickham
Chapter 3 in R for Data Science
Fundamentals of Data Visualization by Claus O. Wilke
Data Visualization - A Practical Introduction by Kieran Healy
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Base R
visualization
tidyverse
/ggplot2
visualization
There's more that we won't cover:
lattice
plots, for exampleamount_topcountries <- stackoverflow_survey_single_response %>% filter(!is.na(country)) %>% group_by(country) %>% summarise(Amount = n()) %>% arrange(desc(Amount)) %>% head(20) %>% # Top 20 first mutate(country = reorder(country, Amount)) %>% # Reorder country factor ungroup()amount_topcountries %>% head()
## # A tibble: 6 × 2## country Amount## <fct> <int>## 1 United States of America 11095## 2 Germany 4947## 3 India 4231## 4 United Kingdom of Great Britain and Northern Ireland 3224## 5 Ukraine 2672## 6 France 2110
R
The graphics
package is already included.
#-- Adjust plot margins--par(mar = c(10, 4, 4, 2)) barplot( height = amount_topcountries$Amount, names.arg = amount_topcountries$country, las = 2, # Rotate labels for readability main = "Top 20 Countries by Count", ylab = "Number of Records", cex.names = 0.9, # Resize labels border = NA # Remove border )#-- Reset margins after the plotpar(mar = c(5, 4, 4, 2))
The most basic function to plot in R is plot()
.
options(scipen = 999) #prevent exponent. scientific notationplot_df <- stackoverflow_survey_single_response %>% filter(!is.na(converted_comp_yearly), !is.na(years_code))plot(plot_df$years_code, plot_df$converted_comp_yearly)
plot( jitter(df$years_coding, 2), jitter(df$participation_frequency, 2), pch = 1, main = "Relationship Experience and Participation Frequency on SO", xlab = "Year of Experience", ylab = "SO Participation Frequency")
base R
graphics?Using similar procedures, we can add more and more stuff to our plot or edit its elements:
We can also create different plot types, such as
boxplot( df$years_coding ~ df$participation_frequency )
par()
and dev.off()
functions for plotspar()
stands for graphical parameters and is called before the actual plotting function. It prepares the graphics device in R
. The most commonly used options are for "telling" the device that 2, 3, 4, or x
plots have to be printed.
We can, e.g., use mfrow
for specifying how many rows (the first value in the vector) and columns (the second value in the vector) we aim to plot.
par(mfrow = c(2, 2))
One caveat of using this function is that we actively have to turn off the device before generating another independent plot.
dev.off()
It's nice that R
provides such pleasant plotting opportunities. However, to include them in our papers, we need to export them. As said in the beginning, numerous export formats are available in R
.
Alternatively, you can also export plots with the commands png()
, pdf()
or jpeg()
, for example. For this purpose, you first have to wrap the plot call between one of those functions and a dev.off()
call.
png("Plot.png")hist(df$years_coding)dev.off()
pdf("Plot.pdf")hist(df$years_coding)dev.off()
jpeg("Plot.jpeg")hist(df$years_coding)dev.off()
ggplot2
?ggplot2
is another R
package for creating plots and is part of the tidyverse
.
It uses the grammar of graphics. Some things to note about ggplot2
:
plot_call + layer_1 + layer_2 + ... + layer_n
base R
ggplot(df , aes(x = age_group)) + geom_bar()
base R
ggplot( df , aes( x = as.factor(main_branch), y = years_coding )) + geom_boxplot()
According to Wickham (2010, 8)* a layered plot consists of the following components:
* http://dx.doi.org/10.1198/jcgs.2009.07098
plot_call + data + aesthetics + geometries + scales + facets
You can use one single data frame to create a plot in ggplot2
. This creates a smooth workflow from data wrangling to the final presentation of the results.
Source: http://r4ds.had.co.nz
ggplot2
prefers data in long format (NB: of course, only if this is possible and makes sense for the data set at hand)
The architecture of building plots in ggplot
is similar to standard R
graphics. There is an initial plotting call, and subsequently, more stuff is added to the plot.
However, in base R
, it is sometimes tricky to find out how to add (or remove) certain plot elements. For example, think of removing the axis ticks in the scatter plot.
We will systematically explore which elements are used in ggplot
in this session.
We do not want to give a lecture on the theory behind data visualization (if you want that, we suggest having a look at the excellent book Fundamentals of Data Visualization by Claus O. Wilke).
Creating plots is all about practice... and 'borrowing' code from others.
Three components are important:
Now, let's start from the beginning and have a closer look at the grammar of graphics.
ggplot()
is the most basic command to create a plot:
ggplot()
But it doesn't show anything...
ggplot(data = df )
Still nothing there...
aes
thetics!ggplot
requires information about the variables to plot.
ggplot(data = df ) + aes(x = years_coding, y = yearly_compensation)
That's a little bit better, right?
geom
s!Finally, ggplot
needs information how to plot the variables.
ggplot(data = df ) + aes(x = years_coding, y = yearly_compensation) + geom_point()
A scatter plot!
geom
We can also add more than one geom
.
ggplot(data = df) + aes(x = years_coding, y = yearly_compensation) + geom_jitter() + geom_smooth(method = "lm", se = FALSE)
A regression line! (without confidence intervals; the regression behind this operation is run automatically)
aes
theticsWe can add different colors for different groups in our data.
df %>% filter(!is.na(ai_threat)) %>% ggplot(aes( x = years_coding, y = participation_frequency, group = main_branch )) + geom_smooth(method = "lm", se = FALSE)
aes
theticsWe can also change the colors that are used in the plot.
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency, group = main_branch, color = main_branch)) + geom_smooth(method = "lm", se = FALSE)
The legend is drawn automatically, that's handy!
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency, group = main_branch, color = main_branch)) + geom_smooth(method = "lm", se = FALSE) + scale_color_brewer( palette = "Dark2" )
color
and fill
Notably, there are two components of the plot or geom
associated with colors: color
and fill
.
Generally, color
refers to the geometry borders, such as a line. fill
refers to a geometry area, such as a polygon.
Remember when using scale_color_brewer
or scale_fill_brewer
in your plots.
theme
sOne particular strength of ggplot2
lies in its immense theming capabilities. The package has some built-in theme functions that makes theming a plot fairly easy, e.g.,
theme_bw()
theme_apa()
theme_void()
See: https://ggplot2.tidyverse.org/reference/ggtheme.html
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency)) + geom_smooth( color = "black", method = "lm", se = FALSE ) + facet_wrap(~main_branch, ncol = 3, nrow=2) + papaja::theme_apa()
theme()
argument in generalThe most direct interface for manipulating your theme is the theme()
argument. Here you can change the appearance of:
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency)) + geom_smooth( color = "black", method = "lm", se = FALSE ) + facet_wrap(~main_branch, ncol = 3, nrow = 2) + theme_bw()+ theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), strip.background = element_rect(fill = "white") )
df %>% filter(!is.na(ai_threat)) %>%ggplot( aes( x = years_coding, y = participation_frequency)) + geom_smooth( color = "black", method = "lm", se = FALSE ) + facet_wrap(~main_branch, ncol = 3, nrow = 2) + theme_bw()+ theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), strip.background = element_rect(fill = "white") ) + ylab("Participation Frequency") + xlab("Year of Coding Experience")
Working with combined aesthetics and different data inputs can become challenging.
Particularly, plotting similar aesthetics which interfere with the automatic procedures can create conflicts.
Some 'favorites' include:
geoms
ggplot
plots are 'simple' objectsIn contrast to standard R
plots, ggplot2
outputs are standard objects like any other object in R
(they are lists). So there is no graphics device involved from which we have to record our plot to re-use it later. We can just use it directly.
my_fancy_plot <- ggplot(data = df, aes( x = years_coding, y = participation_frequency ) ) + geom_point()my_fancy_plot <- my_fancy_plot + geom_smooth()
Additionally, there is also no need to call dev.off()
As of today, there are now a lot of packages that help to combine ggplot2
s fairly easily. For example, the cowplot
package provides a really flexible framework.
Yet, fiddling with this package can become quite complicated. A very easy-to-use package for combining ggplot
s is patchwork
package.
library(patchwork)my_barplot <- ggplot( df , aes(x = years_coding) ) + geom_bar()my_boxplot <- ggplot( df , aes(y = years_pro_coding) ) + geom_boxplot()my_barplot | my_boxplot
my_barplot / my_boxplot
You can also annotate plots with titles, subtitles, captions, and tags.
You can nest plots and introduce more complex layouts.
If you're interested in this, you should check out the patchwork
repository on GitHub as everything is really well-documented there.
Exporting ggplot2
graphics is fairly easy with the ggsave()
function. It automatically detects the file format. You can also define the plot height, width, and dpi, which is particularly useful to produce high-class graphics for publications.
nice_plot <- ggplot( df , aes(x = years_coding) ) + geom_bar()ggsave("nice_plot.png", nice_plot, dpi = 300)
Or:
ggsave("nice_plot.tiff", nice_plot, dpi = 300)
In the session on Exploratory Data Analysis (EDA), we have said that visualization should be part of EDA. We can use ggplot2
for this, but there also are many packages out there that offer helpful visualization functions. We will look at two of those, visdat
(for visualizing missing data patterns) and GGAlly
(for visualizing correlations) in the following. Many of these packages build on ggplot2
and their output can, hence, be further customized or extended using ggplot2
or its extension packages.
library(visdat)vis_miss(df [,18:23])
library(scales)df %>% ggplot( aes( x = age_group, fill = age_group # Fill bars with color based on age group ) ) + geom_bar( # Create a bar plot aes( y = (..count..)/sum(..count..) # Compute relative frequencies (proportions) ) ) + scale_y_continuous( labels = percent # Format y-axis labels as percentages ) + ylab("Relative Frequencies")+ theme_classic() + theme(legend.position = "none") # Hide the legend
df %>% filter(!is.na(ai_complexity)) %>% ggplot(aes(x = ai_complexity, fill = ai_complexity)) + geom_bar(aes(y = (..count..)/sum(..count..))) + scale_y_continuous(labels = percent, expand = expansion(mult = c(0, 0.1))) + ylab("Relative Frequencies") + xlab("")+ theme_classic()+ theme(legend.position = "none")
survey <- stackoverflow_survey_single_response %>% mutate( so_comm = recode(as.character(so_comm), `1` = "Neutral", `2` = "No, not at all", `3` = "No, not really", `4` = "Not sure", `5` = "Yes, definitely", `6` = "Yes, somewhat"), so_part_freq = recode(as.character(so_part_freq), `1` = "A few times per month or weekly", `2` = "A few times per week", `3` = "Daily or almost daily", `4` = "I have never participated in Q&A on Stack Overflow", `5` = "Less than once per month or monthly", `6` = "Multiple times per day") )
library(scales)library(ggrepel)survey %>% select(country, so_comm, so_part_freq) %>% filter(!is.na(country), !is.na(so_comm), !is.na(so_part_freq)) %>% group_by(country) %>% summarise( ConsiderMember = mean(so_comm %in% c("Yes, definitely", "Yes, somewhat"), na.rm = TRUE),#return a logical vector true/false for each row (country) if in freq. group and then calc. mean particip = mean(so_part_freq %in% c( "Multiple times per day", "Daily or almost daily", "A few times per week", "A few times per month or weekly"), na.rm = TRUE), n = n() ) %>% filter(n > 700) %>% ggplot(aes(particip, ConsiderMember, label = country, col = ConsiderMember)) + geom_text_repel(size = 3, point.padding = 0.25) + geom_point(aes(size = n), alpha = 1) + scale_y_continuous(labels = percent_format()) + scale_x_continuous(labels = percent_format()) + scale_size_continuous(labels = comma_format()) + scale_color_gradientn(colors = viridis::viridis(50))+ theme_minimal() + theme(legend.position = "none") + labs( x = "% who participated at least weekly", y = "% who consider themselves as SO Community", title = "Community Membership by Country and Stack Overflow Participation" )
ggplot2 - Elegant Graphics for Data Analysis by Hadley Wickham
Chapter 3 in R for Data Science
Fundamentals of Data Visualization by Claus O. Wilke
Data Visualization - A Practical Introduction by Kieran Healy