1) Trends and functions
Functions for
table and figure replication
Google trend terms
Positive Google trend
terms
2) Original Data
Original data plot
Original data
ordinary least square regression
3) Gathered Data
Gathered data plot
Gathered data
ordinary least square regression
4) Topic Model
Topic Model
Media coverage and
reporting search trends
5) Extension
Extension
Proportion of
fear vs positive search terms
Plotting
proportion of fear vs positive search terms by presidency
Displaying
fear and positive searches by state between 2004 and 2020
library(tidyverse)
library(zoo)
library(stm)
library(maps)
library(stringr)
The current Google trends only includes the year-month. This functions adds a day component to make it year-month-date while also converting the string to a date to allow later partitions to be more accurate.
add_date <- function(df) {
# formats the data to have the date as a date that includes day
df <- df |>
mutate(ymd=as.Date(paste(ym, "-20", sep=""))) |>
select(-ym)
}
This function further simplifies the partitioning by adding a column that distinguishes which president is serving on that date.
add_pres <- function(df) {
# variables to partition the presidential terms
bush_end <- as.Date("2009-01-20")
trump_start <- as.Date("2017-01-20")
# add the corresponding term to that month
df$president <-
ifelse(df$ymd >= bush_end & df$ymd < trump_start, "Obama",
ifelse(df$ymd < bush_end, "Bush", "Trump"))
return(df)
}
Creates three new columns, one for each president, that holds dummy variables (i.e. 0 or 1) to allow the linear model to run effectively.
zero_or_one <- function(df) {
# adds three columns, one for each president and subsequently add a 1 when it is their presidency
df$bush <-
ifelse(df$president == "Bush", 1, 0)
df$obama <-
ifelse(df$president == "Obama", 1, 0)
df$trump <-
ifelse(df$president == "Trump", 1, 0)
return(df)
}
Recreation of the point plot with linear smoothing for easier understanding.The plot separates the points and lines by presidency.
plot_4 <- function(df, start, end, title) {
# recreated figure 4 from the paper
df |>
ggplot(aes(x = ymd, y = trends, color = president)) +
geom_point(shape = 1) +
geom_smooth(method = "lm", se = FALSE) +
ylim(start, end) +
xlab("Year") +
ylab("Google Trends") +
ggtitle(title)
}
Report: report immigrant+report immigration+report
illegals+report illegal alien+report to ice
Crime: immigrant crime+immigrant criminal+immigrant
murder+immigrant kill
Welfare: immigrant welfare+immigrant cost+immigrant
benefits
The original data used in the paper is loaded and manipulated to include the correct date, presidential terms, and dummy variables.
# loading original data
report_og <- read.csv("google-trends data/original research data/google_trends_report.csv")
crime_og <- read.csv("google-trends data/original research data/google_trends_crime.csv")
welfare_og <- read.csv("google-trends data/original research data/google_trends_welfare.csv")
# inserting proper column names
colnames(report_og) <- c("y", "m" ,"trends")
colnames(crime_og) <- c("y", "m","trends")
colnames(welfare_og) <- c("y", "m","trends")
# converting to reproducible format
report_og$ym <- with(report_og, sprintf("%d-%02d", y, m))
crime_og$ym <- with(crime_og, sprintf("%d-%02d", y, m))
welfare_og$ym <- with(welfare_og, sprintf("%d-%02d", y, m))
# keep only necessary columns
report_og <- report_og |> select(ym, trends)
crime_og <- crime_og |> select(ym, trends)
welfare_og <- welfare_og |> select(ym, trends)
# uses the add_date function
report_og <- add_date(report_og)
crime_og <- add_date(crime_og)
welfare_og <- add_date(welfare_og)
# uses the add_pres function
report_og <- add_pres(report_og)
crime_og <- add_pres(crime_og)
welfare_og <- add_pres(welfare_og)
# uses the zero_or_one function, which is the final desired form
report_og <- zero_or_one(report_og)
crime_og <- zero_or_one(crime_og)
welfare_og <- zero_or_one(welfare_og)
plot_4(report_og, 0, 100, "Report")
plot_4(crime_og, 0, 100, "Crime")
plot_4(welfare_og, 0, 100, "Welfare")
Figure 1. Original Immigration searches by
administration
For all three sets of anti-immigrant terms, searches increased on Google
after Trump’s inauguration speech. There was also a spike in all three
sets of terms in the months immediately after the inauguration.
fit.report_og <- lm(trends~ymd+bush+trump+obama, data=report_og |> filter(trends > 0))
fit.crime_og <- lm(trends~ymd+bush+trump+obama, data=crime_og |> filter(trends > 0))
fit.welfare_og <- lm(trends~ymd+bush+trump+obama, data=welfare_og |> filter(trends > 0))
summary(fit.report_og)
##
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(report_og,
## trends > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.050 -5.662 -1.293 4.796 38.482
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.607670 16.108626 4.507 1.16e-05 ***
## ymd -0.001789 0.001024 -1.748 0.0821 .
## bush 0.785389 2.898559 0.271 0.7867
## trump 19.716343 2.787848 7.072 2.99e-11 ***
## obama NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.601 on 186 degrees of freedom
## Multiple R-squared: 0.2863, Adjusted R-squared: 0.2748
## F-statistic: 24.87 on 3 and 186 DF, p-value: 1.411e-13
summary(fit.crime_og)
##
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(crime_og,
## trends > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.526 -7.106 -3.603 3.434 62.727
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.024897 23.183622 -0.044 0.9648
## ymd 0.001045 0.001473 0.710 0.4789
## bush 8.501543 4.163079 2.042 0.0426 *
## trump 20.300850 4.006843 5.067 9.81e-07 ***
## obama NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.78 on 184 degrees of freedom
## Multiple R-squared: 0.266, Adjusted R-squared: 0.254
## F-statistic: 22.23 on 3 and 184 DF, p-value: 2.502e-12
summary(fit.welfare_og)
##
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(welfare_og,
## trends > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.966 -7.986 -1.646 4.774 56.953
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.587926 20.734642 1.379 0.1696
## ymd -0.000178 0.001318 -0.135 0.8927
## bush 7.819065 3.730956 2.096 0.0375 *
## trump 18.560493 3.588452 5.172 5.95e-07 ***
## obama NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.36 on 186 degrees of freedom
## Multiple R-squared: 0.237, Adjusted R-squared: 0.2247
## F-statistic: 19.26 on 3 and 186 DF, p-value: 6.428e-11
Table 1. Original immigration searches by
presidential administration
After accounting for the linear time trend, there was still a clear
increase in immigrant reporting searches after Trump’s inauguration. The
regression is OLS on monthly search interest.
The gathered is loaded and manipulated to include the correct date, presidential terms, and dummy variables
# loading gathered data
report <- read.csv("google-trends data/replicate data/report.csv")
crime <- read.csv("google-trends data/replicate data/crime.csv")
welfare <- read.csv("google-trends data/replicate data/welfare.csv")
# inserting proper column names
colnames(report) <- c("ym" ,"trends")
colnames(crime) <- c("ym","trends")
colnames(welfare) <- c("ym","trends")
# uses the add_date function
report <- add_date(report)
crime <- add_date(crime)
welfare <- add_date(welfare)
# uses the add_pres function
report <- add_pres(report)
crime <- add_pres(crime)
welfare <- add_pres(welfare)
# uses the zero_or_one function, which is the final desired form
report <- zero_or_one(report)
crime <- zero_or_one(crime)
welfare <- zero_or_one(welfare)
plot_4(report, 0, 100, "Report")
plot_4(crime, 0, 100, "Crime")
plot_4(welfare, 0, 100, "Title")
Figure 2. Gathered data immigration searches by
administration
The overall trends of the later administrations are similar to the
original data. However, the bush administration is visibly different.
After looking into the reason it turn out that the original data has
several point that display searches where the gathered data has a zero
value. We came to the conclusion that the two most probable reasons are
missing values in the most recent searches of those terms on Google or
erroneous gathered date. Even with these mismatching values the overall
trend is similar to the original data.
fit.report <- lm(trends~ymd+bush+trump+obama, data=report |> filter(trends > 0))
fit.crime <- lm(trends~ymd+bush+trump+obama, data=crime |> filter(trends > 0))
fit.welfare <- lm(trends~ymd+bush+trump+obama, data=welfare |> filter(trends > 0))
summary(fit.report)
##
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(report,
## trends > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.306 -5.418 -1.076 4.252 38.740
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.7739107 15.3446130 5.916 1.59e-08 ***
## ymd -0.0027952 0.0009751 -2.867 0.00463 **
## bush 1.7209308 2.7105835 0.635 0.52629
## trump 18.6115465 2.6314099 7.073 3.11e-11 ***
## obama NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.991 on 183 degrees of freedom
## Multiple R-squared: 0.2855, Adjusted R-squared: 0.2738
## F-statistic: 24.37 on 3 and 183 DF, p-value: 2.569e-13
summary(fit.crime)
##
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(crime,
## trends > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.848 -6.442 -3.853 3.012 63.302
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.887e+00 2.684e+01 0.108 0.9145
## ymd 8.249e-04 1.700e-03 0.485 0.6282
## bush 9.877e+00 4.661e+00 2.119 0.0357 *
## trump 1.961e+01 4.308e+00 4.552 1.1e-05 ***
## obama NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.97 on 148 degrees of freedom
## Multiple R-squared: 0.29, Adjusted R-squared: 0.2757
## F-statistic: 20.15 on 3 and 148 DF, p-value: 5.23e-11
summary(fit.welfare)
##
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(welfare,
## trends > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.880 -7.361 -1.873 4.095 54.921
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.556981 22.482828 2.115 0.03595 *
## ymd -0.001323 0.001429 -0.926 0.35581
## bush 12.125788 3.925027 3.089 0.00236 **
## trump 20.303916 3.781089 5.370 2.71e-07 ***
## obama NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.59 on 161 degrees of freedom
## Multiple R-squared: 0.3006, Adjusted R-squared: 0.2875
## F-statistic: 23.06 on 3 and 161 DF, p-value: 1.799e-12
Table 2. Gathered data immigration searches by
presidential administration
The values from the dependent variables in the self gathered data has
the trends seen in the original data. However, the difference in some of
the data points mentioned in Figure 2 create a variation in the
trends from the gathered data.
A structural topic model was used to measure changes in media
coverage of immigrant-related issues before and after the 2016 election.
The measure of the coverage was done on news networks CNN, FOX, and
MSNBC. The topic model gather data relating to 30 different topics, news
networks used, duration of the coverage, transcript, etc.
–For more clarification on this data please refer to page 9 of the
research paper
The function replicates the topic model plots used in the original paper
topic_plot <- function(df, lable) {
df |> ungroup() |>
group_by(channel, time) |>
ggplot(aes(x = yearmonth, y = summation, color = channel, group = interaction(channel, time))) +
geom_vline(xintercept = as.numeric(date1), linetype = "dashed", color = "black") +
geom_vline(xintercept = as.numeric(date2), linetype = "dashed", color = "black") +
geom_point() + geom_smooth( se=FALSE) + scale_color_manual(values = my_colors) +
theme(axis.title.x = element_blank(), legend.position = "bottom", legend.title = element_blank()) +
ylab(lable)
}
Loading, mutating, correctly formatting the date and summarizing the topic model data for plotting
# Loads the topic model
load("TopicModel.RData")
document_topics <- make.dt(immigrFit, meta = out$meta)
topic_terms <- t(exp(immigrFit$beta$logbeta[[1]]))
rownames(topic_terms) <- out$vocab
colnames(topic_terms) <- sprintf("Topic%d", 1:ncol(topic_terms))
# converts the date to appropriate format
document_topics <- document_topics |>
mutate(date = as.Date(date, format = "%Y-%m-%d"),yearmonth = ym(format(date, "%Y-%m")))
# grouping and summarizing based on date
time_line <- document_topics |>
select(time, yearmonth) |>
group_by(time) |>
summarize(yearmonth = min(yearmonth))
# variables holding the date for partitioning
date1 <- as.Date("2015-06-01")
date2 <- as.Date("2017-01-01")
# variable to ensure proper coloring of plot
my_colors <- c("magenta", "red3", "blue")
Separating the topic model data based on topic(i.e Topics 1 and 3 were used for crime topics while Topic 13 was used for welfare topics)
# grouping and filtering by news channel
plot_2_table <- document_topics |>
select(channel, duration, yearmonth, time) |> group_by(yearmonth, channel, time) |>
summarize(summation=sum(duration)) |> arrange(summation)
# grouping and filtering by news channel and crime rhetoric
plot_3a_table <- document_topics |>
select(Topic1, Topic3, channel, duration, yearmonth, time) |> group_by(yearmonth, channel, time) |>
summarize(summation=sum(duration*Topic1 + duration*Topic3)) |> arrange(summation)
# grouping and filtering by news channel and welfare rhetoric
plot_3b_table <- document_topics |>
select(Topic13, channel, duration, yearmonth, time) |> group_by(yearmonth, channel, time) |>
summarize(summation=sum(duration*Topic13)) |> arrange(summation)
topic_plot(plot_2_table, "Num Monthly\nImmigration Segs")
topic_plot(plot_3a_table, "Immiger + Crime\nNews Coverage")
topic_plot(plot_3b_table , "Immiger Welfare\nNew Coverage")
The dotted black lines separate key moments in Trumps administration, the first when the Trump campaign began and the second is the start of his presidency
Figure 3. Immigration news segments
Their were similar immigration coverage over the course of the Trump
campaign and presidency in both MSNBC and FOX, However, Fox always held
larger amount of immigration news coverage during the Trump campaign and
administration. In fact, shortly after the Trump took office in January
of 2017 the immigration news more than double on the conservative news
channel. This supports the hypothesis found in the paper, particularly
No. 3 which indicates “People will have more interest in reporting
immigrants when exposed to fear cues about immigrants.”
Selecting, filtering, and summarizing the daily search data and then joining it with the topic model for further examination
# importing daily trends data
daily <- read.csv("google-trends data/replicate data/gt_report_daily.csv") |> select(date, search, search_adj) |> mutate(date=as.Date(date))
# filtering and selecting only the most influential topics with regard to reporting, crime, and welfare
filtered_topics <- document_topics |>
select(date, channel, duration, Topic1, Topic3, Topic13) |> group_by(date) |>
summarize(duration=sum(duration), Topic1and3=sum(Topic1 + Topic3), Topic13=sum(Topic13))
# obtaining the searches happening during the particular topic new coverage
filtered_topics <- filtered_topics |> inner_join(daily, join_by(date))
Creating a liner ordinary least square regression model for the daily searches and topics during the Trump administration
# dates for partitioning
bush_end <- as.Date("2009-01-20")
trump_start <- as.Date("2017-01-20")
# include presidencies on row date
filtered_topics$president <-
ifelse(filtered_topics$date >= bush_end & filtered_topics$date < trump_start, "obama",
ifelse(filtered_topics$date < bush_end, "bush", "trump"))
# create true/false columns, one for each presidency, for the linear model
filtered_topics$trump <-
ifelse(filtered_topics$president == "trump", TRUE, FALSE)
filtered_topics <- filtered_topics %>%
mutate(month = month(date, label = TRUE, abbr = FALSE))
filtered_topics <- filtered_topics %>%
mutate(day_of_week = wday(date, label = TRUE, abbr = FALSE))
# create the linear model and fit the data
fit.coverage <- lm(search_adj~duration+Topic1and3+Topic13+trump+date+month+day_of_week, data=filtered_topics)
summary(fit.coverage)
##
## Call:
## lm(formula = search_adj ~ duration + Topic1and3 + Topic13 + trump +
## date + month + day_of_week, data = filtered_topics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45.814 -11.965 -1.945 9.044 135.937
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 109.584471 23.731562 4.618 4.13e-06 ***
## duration 0.045549 0.019122 2.382 0.017309 *
## Topic1and3 0.733940 0.263840 2.782 0.005457 **
## Topic13 2.266620 0.673892 3.363 0.000784 ***
## trumpTRUE 19.537663 1.694791 11.528 < 2e-16 ***
## date -0.004388 0.001427 -3.074 0.002141 **
## month.L -10.045117 1.504811 -6.675 3.18e-11 ***
## month.Q 8.494530 1.451830 5.851 5.70e-09 ***
## month.C -2.946282 1.466928 -2.008 0.044727 *
## month^4 2.864202 1.467697 1.951 0.051137 .
## month^5 1.956103 1.465239 1.335 0.182026
## month^6 -4.992703 1.473888 -3.387 0.000719 ***
## month^7 1.426701 1.457228 0.979 0.327673
## month^8 -0.334775 1.453648 -0.230 0.817882
## month^9 -1.070189 1.447434 -0.739 0.459769
## month^10 -0.994749 1.420718 -0.700 0.483900
## month^11 1.614120 1.409658 1.145 0.252328
## day_of_week.L 1.293711 1.103600 1.172 0.241230
## day_of_week.Q -6.798711 1.108536 -6.133 1.04e-09 ***
## day_of_week.C -0.622007 1.102206 -0.564 0.572593
## day_of_week^4 -1.020480 1.101801 -0.926 0.354457
## day_of_week^5 2.492456 1.101831 2.262 0.023798 *
## day_of_week^6 -1.721800 1.099503 -1.566 0.117511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.75 on 2007 degrees of freedom
## Multiple R-squared: 0.2643, Adjusted R-squared: 0.2563
## F-statistic: 32.78 on 22 and 2007 DF, p-value: < 2.2e-16
Table 3. Media coverage and reporting
searches
The reproduced regression is not the same one seen in the paper.
Although, it does show similar correlations and trend, which show that
the coverage of unauthorized immigrant crimes and welfare usage is
positively correlated with higher searches for immigrant reporting.
Positive(i.e. positive terms): immigrant success + immigrant contribution + benefits of immigration + help immigrant
vs <- read.csv("google-trends data/FvsS/fearvssuccess.csv")
map <- read.csv("google-trends data/FvsS/fearvssuccess_geo.csv")
# changing column names to be easier to work with
colnames(vs) <- c("ym" ,"fear", "success")
vs <- vs %>%
mutate(ymd = ymd(paste0(ym, "-01")))
ggplot(data = vs, aes(x = ymd)) +
geom_line(aes(y = fear, color = "fear")) +
geom_line(aes(y = success, color = "positive")) +
labs(y = "Values", color = "Legend") +
theme_minimal() +
xlab("Year") +
ylab("Number of Searches") +
ggtitle("Fear and Positive search terms over time")
Figure 4. Fear and positive search terms over
time
The plot shows how fear based searches compare to positive based
searches over time. It displays non drastic changes regardless of the
time, with the only seeming outlier happening in June 2018. This is when
the Trump administration’s “zero tolerance” policy, which resulted in
the separation of children from their parents at the U.S.-Mexico border,
was a major news story. This policy led to widespread public outrage and
intense media coverage. In response to the public outcry, President
Trump signed an executive order on June 20, 2018, to end family
separations.
# obtaining proportion of searches without the visible oultier from Figure 4
prop_no_outlier <- vs |> add_date() |> add_pres() |> zero_or_one() |>
mutate(proportion=(fear/(success+fear))) |>
filter(fear > 0 & success > 0) |>
group_by(president) |> summarise(mean=mean(proportion))
# obtaining proportion of searches
prop_outlier <- vs |> add_date() |> add_pres() |> zero_or_one() |>
mutate(proportion=(fear/(success+fear))) |>
filter(fear > 0 & success > 0 & ymd != "2018-06-20") |>
group_by(president) |>
summarise(mean=mean(proportion))
# plot with outlier
ggplot(prop_outlier, aes(x = president, y = mean, fill = president)) +
geom_bar(stat = "identity", width = 0.5) +
ggtitle("Proportion of searches during administrations with oultier") +
xlab("President") +
ylab("Proportion Mean Value")
# plot without outlier
ggplot(prop_no_outlier, aes(x = president, y = mean, fill = president)) +
geom_bar(stat = "identity", width = 0.5) +
ggtitle("Proportion of searches during administrations without outlier") +
xlab("President") +
ylab("Proportion Mean Value")
Figure 5. Proportion of fear vs positive
searches during each presidency
The proportion of fear vs positive searches on Google seem to be 2 to 1
regardless of the presidency. In fact, even when removing the only
visible outlier in the trend the proportions are about the same.
Obama,the only liberal president viewed here, actually had a slightly
higher proportion of searches, not enough to be significant, but enough
to show that immigrant search proportion tends to stay balanced
regardless of the rhetoric.
# Pre-processing map data which includes metro area
map <- map %>%
mutate(
state = str_extract(Metro, "[A-Z]{2}$"),
fear = as.numeric(str_replace(fear, "%", "")),
success = as.numeric(str_replace(success, "%", ""))
)
# state abbreviation to correlate with the ones given and mutate into full state names for accurate plotting
state_abbreviations <- data.frame(
abbreviation = c(
"AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"
),
state = tolower(c(
"Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Delaware", "Florida", "Georgia",
"Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas",
"Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts",
"Michigan", "Minnesota", "Mississippi", "Missouri", "Montana",
"Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
"New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma",
"Oregon", "Pennsylvania", "Rhode Island", "South Carolina",
"South Dakota", "Tennessee", "Texas", "Utah", "Vermont",
"Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"
))
)
# mutating data
map2 <- map %>%
left_join(state_abbreviations, by = c("state" = "abbreviation")) %>%
rename(region = state.y)
# obtaining and converting necessary map data
states_map <- map_data("state")
map_data <- left_join(states_map, map2, by = "region") |> mutate(success = ifelse(is.na(success), 0, success))
# plotting fear searches
ggplot(map_data, aes(x = long, y = lat, group = group, fill = fear)) +
geom_polygon(color = "black") +
scale_fill_gradient(name = "Fear (%)", low = "lightblue", high = "darkblue", na.value = "grey90") +
coord_fixed(1.3) +
ggtitle("Fear Percentage by State") +
theme(axis.title = element_blank(), axis.text = element_blank(),, axis.ticks = element_blank(), panel.background = element_blank())
# plotting positive searches
ggplot(map_data, aes(x = long, y = lat, group = group, fill = success)) +
geom_polygon(color = "black") +
scale_fill_gradient(name = "Positive (%)", low = "lightblue", high = "darkblue", na.value = "grey90") +
coord_fixed(1.3) +
ggtitle("Positive Percentage by State") +
theme(axis.title = element_blank(), axis.text = element_blank(),, axis.ticks = element_blank(), panel.background = element_blank())
Figure 6. Search percentages by state
The plots display the percentage of fear and positive searches made by
each state. The positive search percentages are simply an inversion on
the fear searches. However, the second plot was made to more clearly see
the correlation of searches by state. Upon first inspection nothing can
be definitively seen in the plot other than more conservative states
have higher fear based searches but not all. Their are a couple of
liberal states that are also seen with higher fear searches. A deeper
dive would have to be done to glean more information from the plot. For
example, a look at the political divide in each state, population of
immigrants by state, or perhaps a better view at the distribution over
time instead of average searcvhes of a set of time.