Anti-Immigrant Rhetoric and ICE Reporting Interest: Evidence from a Large-Scale Study of Web Search Data Replication and Extension

1) Trends and functions
Functions for table and figure replication
Google trend terms
Positive Google trend terms

2) Original Data
Original data plot
Original data ordinary least square regression

3) Gathered Data
Gathered data plot
Gathered data ordinary least square regression

4) Topic Model
Topic Model
Media coverage and reporting search trends

5) Extension
Extension
Proportion of fear vs positive search terms
Plotting proportion of fear vs positive search terms by presidency
Displaying fear and positive searches by state between 2004 and 2020

Loading libraries

library(tidyverse)
library(zoo)
library(stm)
library(maps)
library(stringr)

Functions for table and figure replication

Date manipulation

The current Google trends only includes the year-month. This functions adds a day component to make it year-month-date while also converting the string to a date to allow later partitions to be more accurate.

add_date <- function(df) {
  # formats the data to have the date as a date that includes day
  df <- df |>  
    mutate(ymd=as.Date(paste(ym, "-20", sep=""))) |> 
    select(-ym)
  }

Inserting presidential terms

This function further simplifies the partitioning by adding a column that distinguishes which president is serving on that date.

add_pres <- function(df) {
  # variables to partition the presidential terms 
  bush_end <- as.Date("2009-01-20")
  trump_start <- as.Date("2017-01-20")
  
  # add the corresponding term to that month
  df$president <- 
    ifelse(df$ymd >= bush_end & df$ymd < trump_start, "Obama", 
           ifelse(df$ymd < bush_end, "Bush", "Trump")) 
  
  return(df)
}

Categorizing presidential terms

Creates three new columns, one for each president, that holds dummy variables (i.e. 0 or 1) to allow the linear model to run effectively.

zero_or_one <- function(df) {
  # adds three columns, one for each president and subsequently add a 1 when it is their presidency  
  df$bush <- 
    ifelse(df$president == "Bush", 1, 0) 
  df$obama <- 
    ifelse(df$president == "Obama", 1, 0) 
  df$trump <- 
    ifelse(df$president == "Trump", 1, 0) 
  
  return(df)
}

Plot 4 recreation

Recreation of the point plot with linear smoothing for easier understanding.The plot separates the points and lines by presidency.

plot_4 <- function(df, start, end, title) {
  # recreated figure 4 from the paper 
  df |>
    ggplot(aes(x = ymd, y = trends, color = president)) +
    geom_point(shape = 1) +
    geom_smooth(method = "lm", se = FALSE) + 
    ylim(start, end) +
    xlab("Year") +
    ylab("Google Trends") +
    ggtitle(title)
}

Google trend terms

Report: report immigrant+report immigration+report illegals+report illegal alien+report to ice
Crime: immigrant crime+immigrant criminal+immigrant murder+immigrant kill
Welfare: immigrant welfare+immigrant cost+immigrant benefits


Loading original data

The original data used in the paper is loaded and manipulated to include the correct date, presidential terms, and dummy variables.

# loading original data
report_og <- read.csv("google-trends data/original research data/google_trends_report.csv")
crime_og <- read.csv("google-trends data/original research data/google_trends_crime.csv")
welfare_og <- read.csv("google-trends data/original research data/google_trends_welfare.csv")

# inserting proper column names
colnames(report_og) <- c("y", "m" ,"trends")
colnames(crime_og) <- c("y", "m","trends")
colnames(welfare_og) <- c("y", "m","trends")

# converting to reproducible format
report_og$ym <- with(report_og, sprintf("%d-%02d", y, m))
crime_og$ym <- with(crime_og, sprintf("%d-%02d", y, m))
welfare_og$ym <- with(welfare_og, sprintf("%d-%02d", y, m))

# keep only necessary columns
report_og <- report_og |> select(ym, trends)
crime_og <- crime_og |> select(ym, trends)
welfare_og <- welfare_og |> select(ym, trends)

# uses the add_date function
report_og <- add_date(report_og) 
crime_og <- add_date(crime_og) 
welfare_og <- add_date(welfare_og) 

# uses the add_pres function
report_og <- add_pres(report_og)
crime_og <- add_pres(crime_og) 
welfare_og <- add_pres(welfare_og)

# uses the zero_or_one function, which is the final desired form
report_og <- zero_or_one(report_og)
crime_og <- zero_or_one(crime_og) 
welfare_og <- zero_or_one(welfare_og)

Original plot

plot_4(report_og, 0, 100, "Report")

plot_4(crime_og, 0, 100, "Crime")

plot_4(welfare_og, 0, 100, "Welfare")

Figure 1. Original Immigration searches by administration
For all three sets of anti-immigrant terms, searches increased on Google after Trump’s inauguration speech. There was also a spike in all three sets of terms in the months immediately after the inauguration.


Original data ordinary least square regression

fit.report_og <- lm(trends~ymd+bush+trump+obama, data=report_og |> filter(trends > 0))
fit.crime_og <- lm(trends~ymd+bush+trump+obama, data=crime_og |> filter(trends > 0))
fit.welfare_og <- lm(trends~ymd+bush+trump+obama, data=welfare_og |> filter(trends > 0))
summary(fit.report_og)
## 
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(report_og, 
##     trends > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.050  -5.662  -1.293   4.796  38.482 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 72.607670  16.108626   4.507 1.16e-05 ***
## ymd         -0.001789   0.001024  -1.748   0.0821 .  
## bush         0.785389   2.898559   0.271   0.7867    
## trump       19.716343   2.787848   7.072 2.99e-11 ***
## obama              NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.601 on 186 degrees of freedom
## Multiple R-squared:  0.2863, Adjusted R-squared:  0.2748 
## F-statistic: 24.87 on 3 and 186 DF,  p-value: 1.411e-13
summary(fit.crime_og)
## 
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(crime_og, 
##     trends > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.526  -7.106  -3.603   3.434  62.727 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.024897  23.183622  -0.044   0.9648    
## ymd          0.001045   0.001473   0.710   0.4789    
## bush         8.501543   4.163079   2.042   0.0426 *  
## trump       20.300850   4.006843   5.067 9.81e-07 ***
## obama              NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.78 on 184 degrees of freedom
## Multiple R-squared:  0.266,  Adjusted R-squared:  0.254 
## F-statistic: 22.23 on 3 and 184 DF,  p-value: 2.502e-12
summary(fit.welfare_og)
## 
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(welfare_og, 
##     trends > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.966  -7.986  -1.646   4.774  56.953 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.587926  20.734642   1.379   0.1696    
## ymd         -0.000178   0.001318  -0.135   0.8927    
## bush         7.819065   3.730956   2.096   0.0375 *  
## trump       18.560493   3.588452   5.172 5.95e-07 ***
## obama              NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.36 on 186 degrees of freedom
## Multiple R-squared:  0.237,  Adjusted R-squared:  0.2247 
## F-statistic: 19.26 on 3 and 186 DF,  p-value: 6.428e-11

Table 1. Original immigration searches by presidential administration
After accounting for the linear time trend, there was still a clear increase in immigrant reporting searches after Trump’s inauguration. The regression is OLS on monthly search interest.


Gathered data plot

plot_4(report, 0, 100, "Report")

plot_4(crime, 0, 100, "Crime")

plot_4(welfare, 0, 100, "Title")

Figure 2. Gathered data immigration searches by administration
The overall trends of the later administrations are similar to the original data. However, the bush administration is visibly different. After looking into the reason it turn out that the original data has several point that display searches where the gathered data has a zero value. We came to the conclusion that the two most probable reasons are missing values in the most recent searches of those terms on Google or erroneous gathered date. Even with these mismatching values the overall trend is similar to the original data.


Gathered data ordinary least square regression

fit.report <- lm(trends~ymd+bush+trump+obama, data=report |> filter(trends > 0))
fit.crime <- lm(trends~ymd+bush+trump+obama, data=crime |> filter(trends > 0))
fit.welfare <- lm(trends~ymd+bush+trump+obama, data=welfare |> filter(trends > 0))
summary(fit.report)
## 
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(report, 
##     trends > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.306  -5.418  -1.076   4.252  38.740 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 90.7739107 15.3446130   5.916 1.59e-08 ***
## ymd         -0.0027952  0.0009751  -2.867  0.00463 ** 
## bush         1.7209308  2.7105835   0.635  0.52629    
## trump       18.6115465  2.6314099   7.073 3.11e-11 ***
## obama               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.991 on 183 degrees of freedom
## Multiple R-squared:  0.2855, Adjusted R-squared:  0.2738 
## F-statistic: 24.37 on 3 and 183 DF,  p-value: 2.569e-13
summary(fit.crime)
## 
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(crime, 
##     trends > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.848  -6.442  -3.853   3.012  63.302 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.887e+00  2.684e+01   0.108   0.9145    
## ymd         8.249e-04  1.700e-03   0.485   0.6282    
## bush        9.877e+00  4.661e+00   2.119   0.0357 *  
## trump       1.961e+01  4.308e+00   4.552  1.1e-05 ***
## obama              NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.97 on 148 degrees of freedom
## Multiple R-squared:   0.29,  Adjusted R-squared:  0.2757 
## F-statistic: 20.15 on 3 and 148 DF,  p-value: 5.23e-11
summary(fit.welfare)
## 
## Call:
## lm(formula = trends ~ ymd + bush + trump + obama, data = filter(welfare, 
##     trends > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.880  -7.361  -1.873   4.095  54.921 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.556981  22.482828   2.115  0.03595 *  
## ymd         -0.001323   0.001429  -0.926  0.35581    
## bush        12.125788   3.925027   3.089  0.00236 ** 
## trump       20.303916   3.781089   5.370 2.71e-07 ***
## obama              NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.59 on 161 degrees of freedom
## Multiple R-squared:  0.3006, Adjusted R-squared:  0.2875 
## F-statistic: 23.06 on 3 and 161 DF,  p-value: 1.799e-12

Table 2. Gathered data immigration searches by presidential administration
The values from the dependent variables in the self gathered data has the trends seen in the original data. However, the difference in some of the data points mentioned in Figure 2 create a variation in the trends from the gathered data.


Topic Model

A structural topic model was used to measure changes in media coverage of immigrant-related issues before and after the 2016 election. The measure of the coverage was done on news networks CNN, FOX, and MSNBC. The topic model gather data relating to 30 different topics, news networks used, duration of the coverage, transcript, etc.
–For more clarification on this data please refer to page 9 of the research paper

Topic model plot function

The function replicates the topic model plots used in the original paper

topic_plot <- function(df, lable) {
df |> ungroup() |> 
  group_by(channel, time) |> 
  ggplot(aes(x = yearmonth, y = summation, color = channel, group = interaction(channel, time))) + 
  geom_vline(xintercept = as.numeric(date1), linetype = "dashed", color = "black") +
  geom_vline(xintercept = as.numeric(date2), linetype = "dashed", color = "black") +
  geom_point() + geom_smooth( se=FALSE) + scale_color_manual(values = my_colors) +
  theme(axis.title.x = element_blank(), legend.position = "bottom", legend.title = element_blank()) +
  ylab(lable) 
}

Preparing topic model data

Loading, mutating, correctly formatting the date and summarizing the topic model data for plotting

# Loads the topic model
load("TopicModel.RData")
document_topics <- make.dt(immigrFit, meta = out$meta)
topic_terms <- t(exp(immigrFit$beta$logbeta[[1]]))
rownames(topic_terms) <- out$vocab
colnames(topic_terms) <- sprintf("Topic%d", 1:ncol(topic_terms))
# converts the date to appropriate format
document_topics <- document_topics |> 
   mutate(date = as.Date(date, format = "%Y-%m-%d"),yearmonth = ym(format(date, "%Y-%m")))

# grouping and summarizing based on date
time_line <- document_topics |> 
  select(time, yearmonth) |> 
  group_by(time) |> 
  summarize(yearmonth = min(yearmonth))

# variables holding the date for partitioning
date1 <- as.Date("2015-06-01")
date2 <- as.Date("2017-01-01")

# variable to ensure proper coloring of plot
my_colors <- c("magenta", "red3", "blue")

Preparing data for topic model plot

Separating the topic model data based on topic(i.e Topics 1 and 3 were used for crime topics while Topic 13 was used for welfare topics)

# grouping and filtering by news channel
plot_2_table <- document_topics |> 
  select(channel, duration, yearmonth, time) |> group_by(yearmonth, channel, time) |>
  summarize(summation=sum(duration)) |> arrange(summation) 
 
# grouping and filtering by news channel and crime rhetoric
plot_3a_table <- document_topics |> 
  select(Topic1, Topic3, channel, duration, yearmonth, time) |> group_by(yearmonth, channel, time) |>
  summarize(summation=sum(duration*Topic1 + duration*Topic3)) |> arrange(summation) 

# grouping and filtering by news channel and welfare rhetoric
plot_3b_table <- document_topics |> 
  select(Topic13, channel, duration, yearmonth, time) |> group_by(yearmonth, channel, time) |>
  summarize(summation=sum(duration*Topic13)) |> arrange(summation) 
topic_plot(plot_2_table, "Num Monthly\nImmigration Segs")

topic_plot(plot_3a_table, "Immiger + Crime\nNews Coverage")

topic_plot(plot_3b_table , "Immiger Welfare\nNew Coverage")

The dotted black lines separate key moments in Trumps administration, the first when the Trump campaign began and the second is the start of his presidency

Figure 3. Immigration news segments
Their were similar immigration coverage over the course of the Trump campaign and presidency in both MSNBC and FOX, However, Fox always held larger amount of immigration news coverage during the Trump campaign and administration. In fact, shortly after the Trump took office in January of 2017 the immigration news more than double on the conservative news channel. This supports the hypothesis found in the paper, particularly No. 3 which indicates “People will have more interest in reporting immigrants when exposed to fear cues about immigrants.”


Extension:

1) How searches regarding immigrant success stories and immigrant aid changed over the same time
2) How fear vs positive searches vary thorugh the different states

Positive Google trend terms

Positive(i.e. positive terms): immigrant success + immigrant contribution + benefits of immigration + help immigrant


Proportion of fear vs positive search terms

vs <- read.csv("google-trends data/FvsS/fearvssuccess.csv")
map <- read.csv("google-trends data/FvsS/fearvssuccess_geo.csv")

# changing column names to be easier to work with
colnames(vs) <- c("ym" ,"fear", "success")
vs <- vs %>%
  mutate(ymd = ymd(paste0(ym, "-01")))

ggplot(data = vs, aes(x = ymd)) +
  geom_line(aes(y = fear, color = "fear")) +
  geom_line(aes(y = success, color = "positive")) +
  labs(y = "Values", color = "Legend") +
  theme_minimal() +
  xlab("Year") +
  ylab("Number of Searches") +
  ggtitle("Fear and Positive search terms over time")

Figure 4. Fear and positive search terms over time
The plot shows how fear based searches compare to positive based searches over time. It displays non drastic changes regardless of the time, with the only seeming outlier happening in June 2018. This is when the Trump administration’s “zero tolerance” policy, which resulted in the separation of children from their parents at the U.S.-Mexico border, was a major news story. This policy led to widespread public outrage and intense media coverage. In response to the public outcry, President Trump signed an executive order on June 20, 2018, to end family separations.


Plotting proportion of fear vs positive search terms by presidency

# obtaining proportion of searches without the visible oultier from Figure 4
prop_no_outlier <- vs |> add_date() |> add_pres() |> zero_or_one() |> 
  mutate(proportion=(fear/(success+fear))) |> 
           filter(fear > 0 & success > 0) |>
           group_by(president) |> summarise(mean=mean(proportion))
# obtaining proportion of searches
prop_outlier <- vs |> add_date() |> add_pres() |> zero_or_one() |> 
  mutate(proportion=(fear/(success+fear))) |> 
           filter(fear > 0 & success > 0 & ymd != "2018-06-20") |> 
           group_by(president) |> 
           summarise(mean=mean(proportion))
# plot with outlier
ggplot(prop_outlier, aes(x = president, y = mean, fill = president)) +
  geom_bar(stat = "identity", width = 0.5) +
  ggtitle("Proportion of searches during administrations with oultier") +
    xlab("President") +
    ylab("Proportion Mean Value")

# plot without outlier
ggplot(prop_no_outlier, aes(x = president, y = mean, fill = president)) +
  geom_bar(stat = "identity", width = 0.5) +
    ggtitle("Proportion of searches during administrations without outlier") +
    xlab("President") +
    ylab("Proportion Mean Value")

Figure 5. Proportion of fear vs positive searches during each presidency
The proportion of fear vs positive searches on Google seem to be 2 to 1 regardless of the presidency. In fact, even when removing the only visible outlier in the trend the proportions are about the same. Obama,the only liberal president viewed here, actually had a slightly higher proportion of searches, not enough to be significant, but enough to show that immigrant search proportion tends to stay balanced regardless of the rhetoric.


Displaying fear and positive searches by state between 2004 and 2020

# Pre-processing map data which includes metro area
map <- map %>%
  mutate(
    state = str_extract(Metro, "[A-Z]{2}$"),
    fear = as.numeric(str_replace(fear, "%", "")),
    success = as.numeric(str_replace(success, "%", ""))
  )


# state abbreviation to correlate with the ones given and mutate into full state names for accurate plotting
state_abbreviations <- data.frame(
  abbreviation = c(
    "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
    "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
    "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
    "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
    "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"
  ),
  state = tolower(c(
    "Alabama", "Alaska", "Arizona", "Arkansas", "California",
    "Colorado", "Connecticut", "Delaware", "Florida", "Georgia",
    "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas",
    "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts",
    "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana",
    "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
    "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma",
    "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", 
    "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", 
    "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"
  ))
)

# mutating data
map2 <- map %>%
  left_join(state_abbreviations, by = c("state" = "abbreviation")) %>%
  rename(region = state.y)

# obtaining and converting necessary map data
states_map <- map_data("state")
map_data <- left_join(states_map, map2, by = "region") |>  mutate(success = ifelse(is.na(success), 0, success))

# plotting fear searches
ggplot(map_data, aes(x = long, y = lat, group = group, fill = fear)) +
  geom_polygon(color = "black") +
  scale_fill_gradient(name = "Fear (%)", low = "lightblue", high = "darkblue", na.value = "grey90") +
  coord_fixed(1.3) +
  ggtitle("Fear Percentage by State") +
  theme(axis.title = element_blank(), axis.text =  element_blank(),, axis.ticks =  element_blank(), panel.background = element_blank())

# plotting positive searches
ggplot(map_data, aes(x = long, y = lat, group = group, fill = success)) +
  geom_polygon(color = "black") +
  scale_fill_gradient(name = "Positive (%)", low = "lightblue", high = "darkblue", na.value = "grey90") +
  coord_fixed(1.3) +
  ggtitle("Positive Percentage by State") +
  theme(axis.title = element_blank(), axis.text =  element_blank(),, axis.ticks =  element_blank(), panel.background = element_blank())

Figure 6. Search percentages by state
The plots display the percentage of fear and positive searches made by each state. The positive search percentages are simply an inversion on the fear searches. However, the second plot was made to more clearly see the correlation of searches by state. Upon first inspection nothing can be definitively seen in the plot other than more conservative states have higher fear based searches but not all. Their are a couple of liberal states that are also seen with higher fear searches. A deeper dive would have to be done to glean more information from the plot. For example, a look at the political divide in each state, population of immigrants by state, or perhaps a better view at the distribution over time instead of average searcvhes of a set of time.