Last updated: 2021-05-12

Checks: 7 0

Knit directory: thesis/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210321) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 7fd4ff2. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    data/DB/
    Ignored:    data/raster/
    Ignored:    data/raw/
    Ignored:    data/vector/
    Ignored:    docker_command.txt
    Ignored:    output/acc/
    Ignored:    output/bayes/
    Ignored:    output/ffs/
    Ignored:    output/models/
    Ignored:    output/plots/
    Ignored:    output/test-results/
    Ignored:    renv/library/
    Ignored:    renv/staging/
    Ignored:    report/presentation/

Untracked files:
    Untracked:  analysis/assets/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/eda-response.Rmd) and HTML (docs/eda-response.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 7fd4ff2 Darius Görgen 2021-05-12 workflowr::wflow_publish(files = list.files(“.”, “Rmd$”))
Rmd de8ce5a Darius Görgen 2021-04-05 add content

1 Units of analysis

Two different units of analysis are tested against their capability to capture environmental processes contributing to the occurrences of violent conflict. The first unit of analysis represents NUTS-3 Level of administrative units over the African continent.

adm = st_read("../data/vector/states_mask.gpkg", quiet = T)
adm = adm[,c("adm1_code", "adm1_code", "name", "name_de", "name_en", "type_en",
             "latitude", "longitude", "sov_a3", "adm0_a3", "geonunit")]
adm
Simple feature collection with 847 features and 11 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -17.52892 ymin: -34.82195 xmax: 51.41681 ymax: 37.3452
CRS:            4326
First 10 features:
   adm1_code adm1_code.1                                        name
1   ETH-3129    ETH-3129 Southern Nations, Nationalities and Peoples
2    SDS-892     SDS-892                           Eastern Equatoria
3   ETH-3134    ETH-3134                                      Somali
4   SOM-2413    SOM-2413                                      Nugaal
5   SOM-2412    SOM-2412                                       Mudug
6   SOM-2408    SOM-2408                                   Galguduud
7   SOM-2409    SOM-2409                                     Hiiraan
8   SOM-2312    SOM-2312                                      Bakool
9   SOM-2314    SOM-2314                                        Gedo
10   KEN-890     KEN-890                                 Rift Valley
                         name_de              name_en              type_en
1  Region der südlichen Nationen     Southern Nations Administrative State
2              Eastern Equatoria    Eastern Equatoria                State
3                         Somali        Somali Region Administrative State
4                         Nugaal                Nugal               Region
5                          Mudug                Mudug               Region
6                      Galguduud            Galguduud               Region
7                        Hiiraan                Hiran               Region
8                         Bakool               Bakool               Region
9                           Gedo                 Gedo               Region
10                   Rift Valley Rift Valley Province             Province
   latitude longitude sov_a3 adm0_a3    geonunit                           geom
1   6.21467   36.4494    ETH     ETH    Ethiopia MULTIPOLYGON (((35.32077 5....
2   4.79636   33.5351    SDS     SDS South Sudan MULTIPOLYGON (((34.96796 6....
3   7.09365   43.5345    ETH     ETH    Ethiopia MULTIPOLYGON (((47.97917 7....
4   8.46102   48.8490    SOM     SOM    Puntland MULTIPOLYGON (((47.53239 7....
5   5.51036   48.1009    SOM     SOM     Somalia MULTIPOLYGON (((46.46696 6....
6   5.09616   46.6795    SOM     SOM     Somalia MULTIPOLYGON (((45.51848 5....
7   4.36813   45.4984    SOM     SOM     Somalia MULTIPOLYGON (((44.59681 4....
8   4.40611   43.8350    SOM     SOM     Somalia MULTIPOLYGON (((42.87019 4....
9   3.05384   42.0524    SOM     SOM     Somalia MULTIPOLYGON (((41.88502 3....
10  1.15655   36.0169    KEN     KEN       Kenya MULTIPOLYGON (((33.97708 4....

As we can see from above output, we end up with 847 features in total covering the African continent. Let’s plot a map which’s color is based on the country a certain polygon belongs to. The results should look quite familiar:

adm %<>%
  st_make_valid() %>%
  st_cast("MULTIPOLYGON") %>%
  mutate(area = st_area(geom))

adm %>%
  group_by(sov_a3) %>%
  summarise(geom = st_union(geom), area = sum(area)) %>%
  st_centroid() -> adm_point

tmap_options(max.categories = 50)

tm_shape(adm) +
  tm_polygons("sov_a3", border.col = "white", lwd = .5) +
  tm_shape(adm_point) +
  tm_text("sov_a3", size = "area") +
  tm_legend(show = FALSE)

In comparison to administrative units we test the capability of sub-basin watersheds to capture the pattern of violent conflicts. The selection of the appropriate level was mainly driven to obtain a comparable number of units in order to not include a bias by diverging sizes. Thus the fifth level of the Hydrosheds datasets was chosen.

bas = st_read("../data/vector/basins_mask.gpkg", quiet = TRUE)
bas
Simple feature collection with 1013 features and 13 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -17.52892 ymin: -34.82195 xmax: 51.41681 ymax: 37.3452
CRS:            4326
First 10 features:
     HYBAS_ID NEXT_DOWN  NEXT_SINK   MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA
1  1050000010         0 1050000010 1050000010         0         0  45567.0
2  1050001370         0 1050001370 1050001370         0         0  11678.4
3  1050001380         0 1050001380 1050001380         0         0   5524.6
4  1050001510         0 1050001510 1050001510         0         0  42400.5
5  1050001520         0 1050001520 1050001520         0         0  16198.5
6  1050002290         0 1050002290 1050002290         0         0   4584.0
7  1050002300         0 1050002300 1050002300         0         0  11221.8
8  1050002760         0 1050002760 1050002760         0         0  63986.6
9  1050002770         0 1050002770 1050002770         0         0  35181.7
10 1050003990         0 1050003990 1050003990         0         0  62992.8
   UP_AREA PFAF_ID ENDO COAST ORDER SORT                           geom
1  45567.0   11101    0     1     0    1 MULTIPOLYGON (((32.275 30.1...
2  11678.5   11102    0     0     1    2 MULTIPOLYGON (((35.14167 22...
3   5524.6   11103    0     1     0    3 MULTIPOLYGON (((35.92083 22...
4  42400.5   11104    0     0     1    4 MULTIPOLYGON (((35.24167 21...
5  16198.5   11105    0     1     0    5 MULTIPOLYGON (((37.16129 21...
6   4584.0   11106    0     0     1    6 MULTIPOLYGON (((36.6 18.787...
7  11221.8   11107    0     1     0    7 MULTIPOLYGON (((37.00417 18...
8  63986.6   11108    0     0     1    8 MULTIPOLYGON (((38.4625 16....
9  35181.7   11109    0     1     0    9 MULTIPOLYGON (((38.64583 15...
10 62992.8   11211    0     1     0   10 MULTIPOLYGON (((42.94285 10...

The total number of units is 1013 which is roughly comparable to the number of administrative units. On a map we get a less familiar pattern of the African continent, however, we can distinguish between the main watersheds.

bas_group = st_read("../data/raw/hydrosheds/hybas_af_lev03_v1c.shp", quiet = TRUE)
bas_group %<>% 
  mutate(HYBAS_ID = as.factor(HYBAS_ID))

bas %<>%
  mutate(area = st_area(geom))

tm_shape(bas_group) +
  tm_polygons("HYBAS_ID", lwd = 0) +
  tm_shape(bas) +
  tm_borders("white", lwd = .3) +
  tm_legend(show = FALSE)

The next plot shows a comparison of the distribution of area size between the two different units of analysis. The average area size of the sub-basin watersheds is slightly higher compared to the administrative units. However, for both types of units the majority of individuals lie in a comparable range. For administrative units there are both more small as well as very large units. Overall, the size of the units remains comparable between the two.

data = data.frame(group = c(rep("ADM", nrow(adm)), rep("BAS", nrow(bas))), 
                  area = c(as.numeric(adm$area / 1000000), as.numeric(bas$area / 1000000)))

ggplot(data) +
  geom_boxplot(aes(y=area, group=group, fill=group)) +
  labs(fill = "unit of analysis", y="Area [km²]") +
  theme_classic() +
  scale_y_continuous(labels = scales::comma) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

2 Conflict data

We use data from the Upsala conflict data program to retrieve the response variable for this project. The database is an event database, meaning that every reported violent conflict with at least one death is included as an single event associated with information on involved actors, time and location as well as estimated deaths by party involved. Since the data is mainly collected automatically sometimes inaccuracies concerning the exact location or the number of deaths are introduced. However, the data creators include information on the accuracy and thus the reliablity for each logged event. We downloaded the database for the African continent and filtered it to events occurring between 2000 and 2019 and additionally only included events for which the quality of the geographic localization was at least accurate on the sub-national level and the accuracy of the time was at least on a monthly scale. Additionally, the data is provided with three different classes of conflict which were extracted as well. These are state-based violence, non-state violence and one-sided violence.

load("../data/raw/ged/ged201.RData")
ged201$date_start = as.Date(ged201$date_start, "%Y/%m/%d")
ged201 = st_as_sf(ged201, coords =  c("longitude", "latitude"), crs = "EPSG:4326")
ged201 = ged201[which(st_intersects(st_union(adm), ged201, sparse = FALSE)), ]

ged201 %>%
  as_tibble() %>%
  mutate(date_start =  as.Date(date_start, "%Y/%m/%d")) %>%
  filter(date_start > as.Date("1999-12-31"), 
         where_prec %in% c(1,2,3),
         date_prec != 5) %>%
  select(type_of_violence, date_start, starts_with("deaths_"), country, geometry) -> response_raw
response_raw
# A tibble: 25,652 x 8
   type_of_violence date_start deaths_a deaths_b deaths_civilians deaths_unknown
              <dbl> <date>        <dbl>    <dbl>            <dbl>          <dbl>
 1                1 2000-03-14        1        0                0              0
 2                1 2000-05-05        0       14                0              0
 3                1 2000-12-24        1        0                0              1
 4                1 2001-01-21        0        4                0              0
 5                1 2001-02-08       13        0                0              0
 6                1 2001-02-19        0       18                0              0
 7                1 2001-02-23        0       19                0              0
 8                1 2001-02-23        0        8                0              0
 9                1 2001-04-06        1        0                0              0
10                1 2001-10-21        0        8                0              0
# … with 25,642 more rows, and 2 more variables: country <chr>, geometry <POINT
#   [°]>

In total we obtained 86,377 distinct events coupled with the filtering explained above. The diagram below indicates the distribution of these events over the time period on a monthly scale.

response_raw %>%
  mutate(month = zoo::as.yearmon(date_start),
         type_of_violence = factor(type_of_violence, labels = c("state-based", "non-state", "one-sided"))) %>%
  group_by(month, type_of_violence) %>%
  summarise(count = n()) %>%
  complete(month, type_of_violence, fill = list(count = 0)) -> data 

ggplot(data) +
  geom_bar(aes(y=count, x=month, fill = type_of_violence), stat = "identity") +
  theme_classic() +
  labs(y="Number of events", x="Time", fill="Type of violence") +
  theme(legend.position="bottom")

We can assume an increasing trend in the number of events over the time period. Also, the percentage of non-state violence increases over time, mainly to the cost of one-sided violence. We also can observe some kind of seasonal cycle in the number of conflicts. In order to verify these assumptions we can conduct a decomposition of the time-series. We firstly do so for the total number of events and then we conduct the same analysis by type.

response_raw %>%
  mutate(month = zoo::as.yearmon(date_start),
         type_of_violence = factor(type_of_violence, labels = c("state-based", "non-state", "one-sided"))) %>%
  group_by(month) %>%
  summarise(count = n())  -> ts_data
ts_data = ts(ts_data$count, start = c(2000,1), frequency = 12)
dec = decompose(ts_data)

data = data.frame(obsv = as.numeric(dec$x),
                  seasonal = as.numeric(dec$seasonal),
                  trend =  as.numeric(dec$trend),
                  random = as.numeric(dec$random),
                  date = seq(as.Date("2000-01-01"), as.Date("2019-12-31"), by = "month"))

data %>%
  as_tibble() %>%
  gather(component, value, -date) %>%
  mutate(component = factor(component, levels = c("obsv", "trend", "seasonal", "random"))) %>%
  ggplot() +
  geom_line(aes(x=date, y=value), color = "royalblue3") +
  facet_wrap(~component, nrow = 4, scales = "free_y") +
  theme_classic() +
  labs(y = "Value", x = "Time")

Concerning the decomposition of all types of violence combined we observe an unsteady trend pattern from 2000 to 2010, but a substantial increase afterwards. In 2014 the trend stabilizes at a high level. The seasonality of the data is quite strong, with a peak in January and a period with low occurrences between August and December.

response_raw %>%
  mutate(month = zoo::as.yearmon(date_start),
         type_of_violence = factor(type_of_violence, labels = c("state-based", "non-state", "one-sided"))) %>%
  group_by(month, type_of_violence) %>%
  summarise(count = n()) %>%
  complete(month, type_of_violence, fill = list(count = 0))-> ts_data2

dec_types = lapply(c("state-based", "non-state", "one-sided"), function(x){
  ts_data2 %>%
    filter(type_of_violence == x) %>%
    pull(count) %>%
    ts(start = c(2000,1), frequency = 12) %>%
    decompose() 
})

names(dec_types) = c("state-based", "non-state", "one-sided")

plt_data = lapply(c("state-based", "non-state", "one-sided"), function(x){
  tmp  = dec_types[[x]]
  tmp = data.frame(type = x,
                   obsv = as.numeric(tmp$x),
                   seasonal = as.numeric(tmp$seasonal),
                   trend =  as.numeric(tmp$trend),
                   random = as.numeric(tmp$random),
                   date = seq(as.Date("2000-01-01"), as.Date("2019-12-31"), by = "month"))
  tmp
})

plt_data = do.call(rbind, plt_data)

plt_data %>%
  as_tibble() %>%
  gather(component, value, -type, -date) %>%
  mutate(component = factor(component, levels = c("obsv", "trend", "seasonal", "random")),
         type = factor(type, levels = c("state-based", "non-state", "one-sided"))) %>%
  ggplot() +
  geom_line(aes(x=date, y=value, color=type)) +
  facet_wrap(~component, nrow = 4, scales = "free_y") +
  theme_classic() +
  labs(y = "Value", x = "Time", color = "Type of violence") +
  theme(legend.position="bottom")

When we analyse the decomposition based on types of conflicts, we see that state-based violence is dominating the overall pattern. By far the most conflicts belong to that class and the seasonal and trend pattern of all classes combined mainly mirrors the dynamic found for state-based violence. The patterns for non-state and one-sided violence are quite distinct from the former class, however, the two are very much alike. The seasonal pattern shows that the lowest number of conflicts are found from September through December while the peak number of conflicts is found in January. Concerning the trend, one-sided violence overall is rather stable over the time period. After 2005 there is a steady increase in non-state violence which increases substantially in slope with the year 2017.

Aside from the count of events, we can analyze the number of casualties which were suffered when the events occurred. As pointed out before, the casualties are ordered by participating groups and additionally deaths of civilians and unknown deaths are included in the database. Here we will focus on the total number of deaths, irrespective of the group affiliation.

response_raw %>%
  mutate(month = zoo::as.yearmon(date_start),
         type_of_violence = factor(type_of_violence, labels = c("state-based", "non-state", "one-sided"))) %>%
  group_by(month, type_of_violence) %>%
  summarise(deaths = sum(deaths_civilians, deaths_a, deaths_b, deaths_unknown)) %>%
  complete(month, type_of_violence, fill = list(deaths = 0))-> deaths_dat

deaths_dat %>%
  group_by(month) %>%
  #summarise(deaths = sum(deaths)) %>%
  ggplot() +
  geom_histogram(aes(x=deaths, group = type_of_violence, color = type_of_violence, fill = type_of_violence), 
                 alpha=0.5, position="stack", binwidth = 50)+
  theme_classic() +
  labs(y="Count", x="Number of casualties", color = "Type of violence", fill = "Type of violence") +
  theme(legend.position = "bottom")

The histogram of the number of casualties reveals that one-sided and state-based violence generally are associated with numbers of casualties below 1000. State-based violence on the other hand clearly is the most frequent violence class with casualties higher 1000.

deaths_dat %>%
  ggplot() +
  geom_bar(aes(y=deaths, x=month, fill = type_of_violence), stat = "identity") +
  theme_classic() +
  labs(y="Number of casulties", x="Time", fill="Type of violence") +
  theme(legend.position="bottom")

Concerning the dynamic of casualties over time, we can observe relatively high numbers of casualties at the beginning of the time series with a low episode around from 2005 to 2010. Starting with 2011, the casualties seem to increase and the number of casualties reaches its highest level of the complete time series between 2014 and 2015. There is a small decrease towards the year 2018.

deaths_dat %>%
  group_by(month) %>%
  summarise(deaths = sum(deaths)) %>%
  pull(deaths) %>%
  ts(start = c(2000,1), frequency = 12) %>%
  decompose() -> dec

data = data.frame(obsv = as.numeric(dec$x),
                  seasonal = as.numeric(dec$seasonal),
                  trend =  as.numeric(dec$trend),
                  random = as.numeric(dec$random),
                  date = seq(as.Date("2000-01-01"), as.Date("2019-12-31"), by = "month"))

data %>%
  as_tibble() %>%
  gather(component, value, -date) %>%
  mutate(component = factor(component, levels = c("obsv", "trend", "seasonal", "random"))) %>%
  ggplot() +
  geom_line(aes(x=date, y=value), color = "royalblue3") +
  facet_wrap(~component, nrow = 4, scales = "free_y") +
  theme_classic() +
  labs(y = "Value", x = "Time")

The general trend irrespective of the violent type shows a flat dynamic from 2005 to 2010. After that year the slope of the frend increases substantially resulting in the highest peak between 2014 and 2015 which remains on a relativly high level even towards 2018. The seasonality of casualties is quite similar to the total number of events, namely with a bimodal peak in March and May. The month with the lowest numbers of casualties is November.

deaths_dec = lapply(c("state-based", "non-state", "one-sided"), function(x){
  deaths_dat %>%
    filter(type_of_violence == x) %>%
    pull(deaths) %>%
    ts(start = c(2000,1), frequency = 12) %>%
    decompose() 
})

names(deaths_dec) = c("state-based", "non-state", "one-sided")

plt_data = lapply(c("state-based", "non-state", "one-sided"), function(x){
  tmp  = deaths_dec[[x]]
  tmp = data.frame(type = x,
                   obsv = as.numeric(tmp$x),
                   seasonal = as.numeric(tmp$seasonal),
                   trend =  as.numeric(tmp$trend),
                   random = as.numeric(tmp$random),
                   date = seq(as.Date("2000-01-01"), as.Date("2019-12-31"), by = "month"))
  tmp
})

plt_data = do.call(rbind, plt_data)

plt_data %>%
  as_tibble() %>%
  gather(component, value, -type, -date) %>%
  mutate(component = factor(component, levels = c("obsv", "trend", "seasonal", "random")),
         type = factor(type, levels = c("state-based", "non-state", "one-sided"))) %>%
  ggplot() +
  geom_line(aes(x=date, y=value, color=type)) +
  facet_wrap(~component, nrow = 4, scales = "free_y") +
  theme_classic() +
  labs(y = "Value", x = "Time", color = "Type of violence") +
  theme(legend.position="bottom")

Unsurprisingly, the general time-series is dominated by the signal of state-based violence as well. However, the seasonality of casualties for the different types of violence are more alike than compared to the total number of events. The local peak during the years 2008 to 2009 is not observed in non-state and one-sided violence. The same is true for non-state violence where there is no substantial increase in casualties in 2014. The increase is observed with one-sided violence, however, the trend flips shortly after 2015 showing a greater stability than compared with state-based violence.

3 Combining response with units of analysis

Until now the analysis of violent conflict took a perspective on the African continent as a whole. Since the overall aim of this project is to establish a machine learning model to predict violent conflict for smaller spatial units, a spatio-temporal aggregation of the conflict data is needed. To achieve this, in the first step we declare a spatio-temporal objects based on the spatial information of the units of analysis. We will need the package spacetime for this.

library(spacetime)
date_vec = seq(as.Date("2000-01-01"), as.Date("2019-12-31"), by = "month")

adm_st = STF(as_Spatial(adm), date_vec)
bas_st = STF(as_Spatial(bas), date_vec)
str(bas_st@time)
An 'xts' object on 2000-01-01/2019-12-01 containing:
  Data: int [1:240, 1] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "timeIndex"
  Indexed by objects of class: [Date] TZ: UTC
  xts Attributes:  
 NULL

As evident from the above output these objects now contain a time slot where the information on the time dimension is stored. We specified a monthly resolution from January 2000 to December 2019. The next step is to aggregate the conflict data based on their location in space and time and add the data to the objects declared above. Note, that we are aggregating the total sum of deaths irrespective of the classification but we want to keep information on the type of violence. Therefor the raw data is in need for some restructuring.

response_raw %>% 
  mutate(deaths = deaths_civilians + deaths_a + deaths_b + deaths_unknown) %>%
  select(type_of_violence, date_start, deaths, geometry) -> agg_data 
classes = c("sb", "ns", "os", "all")

for(i in 1:4){ # loop through types of conflict + all types
  print(i)
  if(i != 4){ # filter if appropriate
    points = as_Spatial(st_as_sf(agg_data[agg_data$type_of_violence == i, ]))
  } else { # else take all conflicts
    points = as_Spatial(st_as_sf(agg_data)) 
  }
  # declaration of a space-time data frame
  points_st = STIDF(points, points$date_start, data = data.frame(casualties = points$deaths))
  # aggregate the sum of deahts
  res_adm = aggregate(points_st, adm_st, sum, na.rm = T)
  saveRDS(res_adm, paste0("../data/vector/ged/st_adm_", classes[i],".rds"))
  rm(res_adm)
  gc()
  print("Done with ADM.")
  res_bas = aggregate(points_st, bas_st, sum, na.rm = T)
  saveRDS(res_bas, paste0("../data/vector/ged/st_bas_", classes[i],".rds"))
  rm(res_bas)
  gc()
  print("Done with BAS.")
}

3.1 Read and write to GPKG.

bas_data = list.files("../data/vector/ged/", pattern ="bas", full.names = T)
adm_data = list.files("../data/vector/ged/", pattern ="adm", full.names = T)

for(x in bas_data){
  object = readRDS(x)
  object@data[is.na(object@data)] = 0 # set NA to 0 for no conflict
  object = as(object, "Spatial")
  object = st_as_sf(object)
  st_crs(object) = st_crs(4326)
  type = str_remove(str_split(basename(x), "_")[[1]][3], ".rds")
  names(object)[1:(ncol(object)-1)] = format(as.Date(names(object)[1:(ncol(object)-1)]), "%b-%Y")
  st_write(object, paste0("../data/vector/response_", type, "_basins.gpkg"), delete_dsn = T)
}

for(x in adm_data){
  object = readRDS(x)
  object@data[is.na(object@data)] = 0 # set NA to 0 for no conflict
  object = as(object, "Spatial")
  object = st_as_sf(object)
  st_crs(object) = st_crs(4326)
  type = str_remove(str_split(basename(x), "_")[[1]][3], ".rds")
  names(object)[1:(ncol(object)-1)] = format(as.Date(names(object)[1:(ncol(object)-1)]), "%b-%Y")
  st_write(object, paste0("../data/vector/response_", type, "_states.gpkg"), delete_dsn = T)
}
files = list.files("../data/vector/", pattern = "response", full.names = T)

results = list()
for(i in c("basins", "states")){
  tmp_files  = files[grep(i, files)]
  levels = sapply(tmp_files, function(x) str_split(basename(x), "_")[[1]][2], USE.NAMES = F)
  data = do.call(rbind, lapply(1:length(tmp_files), function(x){
    tmp_files[x] %>%
      st_read(quiet=T) %>%
      st_drop_geometry %>%
      mutate(id = 1:n()) %>%
      gather("time", "value", -id) %>%
      mutate(type = levels[x])
  })
  )
  results[[i]] = data
}

3.2 Unbalanced Data set

results$basins %>%
  as_tibble() %>%
  mutate(conflict = if_else(value>0, "yes", "no")) %>%
  count(conflict) -> n_bas

results$states %>%
  as_tibble() %>%
  mutate(conflict = if_else(value>0, "yes", "no")) %>%
  count(conflict) -> n_adm

dat = data.frame(unit = c("states", "basins"), perc = c(n_adm$n[n_adm$conflict=="yes"]/n_adm$n[n_adm$conflict=="no"]*100,
                                                        n_bas$n[n_bas$conflict=="yes"]/n_bas$n[n_bas$conflict=="no"]*100))

ggplot(dat) +
  geom_bar(aes(x=unit, y=perc, fill=unit), color = "white", stat="identity", width = .5) +
  theme_classic() +
  labs(x="", y="Percentage of conflict district-months", fill="Unit of analysis")

results$basins$unit = "basin"
results$states$unit = "states"

data = as_tibble(do.call(rbind, results))

data %>%
  #filter(type == "all") %>%
  mutate(conflict = if_else(value>0, "yes", "no")) %>%
  group_by(id, unit, type) %>%
  summarise(conflict = sum(conflict == "yes"), deaths = sum(value)) %>%
  ggplot() +
  geom_point(aes(x=conflict, y=deaths, color=unit), alpha=.4) +
  facet_wrap(~type) +
  theme_classic() +
  labs(y="Number of casualties", x="Number of district-months with conflict",
       color="Unit of analysis")

data %>%
  #filter(type == "all") %>%
  mutate(conflict = if_else(value>0, "yes", "no")) %>%
  group_by(unit, type, time) %>%
  summarise(conflict = sum(conflict == "yes"), deaths = sum(value)) %>%
  ggplot() +
  geom_line(aes(x=time, y=conflict, color=unit, group=unit), alpha=.6) +
  facet_wrap(~type) +
  theme_classic() +
  labs(y="Number of districts with conflict", x="Time",
       color="Unit of analysis") +
  theme(axis.ticks.x=element_blank(),
        axis.text.x=element_blank())

4 Spatial Pattern

data %>%
  #filter(type == "all") %>%
  mutate(conflict = if_else(value>0, "yes", "no")) %>%
  group_by(id, unit, type) %>%
  summarise(conflict = sum(conflict == "yes"), deaths = sum(value)) %>%
  ungroup() %>%
  group_split(unit) -> g_data
names(g_data) = c("basin", "states")

adm$id = 1:nrow(adm)
bas$id = 1:nrow(bas)

adm = left_join(adm, g_data$states)
bas = left_join(bas, g_data$basin)
tm_shape(adm) +
  tm_polygons(col = "deaths", palette = "-RdYlBu", breaks = c(0,10,50,100,500,1000,2000,4000,6000,8000,10000,+Inf), border.col = "gray", lwd = .5) +
  tm_facets("type")

tm_shape(adm) +
  tm_polygons(col = "conflict", palette = "-RdYlBu", breaks = c(0,5,10,25,50,75,100,150,+Inf), border.col = "gray", lwd = .5) +
  tm_facets("type")

tm_shape(bas) +
  tm_polygons(col = "deaths", palette = "-RdYlBu", breaks = c(0,10,50,100,500,1000,2000,4000,6000,8000,10000,+Inf), border.col = "gray", lwd = .5) +
  tm_facets("type")

tm_shape(bas) +
  tm_polygons(col = "conflict", palette = "-RdYlBu", breaks = c(0,5,10,25,50,75,100,150,+Inf), border.col = "gray", lwd = .5) +
  tm_facets("type")


sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] spacetime_1.2-3    lubridate_1.7.9.2  rgdal_1.5-18       countrycode_1.2.0 
 [5] welchADF_0.3.2     rstatix_0.6.0      ggpubr_0.4.0       scales_1.1.1      
 [9] RColorBrewer_1.1-2 latex2exp_0.4.0    cubelyr_1.0.0      gridExtra_2.3     
[13] ggtext_0.1.1       magrittr_2.0.1     tmap_3.2           sf_0.9-7          
[17] raster_3.4-5       sp_1.4-4           forcats_0.5.0      stringr_1.4.0     
[21] purrr_0.3.4        readr_1.4.0        tidyr_1.1.2        tibble_3.0.6      
[25] tidyverse_1.3.0    huwiwidown_0.0.1   kableExtra_1.3.1   knitr_1.31        
[29] rmarkdown_2.7.3    bookdown_0.21      ggplot2_3.3.3      dplyr_1.0.2       
[33] devtools_2.3.2     usethis_2.0.0     

loaded via a namespace (and not attached):
  [1] readxl_1.3.1       backports_1.2.0    workflowr_1.6.2   
  [4] lwgeom_0.2-5       splines_3.6.3      crosstalk_1.1.0.1 
  [7] leaflet_2.0.3      digest_0.6.27      htmltools_0.5.1.1 
 [10] fansi_0.4.2        memoise_1.1.0      openxlsx_4.2.3    
 [13] remotes_2.2.0      modelr_0.1.8       xts_0.12.1        
 [16] prettyunits_1.1.1  colorspace_2.0-0   rvest_0.3.6       
 [19] haven_2.3.1        xfun_0.21          leafem_0.1.3      
 [22] callr_3.5.1        crayon_1.4.0       jsonlite_1.7.2    
 [25] lme4_1.1-26        zoo_1.8-8          glue_1.4.2        
 [28] stars_0.4-3        gtable_0.3.0       webshot_0.5.2     
 [31] car_3.0-10         pkgbuild_1.2.0     abind_1.4-5       
 [34] DBI_1.1.0          Rcpp_1.0.5         viridisLite_0.3.0 
 [37] gridtext_0.1.4     units_0.6-7        foreign_0.8-71    
 [40] intervals_0.15.2   htmlwidgets_1.5.3  httr_1.4.2        
 [43] ellipsis_0.3.1     farver_2.0.3       pkgconfig_2.0.3   
 [46] XML_3.99-0.3       dbplyr_2.0.0       utf8_1.1.4        
 [49] labeling_0.4.2     tidyselect_1.1.0   rlang_0.4.10      
 [52] later_1.1.0.1      tmaptools_3.1      munsell_0.5.0     
 [55] cellranger_1.1.0   tools_3.6.3        cli_2.3.0         
 [58] generics_0.1.0     broom_0.7.2        evaluate_0.14     
 [61] yaml_2.2.1         processx_3.4.5     leafsync_0.1.0    
 [64] fs_1.5.0           zip_2.1.1          nlme_3.1-150      
 [67] whisker_0.4        xml2_1.3.2         compiler_3.6.3    
 [70] rstudioapi_0.13    curl_4.3           png_0.1-7         
 [73] e1071_1.7-4        testthat_3.0.1     ggsignif_0.6.0    
 [76] reprex_0.3.0       statmod_1.4.35     stringi_1.5.3     
 [79] highr_0.8          ps_1.5.0           desc_1.2.0        
 [82] lattice_0.20-41    Matrix_1.2-18      nloptr_1.2.2.2    
 [85] classInt_0.4-3     vctrs_0.3.6        pillar_1.4.7      
 [88] lifecycle_0.2.0    data.table_1.13.2  httpuv_1.5.5      
 [91] R6_2.5.0           promises_1.1.1     KernSmooth_2.23-18
 [94] rio_0.5.16         sessioninfo_1.1.1  codetools_0.2-16  
 [97] dichromat_2.0-0    boot_1.3-25        MASS_7.3-53       
[100] assertthat_0.2.1   pkgload_1.1.0      rprojroot_2.0.2   
[103] withr_2.4.1        parallel_3.6.3     hms_1.0.0         
[106] grid_3.6.3         minqa_1.2.4        class_7.3-17      
[109] carData_3.0-4      git2r_0.27.1       base64enc_0.1-3