prepare-data

Last updated: 2021-05-20

Checks: 7 0

Knit directory: booksn_dispersantes/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210428)

The command set.seed(20210428) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: b3d9091

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b3d9091. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store

Unstaged changes:
    Modified:   analysis/diversity.Rmd
    Modified:   data/passerine.csv
    Modified:   data/passerine_abbrev.csv
    Modified:   output/compose_plot.pdf
    Modified:   output/diversity/alfa_diversity.csv
    Modified:   output/diversity/diversity_habitat.pdf
    Modified:   output/diversity/richness.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/prepare-data.Rmd) and HTML (docs/prepare-data.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	b3d9091	Antonio J Perez-Luque	2021-05-20	filter Delichon urbicum
html	684a3f9	Antonio J Perez-Luque	2021-05-07	Build site.
html	1f2c2c2	Antonio J Perez-Luque	2021-05-03	Build site.
html	044e66b	Antonio J Perez-Luque	2021-04-30	Build site.
Rmd	62b09da	Antonio J Perez-Luque	2021-04-30	generate abreviations uniques
html	8e49372	Antonio J Perez-Luque	2021-04-30	Build site.
Rmd	d07811a	Antonio J Perez-Luque	2021-04-30	generate abreviations
html	a8dd293	Antonio J Perez-Luque	2021-04-30	Build site.
Rmd	8feb234	Antonio J Perez-Luque	2021-04-30	fix taxonomic error Cuculus
html	1cb0658	Antonio J Perez-Luque	2021-04-29	Build site.
Rmd	bf02c97	Antonio J Perez-Luque	2021-04-29	fix taxonomic erros
html	ec67c0c	Antonio J Perez-Luque	2021-04-29	Build site.
Rmd	03e8fc6	Antonio J Perez-Luque	2021-04-29	prepare and cleaning data; add exported data
html	5f6b690	Antonio J Perez-Luque	2021-04-29	Build site.
Rmd	a6cd9dc	Antonio J Perez-Luque	2021-04-29	cleaning old censuses
html	7281d42	Antonio J Perez-Luque	2021-04-28	Build site.
Rmd	1ed7427	Antonio J Perez-Luque	2021-04-28	Add my first analysis

knitr::opts_chunk$set(echo = TRUE, 
                      warning = FALSE, 
                      message = FALSE)

Data sources

Data coming from two sources:

Old bird censuses provided by R. Zamora, consists in bird censuses in three locations: oak population (1700 masl); juniper-scrubland (2230 masl) and summit environments (3200 masl). Range temporal cover from 1981 to 1985.
Obsnev bird censuses provided by OBSNEV, realized in several transects distributed along Sierra Nevada. Temporal range from 2008 to 2020. The data were downloaded from new information system of OBSNEV (i.e. PostgreSQL db01.obsnev.es)

All data are stored in the folder /data_raw

Prepare Old bird censuses

Notes for Old bird censuses

File RObledal año 1981 RZAves_SN_10ha.xls:
- habitat: oak woodlands (Q. pyrenaica)
- elevation: 1700 masl
- year: 1981
- variable: bird abundance monthly aggregated (ind / 10 ha)
- sample size: n = 3 (may, june, july)
- notes: The data aren’t the original bird censuses but are monthly aggregated
File Enebral año 1985 RZAves_SN_10ha.xls
- habitat: juniper-scrubland
- elevation: 2300 masl
- year: 1985
- variable: bird abundance monthly aggregated (ind / 10 ha)
- sample size: n = 3 (may, june, july)
- notes: The data aren’t the original bird censuses but are monthly aggregated
File Aves_SN_meses_reproduccion.xls
- habitat: jseveral habitats. We selected juniper-scrubland and summit environments
- elevation: 2300 and 3200 masl
- year: 1984 (juniper); 1982 (summit environment)
- variable: raw bird abundance. For juniper ind / 10.2 ha; and for summit ind / 20 ha.
- sample size: n = 9 transects (juniper) and 6 transects (juniper) during may-july
- notes: Original bird censuses

library("tidyverse")
library("here")
library("readxl")
library("DT")

robledal1981 <- read_excel(here::here("data/data_raw/RObledal año 1981 RZAves_SN_10ha.xls")) %>% 
  pivot_longer(cols= mayo_1981:julio_1981, names_to="fecha") %>% 
  separate(fecha, into = c("mes", "year"), sep="_", remove = FALSE) %>% 
  rename("especie" = Aves, "den" = value) %>% 
  mutate(year = as.numeric(year), 
         habitat = "robledal", 
         cota = 1700,
         mes = case_when(
           mes == "mayo" ~ as.numeric(5), 
           mes == "junio" ~ as.numeric(6),
           mes == "julio" ~ as.numeric(7)), 
         fecha = format(as.Date(paste(year, mes, "01", sep="-")), format="%Y-%m-%d")) 

head(robledal1981)

# A tibble: 6 x 7
  especie              fecha        mes  year   den habitat   cota
  <chr>                <chr>      <dbl> <dbl> <dbl> <chr>    <dbl>
1 Phylloscopus bonelli 1981-05-01     5  1981   8   robledal  1700
2 Phylloscopus bonelli 1981-06-01     6  1981  13.8 robledal  1700
3 Phylloscopus bonelli 1981-07-01     7  1981  14.6 robledal  1700
4 Sylvia atricapilla   1981-05-01     5  1981   4.4 robledal  1700
5 Sylvia atricapilla   1981-06-01     6  1981   4.6 robledal  1700
6 Sylvia atricapilla   1981-07-01     7  1981   6.8 robledal  1700

enebral1985 <- read_excel(here::here("data/data_raw/Enebral año 1985 RZAves_SN_10ha.xls")) %>%   pivot_longer(cols= mayo_1985:julio_1985, names_to="fecha") %>% 
  separate(fecha, into = c("mes", "year"), sep="_") %>% 
  rename("especie" = Aves, "den" = value) %>% 
  mutate(year = as.numeric(year), 
         habitat = "enebral", 
         cota = 2230, 
         mes = case_when(
           mes == "mayo" ~ as.numeric(5), 
           mes == "junio" ~ as.numeric(6),
           mes == "julio" ~ as.numeric(7)),
         fecha = format(as.Date(paste(year, mes, "01", sep="-")), format="%Y-%m-%d")) 

head(enebral1985)

# A tibble: 6 x 7
  especie               mes  year   den habitat  cota fecha     
  <chr>               <dbl> <dbl> <dbl> <chr>   <dbl> <chr>     
1 Carduelis carduelis     5  1985   0   enebral  2230 1985-05-01
2 Carduelis carduelis     6  1985   0   enebral  2230 1985-06-01
3 Carduelis carduelis     7  1985   0.1 enebral  2230 1985-07-01
4 Alauda arvensis         5  1985   5.1 enebral  2230 1985-05-01
5 Alauda arvensis         6  1985   5.4 enebral  2230 1985-06-01
6 Alauda arvensis         7  1985   5.7 enebral  2230 1985-07-01

enebral1984 <- read_excel(here::here("data/data_raw/Aves_SN_meses_reproduccion.xlsx"),
                           sheet = "2230") %>% 
  rename("especie" = Ave, "den" = `Número`) %>% 
  mutate(den = round(den*(10/10.2),2),
         habitat = "enebral", 
         cota = 2230, 
         mes = lubridate::month(Fecha), 
         year = lubridate::year(Fecha), 
         Fecha = strftime(Fecha, format="%Y-%m-%d")) %>% 
  rename(fecha = Fecha)

head(enebral1984)

# A tibble: 6 x 7
  especie             fecha        den habitat  cota   mes  year
  <chr>               <chr>      <dbl> <chr>   <dbl> <dbl> <dbl>
1 Oenanthe oenanthe   1984-05-05 15.7  enebral  2230     5  1984
2 Alauda arvensis     1984-05-05  3.92 enebral  2230     5  1984
3 Emberiza cia        1984-05-05 15.7  enebral  2230     5  1984
4 Phoenicurus ochuros 1984-05-05  0.98 enebral  2230     5  1984
5 Anthus campestris   1984-05-05  0.98 enebral  2230     5  1984
6 Alectoris rufa      1984-05-05  0.98 enebral  2230     5  1984

cumbres1982 <- read_excel(here::here("data/data_raw/Aves_SN_meses_reproduccion.xlsx"),
                           sheet = "3200") %>% 
  rename("especie" = Ave, "den" = `Número`) %>% 
  mutate(den = round(den*(10/20),2),
         habitat = "cumbres", 
         cota = 3200, 
         mes = lubridate::month(Fecha), 
         year = lubridate::year(Fecha),
        Fecha = strftime(Fecha, format="%Y-%m-%d")) %>% 
  rename(fecha = Fecha)

head(cumbres1982)

# A tibble: 6 x 7
  especie             fecha        den habitat  cota   mes  year
  <chr>               <chr>      <dbl> <chr>   <dbl> <dbl> <dbl>
1 Oenanthe oenanthe   1982-06-06   0.5 cumbres  3200     6  1982
2 Phoenicurus ochuros 1982-06-06   3   cumbres  3200     6  1982
3 Prunella collaris   1982-06-06   2.5 cumbres  3200     6  1982
4 Oenanthe oenanthe   1982-06-07   0.5 cumbres  3200     6  1982
5 Phoenicurus ochuros 1982-06-07   1   cumbres  3200     6  1982
6 Prunella collaris   1982-06-07   4   cumbres  3200     6  1982

Bind old data

old_census <- bind_rows(cumbres1982, enebral1984, enebral1985, robledal1981)
datatable(old_census)

Prepare OBSNEV bird censuses

Raw data were downloaded from OBSNEV information system. The downloaded tables were: contactos_paseriformes.csv; dicc_especies.csv; geo.csv; visitas.csv.
The protocol sampling number for passerine in the database is 5.
Select the three locations: “Cortijo del Hornillo” (oak), “Campos de Otero” (juniper), and “Aguas Verdes” (summits).
Filter data only for may, june, july
Filter out data contacted over 30 m
Select only the taxonomic level = species (level > 6)

contactos <- read_csv(here::here("data/data_raw/contactos_paseriformes.csv")) %>% 
  dplyr::select(-fcreacion, -fmodificacion)

dicc_sp <- read_csv(here::here("data/data_raw/dicc_especies.csv")) %>% 
  dplyr::select(idesp, nombre_cientifico, nivel)

# El protocolo de paseriformes es el 5
dicc_visita <- read_csv(here::here("data/data_raw/visitas.csv"), 
                        col_types = 
                          cols(.default ="?", 
                               idgeo = col_character(), 
                               fvisita = col_datetime(format="%Y-%m-%d %H:%M:%S"))) %>%
  filter(protocolo == 5) 
  
dicc_geo <- read_csv(here::here("data/data_raw/geo.csv"), 
                     col_types = cols(.default ="?", 
                               longitud_m = col_double()))

visita_geo <- 
  dicc_visita %>% inner_join(dicc_geo, by = "idgeo") %>% 
  dplyr::select(idvisitas, fvisita, nombre, longitud_m) 

dfraw <- contactos %>% 
  inner_join(visita_geo, by = "idvisitas") %>% 
  inner_join(dicc_sp, by = "idesp") %>% 
  mutate(year = lubridate::year(fvisita), 
         mes = lubridate::month(fvisita))

df <- dfraw %>% 
  filter(nombre %in% c("Cortijo del Hornillo", "Campos de Otero", "Aguas Verdes")) %>% 
  mutate(habitat = recode(nombre, 
                          "Campos de Otero" = "enebral",
                          "Cortijo del Hornillo" = "robledal",
                          "Aguas Verdes" = "cumbres")) %>%
  mutate(cota = case_when(
    habitat == "enebral" ~ 2230, 
    habitat == "robledal" ~ 1700,
    habitat == "cumbres" ~ 3200
  )) %>% 
  mutate(year = lubridate::year(fvisita), 
         mes = lubridate::month(fvisita),
         fecha = strftime(fvisita, format="%Y-%m-%d")) %>% 
  filter(mes %in% c(5,6,7)) %>% 
  filter(desplazamiento < 31) %>% 
  filter(nivel > 6) 

head(df)

# A tibble: 6 x 17
     id idvisitas idesp numero distancia desplazamiento observaciones
  <dbl>     <dbl> <dbl>  <dbl>     <dbl>          <dbl> <chr>        
1  2010      1823 24641      1       260             15 NULL         
2  2011      1823 24592      1       255             15 NULL         
3  2012      1823 24587      1       280              5 NULL         
4  2013      1823 24587      1       295              5 NULL         
5  2014      1823 24587      2       370              5 NULL         
6  2015      1823 24587      1       411              5 NULL         
# … with 10 more variables: fvisita <dttm>, nombre <chr>, longitud_m <dbl>,
#   nombre_cientifico <chr>, nivel <dbl>, year <dbl>, mes <dbl>, habitat <chr>,
#   cota <dbl>, fecha <chr>

Aggregated data to get the sum of all contacts for a specie recorded in the same visit of the same transect
Remove an error “Prunus avium” in species codification

dfab <- df %>% 
  filter(nombre_cientifico != "Prunus avium") %>% 
  group_by(nombre_cientifico, fecha, year, mes, longitud_m, nombre, habitat, nivel, cota) %>%
  summarise(total_ind = sum(numero)) %>%  
  mutate(den = round((total_ind * 10000 * 10 / (longitud_m * 60)),2)) %>% 
  ungroup() %>% 
  rename(especie = nombre_cientifico) %>% 
  dplyr::select(-nivel, -longitud_m, -nombre, -total_ind)

Join old and new data, and export data. Be caution with this data, since it includes some species that specialists indicated to remove them.
Export data as birds.csv
Recode especies according to:
- Acanthis cannabina to Carduelis cannabina;
- Parus caeruleus to Cyanistes caeruleus;
- Parus ater to Periparus ater;
- Phoenicurus ochuros to Phoenicurus ochruros;
- Cuculos canorus to Cuculus canorus

birds <- bind_rows(old_census, dfab) %>% 
  mutate(especie = case_when(
    especie == "Acanthis cannabina" ~ "Carduelis cannabina",
    especie == "Parus caeruleus" ~ "Cyanistes caeruleus",
    especie == "Parus ater" ~ "Periparus ater", 
    especie == "Phoenicurus ochuros" ~ "Phoenicurus ochruros", 
    especie == "Cuculos canorus" ~ "Cuculus canorus",
    TRUE ~ especie
  ))
datatable(birds)

write_csv(birds, here::here("data/birds.csv"))

Generate an abbreviation for the especies composed by the three first characters of Genus and the three first characters of species, separated by a dot.
Filter out species. After consultation of specialist we’ll remove the following species:

remove_sp<- c("Acrocephalus dumetorum", "Alectoris rufa", "Columba palumbus", "Corvus corax", "Ficedula albicollis", "Himantopus himantopus", "Ixobrychus sturmii", "Luscinia svecica", "Monticola solitarius","Oceanodroma leucorhoa", "Prunus avium", "Puffinus yelkouan", "Pyrrhocorax pyrrhocorax", "Ptyonoprogne rupestris", "Delichon urbicum")

passerine  <- birds %>% 
  filter(!especie %in% remove_sp) %>% 
  separate(especie, c("genus", "sp")) %>% 
  mutate(g.ab = substr(genus, 1, 3), 
         s.ab = substr(sp, 1, 3)) %>% 
  unite("sp.abb", g.ab:s.ab, sep=".") %>% 
  unite("especie", genus:sp, sep=" ")

datatable(passerine)

passerine_abbreviations <- passerine %>% 
  dplyr::select(especie, sp.abb) %>% 
  unique()

write_csv(passerine, here::here("data/passerine.csv"))
write_csv(passerine_abbreviations, here::here("data/passerine_abbrev.csv"))

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DT_0.17         readxl_1.3.1    here_1.0.1      forcats_0.5.1  
 [5] stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4     readr_1.4.0    
 [9] tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3   tidyverse_1.3.0
[13] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lubridate_1.7.10  assertthat_0.2.1  rprojroot_2.0.2  
 [5] digest_0.6.27     utf8_1.1.4        R6_2.5.0          cellranger_1.1.0 
 [9] backports_1.2.1   reprex_1.0.0      evaluate_0.14     httr_1.4.2       
[13] pillar_1.4.7      rlang_0.4.10      rstudioapi_0.13   whisker_0.4      
[17] jquerylib_0.1.3   rmarkdown_2.6.6   htmlwidgets_1.5.3 munsell_0.5.0    
[21] broom_0.7.4       compiler_4.0.2    httpuv_1.5.5      modelr_0.1.8     
[25] xfun_0.20         pkgconfig_2.0.3   htmltools_0.5.1.1 tidyselect_1.1.0 
[29] fansi_0.4.2       crayon_1.4.1      dbplyr_2.1.0      withr_2.4.1      
[33] later_1.1.0.1     grid_4.0.2        jsonlite_1.7.2    gtable_0.3.0     
[37] lifecycle_1.0.0   DBI_1.1.1         git2r_0.28.0      magrittr_2.0.1   
[41] scales_1.1.1      cli_2.3.0         stringi_1.5.3     fs_1.5.0         
[45] promises_1.2.0.1  xml2_1.3.2        bslib_0.2.4       ellipsis_0.3.1   
[49] generics_0.1.0    vctrs_0.3.6       tools_4.0.2       glue_1.4.2       
[53] hms_1.0.0         crosstalk_1.1.1   yaml_2.2.1        colorspace_2.0-0 
[57] rvest_0.3.6       knitr_1.31        haven_2.3.1       sass_0.3.1

prepare-data

Antonio J Perez-Luque

2021-04-28

Data sources

Prepare Old bird censuses

Notes for Old bird censuses

Prepare OBSNEV bird censuses