Tutorial 3: Preparing data for a density map

Introduction

Step 1: Setting up
Step 2 - Set auk filters
Step 3 - Apply auk filters

Last updated: 2021-11-22

Checks: 7 0

Knit directory: ebird_light_pollution/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20211122)

The command set.seed(20211122) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 0fa01e5

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 0fa01e5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store

Untracked files:
    Untracked:  analysis/4_drawing_a_density_map.Rmd
    Untracked:  analysis/5_extracting_light_pollution_data.Rmd
    Untracked:  data/.here
    Untracked:  data/house_sparrow_test.txt

Unstaged changes:
    Modified:   analysis/2_make_a_simple_occurrence_plot.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/3_preparing_data_for_density_map.Rmd) and HTML (docs/3_preparing_data_for_density_map.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	0fa01e5	markravinet	2021-11-22	add tutorial 3

Introduction

In the last tutorial, we learned how to extract spatial data for plotting sightings of bird species. Now we want to go over our initial introduction to extracting data from ebird in order to repeat the process but this time with predicting population densities in mind.

Step 1: Setting up

For this particular tutorial, we only need a single R library - auk. We load it up the standard way

library(auk)

Next, we set the ebird data path - this is exactly the same as before:

# set ebd path
auk_set_ebd_path("./ebd_GB_relSep-2020/", overwrite = TRUE)

We then need to tell auk the names of our ebird data file.

# set input file - this is the ebd data
ebd_file <- "./ebd_GB_relSep-2020/ebd_GB_relSep-2020.txt"

This next part is where we diverge from our original tutorial. We should also provide auk with a path tothe ebird sample data. This is also in our ebird directory but last time we didn’t bother with it.

# set the ebd_sample data path - this is for the sample data
ebd_sample <- "./ebd_GB_relSep-2020/ebd_sampling_relSep-2020_filtered.txt"

What is the sample data? Basically this is a list of checklists provided to ebird users when they record their sightings. For example, imagine you go on a birding walk through University Park. If you use ebird, you will be given a checklist that allows you to record the species you see. Crucially, this will also record the species you do not see. Thus the sample data allows us to generate presence/absence data.

Step 2 - Set `auk` filters

Now that we’ve told auk where the data is and the paths for the data files, we’re ready to apply a filter. We do this in exactly the same way as we did before. Except this time we also tell auk_ebd where our sample data is.

# read the ebd data file and apply some filters to extract species
my_filter <- auk_ebd(ebd_file, file_sampling = ebd_sample) %>% 
  auk_species(c("House sparrow")) %>%
  auk_country("GB") %>%
  auk_complete() %>%
  auk_date(c("2017-01-01", "2018-01-01"))

We’ve also done something else a bit different here - we extracted house sparrow data from 2017. This is just to demonstrate that there is temporal data within the ebird dataset and as we will see later, this could potentially be very informative for our analyses.

One other addition is the auk_complete function. This is done to ensure only complete checklists are included in the sample data. This essentially removes any checklists in the sample data that were not finished and that could compromise our presence/absence estimates.

Step 3 - Apply `auk` filters

Now that we’ve set the filters we need to extract information on UK house sparrows between 1992 and 2018, we just need to apply them!

output_ebd <- "house_sparrow_test_2017.txt"
output_sampling <- "house_sparrow_test_sampling_2017.txt"

auk_filter(my_filter, file = output_ebd, file_sampling = output_sampling, overwrite = TRUE)

Note that we set the output data and sampling file paths first. It’s worth noting that we used a suffix of _2017 in order to denote the year. This will be useful in the future.

That’s it for now! You can try plotting this again like you did with tutorial 2 if you want. Otherwise in the next tutorial, we’ll proceed to drawing a density map.

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] auk_0.5.1       workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       whisker_0.4      knitr_1.36       magrittr_2.0.1  
 [5] R6_2.5.1         rlang_0.4.12     fastmap_1.1.0    fansi_0.5.0     
 [9] stringr_1.4.0    tools_4.1.2      xfun_0.28        utf8_1.2.2      
[13] git2r_0.28.0     jquerylib_0.1.4  htmltools_0.5.2  ellipsis_0.3.2  
[17] rprojroot_2.0.2  yaml_2.2.1       digest_0.6.28    tibble_3.1.5    
[21] lifecycle_1.0.1  crayon_1.4.2     later_1.3.0      vctrs_0.3.8     
[25] promises_1.2.0.1 fs_1.5.0         glue_1.5.0       evaluate_0.14   
[29] rmarkdown_2.11   stringi_1.7.5    compiler_4.1.2   pillar_1.6.4    
[33] httpuv_1.6.3     pkgconfig_2.0.3

Tutorial 3: Preparing data for a density map

Introduction

Step 1: Setting up

Step 2 - Set auk filters

Step 3 - Apply auk filters

Step 2 - Set `auk` filters

Step 3 - Apply `auk` filters