Last updated: 2021-03-12

Checks: 6 1

Knit directory: mapme.protectedareas/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210305) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 9cf92e0. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    analysis/README.html
    Ignored:    contributing.html
    Ignored:    mapme.protectedareas.Rproj
    Ignored:    renv/library/
    Ignored:    renv/local/
    Ignored:    renv/staging/

Untracked files:
    Untracked:  analysis/README.Rmd
    Untracked:  analysis/contribute.Rmd

Unstaged changes:
    Modified:   .DS_Store
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


This file includes information on how to use this repository and create good contributions. We also use a code of conduct for open source projects adapted from the Contributor Covenant homepage. You can find this conduct in the file CoC.md.

How can I use this repository?

Put here information on the usage (later).

How can I contribute?

  1. Reporting Bugs & Suggesting Enhancements
  2. Contributing Code
  3. Issues and Prioritization

Reporting Bugs And Suggesting Enhancements

Add here information on where and how to report bugs and add suggestions (later)

Remarks on Contributing Code

In order to ensure reproducibility, proper documentation and open-source code is key. In addition to making all routines public, we also have a strong focus on the proper documentation of utilized methods. Here are the necessary steps contributors should do to make a contribution to this repository.

  1. Read this contrib file. It contains most of the relevant information to get started.
  2. Create a new issue in this repository that describes your intended contribution and wait for a reaction from the community.
  3. If you are a contributor to this repository open a new branch to work on your issue. Give the branch a name that makes it easy to understand what the intended contribution might be. Example for Branchname: wwf_teow_preprocessing to add preprocessing routines for the WWF/TEOW dataset.
  4. If you are not a contributor (yet) you might want to forge this repository first and later create an upstream pull request.
  5. This repository uses the tidyverse Styleguide for creating commits in R. Please have a look at it, if you are not familiar.
  6. This repository tries to follow the recommendations of Ropenscilabs for developing spatial software in R. Please have a look at it, if you are not familiar. The two main libraries to be used for geo-processing are sf and terra. Please try to write your functions using these packages instead of the older packages sp and raster.
  7. Before creating new code you might want to have a look at the existing code-base to see whether you can take advantage of existing processing or analysis routines. The easiest way is probably to have a look at the documentation pages (<-insert link from index later here->) and look at the utilized routines there before going into the code itself. It is encouraged re-utilize code from the code folder as much as possible to reduce maintenance needs and complexity.
  8. We use R scripts to process our data and Rmarkdownscripts (Rmd) to document the utilized data, methods, pre-processing routines and analysis scripts. We have some basic recommendations and minimum standards for code developemt which are shown below.
  9. Develop your code with sample data that is quick to process and easy to reproduce. There are also some recommendations on sample data below.

Repository Structure

This repository is structured as a reproducible project using the workflowr package. For more information please visit workflowr website. In addition, this repository used the the renv package to enable others to exactly work with the same package versions that had been utilized in this project. For more information please visit the renv website.

mapme.protectedareas/

.
├── .gitignore # specifies which files to ignore from the git
├── .Rprofile # contains information on which libraries and settings should be used at start
├── _workflowr.yml  # yml file needed to control workflowr
├──  analysis/
    ├── about.Rmd
    ├── index.Rmd
    ├── license.Rmd 
    ├──  *.Rmd # Rmarkdown files to document pre-processing and analysis routines
    └── _site.yml
├── code/  
    ├── *.R # R-functions and scripts utilized in Rmd
├── data/   # raw imputable sample data for the routine. 
├── docs/  # htmls builds of the Rmds in analysis/ 
├── output/  # intermediate and final data output from sample data
├── README.md # general information on the project
└── renv/  # renv directory to lock used package versions

Preprocessing Datasets

This is a step-by-step guide to create a new pre-processing routine based on the example of TEOW ecoregions dataset from WWF. Pre-processing routines are organized in this repository along the line of thematic data-sets from differing data-sources. The intention of this organization is to have a modular structure that allows for easy adding or deleting data-sources from the pre-processing routine and eventually chain them together. Also this allows us to debug the code more easily if the routine is chained. An exception to this structure are routines that allow access to already pre-processed data-sets, in our case the API-Access DOPA/JRC Rest Services, which provides tabular information for several thematic data-sets pre-processed by JRC on the base of PAs.

Please try to develop your routine using a small subset of sample data that takes few processing time for the documentation and has low storage requirements. For PAs we recommend e.g. to use 1 or 2 PAs from your area of interest. We currently use the wdpar package to automatically download and preprocess sample PAs from the WDPA. You can also save the sample data in the data folder. This is recommended if the dataset you use has no definitive storage place on the internet. Nevertheless, it is encouraged to include the downloading process in the routine as well and save the sample data in the temporary folder that is created by the Rsession.

To create a new pre-processing routine you should

  1. Create a new issue for the data-source that should be processed. The Title could be for example: “Pre-process TEOW ecoregion data from WWF to create tabular data for PA polygon database”
  2. Create a new branch in the repository as discussed above. This branch should indicate what is done e.g. “preprocess_teow”
  3. Have a look at existing processing routines as mentioned above to see if you can recycle code or use existing routines.
  4. Create a Rmd file placed in the analysis folder (see structure above). Best practice is to use the wflow_open() command which will create a new Rmd file. Name this new file according to the pre-processed data-source and variable e.g. wwf_teow.Rmd for TEOW Ecoregions from the World Wildlife Fund for Nature (WWF) or gfw_forests.Rmd for different variables from the Global Forest Watch (GFW) such as forest cover, forest cover loss or emissions from forest cover loss. The full command to create a new script in our example would therefore look like this wflow_open("analysis/wwf_teow.Rmd")
  5. Create a R-script for larger functional code-junks in the code folder (if you need to create new routines). This file can be sourced in the Rmd (details and naming convention below).
  6. Create a reference to the new routine and its documentation by creating a link to the rendered html file in index.Rmd (details below)
  7. Add relevant meta-data to the Rmd file (details below).

Seperate R scripts with source code

You should try to develop R-functions that are seperated in an R-script and then sourced in the Rmd files. Those functions and R-scripts should be placed in the code folder (see structure above). This is especially relevant for code which can be re-used in several pre-processing routines such as chained pre-processing steps e.g. reproject -> rasterize -> stack -> zonal.

Here you have two differing naming conventions when saving the script:

  • You can create a new name that links the R file to the name of your Rmdfile. This could be for example preprocessing_wwf_teow.R. Use this if your code is most probably specific to the dataset that you process.
  • If you create a new processing routine that is more generic and can be utilized in several other projects as well. Try to find a name that best describes this routine. For the example chain above such name could be rasterize_and_zonalstats.R. If you create such a generic routine make sure to also create a link and a small descrpition of this new routine in the index.Rmd file as described in the subsequent section.

Contents of the Rmd file

In order to create a good documentation of the processed data and the authors of the script we would like to ask for some minimal information in the Rmd files consisting of

  • Author Name (and optionally contact details, link to github account or other)
  • Purpose of the script
    • what is shown by the dataset(s) and what information is to be derived,
    • what is done in the script and which processing scripts from the code folder are used
    • what data comes out in the end.
  • Meta-data for the processed dataset(s)
    • Name of the dataset(s) and source(s) (scientific citation if available)
    • Version number of the data (if the data is updated and possess this information)
    • geographical extension
    • spatial resolution
    • temporal resolution
    • link to the main meta-data document from the data-source
    • download-link
    • when was the data downloaded
  • Detailed description of the data processing
  • Time necessary to process the sample data and details on the machine used to process it.

Analysis

This will be added later.

Issues and Prioritization

This repository offers common labels such as bug or enhancement. In addition it also has labels to create a priorization scheme to see which issues should be prioritized and adressed first. To that end we use a simplified method called MoSCoW which categorizes tasks into Must, Should, Could and Won't. Issues of category Must are the most relevant and should be addressed first. After addressing all of these issues we will move forward to the Should and if time left to the Could. In addition there is a label called Fast Lane which is used to mark such issues that should be addressed first within their given category. So an issue with Should and Fast Lane should be adressed quickly after all of the Musts are processed.


sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6      rstudioapi_0.13 knitr_1.30      magrittr_2.0.1 
 [5] R6_2.5.0        rlang_0.4.10    stringr_1.4.0   tools_4.0.3    
 [9] xfun_0.19       git2r_0.27.1    htmltools_0.5.0 ellipsis_0.3.1 
[13] rprojroot_2.0.2 yaml_2.2.1      digest_0.6.27   tibble_3.0.4   
[17] lifecycle_1.0.0 crayon_1.3.4    later_1.1.0.1   vctrs_0.3.6    
[21] promises_1.1.1  fs_1.5.0        glue_1.4.2      evaluate_0.14  
[25] rmarkdown_2.5   stringi_1.5.3   compiler_4.0.3  pillar_1.4.7   
[29] httpuv_1.5.4    renv_0.13.0     pkgconfig_2.0.3