Last updated: 2021-06-24

Knit directory: globalIRmap/

The overall workflow of the project is detailed in the Workflow tab of this website. The documentation that follows is specifically for the portions of the analysis conducted in R, which encompasses all statistical analysis and figure-making (aside from maps).

To reproduce this analysis, please contact to obtain the required raw and pre-processed data, which represent about anywhere between 60 and 500GB depending on how comprehensive of a re-production is wanted

R analysis framework

This analysis relies as much as possible on good enough practices in scientific computing, which users are encouraged to read.

Structure: the overall project directory is structured with the following sub-directories:
|– bin (compiled code/external packages)
|– data (raw data, not to be altered)
|– results (results of the analysis, mostly reproduceable through code executiong but also includes manually modified results)
|– src (code written for the project)
All scripts rely on this structure.

Documentation: this project is organized as an R package, providing documented functions to reproduce and extend the analysis reported in the publication. Note that this package has been written explicitly for this project and may not be suitable for general use. See guidelines below to install the package.

R Workflow: this project is setup with a drake workflow, ensuring reproducibility. In the drake philosophy, every action is a function, and every R object is a “target” with dependencies. Intermediate targets/objects are stored in a .drake directory.

Dependency management: the R library of this project is managed by renv. This makes sure that the exact same package versions are used when recreating the project. When calling renv::restore(), all required packages will be installed with their specific version. Please note that this project was built with R version 4.0.3 on a Windows 10 operating system. The renv packages from this project are not compatible with R versions prior to version 3.6.0.

Syntax: this analysis relies on the data.table syntax, which provides a high-performance version of data.frame. It is concise, faster, and more memory efficient than conventional data.frames and the tidyverse syntax.

Machine learning model development: for random forest model development, this project relies on the mlr3 package and ecosystem (see the mlr3 book for learning its usage), which provides an object-oriented framework for machine learning.

Download the repository

To copy (i.e., clone) this repository to your local machine (for Windows only, please contact author for guidance on other platforms).
In Git Bash, the following commands illustrate the procedure to make a local copy of the Github repository in a newly created directory at C://test_globalIRmap :

Mathis@DESKTOP MINGW64 /c/temp
$ cd /c/
$ mkdir test_globalIRmap
$ cd /c/test_globalIRmap
$ mkdir /c/test_globalIRmap/src
$ cd /c/test_globalIRmap/src
Mathis@DESKTOP MINGW64 /c/test_globalIRmap
$ git clone
Cloning into 'globalIRmap'...
remote: Enumerating objects: 116, done.
remote: Counting objects: 100% (116/116), done.
remote: Compressing objects: 100% (89/89), done.
remote: Total 7363 (delta 48), reused 75 (delta 19), pack-reused 7247
Receiving objects: 100% (7363/7363), 1.91 GiB | 3.78 MiB/s, done.
Resolving deltas: 100% (925/925), done.

In R Studio for Windows, the following procedure can be used:

Get started in R

Then open this project in R and run:

renv::restore() # respond y, restores all R packages with their specific version
remotes::install_github('messamat/globalIRmap') #install project package so that the help documentation can be accessed for project functions (e.g., ?format_gaugestats)
r_make() # recreates the analysis

#If you were provided intermediate targets (i.e., a /.drake directory)
#you can load individual targets in the environment even if they are expired

Other practical notes

Notes and resources

