Last updated: 2021-06-29

Checks: 1 1

Knit directory: globalIRmap/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 025ac83. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    R/.Rhistory
    Ignored:    analysis/.Rhistory
    Ignored:    renv/library/
    Ignored:    renv/staging/

Untracked files:
    Untracked:  .drake/
    Untracked:  .gitignore
    Untracked:  figtabres.docx
    Untracked:  schema.ini

Unstaged changes:
    Modified:   analysis/methods_gettingstarted.Rmd
    Modified:   analysis/methods_refdisdat.Rmd
    Modified:   analysis/methods_riveratlas.Rmd
    Modified:   analysis/methods_workflow.Rmd
    Modified:   log/drake.log
    Deleted:    output/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/methods_workflow.Rmd) and HTML (docs/methods_workflow.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 6e12f71 messamat 2021-06-14 Build site.
Rmd 5e433c3 messamat 2021-06-14 Publish new pages
html 9b48bde messamat 2021-01-06 Build site.
Rmd f1d9dcf messamat 2021-01-06 Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation
html f1d9dcf messamat 2021-01-06 Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation

Overall workflow

This study leverages the respective strengths of R (for data wrangling, statistics, and figure-making) and Python (for spatial analysis and mapping). As a result, re-producing it requires going back and forth between these two languages and platforms. At the broadest level, the main steps of this analysis were the following:
1. Python — Pre-process and format global river network environmental attributes: for more information, see this tab on this website and the corresponding Github repository.
2. Python — Compile and pre-process global river network; download and spatially pre-process streamflow gauging stations (reference data for model training and testing, for more information, see this tab), national hydrographic datasets, and on-the-ground visual observations of flow intermittence: globalIRmap_py Github repository.
3. R — QA/QC streamflow gauging station records; develop and validate random forest models, compare predictions to hydrographic datasets and on-the-ground observations, generate tables, make non-spatial figures and generate tabular predictions: globalIRmap Github repository.
4. Python — Join tabular predictions of flow intermittence to global river network, join predictions to on-the-ground observations of flow intermittence: globalIRmap_py Github repository.
5. ArcMap — Create maps.

Below, we briefly explain how each of these steps was implemented, but additional data not currently available publicly are needed to fully reproduce the analysis. Please contact and/or bernhard.lehner(at)mcgill.ca for additional information should you want to re-produce the results from this study. In addition please note that processing these data takes weeks of continuous computing on a normal workstation.

1. Pre-processing and formatting global river network environmental attributes

Github repository structure for globalIRmap_HydroATLAS_py

Set-up

utility_functions.py:
- import key modules.
- defines utility functions used throughout the analysis.
- defines the basic folder structure of the analysis.

runUplandWeighting.py:
- define functions for routing data on river network

Download data

Downloading data requires the creation of a file called “configs.json” with login information for earthdata and alos. For guidance on formatting the json configuration file, see here.

Pre-format data

  • format_HydroSHEDS.py: create
    • a coastal band raster (~ 10 pixels inland at ~450 m resolution)
    • HydroSHEDS regions of contiguous land surfaces in raster and polygon format
  • format_MODISmosaic.py: extract and mosaic MODIS ocean mask.
  • format_GLAD.py: format surface water dynamics dataset, removing ocean pixels and aggregating data from 30 m resolution to 15 sec (~450 m) resolution (i.e., computing statistics of e.g. percentage area of seasonal surface water).
  • format_WorldClim2.py: resample WorldClim2 rasters (30 sec native resolution) to HydroSHEDS resolution (15 sec) and fill gaps.
  • format_GAIandCMIv2.py:
    • compute Climate Moisture Index (based on WorldClimv2 precipitation and GAIv2 potential evapotranspiration data)
    • resample GAI and CMI rasters to HydroSHEDS resolution (15 sec)
  • format_SoilGrids250m.py: mosaic tiles, compute aggregate texture values for (0-100 cm), reproject and aggregate rasters (250 m) to HydroSHEDS resolution (15 sec).
  • format_worldpop.py: aggregate (from 3 sec to 15 sec resolution)and mosaick country population rasters, associate each population pixel to a river reach (with long-term mean annual flow > 0.1 m3/s), and compute population that is closest to each reach.

Associate hydro-environmental attributes to RiverATLAS river reaches

  • runUplandWeighting_batch.py: route hydro-environmental characteristics along the global river network to yield rasters of the average value of a given hydro-environmental characteristic (e.g., global aridity index) across the entire upstream area of each pixel. Compute rasters for worldclim, GAI, CMI, soilgrids textures from 0 to 100 cm, and surface water dynamics.
  • runHydroATLASStatistics.py: create statistics tables of hydro-environmental attributes for every river reach in RiverATLAS. This code requires a fair amount of manual adjustment of local paths and must direct to a local master spreadsheet with the parameters of all statistics to compute. Please contact for more information and for an example of such a table.

Workflow summary

Execute:
1. scripts for downloading data in any order
2. format_MODISmosaic.py
3. format_HydroSHEDS.py
4. format_WorldClim2.py
5. other formatting scripts in any order
6. runUplandWeighting_batch.py
7. runHydroATLASStatistics.py

2. Pre-processing and formatting spatial datasets aside from hydro-environmental attributes

Github repository structure for globalIRmap_py

Set-up

utility_functions.py:
- imports key modules.
- defines utility functions used throughout the analysis.
- defines the basic folder structure of the analysis.

setup_localIRformatting.py: - defines folder structure for formatting data to compare modeled estimates of global flow intermittence to national hydrographic datasets (Comparison_databases) and to in-situ/field-based observations of flow intermittence (Insitu_databases). - defines functions used in formatting data for the comparisons

Download data

  • download_GSIM.py: download Global Streamflow Indices and Metadata (GSIM) archive from pangaea repositories.
  • download_format_IRdata.py: Download and format national hydrographic datasets and download on-the-ground observation of river intermittence.
    • U.S.A.: download National Hydrography Plus (NHDPlus) medium and high resolution, add attributes (drainage area, mean annual flow), export attribute table for analysis in R, divide datasets into subsets by drainage area and discharge size classes, subselect HydroATLAS basins that overlap the NHDPlus.
    • France (data were given by Ton Snelder): divide dataset into subsets by drainage area and stream order size classes, subselect HydroATLAS basins that overlap France.
    • Brazil: download national hydrographic dataset, identify first order streams through network analysis.
    • Australia: download Australian Geofabric, divide dataset into subsets by drainage area size classes, subselect HydroATLAS basins that overlap Australia.
    • Observatoire National Des Etiages (ONDE, France): download ONDE dataset for 2012-2019, download French “Carthage” hydrographic network for formatting.
    • Pacific Northwest PROSPER: download PROSPER dataset of flow state observations, download continuous parameter grids (CPG) of topography data in the Pacific Northwest for formatting.

Format data

  • format_RiverATLAS.py: format RiverATLAS river network.
    • Intersect RiverATLAS reaches with lakes
    • Spatially associate RiverATLAS reaches with HydroBASINS level 05
    • Export attribute table of RiverATLAS including those included in River ATLAS 1.0 and new hydro-environmental attributes computed for this study.
  • format_stations.py: subselect, format, and spatially join gauging stations with RiverATLAS river network.
    • Join GRDC stations to nearest river reach in RiverATLAS.
    • Manually check and correct the location of all those GRDC stations that are more than 50 meters or whose reported drainage area differs byh more than 10% from associated river reach.
    • Subset GSIM stations according to the criteria outlined in this website’s tab and the article’s supplementary information.
    • Snap GSIM stations to nearest RiverATLAS river reache within 200 m
    • Manually and correct the location of every GSIM station.
    • Flag all GRDC and GSIM stations within 3 km from a coastline
  • format_FROndeEaudata.py: format on-the-ground visual observations of flow intermittence from the Observatoire National Des Etiages (ONDE) across mainland France (for more explanations of the processing approach, see Section VIB in the Supplementary Information of the article).
    • Spatially join every point (site) in the ONDE network to the nearest river reach in the Carthage river network
    • Manually check and correct the location of all sites (based on site name, ID attribute, initial location)
    • Automatically join river reaches in the Carthage network with ONDE sites to RiverATLAS river network (see detailed process in Supplementary Information of the article)
    • Manually check and correct the location of each ONDE site/association between ONDE sites and RiverATLAS network.
    • Extract site attributes: how far down the RiverATLAS reach the site is as a percentage of the reach length, drainage area and discharge at the ONDE site and at the pourpoint (downstream end) of the corresponding RiverATLAS reach.
  • format_PNWdata.py:
    • Subset points to keep the same as those used by Jaeger et al. (2019) to which we added valid observations before 2004.
    • Spatially join observation points to NHDplus high resolution
    • Extract drainage area for each observation point
    • Spatially join points to closest RiverATLAS river reaches
    • Remove those that are over 500 m away from a RiverATLAS reach, with a drainage area < 10 km2, or that considerably differ in drainage area with nearest river reach.
    • Mannualy check and correct the location of most sites (see criteria in Supplementary Information).
    • Extract site attributes: how far down the RiverATLAS reach the site is as a percentage of the reach length, drainage area and discharge at the site and at the pourpoint (downstream end) of the corresponding RiverATLAS reach.

Workflow summary

Execute:
1. scripts for downloading data in any order
2. format_RiverATLAS.py
3. format_stations.py
4. format_FROndeEaudata.py
5. format_PNWdata.py