Last updated: 2021-01-08

Checks: 7 0

Knit directory: globalIRmap/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200414) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 10e5aa9. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .drake/config/
    Ignored:    .drake/data/
    Ignored:    .drake/drake/
    Ignored:    .drake/keys/
    Ignored:    .drake/scratch/
    Ignored:    renv/library/
    Ignored:    renv/staging/

Untracked files:
    Untracked:  .Rbuildignore
    Untracked:  Compare_models_20201026.Rmd
    Untracked:  Rplots.pdf
    Untracked:  figtabres.docx
    Untracked:  figtabres_20201220_1.docx
    Untracked:  log/
    Untracked:  schema.ini
    Untracked:  shinyapp/globalIRmap_gaugesel/rsconnect/
    Untracked:  tabs_quick.Rmd
    Untracked:  tabs_quick.docx
    Untracked:  tabs_quick.html
    Untracked:  test.html

Unstaged changes:
    Modified:   IntermittentAnalysis_MasterScript_reproduced.R
    Modified:   R/IRmapping_functions.R
    Modified:   R/IRmapping_plan.R
    Modified:   _drake.R
    Modified:   analysis/_site.yml
    Modified:   drake_cache.csv
    Modified:   globalIRmap.Rproj
    Modified:   shinyapp/globalIRmap_gaugesel/app.R
    Modified:   shinyapp/globalIRmap_gaugesel/www/shinygdat.fst

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/methods_refdisdat.Rmd) and HTML (docs/methods_refdisdat.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 10e5aa9 messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
html 844dd6b messamat 2021-01-08 Build site.
Rmd 5d228a8 messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
html 8a77149 messamat 2021-01-08 Build site.
Rmd 97a066a messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
html 47239d1 messamat 2021-01-08 Build site.
Rmd 0b9b22a messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
html 24c68da messamat 2021-01-08 Build site.
Rmd 3647cde messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
Rmd 1da3ce4 messamat 2021-01-08 Continue developing app
html f7011f5 messamat 2021-01-06 Build site.
Rmd bb0df37 messamat 2021-01-06 Start formatting gauge selection for better display
html 9b48bde messamat 2021-01-06 Build site.
Rmd f1d9dcf messamat 2021-01-06 Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation
html f1d9dcf messamat 2021-01-06 Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation

Source datasets

Two streamflow gauging station datasets were used as the source of training and cross-validation data for study models — the World Meteorological Organization Global Runoff Data Centre (GRDC) database (n ≈ 10,000) and a complementary subset of the Global Streamflow Indices and Metadata archive (GSIM, n ≈ 25,000), a compilation of twelve free-to-access national and international streamflow gauging station databases.
Whereas the GRDC offers daily water discharge values for most stations, GSIM only contains time series summary indices computed at the yearly, seasonal and monthly resolution (calculated from daily records whose open-access release is restricted for some of the compiled data sources). Therefore, we used the GRDC database as the core of our training/testing set and complemented it with a subset of streamflow gauging stations from GSIM.

A GSIM station was included only if:
1. it was not already part of the GRDC database,
2. it included auxiliary information on the drainage area of the monitored reach (for reliably associating it to RiverATLAS), and if it was located either
3. on an IRES or
4. in a river basin which did not already contain a GRDC station (assessed based on level 5 sub-basins of the global BasinATLAS database52, average sub-basin area = 2.9 x 104 km2).
We applied the described GSIM selection criteria to balance the relative amount of non-perennial vs. perennial records and the spatial distribution of stations in the model training dataset.

Processing and selecting gauging stations data

Each station in the combined dataset was geographically associated with a reach in the RiverATLAS stream network and every discharge time series was quality-checked through statistical and manual outlier detection (see Supplementary Information B1 for details on these procedures).

Non-perennial gauging stations were only included in the dataset if they were free of anomalous zero-flow values (e.g. from instrument malfunction, gauge freezing, tidal flow reversal). We also excluded stations whose streamflow was potentially dominated by reservoir outflow regulation (i.e. with a degree of regulation > 50% or whose discharge time series exhibited an obvious alteration from natural flow permanence, see Supplementary Information B1), as flow regulating structures may change the flow class of a river either from perennial to non-perennial or vice-versa depending on their mode and rules of operation. We further narrowed our selection by adding only gauging stations with streamflow time series spanning at least 10 years — excluding years with more than 20 days of missing records for the calculation of this criterion and in subsequent analysis. Finally, we classified stations as non-perennial if their recorded discharge dropped to zero at least one day per year on average over the years of record, and as perennial otherwise. Stations with at least one zero-flow day per year on average (i.e. non-perennial) but without a zero-flow day during 20 consecutive valid years of data (those with ≤ 20 missing days), anywhere in their record, were deemed either to have experienced a shift in flow intermittence class (regardless of the direction of the shift) or to have ceased to flow due to exceptional conditions of drought and were also excluded.

Based on these selection criteria, the training dataset contained data for 3,967 perennial river reaches and for 1,388 non-perennial reaches, with 45 and 34 years of daily streamflow data on average, respectively, across all continents (except Antarctica).

Map of gauging stations included or exluded from the analysis

Loading of the map can take several minutes, please wait and read the instructions below

What does this show?

On this map are shown all global gauging stations that were reliably matched to the RiverATLAS global digital river network.

What do the colors mean?

  • Blue-colored points show perennial stations (< 1 zero-flow day per year on average across the record)

  • Red-colored points show non-perennial stations (>= 1 zero-flow day per year on average across the record)

  • Semi-transparent points show stations which were kept and included for the training and testing of predictive models.

  • Opaque points show stations which were excluded from further analysis due to one of the various reasons explained in the previous sections.

How do I use this map?

  • To pan, hold the left button and move the mouse.
  • To zoom in or out, use your mouse wheel (or the equivalent on a pad) or press the + and - symbols on the upper left of the map.
  • To add or remove a layer, hover your mouse over the stacked squares in the lower right of the map and click or unclick the corresponding check box. Any combination of blue-red and opaque-transparent points can be added or removed.
  • To get information on a station, hover with your mouse over the corresponding point:
    • the first line shows which dataset this record comes from (GRDC or GSIM) and the corresponding uniquer identifier for this station.
    • the second line shows whether this station was included in the analysis (“kept”), manually “inspected” for erroneous daily discharge values, or altogether “removed” from further analysis.
    • the third line gives a brief explanation as to why the station was removed.
  • To see a hydrograph for a station, left-click the corresponding point. This feature is only available for stations with at least 10 years of valid data. Two types of graphs may be displayed:
    • For a GRDC station: the graph shows the time series of daily streamflow values for that station, excluding calendar years with more than 20 missing daily records. The y-axis is square-root transformed. Individual points show daily values, blue lines link daily values (which may result in unusual patterns due to missing years), green points are non-zero flow daily values statistically flagged as potential outliers (see flagging process in Supplementary Information B1b), red points are zero-flow flow values, black points are zero-flow values flagged as potential outliers.
    • For a GSIM station: daily streamflow records from GSIM stations are unavailable. Therefore, the graph shows the mean (blue points), ± 2SD (light blue background shading), minimum and maximum monthly discharge (black points), excluding calendar years with more than 20 missing daily record. The y-axis is square-root transformed. Red points show minimum monthly discharge values equal to 0, purple points show months for which all daily discharge values are equal to 0.

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     rstudioapi_0.11  whisker_0.4      knitr_1.29      
 [5] magrittr_1.5     R6_2.4.1         rlang_0.4.7      highr_0.8       
 [9] stringr_1.4.0    tools_4.0.2      xfun_0.17        git2r_0.27.1    
[13] htmltools_0.5.0  ellipsis_0.3.0   rprojroot_1.3-2  yaml_2.2.1      
[17] digest_0.6.25    tibble_3.0.1     lifecycle_0.2.0  crayon_1.3.4    
[21] later_1.0.0      vctrs_0.3.4      promises_1.1.0   fs_1.5.0        
[25] glue_1.4.0       evaluate_0.14    rmarkdown_2.3    stringi_1.4.6   
[29] compiler_4.0.2   pillar_1.4.4     backports_1.1.10 httpuv_1.5.4    
[33] renv_0.9.3       pkgconfig_2.0.3