Last updated: 2021-01-08

Checks: 7 0

Knit directory: globalIRmap/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200414) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 0b9b22a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .drake/config/
    Ignored:    .drake/data/
    Ignored:    .drake/drake/
    Ignored:    .drake/keys/
    Ignored:    .drake/scratch/
    Ignored:    renv/library/
    Ignored:    renv/staging/

Untracked files:
    Untracked:  .Rbuildignore
    Untracked:  Compare_models_20201026.Rmd
    Untracked:  Rplots.pdf
    Untracked:  figtabres.docx
    Untracked:  figtabres_20201220_1.docx
    Untracked:  log/
    Untracked:  schema.ini
    Untracked:  shinyapp/globalIRmap_gaugesel/rsconnect/
    Untracked:  tabs_quick.Rmd
    Untracked:  tabs_quick.docx
    Untracked:  tabs_quick.html
    Untracked:  test.html

Unstaged changes:
    Modified:   IntermittentAnalysis_MasterScript_reproduced.R
    Modified:   R/IRmapping_functions.R
    Modified:   R/IRmapping_plan.R
    Modified:   _drake.R
    Modified:   drake_cache.csv
    Modified:   globalIRmap.Rproj
    Modified:   shinyapp/globalIRmap_gaugesel/app.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/methods_refdisdat.Rmd) and HTML (docs/methods_refdisdat.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 0b9b22a messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
html 24c68da messamat 2021-01-08 Build site.
Rmd 3647cde messamat 2021-01-08 wflow_publish(“analysis/methods_refdisdat.Rmd”)
Rmd 1da3ce4 messamat 2021-01-08 Continue developing app
html f7011f5 messamat 2021-01-06 Build site.
Rmd bb0df37 messamat 2021-01-06 Start formatting gauge selection for better display
html 9b48bde messamat 2021-01-06 Build site.
Rmd f1d9dcf messamat 2021-01-06 Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation
html f1d9dcf messamat 2021-01-06 Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation

Source datasets

Two streamflow gauging station datasets were used as the source of training and cross-validation data for study models — the World Meteorological Organization Global Runoff Data Centre (GRDC) database (n ≈ 10,000) and a complementary subset of the Global Streamflow Indices and Metadata archive (GSIM, n ≈ 25,000), a compilation of twelve free-to-access national and international streamflow gauging station databases.
Whereas the GRDC offers daily water discharge values for most stations, GSIM only contains time series summary indices computed at the yearly, seasonal and monthly resolution (calculated from daily records whose open-access release is restricted for some of the compiled data sources). Therefore, we used the GRDC database as the core of our training/testing set and complemented it with a subset of streamflow gauging stations from GSIM.

A GSIM station was included only if:
i) it was not already part of the GRDC database,
ii) it included auxiliary information on the drainage area of the monitored reach (for reliably associating it to RiverATLAS), and if it was located either
iii) on an IRES or
iv) in a river basin which did not already contain a GRDC station (assessed based on level 5 sub-basins of the global BasinATLAS database52, average sub-basin area = 2.9 x 104 km2).
We applied the described GSIM selection criteria to balance the relative amount of non-perennial vs. perennial records and the spatial distribution of stations in the model training dataset.

Each station in the combined dataset was geographically associated with a reach in the RiverATLAS stream network and every discharge time series was quality-checked through statistical and manual outlier detection (see Supplementary Information B1 for details on these procedures). Non-perennial gauging stations were only included in the dataset if they were free of anomalous zero-flow values (e.g. from instrument malfunction, gauge freezing, tidal flow reversal). We also excluded stations whose streamflow was potentially dominated by reservoir outflow regulation (i.e. with a degree of regulation > 50% or whose discharge time series exhibited an obvious alteration from natural flow permanence, see Supplementary Information B1), as flow regulating structures may change the flow class of a river either from perennial to non-perennial or vice-versa depending on their mode and rules of operation. We further narrowed our selection by adding only gauging stations with streamflow time series spanning at least 10 years — excluding years with more than 20 days of missing records for the calculation of this criterion and in subsequent analysis. Finally, we classified stations as non-perennial if their recorded discharge dropped to zero at least one day per year on average over the years of record, and as perennial otherwise. Stations with at least one zero-flow day per year on average (i.e. non-perennial) but without a zero-flow day during 20 consecutive valid years of data (those with ≤ 20 missing days), anywhere in their record, were deemed either to have experienced a shift in flow intermittence class (regardless of the direction of the shift) or to have ceased to flow due to exceptional conditions of drought and were also excluded.

Based on these selection criteria, the training dataset contained data for 3,967 perennial river reaches and for 1,388 non-perennial reaches, with 45 and 34 years of daily streamflow data on average, respectively, across all continents (except Antarctica).

Map of selected and discarded gauges


sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     rstudioapi_0.11  whisker_0.4      knitr_1.29      
 [5] magrittr_1.5     R6_2.4.1         rlang_0.4.7      highr_0.8       
 [9] stringr_1.4.0    tools_4.0.2      xfun_0.17        git2r_0.27.1    
[13] htmltools_0.5.0  ellipsis_0.3.0   rprojroot_1.3-2  yaml_2.2.1      
[17] digest_0.6.25    tibble_3.0.1     lifecycle_0.2.0  crayon_1.3.4    
[21] later_1.0.0      vctrs_0.3.4      promises_1.1.0   fs_1.5.0        
[25] glue_1.4.0       evaluate_0.14    rmarkdown_2.3    stringi_1.4.6   
[29] compiler_4.0.2   pillar_1.4.4     backports_1.1.10 httpuv_1.5.4    
[33] renv_0.9.3       pkgconfig_2.0.3