Last updated: 2021-01-06
Checks: 7 0
Knit directory: globalIRmap/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200414)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version bb0df37. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .drake/config/
Ignored: .drake/data/
Ignored: .drake/drake/
Ignored: .drake/keys/
Ignored: .drake/scratch/
Ignored: renv/library/
Ignored: renv/staging/
Untracked files:
Untracked: .Rbuildignore
Untracked: Compare_models_20201026.Rmd
Untracked: figtabres.docx
Untracked: figtabres_20201220_1.docx
Untracked: log/
Untracked: schema.ini
Untracked: tabs_quick.Rmd
Untracked: tabs_quick.docx
Untracked: tabs_quick.html
Untracked: test.html
Unstaged changes:
Modified: IntermittentAnalysis_MasterScript_reproduced.R
Modified: R/IRmapping_functions.R
Modified: _drake.R
Modified: globalIRmap.Rproj
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/methods_refdisdat.Rmd
) and HTML (docs/methods_refdisdat.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | bb0df37 | messamat | 2021-01-06 | Start formatting gauge selection for better display |
html | 9b48bde | messamat | 2021-01-06 | Build site. |
Rmd | f1d9dcf | messamat | 2021-01-06 | Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation |
html | f1d9dcf | messamat | 2021-01-06 | Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation |
Two streamflow gauging station datasets were used as the source of training and cross-validation data for study models — the World Meteorological Organization Global Runoff Data Centre (GRDC) database (n ≈ 10,000) and a complementary subset of the Global Streamflow Indices and Metadata archive (GSIM, n ≈ 25,000), a compilation of twelve free-to-access national and international streamflow gauging station databases.
Whereas the GRDC offers daily water discharge values for most stations, GSIM only contains time series summary indices computed at the yearly, seasonal and monthly resolution (calculated from daily records whose open-access release is restricted for some of the compiled data sources). Therefore, we used the GRDC database as the core of our training/testing set and complemented it with a subset of streamflow gauging stations from GSIM.
A GSIM station was included only if:
i) it was not already part of the GRDC database,
ii) it included auxiliary information on the drainage area of the monitored reach (for reliably associating it to RiverATLAS), and if it was located either
iii) on an IRES or
iv) in a river basin which did not already contain a GRDC station (assessed based on level 5 sub-basins of the global BasinATLAS database52, average sub-basin area = 2.9 x 104 km2).
We applied the described GSIM selection criteria to balance the relative amount of non-perennial vs. perennial records and the spatial distribution of stations in the model training dataset.
Each station in the combined dataset was geographically associated with a reach in the RiverATLAS stream network and every discharge time series was quality-checked through statistical and manual outlier detection (see Supplementary Information B1 for details on these procedures). Non-perennial gauging stations were only included in the dataset if they were free of anomalous zero-flow values (e.g. from instrument malfunction, gauge freezing, tidal flow reversal). We also excluded stations whose streamflow was potentially dominated by reservoir outflow regulation (i.e. with a degree of regulation > 50% or whose discharge time series exhibited an obvious alteration from natural flow permanence, see Supplementary Information B1), as flow regulating structures may change the flow class of a river either from perennial to non-perennial or vice-versa depending on their mode and rules of operation. We further narrowed our selection by adding only gauging stations with streamflow time series spanning at least 10 years — excluding years with more than 20 days of missing records for the calculation of this criterion and in subsequent analysis. Finally, we classified stations as non-perennial if their recorded discharge dropped to zero at least one day per year on average over the years of record, and as perennial otherwise. Stations with at least one zero-flow day per year on average (i.e. non-perennial) but without a zero-flow day during 20 consecutive valid years of data (those with ≤ 20 missing days), anywhere in their record, were deemed either to have experienced a shift in flow intermittence class (regardless of the direction of the shift) or to have ceased to flow due to exceptional conditions of drought and were also excluded.
Based on these selection criteria, the training dataset contained data for 3,967 perennial river reaches and for 1,388 non-perennial reaches, with 45 and 34 years of daily streamflow data on average, respectively, across all continents (except Antarctica).
Create data.table with gauge id, characteristics, X, Y, reason for removal, and embedded image
in_GRDCgaugestats <- GRDCgaugestats
in_GSIMgaugestats <- GSIMgaugestats
yearthresh = 1800
in_gaugep = gaugep
analyzemerge_gaugeir <- function(in_GRDCgaugestats, in_GSIMgaugestats, yearthresh,
in_gaugep, inp_resdir, plotseries = FALSE) {
### Analyze GSIM data ########################################################
GSIMstatsdt <- rbindlist(in_GSIMgaugestats)
#------ Remove stations with unstable intermittent flow regime
#Remove those which have at least one day per year of zero-flow day but instances
#of no zero-flow day within a 20-year window — except for three gauges that have a slight shift in values but are really IRES
GSIMtoremove_unstableIR <- GSIMstatsdt[
(mDur_o1800 >= 1) & (!movinginter_o1800), gsim_no]
#------ Remove stations based on examination of plots and data series
#Outliers from examining plots of ir time series (those that were commented out were initially considered)
GSIMtoremove_irartifacts <- list(
c('AR_0000014', 'removed', "large gaps in data, changed flow permanence"),
c('AT_0000021', 'removed', "single flow intermittency event, probably gap in data"),
c('AT_0000026', 'removed', "abrupt decrease to 0 flow, probably gaps in data"),
c('AT_0000038', 'removed', "large gaps in data, single flow intermittency event at the end"),
c('AT_0000059', 'removed', "abrupt decrease to 0 flow, probably gaps in data"),
c('BR_0000286', 'removed', "Tapajos river. Impossible that it dries out."),
c('BR_0000557', 'inspected', "Confirmed dry channel visually on satellite imagery"),
c('BR_0000581', 'removed', "Single flow intermittency event at the end"),
c('BR_0000662', 'removed', "flow regulated. unsure when it started, large data gap"),
c('BR_0000664', 'inspected', "change of regime in last few years due to regulation, but originally IRES"),
c('BR_0000706', 'removed', "too many data gaps to determine long-term flow permanence"),
c('BR_0000717', 'removed', "flow regulated, changed from perennial to IRES"),
c('BR_0000778', 'removed', "too many data gaps to determine long-term flow permanence"),
c('BR_0000786', 'removed', "only one flow intermittency event"),
c('BR_0000862', 'removed', "only one flow intermittency event, too many data gaps"),
c('BR_0001011', 'removed', "only one flow intermittency event, too many data gaps"),
c('BR_0001104', 'removed', "only one flow intermittency event at the end"),
c('BR_0001116', 'removed', "seems to have changed flow permanence, many gata gaps"),
c('BR_0001133', 'removed', "changed flow permanence"),
c('CA_0001057', 'removed', "only one flow intermittency event in 28 years"),
c('CA_0003473', 'inspected', "only one flow intermittency event but outlet of natural lake"),
c('CA_0003488', 'removed', "only one flow intermittency event"),
c('CA_0003526', 'removed', "only one flow intermittency event over the winter"),
c('CA_0003544', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000004', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000009', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000010', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000012', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000013', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000004', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000021', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000022', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000026', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000029', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000038', 'removed', "only one flow intermittency event, temporary dewatering by dam"),
c('CN_0000043', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000047', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('CN_0000062', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('ES_0000078', 'removed', "changed flow permanence, first 20 years with 0 flow"),
c('ES_0000087', 'removed', "0 flow seems driven by exceptional drought, not throughout record"),
c('ES_0000388', 'removed', "too many data gaps to determine long-term flow permanence; and looks regulated"),
c('ES_0000444', 'removed', "only one flow intermittency event"),
c('ES_0000525', 'removed', "very discontinued record. Seems to have changed flow permanence"),
c('ES_0000581', 'removed', "appears unreliable"),
c('ES_0000676', 'inspected', "flow regulated, but originally IRES"),
c('ES_0000729', 'removed', "changed flow permanence from perennial to non-perennial"),
c('ES_0000784', 'removed', "same as ES_0000785"),
c('ES_0000785', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('ES_0000786', 'removed', "abrupt decreases to 0 flow, probably gaps in data"),
c('ES_0000794', 'removed', "same as ES_0000832"),
c('ES_0000816', 'removed', "abrupt decreases to 0 flow"),
c('ES_0000818', 'removed', "same as ES_0000785 and ES_0000784, which are not intermittent"),
c('ES_0000856', 'removed', "only one flow intermittency event"),
c('ES_0000892', 'removed', "appears perennial with only one period of flow intermittency, data gaps"),
c('ES_0000906', 'removed', "abrupt decreases to 0 flow"),
c('ES_0000910', 'removed', "only one flow intermittency event"),
c('ES_0000910', 'removed', "only one flow intermittency event"),
c('ES_0000958', 'removed', "only one flow intermittency event, abrupt decrease"),
c('ES_0000986', 'removed', "only one flow intermittency event, abrupt decrease"),
c('ES_0000996', 'removed', "changed flow permanence from perennial to non-perennial"),
c('ES_0001020', 'removed', "too many data gaps to determine long-term flow permanence"),
c('ES_0001052', 'removed', "regulated, changed flow permanence"),
c('ES_0001082', 'removed', "regulated, changed flow permanence"),
c('ES_0001085', 'removed', "only one flow intermittency event at the end"),
c('ES_0001116', 'removed', "only one year of flow intermittency events at the end"),
c('ES_0001162', 'removed', "abrupt decreases to 0 flow"),
c('FI_0000107', 'removed', "lake inlet, maybe just become standing water when high water level"),
c('FI_0000156', 'removed', "no 0 flow for first 60% of record"),
c('FR_0000052', 'removed', "no 0 flow for first 20 years of record"),
c('HU_0000017', 'removed', "only one 0 flow occurence in 25 years"),
c('IN_0000014', 'removed', "no 0 flow occurrence in first 25 years"),
c('IN_0000023', 'removed', "no 0 flow occurrence in first 20 years"),
c('IN_0000045', 'removed', "no 0 flow occurrence in first 20 years"),
c('IN_0000046', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000050', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000063', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000064', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000105', 'removed', "regulated -- no records pre-1968 time of dam building"),
c('IN_0000113', 'removed', "only one 0 flow occurence"),
c('IN_0000124', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000125', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000134', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000159', 'removed', "regulated -- no records pre-1968 time of dam building"),
c('IN_0000170', 'removed', "only one 0 flow occurence"),
c('IN_0000190', 'inspected', "looked regulated but isn't in the end"),
c('IN_0000255', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000105', 'removed', "regulated -- no records pre-1968 time of dam building"),
c('IN_0000280', 'removed', "changed flow permanence from perennial to non-perennial"),
c('IN_0000309', 'inspected', "before building of reservoir in 1988"),
c('IN_0000312', 'removed', "changed flow permanence from perennial to non-perennial"),
c('MZ_0000010', 'removed', "changed flow permanence from perennial to non-perennial"),
c('NA_0000050', 'removed', "unreliable record, data gaps and interpolated values"),
c('NO_0000018', 'removed', "changed flow permanence from perennial to non-perennial"),
c('NO_0000028', 'removed', "changed flow permanence from perennial to non-perennial"),
c('NO_0000030', 'removed', "only one 0 flow occurence"),
c('NO_0000044', 'removed', "0 flow values only at the beginning, probably data gaps"),
c('NO_0000090', 'removed', "0 flow values at the beginning probably data gaps, otherwise only one 0 flow event"),
c('RU_0000189', 'removed', "only one 0 flow occurence"),
c('RU_0000265', 'removed', "changed flow permanence from perennial to non-perennial"),
c('RU_0000358', 'removed', "changed flow permanence from perennial to non-perennial"),
c('SE_0000058', 'removed', "downstream of dam, changed flow permanence. and experiences ice"),
c('TZ_0000018', 'removed', "changed flow permanence from perennial to non-perennial"),
c('US_0005106', 'removed', "probably changed flow permanence from perennial to non-perennial"),
c('US_0005874', 'removed', "changed flow permanence from perennial to non-perennial"),
c('US_0005876', 'removed', "probably changed flow permanence from perennial to non-perennial"),
#'US_0001855', #maybe rounded values --- checked
#'US_0001861', #maybe rounded values --- checked
#'US_0001868', #maybe rounded avlues --- checked
c('US_0002247', 'removed', "regulated, GRDC 4149415 upstream not intermittent"),
c('US_0002248', 'removed', "regulated, just downstream of US_0002247"),
c('US_0002791', 'removed', "on usgs website: close proximity to the Ohio River. During periods of high water on the Ohio River, the computed discharge at this site may be incorrectly displayed due to the backwater effect created."),
#'US_0003591', #maybe rounded values --- checked
# 'US_0003774', #maybe rounded values --- checked
# 'US_0003836', #maybe rounded values --- checked
# 'US_0004023', #maybe rounded values --- checked
# 'US_0004216', #maybe rounded values --- checked
# 'US_0004232', #maybe rounded values --- checked
# 'US_0004658', #maybe rounded values --- checked
c('US_0004773', 'removed', "regulated, no records prior to reservoir buiding"),
c('US_005099', 'inspected', "confirmed dry bed on satellite imagery"),
# 'US_0005161', #maybe rounded values --- checked
# 'US_0005177', #maybe rounded values --- checked
c('US_0005303', 'inspected', "looks fine on usgs website and imagery. just small"),
c('US_0005596', 'inspected', "looks fine on usgs website and imagery. just small"),
c('US_0005597', 'inspected', "looks fine on usgs website and imagery. just small"),
# 'US_0005622', #maybe rounded values --- checked
# 'US_0005623', #maybe rounded values --- checked
# 'US_0005684', #maybe rounded values --- checked
# 'US_0005687', #maybe rounded values --- checked
c('US_0005732', 'removed', "regulated by Lake Arcadia reservoir, changed flow permanence"),
#'US_0005859', #maybe rounded values --- checked
#'US_0005879', #maybe rounded values --- checked
# 'US_0006073', #maybe rounded values --- checked
c('US_0006103', 'removed', "regulated, changed flow permanence"),
# 'US_0006109', #maybe rounded values --- check
# 'US_0006154', #maybe rounded values --- check
c('US_0006155', 'removed', "regulated, did not change flow permanence but same as 0006156"),
c('US_0006156', 'inspected', "regulated, but did not change flow permanence"),
c('US_0006206', 'inspected', "looks fine on usgs website and imagery"),
#'US_0006301', #maybe rounded values --- checked
#'US_0006327', #maybe rounded values --- checked
#'US_0006387', #maybe rounded values --- checked
c('US_0006396', 'removed', "erroneous values pre-1960"),
c('US_0006440', 'removed', "probably changed flow permanence from perennial to non-perennial"),
c('US_0006537', 'removed', "changed flow permanence from perennial to non-perennial"),
c('US_0006574', 'inspected', "suspected regulation, but doesn't seem to be the case"),
#'US_0006975', #maybe rounded values --- checked
#'US_0006984', #maybe rounded values --- checked
#'US_0006985', #maybe rounded values --- checked
#'US_0006986', #maybe rounded values --- checked
c('US_0008607', 'inspected', "regulated since 1978, schanged flow permanence from perennial to non-perennial"),
c('US_0008687', 'inspected', "regulated since 1935, schanged flow permanence from perennial to non-perennial"),
# 'US_0008726', #maybe rounded values --- checked
# 'US_0008779', #maybe rounded values --- checked
c('ZA_0000074', 'removed', "abrupt decrease to 0 flow, probably gaps in data"),
c('ZA_0000268', 'removed', "probably changed flow permanence"),
c('ZA_0000084', 'removed', "probably changed flow permanence"),
c('ZA_0000270', 'removed', "changed flow permanence")
) %>%
do.call(rbind, .) %>%
as.data.table %>%
setnames(c('gsim_no', 'flag', 'comment'))
# readformatGSIMmon(
# GSIMstatsdt[gsim_no == 'US_0000546', path]) %>%
# .[MIN != 0, min(MIN)]
# sort(GSIMtoremove_unstableIR)
# 'ES_0000806' %in% GSIMstatsdt$gsim_no
# GSIMstatsdt[gsim_no == 'ES_0000818', ]
#----- Check flags in winter IR for GSIM
wintergaugesall_GSIM <- plot_winterir(
dt = GSIMstatsdt, dbname = 'gsim', inp_resdir = inp_resdir,
yearthresh = 1800, plotseries = plotseries)
#Check suspicious canadian ones
GSIMwintermeta <- in_gaugep[in_gaugep$gsim_no %in% wintergaugesall_GSIM$gsim_no,]
canadians_toinspect <- in_gaugep[in_gaugep$gsim_no %in%
paste0('CA_000', c(3469, 3473, 3526, 3544, 6082, 6122)),]$reference_no
# if (!dir.exists(hy_dir())) download_hydat()
# cancheck <- lapply(canadians_toinspect, function(refno) {
# merge(hy_daily(station_number = refno),
# hy_stn_regulation(station_number = refno),
# by='STATION_NUMBER') %>%
# setDT
# }) %>%
# rbindlist
# cancheck[REGULATED==T, unique(STATION_NUMBER)] #No regulated station
# cancheck[Value==0, .N, by=.(STATION_NUMBER, Symbol)]
#E - Estimate: no measured data available for the day or missing period,
# and the water level or streamflow value was estimated by an indirect method
#A - Partial Day: daily mean value was estimated despite gaps of more than
# 120 minutes in the data string or missing data not significant enough to
# warrant the use of the E symbol.
#B - Ice conditions: value was estimated with consideration for the presence
# of ice in the stream. Ice conditions alter the open water relationship
# between water levels and streamflow.
#D - Dry: stream or lake is "dry" or that there is no water at the gauge.
# This symbol is used for water level data only.
#R - Revised: The symbol R indicates that a revision, correction or addition
# ` has been made to the historical discharge database after January 1, 1989.
# ggplot(cancheck[Value > 0, ], aes(x=Date, y=Value, color=Symbol)) +
# geom_vline(data=cancheck[is.na(Value),], aes(xintercept = Date), color='grey', alpha=1/4) +
# geom_point(alpha=1/6) +
# geom_point(data=cancheck[Value==0,]) +
# facet_wrap(~STATION_NUMBER, scales='free') +
# theme_classic()
#Remove 06NB002, 06AF001 — CA_0003544, CA_0003473
#Check others 'CN_0000047', 'NO_0000018', 'RU_0000089',
#'RU_0000391', 'RU_0000393', 'RU_00000395', 'RU_0000436',
#'RU_0000470', 'US_0008687')
check <- readformatGSIMmon(GSIMstatsdt[gsim_no == 'US_0008687',path])
#Remove
GSIMtoremove_winterIR <- list(
c('CA_0003473', 'removed', "sudden peak — unsure about estimated discharge under ice conditions"),
c('CN_0000047', 'removed', "Anomalous change from near 0 discharge to 150 m3/s, no explanation"),
c('RU_0000391', 'removed', "Stopped recording during the winter the last ~10 years. Maybe questionable winter data"),
c('RU_0000393', 'removed', "Didn't record during the winter for the first 20 years. Maybe questionable winter data"),
c('RU_0000436', 'removed', "Didn't record during the winter for the first 20 years. Maybe questionable winter data"),
c('RU_0000470', 'removed', "Didn't record during the winter for the first 20 years. Maybe questionable winter data")
) %>%
do.call(rbind, .) %>%
as.data.table %>%
setnames(c('gsim_no', 'flag', 'comment'))
#----- Check flags in coastal IR for GSIM
GSImcoastalirall <- plot_coastalir(in_gaugep = in_gaugep, dt = GSIMstatsdt,
dbname = 'gsim', inp_resdir = inp_resdir,
yearthresh = 1800, plotseries = plotseries)
#Already removed suspect ones
### Analyze GRDC data ####################################
GRDCstatsdt <- rbindlist(in_GRDCgaugestats)
#Remove all gauges with 0 values that have at least 99% of integer values as not reliable (see GRDC_NO 6140700 as example)
GRDCtoremove_allinteger <- GRDCstatsdt[integerperc_o1800 >= 0.99 &
intermittent_o1800 == 1, GRDC_NO]
#Remove those which have at least one day per year of zero-flow day but instances
#of no zero-flow day within a 20-year window — except for three gauges that have a slight shift in values but are really IRES
GRDCtoremove_unstableIR <- GRDCstatsdt[
(mDur_o1800 >= 1) & (!movinginter_o1800) &
!(GRDC_NO %in% c(1160115, 1160245, 4146400)), GRDC_NO]
#Outliers from examining plots of ir time series (those that were commented out were initially considered)
GRDCtoremove_o1800_irartifacts <- list(
c(1134300, 'removed', 'changed flow permanence from perennial to non-perennial, large data gaps'),
c(1159830, 'removed', 'only one occurence of 0 flow values'),
c(1134300, 'removed', 'changed flow permanence from perennial to non-perennial, large data gaps'),
c(1160785, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1160800, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1160850, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1160881, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1196141, 'removed', "doesn't look reliable, hard to assess long term flow permanence"),
c(1199410, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1259500, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1428500, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1434200, 'removed', 'almost all integers'),
c(1434300, 'removed', 'almost all integers'),
c(1491815, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(1591110, 'removed', "doesn't look reliable, changed flow permanence from perennial to non-perennial"),
c(3652050, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(3652200, 'removed', 'changed flow permanence from perennial to non-perennial'),
c(4208372, 'removed', 'abrupt decreases to 0 flow, probably data gaps'),
c(4208655, 'removed', 'insufficient data to tell flow permanence'),
c(4208855, 'removed', 'insufficient data to tell flow permanence'),
c(4208857, 'removed', 'only 1 occurrence of 0 flow values'),
c(4213566, 'removed', 'only 1 occurrence of 0 flow values'),
c(4214297, 'removed', 'only 1 occurrence of 0 flow values'),
c(4214298, 'removed', 'only 1 occurrence of 0 flow values'),
c(4769200, 'removed', 'only 1 occurrence of 0 flow values'),
c(5204170, 'removed', 'changed flow permanence'),
c(5204170, 'removed', 'changed flow permanence'),
c(5405095, 'removed', 'changed flow permanence'),
c(5708200, 'removed', 'changed flow permanence')
) %>%
do.call(rbind, .) %>%
as.data.table %>%
setnames(c('gsim_no', 'flag', 'comment'))
GRDCtoremove_o1961_irartifacts <- list(
c(1134500, 'removed', 'only 1 occurrence of 0 flow values'),
c(1159302, 'removed', 'abrupt decreases to 0 flow values'),
c(1159302, 'removed', 'unreliable record, isolated 0s, sudden jumps and capped at 77'),
c(1159320, 'inspected', "0 values for the first 14 years but still apparently originally IRES"),
c(1159325, 'inspected', "0 values for most record. probably episodic and due to series of agricultural ponds"),
c(1159510, 'inspected', "0 values for most record, on same segment as 1159511 but seems unreliable"),
c(1159520, 'inspected', "values seem capped after 1968, otherwise seem fine. Could just be rating curve"),
c(1160101, 'removed', 'abrupt decreases to 0 flow values'),
c(1160340, 'removed', 'abrupt decreases to 0 flow values'),
c(1160378, 'inspected', 'appears IRES before regulation, reservoir fully dry on satellite imagery, so keep as intermittent even when not regulated'),
c(1160420, 'removed', 'decrease to 0 appears a bit abrupt but ok'),
c(1160435, 'removed', 'unreliable record, abrupt decreases to 0, capped'),
c(1160470, 'removed', 'unreliable record, probably change of rating curve in 1947, mostly missing data until 1980 but truly intermittent based on imagery'),
c(1160540, 'inspected', 'only 0 - nodata for first 15 years. seemingly good data post 1979 and IRES'),
c(1160635, 'inspected', '0 values post-1985 seem abrupt but enough values otherwise to make it intermittent'),
c(1160670, 'removed', 'regulated'),
c(1160675, 'removed', 'some outliers but otherwise most 0 values seem believable'),
c(1160780, 'removed', 'unreliable record, abrupt decreases to 0 flow values, large data gaps'),
c(1160795, 'removed', 'abrupt decreases to 0 flow values'),
c(1160840, 'removed', 'only 2 zero flow values are believable, others are outliers'),
c(1160850, 'removed', 'changed flow permanence'),
c(1160880, 'removed', 'unreliable record. Tugela river, perennial'),
c(1160900, 'removed', 'most 0 values look like outliers, abrupt decreases'),
c(1160911, 'removed', 'most 0 values look like outliers, abrupt decreases'),
c(1160971, 'removed', 'most 0 values look like outliers, abrupt decreases'),
c(1160975, 'removed', 'most 0 values look like outliers, abrupt decreases'),
c(1196102, 'removed', 'unreliable record, large data gaps, hard to tell original flow permanence'),
c(1196160, 'inspected', 'some outlying 0 flow values but most are good'),
c(1197500, 'removed', 'only one flow intermittency event, abrupt decrease to 0'),
c(1197540, 'removed', 'abrupt decreases to 0'),
c(1197591, 'removed', 'abrupt decreases to 0'),
c(1197700, 'removed', 'abrupt decreases to 0'),
c(1197740, 'removed', 'some outlying 0 flow values but most are good'),
c(1199100, 'removed', 'most 0 values look like outliers'),
c(1199200, 'removed', 'abrupt decreases to 0'),
c(1199410, 'inspected', 'some outlying 0 flow values but most are good'),
c(1259800, 'removed', '0 values come from integer-based part of the record'),
c(1286690, 'removed', 'changed flow permanence, record too short to determine original flow permanence'),
c(1259800, 'removed', 'changed flow permanence, only one 0 flow value post 1963'),
c(1428400, 'removed', '0 values come from integer-based part of the record'),
c(1434810, 'removed', '0 values come from integer-based part of the record'),
c(1491870, 'removed', 'some outlying 0 flow values but most are good'),
c(1494100, 'removed', 'abrupt decreases to 0'),
c(1494100, 'inspected', 'regulated but naturally intermittent'),
c(1591730, 'removed', 'abrupt decreases to 0'),
c(1733600, 'removed', '0 values come from integer-based part of the record and outliers'),
c(1837410, 'removed', 'abrupt decreases to 0'),
c(1837430, 'inspected', 'nearly same as 1837410. Naturally intermittent before dam'),
c(1897550, 'removed', 'abrupt decreases to 0'),
c(1898501, 'removed', 'abrupt decreases to 0'),
c(1992400, 'removed', 'most 0 values look like outliers'),
c(2588500, 'removed', 'abrupt decreases to 0'),
c(2588551, 'removed', 'abrupt decreases to 0'),
c(2588630, 'removed', 'abrupt decreases to 0'),
c(2588640, 'removed', 'abrupt decreases to 0'),
c(2588708, 'removed', 'abrupt decreases to 0'),
c(2588820, 'removed', 'abrupt decreases to 0'),
c(2589230, 'removed', 'abrupt decreases to 0'),
c(2589370, 'removed', 'abrupt decreases to 0'),
c(2591801, 'removed', 'abrupt decreases to 0'),
c(2694450, 'removed', 'abrupt decreases to 0'),
c(2969081, 'removed', 'abrupt decreases to 0'),
c(2999920, 'removed', '0 values come from integer-based part of the record'),
c(3650460, 'inspected', 'some outlying 0 flow values but most are good'),
c(3650470, 'removed', '0 values come from integer-based part of the record'),
c(3650610, 'removed', 'integers pre-1960s but still intermittent after'),
c(3650640, 'removed', 'abrupt decreases to 0'),
c(3650649, 'inspected', 'change of flow regime due to dam building but intermittent before'),
c(3650690, 'inspected', 'abrupt decreases to 0'),
c(3650928, 'removed', 'most 0 values look like outliers'),
c(3652135, 'removed', 'only one valid 0-flow event'),
###################### TO CONTINUE FORMATTING FOR DISPLAY ######################################
3844460, #Remove
#4101451, #station downstream also has 0s
4103700, #zero values are outliers
#4113304, #Not super representative of current regime
#4150605, #just downstream of lwesville dam in Dallas. previously intermittent as well but will be removed anyways as >50% dor
#4151513, #Looks regulated but will be removed as > 50% regulated #################### good example for showing effect of regulation
4208195, #0s stem from interpolation
#4213540, #one outlier, rest is good although occurred only once in 1965..
#4213675, #looks funny but just downstream of natural lake
4213905, #Regulated. hence the intermittency
4214075, ##Remove
4234300, #remove. regulated
4351710, #Remove. 0s due to outlier and integer-based values
4355500, #Remove. outlier values. regulated
4357510, #Remove
4773050, #look erroneous (going from 140L/s to 0 in one day. must be a rating curve issue)
5101020, #Remove
5101101, #Remove
5101130, #Remove
5101201, #Remove
#5101290, #valkues before 2000 aren't great. But 0 values post 2000 are believable
5101305, #most 0 values are wrong
5101380, #Remove
5109200, #Remove, 0 values at beginning, interpolations used, not reliable
5109230, #Remove
5202140, #Remove
5202145, #Most 0s look erroneous
5202228, #Regulated? Remove
5302251, #Remove
5302261, #Remove
#5405046, #canal through adelaide - sturt river
#5405095, #looks wierd but intermittent it is
5608100, #Remove
5803160, #Remove
5864500, #Remove. rounded to 10L/s
5870100, #remove, rounded to 100L/S
6119100, #Remove, rounded to 10L/s
#6128220, #Weird. occurred only once but seems believable
6442300, ##Remove, perfect example of what an integer-based record involves
6444250, #Remove
6444350, #Remove
6444400, #Check, abrupt 0 values
6935570 #Remove, rounded to 10L/s
) %>%
do.call(rbind, .) %>%
as.data.table %>%
setnames(c('gsim_no', 'flag', 'comment'))
#plotGRDCtimeseries(GRDCstatsdt[GRDC_NO == 4101450,])
#### Check intermittent record
# checkno <- 6444400 #GRDC_NO
# check <- checkGRDCzeroes( #Check area around 0 values
# GRDCstatsdt, in_GRDC_NO=checkno, period=15, yearthresh=1800,
# maxgap=20, in_scales='free', labelvals = F)
# checkno %in% GRDCtoremove_allinteger #Check whether all integers
# in_gaugep[in_gaugep$GRDC_NO==checkno & !is.na(in_gaugep$GRDC_NO), "dor_pc_pva"] #check DOR
# GRDCstatsdt[GRDC_NO == checkno, integerperc_o1800] #Check % integers
#Outliers from examining plots of perennial time series (those that were commented out were initially considered)
#Try to find those:
# whose low flow plateaus could be 0s
# whose perennial character is dam-driven or maybe irrigation driven (changed from IR to perennial but hard to find)
# whose missing data are actually 0s
# whose quality is too low to be reliable
GRDCtoremove_o1800_pereartifacts <- c(
1159800, #look regulated
#1160302, #looked regulated -- but nothing obvious
4101200, #low flows may be zeros as they plateau unless all integers
4118850, #low flows plateau
4125903, #too regulated
4126351, #looks too regulated (as far as the record goes)
4148850 #unsure, missing data have lots of zeros
)
GRDCtoremove_o1961_pereartifacts <- c(
1160331, #remove- Lower plateaus are likely overestimated 0 values
#1593100 #checked - clearly not intermittent but really bad quality
1593751, #remove - missing values seem to contain intermittency
1599111, #remove - maybe going down to 0 in missing data
1160788, #remove - looks like plateauing at 0 but rating curve is off
1899100, #remove-maybe intermittent now. intermittent last year of record (missing gap)
3628200, #remove- lower values may be 0s but rating curve is off
3652030, #remove- 0s in missing years and low flows in other years may also be 0s
4115225, #remove -looks like it became regulated and may have otherwised become intermittent
4151801, #remove - regulated and may otherwise go dry Rio Grande
4152651, #remove - regulated by blue mesa reservoir, may have been intermittent otherwise ######### good example
4208610, #remove - too much missing data but if not would be intermittent ###################good example of that
4213055, #remove- too much missing data but if not would be intermittent
4213802, #remove - identical to 4213801
4214320, #remove - lower values may be 0s and in missing years
#4231620, #check -- maybe regulated but would probably otherwise be perennial
4362100, #remove = lower values may be 0s and in missing years
5606090, #remove - lower values may be 0s
56064140, #remove - lower values may be 0s
6123630, #remove- lower values may be 0s
6233410, #remove - looks erroneous
6335020, #remove - looks identical to 633060
6335050, #remove - looks identical to 6335060
6337503, #remove -looks heavily regulated. cannot tell whether may have been intermittent before
6442100, #remove - identical to 6442600
6935146, #remove - looks identical to 6935145
6972350, #remove
6935600 #remove - identical to 6935145
)
#---------- Check flags in winter IR
plot_winterir(dt = GRDCstatsdt, dbname = 'grdc', inp_resdir = inp_resdir,
yearthresh = 1800, plotseries = plotseries)
#Checked for seemingly anomalous 0s. Sudden decreases.
#Check for flags, check satellite imagery, station name, check for construction of reservoir
#If no way to explain, remove or if caused by reservoir/dam that is not in GranD
#4220310 and 4243610, just downstream of dams that are in GranD — should be taken in account
GRDCtoremove_winterIR <- c('2588640', #Sudden shift
'2589230',#Sudden shift
'4213540',#Sudden shift
'4214075',#Sudden shift
'6401800' #Just downstream of a reservoir that is not in GranD
)
#------ Check time series of stations within 3 km of seawater
GRDCcoastalirall <- plot_coastalir(in_gaugep = in_gaugep, dt = GRDCstatsdt,
dbname = 'grdc', inp_resdir = inp_resdir,
yearthresh = 1800, plotseries = plotseries)
GRDCcoastalirall[, unique(readformatGRDC(path)$Flag), by=GRDC_NO]
#Nothing obviously suspect beyond those that ad already been flagged
#Inspect statistics for 4208857, 4213531 as no flow days occurred only one year
# ID = '6976300'
# GRDCstatsdt[GRDC_NO == ID,]
# check <- readformatGRDC(GRDCstatsdt[GRDC_NO == ID,path])
# unique(check$Flag)
#
# plotGRDCtimeseries(GRDCstatsdt[GRDC_NO == ID,], outpath=NULL)
#------ Remove stations with unstable intermittent flow regime
#Before cleaning
GRDCtoremove_all <- unique(c(GRDCtoremove_allinteger,
GRDCtoremove_o1800_irartifacts,
GRDCtoremove_o1800_pereartifacts,
GRDCtoremove_o1961_irartifacts,
GRDCtoremove_o1961_pereartifacts,
GRDCtoremove_winterIR,
GRDCtoremove_unstableIR))
GRDCstatsdt[intermittent_o1800 == 1 & totalYears_kept_o1800 >= 10, .N]
GRDCstatsdt[intermittent_o1800 == 1 & totalYears_kept_o1800 >= 10 &
!(GRDC_NO %in% GRDCtoremove_all), .N]
### Check changes in GSIM discharge data availability and flow regime over time ####
GSIMstatsdt_clean <- GSIMstatsdt[!(gsim_no %in% c(GSIMtoremove_o1961_irartifacts,
GSIMtoremove_o1800_irartifacts,
GSIMtoremove_coastalIR,
GSIMtoremove_winterIR,
GSIMtoremove_unstableIR)),]
mvars <- c('intermittent_o1800',
'intermittent_o1961',
'intermittent_o1971')
alluv_formatGSIM <- melt(GSIMstatsdt_clean,
id.vars = c('gsim_no',
paste0('totalYears_kept_o',
c(1800, 1961, 1971))),
measure.vars = mvars) %>%
.[totalYears_kept_o1800 < 10 & variable %in% mvars, value := NA] %>%
.[totalYears_kept_o1961 < 10 & variable %in% mvars[2:3], value := NA] %>%
.[totalYears_kept_o1971 < 10 & variable %in% mvars[3], value := NA] %>%
.[, count := .N, by=.(variable, value)]
### Check changes in GRDC discharge data availability and flow regime over time ####
GRDCstatsdt_clean <- GRDCstatsdt[!(GRDC_NO %in% GRDCtoremove_all),]
alluv_formatGRDC <- melt(GRDCstatsdt_clean,
id.vars = c('GRDC_NO',
paste0('totalYears_kept_o',
c(1800,1961, 1971))),
measure.vars = mvars) %>%
.[totalYears_kept_o1800 < 10 & variable %in% mvars, value := NA] %>%
.[totalYears_kept_o1961 < 10 & variable %in% mvars[2:3], value := NA] %>%
.[totalYears_kept_o1971 < 10 & variable %in% mvars[3], value := NA] %>%
.[, count := .N, by=.(variable, value)]
###Analyze change in number of gauges with different intermittency criterion
irsensi_format <- melt(rbind(GRDCstatsdt_clean, GSIMstatsdt_clean,
use.names=TRUE, fill=T)[totalYears_kept_o1961 >= 10,],
id.vars = c('GRDC_NO', 'gsim_no'),
measure.vars = paste0('mDur_o', c(1800, 1961, 1971))) %>%
.[!is.na(value) & value >0,] %>%
setorder(variable, -value) %>%
.[, cumcount := seq(.N), by=.(variable, is.na(GRDC_NO))]
ggirsensi <- ggplot(irsensi_format, aes(x=value, y=cumcount,
color=variable, linetype=is.na(GRDC_NO))) +
geom_line(size=1.1) +
coord_cartesian(expand=0, clip='off') +
scale_x_sqrt(breaks=c(1, 5, 10, 30, 90, 180, 365),
labels=c(1, 5, 10, 30, 90, 180, 365)) +
geom_vline(xintercept=c(1, 5)) +
annotate(geom='text', x=c(1.7,6.5), y=150, angle=90,
label=c(sum(irsensi_format[value==1 & variable=='mDur_o1800',
max(cumcount), by=is.na(GRDC_NO)]$V1),
sum(irsensi_format[value==5 & variable=='mDur_o1800',
max(cumcount), by=is.na(GRDC_NO)]$V1))) +
theme_classic()
plots <- grid.arrange(
ggalluvium_gaugecount(dtformat = alluv_formatGRDC, alluvvar = 'GRDC_NO'),
ggalluvium_gaugecount(dtformat = alluv_formatGSIM, alluvvar = 'gsim_no'),
ggirsensi
)
### Bind GRDC and GSIM records ####################################
databound <- rbind(GRDCstatsdt_clean,
GSIMstatsdt_clean,
use.names=TRUE, fill=T)
return(list(plots=plots, data=databound))
}
Create a map with all gauges and their characteristics
#Render image based on directory of graphs in github server <- function(input, output, session) { # Send a pre-rendered image, and don’t delete the image after sending it output\(preImage <- renderImage({ # When input\)n is 3, filename is ./images/image3.jpeg filename <- normalizePath(file.path(‘./images’, paste(‘image’, input$n, ‘.jpeg’, sep=’’)))
# Return a list containing the filename and alt text
list(src = filename,
alt = paste("Image number", input$n))
}, deleteFile = FALSE) }
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats graphics grDevices datasets utils methods
[8] base
other attached packages:
[1] visNetwork_2.0.9 tidyhydat_0.5.0
[3] raster_3.3-13 quantreg_5.67
[5] SparseM_1.78 profvis_0.3.6
[7] tictoc_1.0 stringr_1.4.0
[9] sf_0.9-6 scales_1.1.0
[11] rnaturalearthdata_0.1.0 rnaturalearth_0.1.0
[13] rgeos_0.5-5 rgdal_1.5-16
[15] sp_1.4-2 reprex_0.3.0
[17] rbin_0.2.0 ranger_0.12.1
[19] qs_0.23.2 poweRlaw_0.70.6
[21] plyr_1.8.6 patchwork_1.1.0
[23] paradox_0.4.0 olsrr_0.5.3
[25] mvtnorm_1.1-1 mlr3viz_0.3.0
[27] mlr3tuning_0.3.0 mlr3pipelines_0.3.0
[29] mlr3learners_0.3.0 mlr3_0.6.0
[31] mgcv_1.8-33 nlme_3.1-147
[33] maps_3.3.0 lubridate_1.7.9
[35] kableExtra_1.2.1 gstat_2.0-6
[37] gridExtra_2.3 ggpubr_0.4.0
[39] ggnewscale_0.4.3 ggalluvial_0.12.2
[41] ggplot2_3.3.0 gdalUtils_2.0.3.2
[43] furrr_0.1.0 future.callr_0.5.0
[45] future.apply_1.6.0 future_1.18.0
[47] drake_7.12.5 dplyr_1.0.2
[49] data.table_1.13.2 cowplot_1.0.0
[51] mandrake_1.0.0 facetscales_0.1.0.9000
[53] bigstatsr_1.2.3 edarf_1.1.1
[55] mlr3learners.partykit_0.2.1.9000 mlr3spatiotempcv_0.0.0.9004
[57] ggplotify_0.0.5 rprojroot_1.3-2
[59] workflowr_1.6.2
loaded via a namespace (and not attached):
[1] R.utils_2.9.2 tidyselect_1.1.0 htmlwidgets_1.5.1
[4] mlr3misc_0.5.0 munsell_0.5.0 mmpf_0.0.5
[7] base64url_1.4 units_0.6-7 codetools_0.2-16
[10] bbotk_0.2.1 withr_2.2.0 colorspace_1.4-1
[13] filelock_1.0.2 knitr_1.29 uuid_0.1-4
[16] rstudioapi_0.11 ggsignif_0.6.0 listenv_0.8.0
[19] git2r_0.27.1 lgr_0.3.4 txtq_0.2.3
[22] vctrs_0.3.4 generics_0.0.2 xfun_0.17
[25] R6_2.4.1 bigassertr_0.1.3 doParallel_1.0.15
[28] gridGraphics_0.5-1 assertthat_0.2.1 promises_1.1.0
[31] gtable_0.3.0 globals_0.12.5 conquer_1.0.2
[34] processx_3.4.2 goftest_1.2-2 MatrixModels_0.4-1
[37] rlang_0.4.7 flock_0.7 splines_4.0.2
[40] rstatix_0.6.0 broom_0.5.6 checkmate_2.0.0
[43] BiocManager_1.30.10 yaml_2.2.1 bigparallelr_0.2.3
[46] abind_1.4-5 backports_1.1.10 httpuv_1.5.4
[49] tools_4.0.2 ellipsis_0.3.0 Rcpp_1.0.4.6
[52] progress_1.2.2 classInt_0.4-3 purrr_0.3.4
[55] ps_1.3.3 prettyunits_1.1.1 zoo_1.8-8
[58] haven_2.3.1 fs_1.5.0 magrittr_1.5
[61] openxlsx_4.1.5 spacetime_1.2-3 whisker_0.4
[64] storr_1.2.1 matrixStats_0.56.0 stringfish_0.14.2
[67] hms_0.5.3 evaluate_0.14 rio_0.5.16
[70] readxl_1.3.1 compiler_4.0.2 tibble_3.0.1
[73] KernSmooth_2.23-16 crayon_1.3.4 R.oo_1.23.0
[76] htmltools_0.4.0 later_1.0.0 tidyr_1.0.3
[79] RcppParallel_5.0.2 DBI_1.1.0 RApiSerialize_0.1.0
[82] Matrix_1.2-18 car_3.0-9 cli_2.0.2
[85] R.methodsS3_1.8.0 parallel_4.0.2 igraph_1.2.5
[88] forcats_0.5.0 pkgconfig_2.0.3 rvcheck_0.1.8
[91] foreign_0.8-78 xml2_1.3.2 roxygen2_7.1.1
[94] foreach_1.5.0 webshot_0.5.2 rvest_0.3.6
[97] callr_3.4.3 digest_0.6.25 pracma_2.2.9
[100] rmarkdown_2.3 cellranger_1.1.0 intervals_0.15.2
[103] nortest_1.0-4 curl_4.3 jsonlite_1.6.1
[106] lifecycle_0.2.0 carData_3.0-4 viridisLite_0.3.0
[109] fansi_0.4.1 pillar_1.4.4 lattice_0.20-41
[112] httr_1.4.2 glue_1.4.0 xts_0.12-0
[115] zip_2.1.1 FNN_1.1.3 iterators_1.0.12
[118] class_7.3-16 stringi_1.4.6 renv_0.9.3
[121] e1071_1.7-3