Last updated: 2019-03-19
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20190319)
The command set.seed(20190319)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 64ac134
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: vignettes/data/sst_ALL_clim_only.Rdata
Ignored: vignettes/data/sst_ALL_event_aov_tukey.Rdata
Ignored: vignettes/data/sst_ALL_flat.Rdata
Ignored: vignettes/data/sst_ALL_miss.Rdata
Ignored: vignettes/data/sst_ALL_miss_cat_chi.Rdata
Ignored: vignettes/data/sst_ALL_miss_clim_event_cat.Rdata
Ignored: vignettes/data/sst_ALL_miss_clim_only.Rdata
Ignored: vignettes/data/sst_ALL_repl.Rdata
Ignored: vignettes/data/sst_ALL_smooth.Rdata
Ignored: vignettes/data/sst_ALL_smooth_aov_tukey.Rdata
Ignored: vignettes/data/sst_ALL_smooth_event.Rdata
Ignored: vignettes/data/sst_ALL_trend.Rdata
Ignored: vignettes/data/sst_ALL_trend_clim_event_cat.Rdata
Untracked files:
Untracked: analysis/bibliography.bib
Untracked: analysis/data/
Untracked: code/functions.R
Untracked: code/workflow.R
Untracked: data/sst_WA.csv
Untracked: docs/figure/
Unstaged changes:
Modified: NEWS.md
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 64ac134 | robwschlegel | 2019-03-19 | Publish analysis files |
A time series of temperatures taken at a specific locality is a sample of the thermal environment. Because it is a sample, which is used to approximate the true state of the environment, the representivity of the data is subject to certain limitations that have an influence on their accuracy and precision—in other words, on how well the data represent reality. Here we will not concern ourselves with matters of precision and accuracy; rather, we shall deal with ways to produce climatologies from a time series of daily thermal measurements.
A climatological mean is a multi-year average over a base period of typically 30 years duration. Sometimes the climatology is also called a ‘climate normal.’ It predicts the temperatures over an annual cycle that will most likely be experienced at a particular place and time.
Another concept that we use here instead of a climatology is a baseline…
Central to the functioning of the heatwaveR detect algorithm is the climatology, which is the reference against which the extremes are calculated. For historical reasons—which also turn out to make good practical sense—it has become standard practice to base a climatology on 30 years’ worth of data, updated at the start of each new decade (the most recent being 1981-2010). For the heatwaveR package we recommend the same. Because heat wave detection is based on daily data, a daily climatology is required. That is, the mean value for each of the days in a 365 (non-leap) or 366 (leap) day long year. In the case of gridded data, we also require that the climatology is at the same spatial resolution as the SST time series within which the extreme events are sought. In the case of the dOISST v2. data, it is independently calculated at each of the 1/4° grid cells, and for station data it must be representative of that station and preferably also obtained with the same instrument that yielded the seawater temperature time series entering the event detection algorithm (in fact, the algorithm will by default use the daily time series to derive the climatology).
In a daily climatology, there will be one mean temperature for each day of the year (365 or 366 days, depending on how leap years are dealt with). How the daily mean is calculated can vary somewhat from application to application. The detect()
function applies a window centred on a sliding Julian day for the pooling of values that will form the basis for calculation of the climatology and threshold percentile. We use a default of 5 days, which gives a window width of 11 days centred on the 6th day of the series of 11 days. For example, the mean temperature for Julian day 6 will be comprised of all temperatures measured from Julian days 1 to 11, across 30 years, i.e. 330 temperature measurements are summarised in this climatological daily mean. A sliding window centred on Julian day 7 will comprise all temperatures measured on Julian days 2 to 12 over the 30 years; etc. Currently we do not provide the option to calculate a weighted mean; rather, each day receives the same weighting as any other day, irrespective of how far the day is from the one on which the sliding window is centred. Climatologies produced with sliding windows of width 1, 11, 31, and 61 days are presented in Figure 1. In the figure we see that the wider the sliding window becomes the less day-to-day noise is retained in the climatology.
library(tidyverse) # misc. data processing conveniences
library(lubridate) # for working with dates
library(heatwaveR)
# Using a built-in data set
# smoothPercentile = FALSE
res.1.F <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"),
windowHalfWidth = 0, smoothPercentile = F, clmOnly = TRUE)
res.11.F <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"),
smoothPercentile = F, clmOnly = TRUE)
res.31.F <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"),
windowHalfWidth = 15, smoothPercentile = F, clmOnly = TRUE)
res.61.F <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"),
windowHalfWidth = 30, smoothPercentile = F, clmOnly = TRUE)
res.1.F <- res.1.F %>%
mutate(width = "1")
res.11.F <- res.11.F %>%
mutate(width = "11")
res.31.F <- res.31.F %>%
mutate(width = "31")
res.61.F <- res.61.F %>%
mutate(width = "61")
no_smooth <- rbind(res.1.F, res.11.F, res.31.F, res.61.F)
plt1 <- ggplot(data = no_smooth,
aes(x = doy, y = seas)) +
geom_line(aes(colour = width, size = width)) +
geom_line(aes(y = thresh, colour = width, size = width)) +
scale_colour_manual(name = "Width",
values = c("grey50", "orange", "purple", "black")) +
scale_size_manual(values = c(0.4, 0.4, 0.4, 0.8), guide = FALSE) +
labs(x = "Julian day", y = "SST (°C)", title = "Annual cycle climatology and threshold",
subtitle = "1983-01-01 to 2012-12-31 reference period; smooth off") +
theme_publ()
ggsave(plt1, file = "climat_1.png", width = 6, height = 2.8, dpi = 120)
To produce smoother climatologies, we can additionally enable a moving average smoother. The result is shown in Figure 3.
#smoothPercentile = TRUE
res.11.T <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"), clmOnly = TRUE)
res.31.T <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"), clmOnly = TRUE, windowHalfWidth = 15)
res.61.T <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"), clmOnly = TRUE, windowHalfWidth = 30)
res.11.T <- res.11.T %>%
mutate(width = "11")
res.31.T <- res.31.T %>%
mutate(width = "31")
res.61.T <- res.61.T %>%
mutate(width = "61")
smooth <- rbind(res.11.T, res.31.T, res.61.T)
plt2 <- ggplot(data = smooth,
aes(x = doy, y = seas)) +
geom_line(aes(colour = width, size = width)) +
geom_line(aes(y = thresh, colour = width, size = width)) +
scale_colour_manual(name = "Width",
values = c("orange", "purple", "black")) +
scale_size_manual(values = c(0.4, 0.4, 0.8), guide = FALSE) +
labs(x = "Julian day", y = "SST (°C)", title = "Annual cycle climatology and threshold",
subtitle = "1983-01-01 to 2012-12-31 reference period; smooth on") +
theme_publ()
ggsave(plt2, file = "climat_2.png", width = 6, height = 2.8, dpi = 120)
library(fda)
clm <- res.1.F %>%
as_tibble
b7 <- create.fourier.basis(rangeval = range(clm$doy), nbasis = 7)
# plot(b7)
# b7.val <- as_tibble(eval.basis(clm$doy, basisobj = b7))
# b7.val <- cbind(doy = clm$doy, b7.val) %>%
# gather(key = "basis", value = "value", -doy) %>%
# as_tibble()
# create smooth seasonal climatology
b7.seas.smth <- smooth.basis(argvals = clm$doy, y = clm$seas, fdParobj = b7)$fd
clm.seas.smth <- eval.fd(clm$doy, b7.seas.smth)
# create smooth threshold
b7.thresh.smth <- smooth.basis(argvals = clm$doy, y = clm$thresh, fdParobj = b7)$fd
clm.thresh.smth <- eval.fd(clm$doy, b7.thresh.smth)
temps <- as_tibble(data.frame(doy = clm$doy,
seas.unsmth_1 = clm$seas,
seas.smth_11 = res.11.T$seas,
clim.fourier_7 = clm.seas.smth,
thresh.unsmth_1 = clm$thresh,
thresh.smth_11 = res.11.T$thresh,
thresh.fourier_7 = clm.thresh.smth))
library(forcats)
plt3 <- temps %>%
gather(key = "type", value = "temperature", -doy) %>%
ggplot(aes(x = doy, y = temperature)) +
geom_line(aes(colour = fct_inorder(type), size = fct_inorder(type))) +
scale_colour_manual(name = "Smooth",
values = c("grey50", "purple", "black", "grey50", "purple", "black"),
labels = c("1, unsmooth", "11, smooth", "Fourier"),
breaks = c("seas.unsmth_1", "seas.smth_11", "clim.fourier_7")) +
scale_size_manual(values = c(0.4, 0.4, 0.8, 0.4, 0.4, 0.8), guide = FALSE) +
labs(x = "Julian day", y = "SST (°C)", title = "Annual cycle climatology and threshold",
subtitle = "1983-01-01 to 2012-12-31 reference period") +
theme_publ()
ggsave(plt3, file = "climat_3.png", width = 6, height = 2.8, dpi = 120)
# plot(b7.seas.smth[1:7])
# b7.PCA = pca.fd(b7.seas.smth, nharm=6)
# library(TSA)
# # compute the Fourier Transform
# p <- periodogram(y)
# dd <- data.frame(freq = p$freq, spec = p$spec)
# order <- dd[order(-dd$spec), ]
# top6 <- head(order, 6)
# # display the 6 highest "power" frequencies
# top6
# # convert frequency to time periods
# time <- 1/top6$f
# time
# fft(y)
# spectrum(y)
# require(graphics)
# spec.y <- spec.pgram(y)
# plot(spec.y, plot.type = "coherency")
# plot(spec.y, plot.type = "phase")
# library(GeneCycle)
# GeneCycle::periodogram(y)
Should one want to use a different baseline to calculate the climatology against which the events should be detected, this is now possible.
# Again, using the built-in time series
# The Hobday et al. (2016) default definition applied above is available in
# `res.11.T`; we repeat it her and return the original time series with the
# climatology so as to have data in the specified format as required by the
# detect function, `detect_event()`:
res.11.T.full <- ts2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"), clmOnly = FALSE)
# Calculate an anomaly:
daily.dat <- res.11.T.full %>%
mutate(anom = temp - mean(temp, na.rm = TRUE),
num = seq(1:length(temp)))
# Let us remove the long-term non-linear trend from the WA anomaly data using a
# generalised additive model; first, define the GAM:
library(mgcv)
library(broom)
mod.gam <- gam(temp ~ s(num, k = 20), data = daily.dat)
daily.dat2 <- augment(mod.gam)
daily.dat2 <- cbind(daily.dat2, daily.dat[, c("doy", "t", "anom")])
raw.clim <- ts2clm(daily.dat2, x = t, y = anom,
climatologyPeriod = c("1983-01-01", "2012-12-31"), clmOnly = FALSE)
detrended.clim <- ts2clm(daily.dat2, x = t, y = .resid,
climatologyPeriod = c("1983-01-01", "2012-12-31"), clmOnly = FALSE)
# Plot the two climatologies (raw vs. detrended):
plt4 <- ggplot(data = filter(raw.clim, t < "1983-01-01"), aes(x = doy, y = seas)) +
geom_line(colour = "black") +
geom_line(data = filter(detrended.clim, t < "1983-01-01"),
colour = "red", alpha = 0.6) +
theme_publ()
ggsave(plt4, file = "climat_4.png", width = 12, height = 6, dpi = 120)
# ... there is a tiny, almost imperceptable difference between the two...
# Calculate events in the original WA SST (anomaly) relative to a climatology
# created from the detrended WA SST data:
raw.res <- detect_event(raw.clim, x = t, y = anom,
seasClim = seas,
threshClim = thresh)
detrended.res <- detect_event(raw.clim, x = t, y = anom,
seasClim = detrended.clim$seas,
threshClim = detrended.clim$thresh)
raw.res
detrended.res
The events that get detected are somewhat different now. Notice that in detrended.clim
, seas
and thresh
are unfortunately the ones that were calculated together with raw (anomaly) time series, not the seas
and thresh
that were actually used and included against climSeas and climThresh. We will fix that.
If one would rather supply a pre-calculated climatology, and not calculate one from a baseline, this is also now possible.
# Pull out the climatologies from the built in time series
# ts1_clim <- detect_evets2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"))$clim %>%
# dplyr::select(doy, seas, thresh, var_clim_year) %>%
# base::unique() %>%
# dplyr::arrange(doy)
# ts2_clim <- detect_evets2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"))$clim %>%
# dplyr::select(doy, seas, thresh, var_clim_year) %>%
# base::unique() %>%
# dplyr::arrange(doy)
# ts3_clim <- detect_evets2clm(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"))$clim %>%
# dplyr::select(doy, seas, thresh, var_clim_year) %>%
# base::unique() %>%
# dplyr::arrange(doy)
#
# # Calculate events in one time series with a climatology from another
# res_clim_diff <- detect_event(ts1, alt_clim = TRUE, alt_clim_data = ts2_clim)
Note how the climatologies are the same below:
# head(ts2_clim)
# head(res_clim_diff$clim)
Should the provided climatology be to low, the function will return a warning letting the user know this.
# # Calculate events in one time series with a climatology from another
# res_clim_diff <- detect_event(ts1, alt_clim = TRUE, alt_clim_data = ts3_clim)
sessionInfo()
This reproducible R Markdown analysis was created with workflowr 1.1.1