Last updated: 2019-03-19
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20190319)
The command set.seed(20190319)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 64ac134
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: vignettes/data/sst_ALL_clim_only.Rdata
Ignored: vignettes/data/sst_ALL_event_aov_tukey.Rdata
Ignored: vignettes/data/sst_ALL_flat.Rdata
Ignored: vignettes/data/sst_ALL_miss.Rdata
Ignored: vignettes/data/sst_ALL_miss_cat_chi.Rdata
Ignored: vignettes/data/sst_ALL_miss_clim_event_cat.Rdata
Ignored: vignettes/data/sst_ALL_miss_clim_only.Rdata
Ignored: vignettes/data/sst_ALL_repl.Rdata
Ignored: vignettes/data/sst_ALL_smooth.Rdata
Ignored: vignettes/data/sst_ALL_smooth_aov_tukey.Rdata
Ignored: vignettes/data/sst_ALL_smooth_event.Rdata
Ignored: vignettes/data/sst_ALL_trend.Rdata
Ignored: vignettes/data/sst_ALL_trend_clim_event_cat.Rdata
Untracked files:
Untracked: analysis/bibliography.bib
Untracked: analysis/data/
Untracked: code/functions.R
Untracked: code/workflow.R
Untracked: data/sst_WA.csv
Untracked: docs/figure/
Unstaged changes:
Modified: NEWS.md
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 64ac134 | robwschlegel | 2019-03-19 | Publish analysis files |
How does one create the best climatology? What if the time series is shorter than the recommended 30 years?
Lets look at all the SACTN data that’s longer than 30 years first. Can each site’s annual signal be represented by a sine/cosine curve derived from a Fourier analysis? First I find all the sites within the South African Coastal Temperature Network that are approximately 30 years or more in duration.
# load("~/SACTNraw/data/4_products/SACTN_daily_v4.2.RData")
load("/Volumes/GoogleDrive/My Drive/SACTN/SACTNraw/data/4_products/SACTN_daily_v4.2.RData")
daily.in <- SACTN_daily_v4.2
rm(SACTN_daily_v4.2)
long_sites <- daily.in %>%
group_by(index) %>%
summarise(n = n()) %>%
filter(n >= 365 * 30) %>%
ungroup()
Now, for each site that has ≥30 years of data, I create a daily climatology by taking the mean for each Julian day, using all data across the full duration of each time series.
daily <- daily.in %>%
filter(index %in% long_sites$index) %>%
mutate(index = as.character(index)) %>%
# droplevels() %>%
group_by(index) %>%
mutate(doy = yday(date)) %>%
group_by(index, doy) %>%
summarise(temp = mean(temp, na.rm = TRUE)) %>%
ungroup()
I make a function that will fit a Fourier series with seven basis functions (3 sine, 3 cosine and the constant) to the data. This is the same approach I used to create the smooth climatology in “Climatologies and baseline” elsewhere on this site, and it is also the one favoured in the creation of the daily Optimally Interpolated Sea Surface Temperature climatology (Banzon et al. 2014). Here it is wrapped in a function so I can easily apply it to each time series.
fFun <- function(data) {
require(fda)
b7 <- create.fourier.basis(rangeval = range(data$doy), nbasis = 7)
# create smooth seasonal climatology
b7.smth <- smooth.basis(argvals = data$doy, y = data$temp, fdParobj = b7)$fd
data.smth <- eval.fd(data$doy, b7.smth)
return(data.smth)
}
# currently this function fails when the 1st of January is an NA
# which sites have NAs?
excl.sites <- c(daily[is.na(daily$temp), 1])$index
daily.nest <- daily %>%
filter(!index %in% excl.sites) %>%
nest(-index)
daily.smth <- daily.nest %>%
mutate(smoothed = map(data, fFun)) %>%
unnest(data, smoothed)
plt1 <- daily.smth %>%
ggplot(aes(x = doy, y = temp)) +
geom_point(shape = "+", colour = "red", alpha = 0.6, size = 1) +
geom_line(aes(y = smoothed, group = index), size = 0.6, colour = "black") +
facet_wrap(~index) +
labs(x = "Julian day", y = "Temperature (°C)", title = "SACTN: daily climatology",
subtitle = "Fourier series: 7 basis functions")
ggsave(plt1, file = "fig/SACTN_Fourier7.png", width = 12, height = 9, dpi = 120)
There are instances when we might need to create climatologies when we don’t have access to time series of 30 years or longer. To look at this problem, I created a quick resampling analysis to see what the effect of time series length is on the resultant climatology. In this instance I used a Fourier analysis (see Banzon et al., 2014) to calculate six modes within an annual cycle, and used these to reconstruct a smooth climatology. This technique is quite forgiving of SST seasonalities that depart from typical sinusoidal profiles, such as which we find along upwelling areas (I have tested this on our coastal seawater temperatures).
I used the Western Australian data that’s packaged with the python and R marine heat wave scripts. Later I’ll test this more widely on other time series. The WA time series is 32 years long. I treat all temperatures over the duration of the time series as a pool of data, separately for each day-of-year (doy); i.e. doy 1 has 32 temperatures, doy 2 also has 32 temperatures, etc. I assume that the temperature on doy 1 of 1982 is independent of that on doy 1 in every other year (etc.), so the sample of doy 1 temperatures is therefore independent and identically distributed (i.i.d.). I further assume that there’s no autocorrelation from day-to-day (definitely erroneous, but I don’t think it matters here).
For each day’s pool of samples I then randomly take 10 samples and find the mean of that. This is the same as averaging over 10 years to obtain the mean for each of the 365 days (one year). Note that I did not apply the 11 day sliding window as per the heat wave function (windowHalfWidth), and rather opted to treat each doy independently as per Banzon et al. (2014). I repeat this 100 times, and so I effectively have estimates of a 100 annual cycles. I then do the same with 20 samples and 30 samples taken from the pool of available samples for each doy.
In total I now have 300 annual cycles: 100 of them representing a 10-year long times series, 100 representing a 20-year long time series, and 100 for a 30 year time series. To each of them, separately, I then fit the smooth climatology assembled from their Fourier components.
The attached plots show the outcome. The three panels represent 10, 20 and 30 year long time series. In Figure 2, each of the 100 daily time series is plotted as a line graph. In Figure 3, the red crosses show, for each doy, each of the 100 mean temperatures that I obtained. The black lines are of course the smooth climatologies–there is also 100 of these lines on each of the three panels.
So, using resampling and a Fourier analysis we can get away with using shorter time series. I can see later what happens if I shrink even further the time series down to only five years long. I will also test how well the python and R heatwave scripts’ internal climatology functions (windowHalfWidth and smoothPercentile) are able to produce reliable climatologies from short time series.
library(heatwaveR)
sst.in <- as_tibble(heatwaveR::sst_WA)
sst <- sst.in %>%
mutate(doy = yday(as.Date(t))) %>%
nest(-doy)
meanFun <- function(data) {
m <- mean(data$temp, na.rm = TRUE)
return(m)
}
sstRepl <- function(sst) {
sst.sampled <- sst %>%
mutate(sample_10 = map(data, sample_n, 10, replace = TRUE),
sample_20 = map(data, sample_n, 20, replace = TRUE),
sample_30 = map(data, sample_n, 30, replace = TRUE)) %>%
mutate(sample_10_m = map(sample_10, meanFun),
sample_20_m = map(sample_20, meanFun),
sample_30_m = map(sample_30, meanFun)) %>%
select(-data, -sample_10, -sample_20, -sample_30) %>%
unnest(sample_10_m, sample_20_m, sample_30_m)
return(sst.sampled)
}
library(purrr)
sst.repl <- purrr::rerun(100, sstRepl(sst)) %>%
map_df(as.data.frame, .id = "rep") %>%
gather(key = "ts.len", value = "temp", -rep, -doy)
plt2 <- ggplot(data = sst.repl, aes(x = doy, y = temp)) +
geom_line(aes(group = rep), alpha = 0.3, size = 0.1) +
facet_wrap(~ts.len, nrow = 3) +
labs(x = "Julian day", y = "SST (°C)", title = "Raw daily climatology",
subtitle = "100 daily climatological means")
ggsave(plt2, file = "fig/WA_100_simul.png", width = 12, height = 6, dpi = 120)
sst.repl.nest <- sst.repl %>%
nest(-rep, -ts.len)
sst.repl.smth <- sst.repl.nest %>%
mutate(smoothed = map(data, fFun)) %>%
unnest(data, smoothed)
plt3 <- ggplot(data = sst.repl.smth, aes(x = doy, y = smoothed)) +
geom_point(aes(y = temp), shape = "+", colour = "red", alpha = 0.1, size = 1) +
geom_line(aes(group = rep), alpha = 0.3, size = 0.1) +
facet_wrap(~ts.len, nrow = 3) +
labs(x = "Julian day", y = "SST (°C)", title = "WA SST: daily climatology",
subtitle = "Fourier series: 7 basis functions")
ggsave(plt3, file = "fig/WA_Fourier7.png", width = 12, height = 6, dpi = 120)
The heatwaveR climatology tool creates the mean and 90th percentiles (threshold) for all the data within a sliding window with certain window width (by default 11 days centred on a day-of-year), and then this is followed by a running mean smoother for both the mean and the threshold. Using the approach I created for the Fourier series, lets do the same here.
sstRepl2 <- function(sst) {
sst.sampled <- sst %>%
mutate(sample_10 = map(data, sample_n, 10, replace = TRUE),
sample_20 = map(data, sample_n, 20, replace = TRUE),
sample_30 = map(data, sample_n, 30, replace = TRUE))
return(sst.sampled)
}
sst.repl2 <- purrr::rerun(100, sstRepl2(sst)) %>%
map_df(as.data.frame, .id = "rep")
parseDates <- function(data, rep_col, len) {
parsed <- data %>%
mutate(id = rep_col,
y = year(t),
m = month(t),
d = day(t)) %>%
group_by(rep, doy) %>%
mutate(y = seq(1983, by = 1, len = len)) %>%
mutate(t = ymd(paste(y, m, d, sep = "-"))) %>%
select(-y, -m, -d) %>%
na.omit() # because of complications due to leap years
return(parsed)
}
sample_10 <- sst.repl2 %>%
unnest(sample_10) %>%
parseDates("sample_10", len = 10) %>%
group_by(id, rep) %>%
nest() %>%
mutate(smoothed = map(data, function(x) ts2clm(x, climatologyPeriod = c("1983-01-01", "1992-12-31")))) %>%
unnest(smoothed)
sample_20 <- sst.repl2 %>%
unnest(sample_20) %>%
parseDates("sample_20", len = 20) %>%
group_by(id, rep) %>%
nest() %>%
mutate(smoothed = map(data, function(x) ts2clm(x, climatologyPeriod = c("1983-01-01", "2002-12-31")))) %>%
unnest(smoothed)
sample_30 <- sst.repl2 %>%
unnest(sample_30) %>%
parseDates("sample_30", len = 30) %>%
group_by(id, rep) %>%
nest() %>%
mutate(smoothed = map(data, function(x) ts2clm(x, climatologyPeriod = c("1983-01-01", "2012-12-31")))) %>%
unnest(smoothed)
sst.repl2.smth <- bind_rows(sample_10, sample_20, sample_30)
plt4 <- sst.repl2.smth %>%
ggplot(aes(x = doy, y = temp)) +
# geom_point(shape = "+", colour = "red", alpha = 0.1, size = 1) +
geom_line(data = filter(sst.repl2.smth, t <= "1984-01-01"), aes(y = seas, group = rep), alpha = 0.3, size = 0.1) +
facet_wrap(~id, nrow = 3) +
labs(x = "Julian day", y = "SST (°C)", title = "WA SST: daily climatology",
subtitle = "MHW climatology, default settings")
ggsave(plt4, file = "fig/rand_mat_smoothed_2.png", width = 12, height = 6, dpi = 120)
Banzon, Viva F., Richard W. Reynolds, Diane Stokes, and Yan Xue. 2014. “A 1/4°-Spatial-resolution daily sea surface temperature climatology based on a blended satellite and in situ analysis.” Journal of Climate 27 (21): 8221–8. doi:10.1175/JCLI-D-14-00293.1.
sessionInfo()
This reproducible R Markdown analysis was created with workflowr 1.1.1