Last updated: 2019-08-07
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20190513)
The command set.seed(20190513)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 9d81722
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/ALL_anom.Rda
Ignored: data/ALL_clim.Rda
Ignored: data/ERA5_lhf.Rda
Ignored: data/ERA5_lwr.Rda
Ignored: data/ERA5_qnet.Rda
Ignored: data/ERA5_qnet_anom.Rda
Ignored: data/ERA5_qnet_clim.Rda
Ignored: data/ERA5_shf.Rda
Ignored: data/ERA5_swr.Rda
Ignored: data/ERA5_t2m.Rda
Ignored: data/ERA5_t2m_anom.Rda
Ignored: data/ERA5_t2m_clim.Rda
Ignored: data/ERA5_u.Rda
Ignored: data/ERA5_u_anom.Rda
Ignored: data/ERA5_u_clim.Rda
Ignored: data/ERA5_v.Rda
Ignored: data/ERA5_v_anom.Rda
Ignored: data/ERA5_v_clim.Rda
Ignored: data/GLORYS_mld.Rda
Ignored: data/GLORYS_mld_anom.Rda
Ignored: data/GLORYS_mld_clim.Rda
Ignored: data/GLORYS_u.Rda
Ignored: data/GLORYS_u_anom.Rda
Ignored: data/GLORYS_u_clim.Rda
Ignored: data/GLORYS_v.Rda
Ignored: data/GLORYS_v_anom.Rda
Ignored: data/GLORYS_v_clim.Rda
Ignored: data/NAPA_clim_U.Rda
Ignored: data/NAPA_clim_V.Rda
Ignored: data/NAPA_clim_W.Rda
Ignored: data/NAPA_clim_emp_ice.Rda
Ignored: data/NAPA_clim_emp_oce.Rda
Ignored: data/NAPA_clim_fmmflx.Rda
Ignored: data/NAPA_clim_mldkz5.Rda
Ignored: data/NAPA_clim_mldr10_1.Rda
Ignored: data/NAPA_clim_qemp_oce.Rda
Ignored: data/NAPA_clim_qla_oce.Rda
Ignored: data/NAPA_clim_qns.Rda
Ignored: data/NAPA_clim_qsb_oce.Rda
Ignored: data/NAPA_clim_qt.Rda
Ignored: data/NAPA_clim_runoffs.Rda
Ignored: data/NAPA_clim_ssh.Rda
Ignored: data/NAPA_clim_sss.Rda
Ignored: data/NAPA_clim_sst.Rda
Ignored: data/NAPA_clim_taum.Rda
Ignored: data/NAPA_clim_vars.Rda
Ignored: data/NAPA_clim_vecs.Rda
Ignored: data/OAFlux.Rda
Ignored: data/OISST_sst.Rda
Ignored: data/OISST_sst_anom.Rda
Ignored: data/OISST_sst_clim.Rda
Ignored: data/node_mean_all_anom.Rda
Ignored: data/packet_all.Rda
Ignored: data/packet_all_anom.Rda
Ignored: data/packet_nolab.Rda
Ignored: data/packet_nolab14.Rda
Ignored: data/packet_nolabgsl.Rda
Ignored: data/packet_nolabmod.Rda
Ignored: data/som_all.Rda
Ignored: data/som_all_anom.Rda
Ignored: data/som_nolab.Rda
Ignored: data/som_nolab14.Rda
Ignored: data/som_nolab_16.Rda
Ignored: data/som_nolab_9.Rda
Ignored: data/som_nolabgsl.Rda
Ignored: data/som_nolabmod.Rda
Ignored: data/synoptic_states.Rda
Ignored: data/synoptic_vec_states.Rda
Unstaged changes:
Modified: code/workflow.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | ed626bf | robwschlegel | 2019-08-07 | Ran a bunch of figures and had a meeting with Eric. More changes coming to GLORYS data tomorrow before settling on one of the experimental SOMs |
html | f66aa38 | robwschlegel | 2019-08-01 | Build site. |
Rmd | 5e12d9e | robwschlegel | 2019-08-01 | Re-publish entire site. |
Rmd | 9a9fa7d | robwschlegel | 2019-08-01 | A more in depth dive into the potential criteria to meet for the SOM model |
Rmd | 240a7a0 | robwschlegel | 2019-07-31 | Ran the base SOM results |
html | aa82e6e | robwschlegel | 2019-07-31 | Build site. |
Rmd | 498909b | robwschlegel | 2019-07-31 | Re-publish entire site. |
html | 35987b4 | robwschlegel | 2019-07-09 | Build site. |
Rmd | 34efa43 | robwschlegel | 2019-07-09 | Added some thinking to the SOM vignette. |
html | e2f6f42 | robwschlegel | 2019-07-09 | Build site. |
Rmd | 609cca8 | robwschlegel | 2019-07-09 | Added some thinking to the SOM vignette. |
html | 81e961d | robwschlegel | 2019-07-09 | Build site. |
Rmd | 7ff9b8b | robwschlegel | 2019-06-17 | More work on the talk |
Rmd | b25762e | robwschlegel | 2019-06-12 | More work on figures |
Rmd | 413bb8b | robwschlegel | 2019-06-12 | Working on pixel interpolation |
html | c23c50b | robwschlegel | 2019-06-10 | Build site. |
html | 028d3cc | robwschlegel | 2019-06-10 | Build site. |
Rmd | c6b3c7b | robwschlegel | 2019-06-10 | Re-publish entire site. |
Rmd | 1b53eeb | robwschlegel | 2019-06-10 | SOM packet pipeline testing |
Rmd | 4504e12 | robwschlegel | 2019-06-07 | Working on joining in vector data |
html | c61a15f | robwschlegel | 2019-06-06 | Build site. |
Rmd | 44ac335 | robwschlegel | 2019-06-06 | Working on inclusion of vectors into SOM pipeline |
html | 6dd6da8 | robwschlegel | 2019-06-06 | Build site. |
Rmd | 07137d9 | robwschlegel | 2019-06-06 | Site wide update, including newly functioning SOM pipeline. |
Rmd | 990693a | robwschlegel | 2019-06-05 | First SOM result visuals |
Rmd | 25e7e9a | robwschlegel | 2019-06-05 | SOM pipeline nearly finished |
Rmd | 4838cc8 | robwschlegel | 2019-06-04 | Working on SOM functions |
Rmd | 94ce8f6 | robwschlegel | 2019-06-04 | Functions for creating data packets are up and running |
Rmd | 65301ed | robwschlegel | 2019-05-30 | Push before getting rid of some testing structure |
html | c09b4f7 | robwschlegel | 2019-05-24 | Build site. |
Rmd | 5dc8bd9 | robwschlegel | 2019-05-24 | Finished initial creation of SST prep vignette. |
html | a29be6b | robwschlegel | 2019-05-13 | Build site. |
html | ea61999 | robwschlegel | 2019-05-13 | Build site. |
Rmd | f8f28b1 | robwschlegel | 2019-05-13 | Skeleton files |
This vignette contains the code used to perform the self-organising map (SOM) analysis on the mean synoptic states created in the Variable preparation vignette. We’ll start by creating custom packets that meet certain experimental criteria before then feeding them into a SOM. We will finish up by creating some cursory visuals of the results. The full summary of the results may be seen in the Node summary vignette.
# Insatll from GitHub
# .libPaths(c("~/R-packages", .libPaths()))
# devtools::install_github("fabrice-rossi/yasomi")
# Packages used in this vignette
library(jsonlite, lib.loc = "../R-packages/")
library(tidyverse) # Base suite of functions
library(lubridate) # For convenient date manipulation
library(yasomi, lib.loc = "../R-packages/") # The SOM package of choice due to PCI compliance
library(data.table) # For working with massive dataframes
# Set number of cores
doMC::registerDoMC(cores = 50)
# Disable scientific notation for numeric values
# I just find it annoying
options(scipen = 999)
# Set number of cores
doMC::registerDoMC(cores = 50)
# Disable scientific notation for numeric values
# I just find it annoying
options(scipen = 999)
# Individual regions
NWA_coords <- readRDS("data/NWA_coords_cabot.Rda")
# Corners of the study area
NWA_corners <- readRDS("data/NWA_corners.Rda")
# The base map
map_base <- ggplot2::fortify(maps::map(fill = TRUE, col = "grey80", plot = FALSE)) %>%
dplyr::rename(lon = long) %>%
mutate(group = ifelse(lon > 180, group+9999, group),
lon = ifelse(lon > 180, lon-360, lon)) %>%
select(-region, -subregion)
# MHW results
OISST_region_MHW <- readRDS("data/OISST_region_MHW.Rda")
# MHW Events
OISST_MHW_event <- OISST_region_MHW %>%
select(-cats) %>%
unnest(events) %>%
filter(row_number() %% 2 == 0) %>%
unnest(events)
# MHW Categories
suppressWarnings( # Don't need warning about different names for events
OISST_MHW_cats <- OISST_region_MHW %>%
select(-events) %>%
unnest(cats)
)
In this last stage before running our SOM analyses we will create data packets that can be fed directly into the SOM algorithm. These data packets will vary based on the exclusion of certain regions in the study area. In the first run of this analysis on the NAPA model data it was found that the inclusion of the Labrador Sea complicated the results quite a bit. It is also unclear whether or not the Gulf of St Lawrence region should be included in the analysis. While creating whatever packets we desire we will also be converting them into the super-wide matrix format that the SOM model desires.
Up first we must simply load and unnest the synoptic state packets made previously.
# Load the synoptic states data packet
system.time(
synoptic_states <- readRDS("data/synoptic_states.Rda")
) # 3 seconds
# Unnest the synoptic data
system.time(
synoptic_states_unnest <- synoptic_states %>%
select(region, event_no, synoptic) %>%
unnest()
) # 8 seconds
With all of our data ready we may now trim them as we see fit before saving them for the SOM.
# The study area size when the Labrador region is excluded
NWA_coords_nolab <- NWA_coords %>%
filter(region != "ls")
# The study area size when the Labrador and GSL regions are excluded
NWA_coords_nolabgsl <- NWA_coords %>%
filter(!region %in% c("ls", "gsl"))
# Test visuals of reduced study areas
# synoptic_states[1,] %>%
# unnest() %>%
# filter(lat <= round(max(NWA_coords_nolabgsl$lat))+0.5) %>%
# ggplot(aes(x = lon, y = lat)) +
# geom_raster(aes(fill = sst_anom)) +
# geom_polygon(data = NWA_coords_nolabgsl, aes(colour = region), fill = NA)
# Function for casting wide the custom packets
create_packet <- function(df){
# Cast the data to a single row
res <- data.table::data.table(df) %>%
reshape2::melt(id = c("region", "event_no", "lon", "lat"),
measure = c(colnames(.)[-c(1:4)]),
variable.name = "var", value.name = "val") %>%
dplyr::arrange(var, lon, lat) %>%
unite(coords, c(lon, lat, var), sep = "BBB") %>%
unite(event_ID, c(region, event_no), sep = "BBB") %>%
reshape2::dcast(event_ID ~ coords, value.var = "val")
# Remove columns (pixels) with missing data
res_fix <- res[,colSums(is.na(res))<1]
return(res_fix)
}
# Packet for entire study region
system.time(
packet_all <- create_packet(synoptic_states_unnest)
) # 185 seconds
# saveRDS(packet_all, "data/packet_all.Rda")
# Exclude Labrador region
system.time(
packet_nolab <- synoptic_states_unnest %>%
filter(region != "ls",
lat <= round(max(NWA_coords_nolab$lat))+0.5) %>%
create_packet()
) # 142 seconds
# saveRDS(packet_nolab, "data/packet_nolab.Rda")
# Exclude Labrador and Gulf of St Lawrence regions
system.time(
packet_nolabgsl <- synoptic_states_unnest %>%
filter(!region %in% c("ls", "gsl"),
lat <= round(max(NWA_coords_nolabgsl$lat))+0.5) %>%
create_packet()
) # 106 seconds
# saveRDS(packet_nolabgsl, "data/packet_nolabgsl.Rda")
# Exclude Labrador region and moderate events
system.time(
packet_nolabmod <- synoptic_states_unnest %>%
filter(region != "ls",
lat <= round(max(NWA_coords_nolab$lat))+0.5) %>%
left_join(select(OISST_MHW_cats, region, event_no, category), by = c("region", "event_no")) %>%
filter(category != "I Moderate") %>%
select(-category) %>%
create_packet()
) # 15 seconds
# saveRDS(packet_nolabmod, "data/packet_nolabmod.Rda")
# Exclude Labrador region and moderate events
system.time(
packet_nolab14 <- synoptic_states_unnest %>%
filter(region != "ls",
lat <= round(max(NWA_coords_nolab$lat))+0.5) %>%
left_join(select(OISST_MHW_cats, region, event_no, duration), by = c("region", "event_no")) %>%
filter(duration >= 14) %>%
select(-duration) %>%
create_packet()
) # 40 seconds
# saveRDS(packet_nolab14, "data/packet_nolab14.Rda")
Now that we have our data packets to feed the SOM, we need a function that will ingest them and produce results for us. The function below has been greatly expanded on from the previous version of this project and now performs all of the SOM related work in one go. This allowed me to remove a couple hundreds lines of code and text from this vignette.
# Function for calculating SOMs using PCI
# This outputs the mean values for each SOM as well
# NB: 4x4 produced one empty cell and one cell with only one event
# So the default size has been reduced to 4x3
som_model_PCI <- function(data_packet, xdim = 4, ydim = 3){
# Create a scaled matrix for the SOM
# Cancel out first column as this is the reference ID of the event per row
data_packet_matrix <- as.matrix(scale(data_packet[,-1]))
# Create the grid that the SOM will use to determine the number of nodes
som_grid <- somgrid(xdim = xdim, ydim = ydim, topo = "hexagonal")
# Run the SOM with PCI
som_model <- batchsom(data_packet_matrix,
somgrid = som_grid,
init = "pca",
max.iter = 100)
# Create a data.frame of info
node_info <- data.frame(event_ID = data_packet[,"event_ID"],
node = som_model$classif) %>%
separate(event_ID, into = c("region", "event_no"), sep = "BBB") %>%
group_by(node) %>%
mutate(count = n()) %>%
ungroup() %>%
mutate(event_no = as.numeric(as.character(event_no))) %>%
left_join(select(OISST_MHW_cats, region, event_no, category, peak_date),
by = c("region", "event_no")) %>%
mutate(month_peak = lubridate::month(peak_date, label = T),
season_peak = case_when(month_peak %in% c("Jan", "Feb", "Mar") ~ "Winter",
month_peak %in% c("Apr", "May", "Jun") ~ "Spring",
month_peak %in% c("Jul", "Aug", "Sep") ~ "Summer",
month_peak %in% c("Oct", "Nov", "Dec") ~ "Autumn")) %>%
select(-peak_date, -month_peak)
# Determine which event goes in which node and melt
data_packet_long <- cbind(node = som_model$classif, data_packet) %>%
separate(event_ID, into = c("region", "event_no"), sep = "BBB") %>%
data.table() %>%
reshape2::melt(id = c("node", "region", "event_no"),
measure = c(colnames(.)[-c(1:3)]),
variable.name = "variable", value.name = "value")
# Create the mean values that serve as the unscaled results from the SOM
node_data <- data_packet_long[, .(val = mean(value, na.rm = TRUE)),
by = .(node, variable)] %>%
separate(variable, into = c("lon", "lat", "var"), sep = "BBB") %>%
dplyr::arrange(node, var, lon, lat) %>%
mutate(lon = as.numeric(lon),
lat = as.numeric(lat),
val = round(val, 4))
## ANOSIM for goodness of fit for node count
node_data_wide <- node_data %>%
unite(coords, c(lon, lat, var), sep = "BBB") %>%
data.table() %>%
dcast(node~coords, value.var = "val")
# Calculate similarity
som_anosim <- vegan::anosim(as.matrix(node_data_wide[,-1]),
node_data_wide$node, distance = "euclidean")$signif
# Combine and exit
res <- list(data = node_data, info = node_info, ANOSIM = paste0("p = ",som_anosim))
return(res)
}
With the function sorted, we now feed it the data packets.
# The SOM on the entire study area
packet_all <- readRDS("data/packet_all.Rda")
system.time(som_all <- som_model_PCI(packet_all)) # 136 seconds
# som_all$ANOSIM # p = 0.001
saveRDS(som_all, file = "data/som_all.Rda")
# The SOM excluding the Labrador Sea region
packet_nolab <- readRDS("data/packet_nolab.Rda")
system.time(som_nolab <- som_model_PCI(packet_nolab)) # 72 seconds
# som_nolab$ANOSIM # p = 0.001
saveRDS(som_nolab, file = "data/som_nolab.Rda")
# The SOM excluding the Labrador Sea and Gulf of St Lawrence regions
packet_nolabgsl <- readRDS("data/packet_nolabgsl.Rda")
system.time(som_nolabgsl <- som_model_PCI(packet_nolabgsl)) # 58 seconds
# som_nolabgsl$ANOSIM # p = 0.001
saveRDS(som_nolabgsl, file = "data/som_nolabgsl.Rda")
# We see below that the results are crisper when we leave the Gulf of St Lawrence in,
# so we will proceed with the rest of the experiments only excluding the Labrador Shelf
# A 9 node SOM
system.time(som_nolab_9 <- som_model_PCI(packet_nolab, xdim = 3, ydim = 3)) # 56 seconds
# som_nolab_9$ANOSIM # p = 0.001
saveRDS(som_nolab_9, file = "data/som_nolab_9.Rda")
# The 9 node results are perhaps easier to make sense of than 12 nodes, but it's not certain
# A 16 node SOM
system.time(som_nolab_16 <- som_model_PCI(packet_nolab, xdim = 4, ydim = 4)) # 91 seconds
# som_nolab_16$ANOSIM # p = 0.001
saveRDS(som_nolab_16, file = "data/som_nolab_16.Rda")
# 16 nodes seems unnecessary...
# A SOM without moderate events
system.time(som_nolabmod <- som_model_PCI(packet_nolabmod, xdim = 2, ydim = 2)) # 12 seconds
# som_nolabmod$ANOSIM # p = 0.042
saveRDS(som_nolabmod, file = "data/som_nolabmod.Rda")
# There are fewer than 40 category "II Strong" and larger MHWs so using more than 4 nodes wouldn't be appropriate
# These results are defintely too sparse to use for a publication
# A SOM without events shorter than 14 days
system.time(som_nolab14 <- som_model_PCI(packet_nolab14, xdim = 3, ydim = 3)) # 12 seconds
# som_nolab14$ANOSIM # p = 0.001
saveRDS(som_nolab14, file = "data/som_nolab14.Rda")
As simpleas that we now have a range of results from our SOM experiments. Up next in the Node summary vignette we will show the results with a range of visuals.
“Finally, Shearman and Lentz (2010) showed that century-long ocean warming trends observed along the entire northeast U.S. coast are not related to local atmospheric forcing but driven by atmospheric warming of source waters in the Labrador Sea and the Arctic, which are advected into the region.” (Richaud et al., 2016)
Downwelling
Net heatflux (OAFlux) doesn’t line up perfectly with seasonal SST signal, but is very close, with heat flux tending to lead SST by 2 – 3 months (Richaud et al., 2016). It is therefore likely one of the primary drivers of SST and should therefore be strongly considered when constructing SOMs.
There is almost no seasonal cycle for slope waters in any of the regions (Richaud et al., 2016).
It would be interesting to see if the SOM outputs differ in any meaningful ways when only data from the first half of the study time period are used compared against the second half.
The output of the SOMs could likely be more meaningfully conveyed from the point of view of the regions. What I mean by this is to take the summary of the nodes, convey them into a table, and then use that table to inform a series of information bits that is focused around each region. Some sort of interactive visual may be useful for this. Showing the percentage that each region has in each node would be a good start. This would allow for a more meaningful further explanation for which drivers affect which regions during which seasons and over which years.
Once this summary is worked out it would then follow that the same analysis be run 1, 2, 3 etc. months in the past and see what the same information format provides w.r.t. a sort of predictive capacity. All of this can then be used to check other data products with a more focused lens in order to maximise the utility of the output.
Richaud, B., Kwon, Y.-O., Joyce, T. M., Fratantoni, P. S., and Lentz, S. J. (2016). Surface and bottom temperature and salinity climatology along the continental shelf off the canadian and us east coasts. Continental Shelf Research 124, 165–181.
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 data.table_1.11.6 yasomi_0.3
[4] proxy_0.4-22 e1071_1.7-0 lubridate_1.7.4
[7] forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6
[10] purrr_0.2.5 readr_1.1.1 tidyr_0.8.1
[13] tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
[16] jsonlite_1.6
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 haven_1.1.2 lattice_0.20-35
[4] colorspace_1.3-2 htmltools_0.3.6 yaml_2.2.0
[7] rlang_0.2.2 R.oo_1.22.0 pillar_1.3.0
[10] glue_1.3.0 withr_2.1.2 R.utils_2.7.0
[13] doMC_1.3.5 modelr_0.1.2 readxl_1.1.0
[16] foreach_1.4.4 bindr_0.1.1 plyr_1.8.4
[19] munsell_0.5.0 gtable_0.2.0 workflowr_1.1.1
[22] cellranger_1.1.0 rvest_0.3.2 R.methodsS3_1.7.1
[25] codetools_0.2-15 evaluate_0.11 knitr_1.20
[28] parallel_3.6.1 class_7.3-14 broom_0.5.0
[31] Rcpp_0.12.18 backports_1.1.2 scales_1.0.0
[34] hms_0.4.2 digest_0.6.16 stringi_1.2.4
[37] grid_3.6.1 rprojroot_1.3-2 cli_1.0.0
[40] tools_3.6.1 maps_3.3.0 magrittr_1.5
[43] lazyeval_0.2.1 crayon_1.3.4 whisker_0.3-2
[46] pkgconfig_2.0.2 xml2_1.2.0 iterators_1.0.10
[49] assertthat_0.2.0 rmarkdown_1.10 httr_1.3.1
[52] rstudioapi_0.7 R6_2.2.2 nlme_3.1-137
[55] git2r_0.23.0 compiler_3.6.1
This reproducible R Markdown analysis was created with workflowr 1.1.1