Last updated: 2019-08-15

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(20190513)

    The command set.seed(20190513) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: 78f4977

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    data/ALL_anom.Rda
        Ignored:    data/ALL_clim.Rda
        Ignored:    data/ERA5_lhf.Rda
        Ignored:    data/ERA5_lwr.Rda
        Ignored:    data/ERA5_qnet.Rda
        Ignored:    data/ERA5_qnet_anom.Rda
        Ignored:    data/ERA5_qnet_clim.Rda
        Ignored:    data/ERA5_shf.Rda
        Ignored:    data/ERA5_swr.Rda
        Ignored:    data/ERA5_t2m.Rda
        Ignored:    data/ERA5_t2m_anom.Rda
        Ignored:    data/ERA5_t2m_clim.Rda
        Ignored:    data/ERA5_u.Rda
        Ignored:    data/ERA5_u_anom.Rda
        Ignored:    data/ERA5_u_clim.Rda
        Ignored:    data/ERA5_v.Rda
        Ignored:    data/ERA5_v_anom.Rda
        Ignored:    data/ERA5_v_clim.Rda
        Ignored:    data/GLORYS_mld.Rda
        Ignored:    data/GLORYS_mld_anom.Rda
        Ignored:    data/GLORYS_mld_clim.Rda
        Ignored:    data/GLORYS_u.Rda
        Ignored:    data/GLORYS_u_anom.Rda
        Ignored:    data/GLORYS_u_clim.Rda
        Ignored:    data/GLORYS_v.Rda
        Ignored:    data/GLORYS_v_anom.Rda
        Ignored:    data/GLORYS_v_clim.Rda
        Ignored:    data/NAPA_clim_U.Rda
        Ignored:    data/NAPA_clim_V.Rda
        Ignored:    data/NAPA_clim_W.Rda
        Ignored:    data/NAPA_clim_emp_ice.Rda
        Ignored:    data/NAPA_clim_emp_oce.Rda
        Ignored:    data/NAPA_clim_fmmflx.Rda
        Ignored:    data/NAPA_clim_mldkz5.Rda
        Ignored:    data/NAPA_clim_mldr10_1.Rda
        Ignored:    data/NAPA_clim_qemp_oce.Rda
        Ignored:    data/NAPA_clim_qla_oce.Rda
        Ignored:    data/NAPA_clim_qns.Rda
        Ignored:    data/NAPA_clim_qsb_oce.Rda
        Ignored:    data/NAPA_clim_qt.Rda
        Ignored:    data/NAPA_clim_runoffs.Rda
        Ignored:    data/NAPA_clim_ssh.Rda
        Ignored:    data/NAPA_clim_sss.Rda
        Ignored:    data/NAPA_clim_sst.Rda
        Ignored:    data/NAPA_clim_taum.Rda
        Ignored:    data/NAPA_clim_vars.Rda
        Ignored:    data/NAPA_clim_vecs.Rda
        Ignored:    data/OAFlux.Rda
        Ignored:    data/OISST_sst.Rda
        Ignored:    data/OISST_sst_anom.Rda
        Ignored:    data/OISST_sst_clim.Rda
        Ignored:    data/node_mean_all_anom.Rda
        Ignored:    data/packet_all.Rda
        Ignored:    data/packet_all_anom.Rda
        Ignored:    data/packet_nolab.Rda
        Ignored:    data/packet_nolab14.Rda
        Ignored:    data/packet_nolabgsl.Rda
        Ignored:    data/packet_nolabmod.Rda
        Ignored:    data/som_all.Rda
        Ignored:    data/som_all_anom.Rda
        Ignored:    data/som_nolab.Rda
        Ignored:    data/som_nolab14.Rda
        Ignored:    data/som_nolab_16.Rda
        Ignored:    data/som_nolab_9.Rda
        Ignored:    data/som_nolabgsl.Rda
        Ignored:    data/som_nolabmod.Rda
        Ignored:    data/synoptic_states.Rda
        Ignored:    data/synoptic_vec_states.Rda
    
    Untracked files:
        Untracked:  output/SOM/no_ls/assets
        Untracked:  output/SOM/no_ls/no_ls
    
    Unstaged changes:
        Modified:   code/functions.R
        Modified:   docs/assets/no_ls/fig_2.pdf
        Modified:   docs/assets/no_ls/fig_3.pdf
        Modified:   docs/assets/no_ls/fig_4.pdf
        Modified:   docs/assets/no_ls/fig_5.pdf
        Modified:   docs/assets/no_ls/fig_6.pdf
        Modified:   docs/assets/no_ls/fig_7.pdf
        Modified:   docs/assets/no_ls/node_10_panels.pdf
        Modified:   docs/assets/no_ls/node_11_panels.pdf
        Modified:   docs/assets/no_ls/node_12_panels.pdf
        Modified:   docs/assets/no_ls/node_1_panels.pdf
        Modified:   docs/assets/no_ls/node_2_panels.pdf
        Modified:   docs/assets/no_ls/node_3_panels.pdf
        Modified:   docs/assets/no_ls/node_4_panels.pdf
        Modified:   docs/assets/no_ls/node_5_panels.pdf
        Modified:   docs/assets/no_ls/node_6_panels.pdf
        Modified:   docs/assets/no_ls/node_7_panels.pdf
        Modified:   docs/assets/no_ls/node_8_panels.pdf
        Modified:   docs/assets/no_ls/node_9_panels.pdf
        Modified:   docs/assets/no_ls_14/fig_2.pdf
        Modified:   docs/assets/no_ls_14/fig_3.pdf
        Modified:   docs/assets/no_ls_14/fig_4.pdf
        Modified:   docs/assets/no_ls_14/fig_5.pdf
        Modified:   docs/assets/no_ls_14/fig_6.pdf
        Modified:   docs/assets/no_ls_14/fig_7.pdf
        Modified:   docs/assets/no_ls_14/node_1_panels.pdf
        Modified:   docs/assets/no_ls_14/node_2_panels.pdf
        Modified:   docs/assets/no_ls_14/node_3_panels.pdf
        Modified:   docs/assets/no_ls_14/node_4_panels.pdf
        Modified:   docs/assets/no_ls_14/node_5_panels.pdf
        Modified:   docs/assets/no_ls_14/node_6_panels.pdf
        Modified:   docs/assets/no_ls_14/node_7_panels.pdf
        Modified:   docs/assets/no_ls_14/node_8_panels.pdf
        Modified:   docs/assets/no_ls_14/node_9_panels.pdf
        Modified:   docs/assets/no_ls_3x3/fig_2.pdf
        Modified:   docs/assets/no_ls_3x3/fig_3.pdf
        Modified:   docs/assets/no_ls_3x3/fig_4.pdf
        Modified:   docs/assets/no_ls_3x3/fig_5.pdf
        Modified:   docs/assets/no_ls_3x3/fig_6.pdf
        Modified:   docs/assets/no_ls_3x3/fig_7.pdf
        Modified:   docs/assets/no_ls_3x3/node_1_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_2_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_3_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_4_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_5_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_6_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_7_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_8_panels.pdf
        Modified:   docs/assets/no_ls_3x3/node_9_panels.pdf
        Modified:   output/SOM/all/fig_2.pdf
        Modified:   output/SOM/all/fig_3.pdf
        Modified:   output/SOM/all/fig_4.pdf
        Modified:   output/SOM/all/fig_5.pdf
        Modified:   output/SOM/all/fig_6.pdf
        Modified:   output/SOM/all/fig_7.pdf
        Modified:   output/SOM/all/node_10_panels.pdf
        Modified:   output/SOM/all/node_11_panels.pdf
        Modified:   output/SOM/all/node_12_panels.pdf
        Modified:   output/SOM/all/node_1_panels.pdf
        Modified:   output/SOM/all/node_2_panels.pdf
        Modified:   output/SOM/all/node_3_panels.pdf
        Modified:   output/SOM/all/node_4_panels.pdf
        Modified:   output/SOM/all/node_5_panels.pdf
        Modified:   output/SOM/all/node_6_panels.pdf
        Modified:   output/SOM/all/node_7_panels.pdf
        Modified:   output/SOM/all/node_8_panels.pdf
        Modified:   output/SOM/all/node_9_panels.pdf
        Modified:   output/SOM/no_ls/fig_2.pdf
        Modified:   output/SOM/no_ls/fig_3.pdf
        Modified:   output/SOM/no_ls/fig_4.pdf
        Modified:   output/SOM/no_ls/fig_5.pdf
        Modified:   output/SOM/no_ls/fig_6.pdf
        Modified:   output/SOM/no_ls/fig_7.pdf
        Modified:   output/SOM/no_ls/node_10_panels.pdf
        Modified:   output/SOM/no_ls/node_11_panels.pdf
        Modified:   output/SOM/no_ls/node_12_panels.pdf
        Modified:   output/SOM/no_ls/node_1_panels.pdf
        Modified:   output/SOM/no_ls/node_2_panels.pdf
        Modified:   output/SOM/no_ls/node_3_panels.pdf
        Modified:   output/SOM/no_ls/node_4_panels.pdf
        Modified:   output/SOM/no_ls/node_5_panels.pdf
        Modified:   output/SOM/no_ls/node_6_panels.pdf
        Modified:   output/SOM/no_ls/node_7_panels.pdf
        Modified:   output/SOM/no_ls/node_8_panels.pdf
        Modified:   output/SOM/no_ls/node_9_panels.pdf
        Modified:   output/SOM/no_ls_14/fig_2.pdf
        Modified:   output/SOM/no_ls_14/fig_3.pdf
        Modified:   output/SOM/no_ls_14/fig_4.pdf
        Modified:   output/SOM/no_ls_14/fig_5.pdf
        Modified:   output/SOM/no_ls_14/fig_6.pdf
        Modified:   output/SOM/no_ls_14/fig_7.pdf
        Modified:   output/SOM/no_ls_14/node_1_panels.pdf
        Modified:   output/SOM/no_ls_14/node_2_panels.pdf
        Modified:   output/SOM/no_ls_14/node_3_panels.pdf
        Modified:   output/SOM/no_ls_14/node_4_panels.pdf
        Modified:   output/SOM/no_ls_14/node_5_panels.pdf
        Modified:   output/SOM/no_ls_14/node_6_panels.pdf
        Modified:   output/SOM/no_ls_14/node_7_panels.pdf
        Modified:   output/SOM/no_ls_14/node_8_panels.pdf
        Modified:   output/SOM/no_ls_14/node_9_panels.pdf
        Modified:   output/SOM/no_ls_3x3/fig_2.pdf
        Modified:   output/SOM/no_ls_3x3/fig_3.pdf
        Modified:   output/SOM/no_ls_3x3/fig_4.pdf
        Modified:   output/SOM/no_ls_3x3/fig_5.pdf
        Modified:   output/SOM/no_ls_3x3/fig_6.pdf
        Modified:   output/SOM/no_ls_3x3/fig_7.pdf
        Modified:   output/SOM/no_ls_3x3/node_1_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_2_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_3_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_4_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_5_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_6_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_7_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_8_panels.pdf
        Modified:   output/SOM/no_ls_3x3/node_9_panels.pdf
        Modified:   output/SOM/no_ls_4x4/fig_2.pdf
        Modified:   output/SOM/no_ls_4x4/fig_3.pdf
        Modified:   output/SOM/no_ls_4x4/fig_4.pdf
        Modified:   output/SOM/no_ls_4x4/fig_5.pdf
        Modified:   output/SOM/no_ls_4x4/fig_6.pdf
        Modified:   output/SOM/no_ls_4x4/fig_7.pdf
        Modified:   output/SOM/no_ls_4x4/node_10_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_11_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_12_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_13_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_14_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_15_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_16_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_1_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_2_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_3_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_4_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_5_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_6_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_7_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_8_panels.pdf
        Modified:   output/SOM/no_ls_4x4/node_9_panels.pdf
        Modified:   output/SOM/no_ls_gsl/fig_2.pdf
        Modified:   output/SOM/no_ls_gsl/fig_3.pdf
        Modified:   output/SOM/no_ls_gsl/fig_4.pdf
        Modified:   output/SOM/no_ls_gsl/fig_5.pdf
        Modified:   output/SOM/no_ls_gsl/fig_6.pdf
        Modified:   output/SOM/no_ls_gsl/fig_7.pdf
        Modified:   output/SOM/no_ls_gsl/node_10_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_11_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_12_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_1_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_2_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_3_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_4_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_5_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_6_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_7_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_8_panels.pdf
        Modified:   output/SOM/no_ls_gsl/node_9_panels.pdf
        Modified:   output/SOM/no_ls_mod/fig_2.pdf
        Modified:   output/SOM/no_ls_mod/fig_3.pdf
        Modified:   output/SOM/no_ls_mod/fig_4.pdf
        Modified:   output/SOM/no_ls_mod/fig_5.pdf
        Modified:   output/SOM/no_ls_mod/fig_6.pdf
        Modified:   output/SOM/no_ls_mod/fig_7.pdf
        Modified:   output/SOM/no_ls_mod/node_1_panels.pdf
        Modified:   output/SOM/no_ls_mod/node_2_panels.pdf
        Modified:   output/SOM/no_ls_mod/node_3_panels.pdf
        Modified:   output/SOM/no_ls_mod/node_4_panels.pdf
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    Rmd 78f4977 robwschlegel 2019-08-15 Re-publish entire site.
    Rmd 07fe2a2 robwschlegel 2019-08-14 Nearly through the node summaries for the three SOM experiments
    Rmd a61b420 robwschlegel 2019-08-13 Working on SOM write-up
    html 20ae166 robwschlegel 2019-08-11 Build site.
    html 19bea26 robwschlegel 2019-08-11 Build site.
    html 2652a3a robwschlegel 2019-08-11 Build site.
    Rmd adc762b robwschlegel 2019-08-08 Re-worked the GLORYS data and propogated update through to SOM analysis figures for all experiments
    html f0d2efb robwschlegel 2019-08-07 Build site.
    Rmd ed626bf robwschlegel 2019-08-07 Ran a bunch of figures and had a meeting with Eric. More changes coming to GLORYS data tomorrow before settling on one of the experimental SOMs
    html f66aa38 robwschlegel 2019-08-01 Build site.
    Rmd 5e12d9e robwschlegel 2019-08-01 Re-publish entire site.
    Rmd 9a9fa7d robwschlegel 2019-08-01 A more in depth dive into the potential criteria to meet for the SOM model
    Rmd 240a7a0 robwschlegel 2019-07-31 Ran the base SOM results
    html aa82e6e robwschlegel 2019-07-31 Build site.
    Rmd 498909b robwschlegel 2019-07-31 Re-publish entire site.
    html 35987b4 robwschlegel 2019-07-09 Build site.
    Rmd 34efa43 robwschlegel 2019-07-09 Added some thinking to the SOM vignette.
    html e2f6f42 robwschlegel 2019-07-09 Build site.
    Rmd 609cca8 robwschlegel 2019-07-09 Added some thinking to the SOM vignette.
    html 81e961d robwschlegel 2019-07-09 Build site.
    Rmd 7ff9b8b robwschlegel 2019-06-17 More work on the talk
    Rmd b25762e robwschlegel 2019-06-12 More work on figures
    Rmd 413bb8b robwschlegel 2019-06-12 Working on pixel interpolation
    html c23c50b robwschlegel 2019-06-10 Build site.
    html 028d3cc robwschlegel 2019-06-10 Build site.
    Rmd c6b3c7b robwschlegel 2019-06-10 Re-publish entire site.
    Rmd 1b53eeb robwschlegel 2019-06-10 SOM packet pipeline testing
    Rmd 4504e12 robwschlegel 2019-06-07 Working on joining in vector data
    html c61a15f robwschlegel 2019-06-06 Build site.
    Rmd 44ac335 robwschlegel 2019-06-06 Working on inclusion of vectors into SOM pipeline
    html 6dd6da8 robwschlegel 2019-06-06 Build site.
    Rmd 07137d9 robwschlegel 2019-06-06 Site wide update, including newly functioning SOM pipeline.
    Rmd 990693a robwschlegel 2019-06-05 First SOM result visuals
    Rmd 25e7e9a robwschlegel 2019-06-05 SOM pipeline nearly finished
    Rmd 4838cc8 robwschlegel 2019-06-04 Working on SOM functions
    Rmd 94ce8f6 robwschlegel 2019-06-04 Functions for creating data packets are up and running
    Rmd 65301ed robwschlegel 2019-05-30 Push before getting rid of some testing structure
    html c09b4f7 robwschlegel 2019-05-24 Build site.
    Rmd 5dc8bd9 robwschlegel 2019-05-24 Finished initial creation of SST prep vignette.
    html a29be6b robwschlegel 2019-05-13 Build site.
    html ea61999 robwschlegel 2019-05-13 Build site.
    Rmd f8f28b1 robwschlegel 2019-05-13 Skeleton files

Introduction

This vignette contains the code used to perform the self-organising map (SOM) analysis on the mean synoptic states created in the Variable preparation vignette. We’ll start by creating custom packets that meet certain experimental criteria before then feeding them into a SOM. We will finish up by creating some cursory visuals of the results. The full summary of the results may be seen in the Node summary vignette.

# Insatll from GitHub
# .libPaths(c("~/R-packages", .libPaths()))
# devtools::install_github("fabrice-rossi/yasomi")

# Packages used in this vignette
library(jsonlite, lib.loc = "../R-packages/")
library(tidyverse) # Base suite of functions
library(lubridate) # For convenient date manipulation
library(yasomi, lib.loc = "../R-packages/") # The SOM package of choice due to PCI compliance
library(data.table) # For working with massive dataframes

# Set number of cores
doMC::registerDoMC(cores = 50)

# Disable scientific notation for numeric values
  # I just find it annoying
options(scipen = 999)

# Set number of cores
doMC::registerDoMC(cores = 50)

# Disable scientific notation for numeric values
  # I just find it annoying
options(scipen = 999)

# Individual regions
NWA_coords <- readRDS("data/NWA_coords_cabot.Rda")

# Corners of the study area
NWA_corners <- readRDS("data/NWA_corners.Rda")

# The base map
map_base <- ggplot2::fortify(maps::map(fill = TRUE, col = "grey80", plot = FALSE)) %>%
  dplyr::rename(lon = long) %>%
  mutate(group = ifelse(lon > 180, group+9999, group),
         lon = ifelse(lon > 180, lon-360, lon)) %>% 
  select(-region, -subregion)

# MHW results
OISST_region_MHW <- readRDS("data/OISST_region_MHW.Rda")

# MHW Events
OISST_MHW_event <- OISST_region_MHW %>%
  select(-cats) %>%
  unnest(events) %>%
  filter(row_number() %% 2 == 0) %>%
  unnest(events)

# MHW Categories
suppressWarnings( # Don't need warning about different names for events
OISST_MHW_cats <- OISST_region_MHW %>%
  select(-events) %>%
  unnest(cats) 
)

Tailored data packets

In this last stage before running our SOM analyses we will create data packets that can be fed directly into the SOM algorithm. These data packets will vary based on the exclusion of certain regions in the study area. In the first run of this analysis on the NAPA model data it was found that the inclusion of the Labrador Sea complicated the results quite a bit. It is also unclear whether or not the Gulf of St Lawrence region should be included in the analysis. While creating whatever packets we desire we will also be converting them into the super-wide matrix format that the SOM model desires.

Unnest synoptic state packets

Up first we must simply load and unnest the synoptic state packets made previously.

# Load the synoptic states data packet
system.time(
synoptic_states <- readRDS("data/synoptic_states.Rda")
) # 3 seconds

# Unnest the synoptic data
system.time(
synoptic_states_unnest <- synoptic_states %>% 
  select(region, event_no, synoptic) %>% 
  unnest()
) # 8 seconds

Custom packets

With all of our data ready we may now trim them as we see fit before saving them for the SOM.

# The study area size when the Labrador region is excluded
NWA_coords_nolab <- NWA_coords %>% 
  filter(region != "ls")

# The study area size when the Labrador and GSL regions are excluded
NWA_coords_nolabgsl <- NWA_coords %>% 
  filter(!region %in% c("ls", "gsl"))

# Test visuals of reduced study areas
# synoptic_states[1,] %>% 
#   unnest() %>% 
#   filter(lat <= round(max(NWA_coords_nolabgsl$lat))+0.5) %>% 
#   ggplot(aes(x = lon, y = lat)) +
#   geom_raster(aes(fill = sst_anom)) +
#   geom_polygon(data = NWA_coords_nolabgsl, aes(colour = region), fill = NA)

# Function for casting wide the custom packets
wide_packet_func <- function(df){
  
  # Cast the data to a single row
  res <- data.table::data.table(df) %>% 
    select(-t) %>% 
    reshape2::melt(id = c("region", "event_no", "lon", "lat"),
                   measure = c(colnames(.)[-c(1:4)]), 
                   variable.name = "var", value.name = "val") %>% 
    dplyr::arrange(var, lon, lat) %>%
    unite(coords, c(lon, lat, var), sep = "BBB") %>%
    unite(event_ID, c(region, event_no), sep = "BBB") %>%
    reshape2::dcast(event_ID ~ coords, value.var = "val")
  
  # Remove columns (pixels) with missing data
  res_fix <- res[,colSums(is.na(res))<1]
  
  # Remove columns (pixels) with no variance
  # This may occur in pixels where there is no variance in MLD anomaly
  no_var <- data.frame(min = sapply(res_fix[,-1], min),
                       max = sapply(res_fix[,-1], max)) %>% 
    mutate(col_name = row.names(.)) %>% 
    filter(min == max)
  res_filter <- res_fix[,!(colnames(res_fix) %in% no_var$col_name)]
  
  # Exit
  return(res_filter)
}

# Packet for entire study region
system.time(
  packet_all <- wide_packet_func(synoptic_states_unnest)
) # 182 seconds
saveRDS(packet_all, "data/packet_all.Rda")

# Exclude Labrador region
system.time(
  packet_nolab <- synoptic_states_unnest %>% 
    filter(region != "ls",
           lat <= round(max(NWA_coords_nolab$lat))+0.5) %>% 
    wide_packet_func()
) # 103 seconds
saveRDS(packet_nolab, "data/packet_nolab.Rda")

# Exclude Labrador and Gulf of St Lawrence regions
system.time(
  packet_nolabgsl <- synoptic_states_unnest %>% 
    filter(!region %in% c("ls", "gsl"),
           lat <= round(max(NWA_coords_nolabgsl$lat))+0.5) %>% 
    wide_packet_func()
) # 82 seconds
saveRDS(packet_nolabgsl, "data/packet_nolabgsl.Rda")

# Exclude Labrador region and moderate events
system.time(
  packet_nolabmod <- synoptic_states_unnest %>% 
    filter(region != "ls",
           lat <= round(max(NWA_coords_nolab$lat))+0.5) %>% 
    left_join(select(OISST_MHW_cats, region, event_no, category), by = c("region", "event_no")) %>% 
    filter(category != "I Moderate") %>% 
    select(-category) %>% 
    wide_packet_func()
) # 14 seconds
saveRDS(packet_nolabmod, "data/packet_nolabmod.Rda")

# Exclude Labrador region and moderate events
system.time(
  packet_nolab14 <- synoptic_states_unnest %>% 
    filter(region != "ls",
           lat <= round(max(NWA_coords_nolab$lat))+0.5) %>% 
    left_join(select(OISST_MHW_cats, region, event_no, duration), by = c("region", "event_no")) %>% 
    filter(duration >= 14) %>% 
    select(-duration) %>% 
    wide_packet_func()
) # 40 seconds
saveRDS(packet_nolab14, "data/packet_nolab14.Rda")

# A packet for events with a cummulative intensity of at least 100

Run SOM models

Now that we have our data packets to feed the SOM, we need a function that will ingest them and produce results for us. The function below has been greatly expanded on from the previous version of this project and now performs all of the SOM related work in one go. This allowed me to remove a couple hundreds lines of code and text from this vignette.

# Function for calculating SOMs using PCI
# This outputs the mean values for each SOM as well
som_model_PCI <- function(data_packet, xdim = 4, ydim = 3){
  # Create a scaled matrix for the SOM
  # Cancel out first column as this is the reference ID of the event per row
  data_packet_matrix <- as.matrix(scale(data_packet[,-1]))

  # Create the grid that the SOM will use to determine the number of nodes
  som_grid <- somgrid(xdim = xdim, ydim = ydim, topo = "hexagonal")

  # Run the SOM with PCI
  som_model <- batchsom(data_packet_matrix,
                        somgrid = som_grid,
                        init = "pca",
                        max.iter = 100)
  
  # Create a data.frame of info
  node_info <- data.frame(event_ID = data_packet[,"event_ID"],
                           node = som_model$classif) %>% 
    separate(event_ID, into = c("region", "event_no"), sep = "BBB") %>% 
    group_by(node) %>% 
    mutate(count = n()) %>% 
    ungroup() %>% 
    mutate(event_no = as.numeric(as.character(event_no))) %>%
    left_join(select(OISST_MHW_cats, region, event_no, category, peak_date), 
              by = c("region", "event_no")) %>% 
    mutate(month_peak = lubridate::month(peak_date, label = T),
           season_peak = case_when(month_peak %in% c("Jan", "Feb", "Mar") ~ "Winter",
                                   month_peak %in% c("Apr", "May", "Jun") ~ "Spring",
                                   month_peak %in% c("Jul", "Aug", "Sep") ~ "Summer",
                                   month_peak %in% c("Oct", "Nov", "Dec") ~ "Autumn")) %>% 
    select(-peak_date, -month_peak)
  
  # Determine which event goes in which node and melt
  data_packet_long <- cbind(node = som_model$classif, data_packet) %>% 
    separate(event_ID, into = c("region", "event_no"), sep = "BBB") %>% 
    data.table() %>% 
    reshape2::melt(id = c("node", "region", "event_no"),
                   measure = c(colnames(.)[-c(1:3)]), 
                   variable.name = "variable", value.name = "value")
  
  # Create the mean values that serve as the unscaled results from the SOM
  node_data <- data_packet_long[, .(val = mean(value, na.rm = TRUE)),
                                   by = .(node, variable)] %>% 
    separate(variable, into = c("lon", "lat", "var"), sep = "BBB") %>%
    dplyr::arrange(node, var, lon, lat) %>% 
    mutate(lon = as.numeric(lon),
           lat = as.numeric(lat),
           val = round(val, 4))
  
  ## ANOSIM for goodness of fit for node count
  node_data_wide <- node_data %>%
    unite(coords, c(lon, lat, var), sep = "BBB") %>% 
    data.table() %>% 
    dcast(node~coords, value.var = "val")

  # Calculate similarity
  som_anosim <- vegan::anosim(as.matrix(node_data_wide[,-1]), 
                              node_data_wide$node, distance = "euclidean")$signif
  
  # Combine and exit
  res <- list(data = node_data, info = node_info, ANOSIM = paste0("p = ",round(som_anosim, 4)))
  return(res)
}

With the function sorted, we now feed it the data packets.

# The SOM on the entire study area
packet_all <- readRDS("data/packet_all.Rda")
system.time(som_all <- som_model_PCI(packet_all)) # 132 seconds
# som_all$ANOSIM # p = 0.001
saveRDS(som_all, file = "data/som_all.Rda")

# The SOM excluding the Labrador Sea region
packet_nolab <- readRDS("data/packet_nolab.Rda")
system.time(som_nolab <- som_model_PCI(packet_nolab)) # 63 seconds
# som_nolab$ANOSIM # p = 0.001
saveRDS(som_nolab, file = "data/som_nolab.Rda")

# The SOM excluding the Labrador Sea and Gulf of St Lawrence regions
packet_nolabgsl <- readRDS("data/packet_nolabgsl.Rda")
system.time(som_nolabgsl <- som_model_PCI(packet_nolabgsl)) # 52 seconds
# som_nolabgsl$ANOSIM # p = 0.001
saveRDS(som_nolabgsl, file = "data/som_nolabgsl.Rda")

# We see below that the results are crisper when we leave the Gulf of St Lawrence in,
# so we will proceed with the rest of the experiments only excluding the Labrador Shelf

# A 9 node SOM
system.time(som_nolab_9 <- som_model_PCI(packet_nolab, xdim = 3,  ydim = 3)) # 53 seconds
# som_nolab_9$ANOSIM # p = 0.001
saveRDS(som_nolab_9, file = "data/som_nolab_9.Rda")

# The 9 node results are perhaps easier to make sense of than 12 nodes, but it's not certain

# A 16 node SOM
system.time(som_nolab_16 <- som_model_PCI(packet_nolab, xdim = 4, ydim = 4)) # 86 seconds
# som_nolab_16$ANOSIM # p = 0.001
saveRDS(som_nolab_16, file = "data/som_nolab_16.Rda")

# 16 nodes seems unnecessary...

# A SOM without moderate events
packet_nolabmod <- readRDS("data/packet_nolabmod.Rda")
system.time(som_nolabmod <- som_model_PCI(packet_nolabmod, xdim = 2, ydim = 2)) # 12 seconds
# som_nolabmod$ANOSIM # p = 0.0417
saveRDS(som_nolabmod, file = "data/som_nolabmod.Rda")

# There are fewer than 40 category "II Strong" and larger MHWs so using more than 4 nodes wouldn't be appropriate
# These results are defintely too sparse to use for a publication

# A SOM without events shorter than 14 days
packet_nolab14 <- readRDS("data/packet_nolab14.Rda")
system.time(som_nolab14 <- som_model_PCI(packet_nolab14, xdim = 3, ydim = 3)) # 12 seconds
# som_nolab14$ANOSIM # p = 0.001
saveRDS(som_nolab14, file = "data/som_nolab14.Rda")

As simple as that we now have a range of results from our SOM experiments. Up next in the Node summary vignette we will show the results with a range of visuals.

Musings

Possible mechanisms

“Finally, Shearman and Lentz (2010) showed that century-long ocean warming trends observed along the entire northeast U.S. coast are not related to local atmospheric forcing but driven by atmospheric warming of source waters in the Labrador Sea and the Arctic, which are advected into the region.” (Richaud et al., 2016)

Downwelling

Net heatflux (OAFlux) doesn’t line up perfectly with seasonal SST signal, but is very close, with heat flux tending to lead SST by 2 – 3 months (Richaud et al., 2016). It is therefore likely one of the primary drivers of SST and should therefore be strongly considered when constructing SOMs.

There is almost no seasonal cycle for slope waters in any of the regions (Richaud et al., 2016).

More ideas

It would be interesting to see if the SOM outputs differ in any meaningful ways when only data from the first half of the study time period are used compared against the second half.

The output of the SOMs could likely be more meaningfully conveyed from the point of view of the regions. What I mean by this is to take the summary of the nodes, convey them into a table, and then use that table to inform a series of information bits that is focused around each region. Some sort of interactive visual may be useful for this. Showing the percentage that each region has in each node would be a good start. This would allow for a more meaningful further explanation for which drivers affect which regions during which seasons and over which years.

Once this summary is worked out it would then follow that the same analysis be run 1, 2, 3 etc. months in the past and see what the same information format provides w.r.t. a sort of predictive capacity. All of this can then be used to check other data products with a more focused lens in order to maximise the utility of the output.

References

Richaud, B., Kwon, Y.-O., Joyce, T. M., Fratantoni, P. S., and Lentz, S. J. (2016). Surface and bottom temperature and salinity climatology along the continental shelf off the canadian and us east coasts. Continental Shelf Research 124, 165–181.

Session information

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2    data.table_1.11.6 yasomi_0.3       
 [4] proxy_0.4-22      e1071_1.7-0       lubridate_1.7.4  
 [7] forcats_0.3.0     stringr_1.3.1     dplyr_0.7.6      
[10] purrr_0.2.5       readr_1.1.1       tidyr_0.8.1      
[13] tibble_1.4.2      ggplot2_3.0.0     tidyverse_1.2.1  
[16] jsonlite_1.6     

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.4  haven_1.1.2       lattice_0.20-35  
 [4] colorspace_1.3-2  htmltools_0.3.6   yaml_2.2.0       
 [7] rlang_0.2.2       R.oo_1.22.0       pillar_1.3.0     
[10] glue_1.3.0        withr_2.1.2       R.utils_2.7.0    
[13] doMC_1.3.5        modelr_0.1.2      readxl_1.1.0     
[16] foreach_1.4.4     bindr_0.1.1       plyr_1.8.4       
[19] munsell_0.5.0     gtable_0.2.0      workflowr_1.1.1  
[22] cellranger_1.1.0  rvest_0.3.2       R.methodsS3_1.7.1
[25] codetools_0.2-15  evaluate_0.11     knitr_1.20       
[28] parallel_3.6.1    class_7.3-14      broom_0.5.0      
[31] Rcpp_0.12.18      backports_1.1.2   scales_1.0.0     
[34] hms_0.4.2         digest_0.6.16     stringi_1.2.4    
[37] grid_3.6.1        rprojroot_1.3-2   cli_1.0.0        
[40] tools_3.6.1       maps_3.3.0        magrittr_1.5     
[43] lazyeval_0.2.1    crayon_1.3.4      whisker_0.3-2    
[46] pkgconfig_2.0.2   xml2_1.2.0        iterators_1.0.10 
[49] assertthat_0.2.0  rmarkdown_1.10    httr_1.3.1       
[52] rstudioapi_0.7    R6_2.2.2          nlme_3.1-137     
[55] git2r_0.23.0      compiler_3.6.1   

This reproducible R Markdown analysis was created with workflowr 1.1.1