Last updated: 2024-04-17

Checks: 7 0

Knit directory: bgc_argo_r_argodata/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20211008)

The command set.seed(20211008) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 550ab72

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 550ab72. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   analysis/_site.yml
    Modified:   analysis/coverage_maps_North_Atlantic.Rmd
    Modified:   analysis/load_broullon_DIC_TA_clim.Rmd
    Modified:   analysis/temp_core_SO_cluster_analysis.Rmd
    Modified:   code/Workflowr_project_managment.R
    Modified:   code/start_background_job.R
    Modified:   code/start_background_job_load.R
    Modified:   code/start_background_job_partial.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/MHWs_categorisation.Rmd) and HTML (docs/MHWs_categorisation.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	550ab72	mlarriere	2024-04-17	Cluster analysis test
html	4a8157c	mlarriere	2024-04-16	Build site.
Rmd	5acdc98	mlarriere	2024-04-16	Some explanations on categorisation and beginning of cluster analysis
html	ad0d737	mlarriere	2024-04-16	Build site.
Rmd	f22c4b5	mlarriere	2024-04-16	Some explanations on categorisation and beginning of cluster analysis
Rmd	2c60853	mlarriere	2024-04-15	Some explanations on categorisation and beginning of cluster analysis
html	9910de2	mlarriere	2024-04-14	Build site.
html	50596ee	mlarriere	2024-04-14	Build site.
Rmd	ac8de8b	mlarriere	2024-04-14	Figures MHWs
html	718d84b	mlarriere	2024-04-14	Build site.
Rmd	a2e118c	mlarriere	2024-04-14	Figures MHWs
html	c789e3f	mlarriere	2024-04-13	Build site.
Rmd	acdc838	mlarriere	2024-04-13	Figures MHWs
html	bbbc1bb	mlarriere	2024-04-13	Build site.
Rmd	b080fc9	mlarriere	2024-04-13	Figures MHWs
html	83d1fa3	mlarriere	2024-04-13	Build site.
Rmd	c8b3f67	mlarriere	2024-04-13	Figures MHWs
html	1ab66f1	mlarriere	2024-04-13	Build site.
Rmd	25928b0	mlarriere	2024-04-13	Figures MHWs
html	5475def	mlarriere	2024-04-13	Build site.
Rmd	e2b710d	mlarriere	2024-04-13	build North Atlanatic section
html	c076fba	mlarriere	2024-04-12	Build site.
Rmd	a52cee0	mlarriere	2024-04-08	adding subsection MHWs categorisation
html	efc6aab	mlarriere	2024-04-08	Build site.
Rmd	a9ea2a8	mlarriere	2024-04-08	test MHWs cat

Task

Focusing on 2023 Only core Argo - focus on temperature anomalies

Dependencies

temp_core_va.rds - temperature of core argo floats after vertical alignment.

core_fileid.rds - not used yet

temp_anomaly_va.rds - file containing the temperature anomalies (temp core - climatology).

2023_mhw_raw.csv - CSV file containing the categorization of surface marine heatwaves, in 2023 and in a 0.25°x0.25° grid.

Outputs

2023_surface_mhws_1x1.rds - file containing the categorization of surface marine heatwaves in a 1°x1° grid, in 2023.

path_emlr_utilities <- "/nfs/kryo/work/jenmueller/emlr_cant/utilities/files/"
path_basin_mask <- "/nfs/kryo/work/datasets/gridded/ocean/interior/reccap2/supplementary/"
path_argo <- '/nfs/kryo/work/datasets/ungridded/3d/ocean/floats/bgc_argo'
path_argo_core <- '/nfs/kryo/work/datasets/ungridded/3d/ocean/floats/core_argo_r_argodata_2024-03-13'

path_argo_core_preprocessed <- paste0(path_argo_core, "/preprocessed_core_data")
path_argo_preprocessed <- paste0(path_argo, "/preprocessed_bgc_data")

path_mhw<- '/net/kryo/work/datasets/gridded/ocean/2d/obs/mhw'

Load data

Load biomes

Load Argo floats data

MHWs category

Upscaling Surface Marine Heat Waves

#Aggregate the dataset into 1°x1° grid
argo_grid<-sst_natlantic_2023 %>% 
  ungroup() %>% 
  select(lat, lon)
  
mhw_1x1_natlantic_2023 <- mhw_raw_2023_north_atlantic %>% #----------->long process (few min)
  mutate(lat_upscale = floor(lat) + 0.5, #rounding to the lower nearest value and add an offset of 0.5 to match with Argo data grid
         lon_upscale = floor(lon) + 0.5,
         row_id = row_number() #row identifier - to facilitate comparison with original dataset
         ) %>%  
  group_by(lat = lat_upscale, #group the data by lon, lat and time
           lon = lon_upscale, 
           time) %>% 
 summarise(avg_intensity = mean(intensity),  # average intensity
           most_freq_category = names(sort(table(category), decreasing = TRUE))[1],  #Most frequent category
           row_id = first(row_id)
           ) %>%
  ungroup() %>% 
  arrange(row_id)

test<- mhw_raw_2023_north_atlantic %>% filter(time=="2023-01-01", lat<1, lat>0, lon<1, lon>0)
print(mean(test$intensity))
test1<- mhw_1x1_natlantic_2023 %>% filter(time=="2023-01-01", lat<1, lat>0, lon<1, lon>0)
print(test1)

#Adding biomes value to the surface MHWs
mhw_1x1_natlantic_2023_biomes<-left_join(mhw_1x1_natlantic_2023, biomes_subset, by=c("lat", "lon")) %>% 
  filter(!is.na(biome_value)) 

mhw_1x1_natlantic_2023_biomes$month <- format( as.Date(mhw_1x1_natlantic_2023_biomes$time), "%m")#adding month attribute

#Write upscaled MHWs dataset (1°x1°)
mhw_1x1_natlantic_2023_biomes %>% write_rds(file = paste0(path_argo_core_preprocessed,"/", "2023_surface_mhws_1x1.rds"))

Surface intensity maps

#Read upscale data
mhw_1x1_natlantic_2023_biomes <- read_rds(file = paste0(path_argo_core_preprocessed,"/", "2023_surface_mhws_1x1.rds"))

# original_plot <- base_map +
#   geom_point(data=mhw_raw_2023_north_atlantic, aes(x = lon, y = lat, color = intensity), size=0.1) +
#   scale_color_gradient(low = "blue", high = "red") +
#   labs(title = "MHW intensity - Original Data") +
#   theme_minimal()
# print(original_plot)

mhw_biomes_plot <- base_map +
  geom_point(data= mhw_1x1_natlantic_2023_biomes, aes(x = lon, y = lat, color = avg_intensity)) +
  scale_color_gradient(low = "blue", high = "red") +
  labs(title = "MHW intensity - Upscaled Data") +
  theme_minimal()
print(mhw_biomes_plot)

Version	Author	Date
718d84b	mlarriere	2024-04-14
c789e3f	mlarriere	2024-04-13
bbbc1bb	mlarriere	2024-04-13
83d1fa3	mlarriere	2024-04-13
1ab66f1	mlarriere	2024-04-13
5475def	mlarriere	2024-04-13

Surface category maps per season

mhw_season_plot_comparison<-function(original_dataset, upscale_dataset, season_of_interest, name_season){
  
  #Select data in season_of_interest
  season_original<-original_dataset %>% 
  filter(month %in% season_of_interest)
  
  season_1x1<-upscale_dataset %>% 
  filter(month %in% season_of_interest)
  
  #Plot
  plot_original<- base_map + 
  geom_point(data = season_original, aes(x = lon, y = lat, color = factor(category))) +
  scale_color_manual(values = c("I Moderate" = "darkgoldenrod1", 
                                "II Strong" = "darkorange",
                                "III Severe" = "darkred",
                                "IV Extreme" = "#21152B"), name = "Category")+
  guides(color = guide_legend(override.aes = list(shape = 15, size = 10)))+
  labs(title = paste0("Surface MHWs in ", name_season, " 2023 - resolution 0.25°x0.25°")) 
  
  plot_upscale<- base_map + 
  geom_point(data = season_1x1, aes(x = lon, y = lat, color = factor(most_freq_category)), size = 0.3) +
  scale_color_manual(values = c("I Moderate" = "darkgoldenrod1", 
                                "II Strong" = "darkorange",
                                "III Severe" = "darkred",
                                "IV Extreme" = "#21152B"), name = "Category")+
  guides(color = guide_legend(override.aes = list(shape = 15, size = 10)))+
  labs(title = paste0("Surface MHWs in ", name_season, " 2023 - resolution 1°x1°"))


  combined_plot <- grid.arrange(plot_original, plot_upscale, ncol = 2)
  print(combined_plot)
}

mhw_season_plot_comparison(mhw_raw_2023_north_atlantic, mhw_1x1_natlantic_2023_biomes, c("03", "04", "05"), "spring")
mhw_season_plot_comparison(mhw_raw_2023_north_atlantic, mhw_1x1_natlantic_2023_biomes, c("06", "07", "08"), "summer")
mhw_season_plot_comparison(mhw_raw_2023_north_atlantic, mhw_1x1_natlantic_2023_biomes, c("09", "10", "11", "12"), "autumn")
mhw_season_plot_comparison(mhw_raw_2023_north_atlantic, mhw_1x1_natlantic_2023_biomes, c("01", "02"), "winter")

mhw_season_plot<-function(upscale_dataset, months){
  months<-unique(format(as.Date(upscale_dataset$time), "%m"))
  print(months)
  #Plot
  plot_upscale<- base_map + 
  geom_point(data = upscale_dataset, aes(x = lon, y = lat, color = factor(most_freq_category)), size = 0.3) +
  scale_color_manual(values = c("I Moderate" = "darkgoldenrod1", 
                                "II Strong" = "darkorange",
                                "III Severe" = "darkred",
                                "IV Extreme" = "#21152B"), name = "Category")+
  guides(color = guide_legend(override.aes = list(shape = 15, size = 10)))+
  labs(title = paste0("Surface MHWs in 2023"),
       subtitle = paste0("Months: ", paste(months, collapse = ", ") ,"\nresolution 1°x1°"))
  return(plot_upscale)
}

#Select data in season_of_interest
spring_1x1<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("03", "04", "05"))
summer_1x1<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("06", "07", "08"))
autumn_1x1<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("09", "10", "11", "12"))
winter_1x1<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("01", "02"))

#Plot MHWs
spring_plot_1x1<-mhw_season_plot(spring_1x1, "spring")

[1] "03" "04" "05"

summer_plot_1x1<-mhw_season_plot(summer_1x1, "summer")

[1] "06" "07" "08"

autumn_plot_1x1<-mhw_season_plot(autumn_1x1, "autumn")

[1] "09" "10" "11" "12"

winter_plot_1x1<-mhw_season_plot(winter_1x1, "winter")

[1] "01" "02"

combined_plot <- grid.arrange(winter_plot_1x1, spring_plot_1x1, summer_plot_1x1, autumn_plot_1x1, ncol = 2, nrow=2)

Version	Author	Date
ad0d737	mlarriere	2024-04-16
718d84b	mlarriere	2024-04-14
c789e3f	mlarriere	2024-04-13
83d1fa3	mlarriere	2024-04-13
1ab66f1	mlarriere	2024-04-13
5475def	mlarriere	2024-04-13

print(combined_plot)

TableGrob (2 x 2) "arrange": 4 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (2-2,1-1) arrange gtable[layout]
4 4 (2-2,2-2) arrange gtable[layout]

Argo floats

We classify marine heat waves according to their intensity and propagation over the water column:

When the temperature anomaly is equal to or greater than 1°C, we consider the anomaly to be big, otherwise it is considered as small.

Then we look at the vertical propagation of the MHWS: - Shallow MHWs: When a big anomaly is detected between 0 and 100 meters depth.

- Medium MHWs: When a big anomaly is detected between 0 and 200 meters depth.

- Deep MHWs: When a big anomaly is detected between 0 and 600 meters depth.

#reorganize dataset
argo_data_2023 <- sst_with_biomes_2023[, c("file_id",  
                                     "depth", "lat", "lon", "date", 
                                     "profile_range", "biome_value", "temp")] 

#Time in the same format in the 2 dataframes
argo_data_2023 <- argo_data_2023 %>%
  mutate(time = as.Date(date))
mhw_1x1_natlantic_2023_biomes <- mhw_1x1_natlantic_2023_biomes %>%
  mutate(time = as.Date(time))

#Associating mhws category to the argo profiles
argo_mhws_categ<-left_join(mhw_1x1_natlantic_2023_biomes, argo_data_2023, by=c("lat", "lon", "time")) %>% 
  filter(!is.na(file_id)) %>%  #select only locations where there is an argo float
  rename(biome_value=biome_value.x) %>% 
  select(file_id, 
         depth, lat, lon, time, 
         avg_intensity, most_freq_category, 
         biome_value, temp) #Cleaning dataset

#Adding temperature anomaly
temp_anomaly_600m_2023 <- temp_anomaly_va_biomes_2023 %>%
  filter(profile_range==1) %>%  #only profile until 600m
  mutate(time = as.Date(date)) 

temp_anomaly_filtered_2023<- temp_anomaly_600m_2023[, c("file_id",  
                                     "depth", "lat", "lon", "time", 
                                     "biome_value", "anomaly")] 

#Datasets for each surface MHWs category
mhws_surface_categorisation<- function(argo_categ_dataset, anomaly_dataset, category){
  argo_surf_cat<-filter(argo_categ_dataset, most_freq_category==category)
  argo_anomaly_cat<- left_join(anomaly_dataset, argo_surf_cat, by=c("file_id","lat", "lon", "depth", "time", "biome_value")) %>% 
  filter(!is.na(anomaly), !is.na(most_freq_category))
  
  return(argo_anomaly_cat)
}

argo_anomaly_moderate<- mhws_surface_categorisation(argo_mhws_categ,temp_anomaly_filtered_2023, category='I Moderate')
argo_anomaly_strong<- mhws_surface_categorisation(argo_mhws_categ,temp_anomaly_filtered_2023, category='II Strong')
argo_anomaly_severe<- mhws_surface_categorisation(argo_mhws_categ,temp_anomaly_filtered_2023, category='III Severe')
argo_anomaly_extreme<- mhws_surface_categorisation(argo_mhws_categ,temp_anomaly_filtered_2023, category='IV Extreme')

distribution_anomaly <- function(argo_anomaly){
  #from oct to dec (most extreme floats) and only the first 100m
  argo_anomaly<- argo_anomaly %>% 
    filter(depth<=100, format(time, "%m")==c("10","11","12"))
  # Calculate mean and 99th percentile of anomaly values
  mean_anomaly <- mean(argo_anomaly$anomaly)
  n_floats<-length(unique(argo_anomaly$file_id))

  # Create histogram of anomaly values
  p <- ggplot(argo_anomaly, aes(x = anomaly)) +
    geom_histogram(binwidth = 0.2, fill = "skyblue", color = "black") +
    geom_vline(aes(xintercept = mean_anomaly, linetype = "Mean"), color = "red") +
    labs(title = "Distribution of Anomaly",
         subtitle=paste0("Surface MHWs = ", unique(argo_anomaly$most_freq_category), 
                                                   "\nDepth: 0-100m, Oct-Nov-Dec 2023 \nNumber of floats: ", n_floats),
         x = "Anomaly",
         y = "Frequency",
         linetype = "Statistic") +
    scale_linetype_manual(values = c("Mean" = "dashed"), 
                          name = "Statistic", 
                          labels = c(paste0("Mean = ", round(mean_anomaly, 2)))) +   
    scale_color_manual(values = "red", name = "Statistic") +
    theme_minimal() +
    theme(legend.position = "top",
          legend.box.background = element_rect(color = "black", size = 0.5))

  p
}


distribution_moderate<-distribution_anomaly(argo_anomaly_moderate)
distribution_strong<-distribution_anomaly(argo_anomaly_strong)
distribution_severe<-distribution_anomaly(argo_anomaly_severe)
distribution_extreme<-distribution_anomaly(argo_anomaly_extreme)

combined_stat<-  grid.arrange(distribution_moderate, distribution_strong, distribution_severe,distribution_extreme, ncol = 2, nrow=2)

Version	Author	Date
ad0d737	mlarriere	2024-04-16
50596ee	mlarriere	2024-04-14
bbbc1bb	mlarriere	2024-04-13
5475def	mlarriere	2024-04-13

print(combined_stat)

TableGrob (2 x 2) "arrange": 4 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (2-2,1-1) arrange gtable[layout]
4 4 (2-2,2-2) arrange gtable[layout]

# Defining the anomaly class as a function of anomaly value and depth
threshold <- argo_anomaly_moderate %>%
  filter(anomaly >= 1) %>% # Look only at positive anomaly values 
    group_by(depth) %>%
    summarise(depth_threshold = max(depth))
  
argo_anomaly_dataset <- argo_anomaly_moderate %>%
  mutate(anomaly_class = ifelse(anomaly >= 1 & depth <= threshold$depth_threshold, "Big", "Small")) %>%
  mutate(anomaly_class = factor(anomaly_class, levels = c("Big", "Small")))

# Group the data by float identifiers
max_depth_by_float <- argo_anomaly_moderate %>%
  filter(anomaly >= 1) %>%
  group_by(file_id, lat, lon) %>%
  summarise(max_depth = max(depth))

# Define mhw_class based on the maximum depth
max_depth_by_float <- max_depth_by_float %>%
  mutate(mhw_class = case_when(
    max_depth <= 100 ~ "Shallow",
    max_depth <= 200 ~ "Medium",
    TRUE ~ "Deep"
  ))

#Combining datasets
final_mhw_argo_cat<-left_join(max_depth_by_float, argo_anomaly_dataset, by=c("file_id", "lat","lon") )

 
plot_anomaly_categorisation <- function(anomaly_data) {
  anomaly_classes <- anomaly_data %>%
    distinct(depth, anomaly_class)

    ggplot(anomaly_data) +
    geom_rect(data = anomaly_data %>% distinct(depth, anomaly_class),
              aes(xmin = -Inf, xmax = Inf, ymin = lag(depth), ymax = depth, fill = anomaly_class),
              inherit.aes = FALSE) +
    geom_path(aes(x = anomaly , y = depth)) +
    geom_vline(xintercept = 0) +
    scale_fill_manual(values = c("Big" = "lightblue", "Small" = "lightyellow"), 
                                            name = "Anomaly Class",  labels = c("Big Anomaly", "Small Anomaly")
) +
    scale_y_reverse() +
    coord_cartesian(xlim = c(-4.5, 6)) +
    scale_x_continuous(breaks = c(-4, -2, 0, 2, 4, 6)) +
    labs(title = paste0('Anomaly profile in ', month.name[unique(month(anomaly_data$time))], ' 2023'),
         subtitle = paste0('Location: (', unique(anomaly_data$lat), ',', unique(anomaly_data$lon),')\n',
                          'Type of surface MHW: ', unique(anomaly_data$most_freq_category),"\n",
                           'Type of argo MHW: ', unique(anomaly_data$mhw_class)),
         x = "Temperature anomaly (°C)", y = 'depth (m)')
}

shallow<-plot_anomaly_categorisation(final_mhw_argo_cat %>% filter(file_id==20))
medium<-plot_anomaly_categorisation(final_mhw_argo_cat %>% filter(file_id==10))
deep<-plot_anomaly_categorisation(final_mhw_argo_cat %>% filter(file_id==6396))

combined_plot <- grid.arrange(shallow, medium, deep, ncol = 3)

Version	Author	Date
bbbc1bb	mlarriere	2024-04-13
5475def	mlarriere	2024-04-13

print(combined_plot)

TableGrob (1 x 3) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (1-1,3-3) arrange gtable[layout]

Argo floats clustering

To define the optimal number of cluster we use the elbow method. Then we apply a K-mean clustering on the final_mhw_argo_cat dataset.

Elbow Method

Kmean clustering per biomes in 2023

# code adapted from cluster_analysis_extreme.Rmd
plot_anomaly_clusters <- function(data_input, inum_clusters = elbow_optimal, 
                                  opt_max_iterations = 8, opt_n_start = 15) {
  #-------Preprocessing
  # wide table with each depth becoming a column
  anomaly_va_wide <- data_input %>%
    select(file_id, depth, anomaly) %>%
    pivot_wider(names_from = depth, values_from = anomaly)
    
  # Drop any rows with missing values N/A caused by gaps in climatology data
  anomaly_va_wide <- anomaly_va_wide %>% 
    drop_na()
    
  # Table for cluster analysis
  points <- anomaly_va_wide %>%
    column_to_rownames(var = "file_id")
  
  #-------Clustering
  kclusts <- tibble(k = inum_clusters) %>%
    mutate(
      kclust = map(k, ~ kmeans(points, .x, iter.max = opt_max_iterations, nstart = opt_n_start)),
      tidied = map(kclust, tidy),
      glanced = map(kclust, glance),
      augmented = map(kclust, augment, points)
    )


  profile_id <- kclusts %>%
    unnest(cols = c(augmented)) %>%
    select(file_id = .rownames,
           cluster = .cluster) %>%
    mutate(file_id = as.numeric(file_id),
           cluster = as.character(cluster))


  data_output <- full_join(data_input, profile_id)
  data_output <- data_output %>%
    filter(!is.na(cluster))

  # Summarize data for plotting
  anomaly_cluster_mean <- data_output %>%
    group_by(cluster, depth) %>%
    summarise(
      count_cluster = n(),
      anomaly_mean = mean(anomaly, na.rm = TRUE),
      anomaly_sd = sd(anomaly, na.rm = TRUE)
    ) %>%
    ungroup()

  #-----Plotting
  #defining colors
  num_colors <- min(5, inum_clusters)  # 5 colors maximum (max number of cluster=5)
  cluster_palette <- paletteer::paletteer_d("colorBlindness::Blue2DarkRed12Steps") 
  cluster_colors<-cluster_palette[1:num_colors]

  ggplot(anomaly_cluster_mean, aes(x = anomaly_mean, y = depth, color = factor(cluster))) +
    geom_path() +
    geom_ribbon(aes(xmax = anomaly_mean + anomaly_sd,
                    xmin = anomaly_mean - anomaly_sd,
                    y = depth, fill = factor(cluster)), alpha = 0.3, color=NA) +
    geom_vline(xintercept = 0) +
    scale_y_reverse() +
    labs(title = paste0('Anomaly Profiles by Cluster -- 2023'),
         subtitle = paste0('Months: ', paste(substr(month.abb[unique(month(data_input$time))], 1, 3)
                                             [order(unique(month(data_input$time)))], collapse = ", "), 
                           "\nBiomes: ", paste(unique(data_input$biome_value), collapse = ", ")),
         x = "Mean Anomaly", y = "Depth (m)") +
    scale_fill_manual(name = "Cluster", values = cluster_colors) +
    scale_color_manual(name = "Cluster", values = cluster_colors) +
    theme_minimal() +
    theme(legend.position = "bottom") +
    facet_wrap(~cluster, nrow = 1)  #1 subplot per cluster
}

#Clustered anomaly profiles
cluster_SPSS<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                      group_by(file_id, lat, lon) %>%
                                      filter(biome_value==3))

cluster_STSS<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                      group_by(file_id, lat, lon) %>%
                                      filter(biome_value==2))
cluster_STPS<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                      group_by(file_id, lat, lon) %>%
                                      filter(biome_value==1))

cluster_subplots_biome<- grid.arrange(cluster_SPSS, cluster_STSS, cluster_STPS, ncol=3, nrow=1)

Version	Author	Date
4a8157c	mlarriere	2024-04-16
ad0d737	mlarriere	2024-04-16

print(cluster_subplots_biome)

TableGrob (1 x 3) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (1-1,3-3) arrange gtable[layout]

Kmean clustering per seasons in 2023

cluster_winter<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                        group_by(file_id, lat, lon, biome_value) %>% 
                                        filter(format(time, "%m") %in% c("01","02")))
cluster_spring<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                        group_by(file_id, lat, lon) %>% 
                                        filter(format(time, "%m") %in% c("03","04", "05")))
cluster_summer<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                        group_by(file_id, lat, lon) %>% 
                                        filter(format(time, "%m") %in% c("06","07", "08")))
cluster_autumn<-plot_anomaly_clusters(final_mhw_argo_cat %>% 
                                        group_by(file_id, lat, lon) %>% 
                                        filter(format(time, "%m") %in% c("09","10", "11", "12")))


cluster_subplots<- grid.arrange(cluster_winter, cluster_spring, cluster_summer, cluster_autumn, ncol=2, nrow=2)

print(cluster_subplots)

TableGrob (2 x 2) "arrange": 4 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (2-2,1-1) arrange gtable[layout]
4 4 (2-2,2-2) arrange gtable[layout]

#Track 1 argo profile in lat, lon ,depth

test <- final_mhw_argo_cat %>% 
  filter(file_id==10) %>% 
  mutate(point_order = row_number())

# Convert most_freq_category to factor
test$most_freq_category <- factor(test$most_freq_category)
test <- test %>%
  arrange(desc(depth))

test$depth_reversed <- test$depth * -1

# Create a color palette based on the levels of most_freq_category
color_palette <- c("I Moderate" = "darkgoldenrod1", 
                   "II Strong" = "darkorange",
                   "III Severe" = "darkred",
                   "IV Extreme" = "#21152B")
# Create the 3D scatter plot
scatterplot3d(x = test$lon, 
              y = test$lat, 
              z = test$depth_reversed, 
              color = color_palette[as.numeric(test$most_freq_category)], 
              pch = 16,
              type = "p", 
              main = "3D Scatter Plot of 1 Argo Float",
              xlab = "Longitude", 
              ylab = "Latitude", 
              zlab = "Depth"
)

# Add legend
legend("topright", 
       legend = levels(test$most_freq_category), 
       pch = 16, 
       col = color_palette,
       title = "Most Frequency Category")

#reorganize dataset
argo_data_2023 <- sst_with_biomes_2023[, c("file_id",  
                                     "depth", "lat", "lon", "date", 
                                     "profile_range", "biome_value", "temp")] 
selected_latitude <- 60.5

# Filter data frames
mhw_subset <- mhw_1x1_natlantic_2023_biomes %>%
  filter(lat == selected_latitude, month=="02")# most_freq_category ==c ("IV Extreme",  "III Severe"))

# argo_subset <- argo_data_2023 %>%
#   filter(lat == selected_latitude)
# argo_subset$depth_reversed <- argo_subset$depth * -1

anomaly_subset <- temp_anomaly_va_biomes_2023 %>%
  filter(lat == selected_latitude)

# Define color palette for MHWs category
color_palette <- c("I Moderate" = "darkgoldenrod1", 
                   "II Strong" = "darkorange",
                   "III Severe" = "darkred",
                   "IV Extreme" = "#21152B")

plot_mhw <- ggplot() +
  geom_point(data = anomaly_subset, aes(x = lon, y = depth, fill = anomaly), shape=21, size = 2, color="transparent") +
  geom_point(data = mhw_subset, aes(x = lon, y = 0, color = most_freq_category), size = 1) +  
  scale_fill_continuous(low = "darkblue", high = "lightgreen", name = "Anomaly") +  
  scale_color_manual(name = "MHWs category", values = color_palette) +  
  labs(x = "Longitude", y = "Depth", color = "MHWs category") +  
  ggtitle(paste0("Argo temperature anomaly for latitude=", selected_latitude, "°, in december 2023")) +
  theme_minimal() +
  scale_y_reverse() +  
  guides(fill = guide_legend(title = "Anomaly \n climatology-obs")) 

plot_mhw

test_anomaly<-temp_anomaly_va_biomes_2023 %>% filter(file_id==6879)
test_mhw<-mhw_subset%>% filter(lon==-59.5)

plot_anomaly_profiles <- function(data, max_depth) {
  anomaly_overall_mean <- data %>%
    filter(depth <= max_depth) %>%
    group_by(depth) %>% 
    summarise(temp_count = n(),
              temp_anomaly_mean = mean(anomaly, na.rm = TRUE),
              temp_anomaly_sd = sd(anomaly, na.rm = TRUE))

  ggplot(anomaly_overall_mean) +
    geom_path(aes(x = temp_anomaly_mean, y = depth)) +
    geom_ribbon(aes(xmax = temp_anomaly_mean + temp_anomaly_sd,
                    xmin = temp_anomaly_mean - temp_anomaly_sd,
                    y = depth), alpha = 0.2) +
    geom_vline(xintercept = 0) +
    scale_y_reverse() +
    coord_cartesian(xlim = c(-4.5, 6)) +
    scale_x_continuous(breaks = c(-4, -2, 0, 2, 4, 6)) +
    labs(title = paste0('Mean anomaly profile (lat:60.5°, lon: -59.5) in dec 2023'),
         x = "temperature anomaly (°C)", y = 'depth (m)')
}

plot_anomaly_profiles(test_anomaly, "600")


#Temperature VS Depth
# look at the anomaly under catgoery 
temp_anomaly_filtered_2023 <- temp_anomaly_va_biomes_2023 %>%
  mutate(time = as.Date(date)) %>% 
  filter(depth==5, lat<35, lat>23, lon<(-20), lon>-40)

mean(temp_anomaly_filtered_2023$anomaly)
base_map <- base_map + lims(x= c(-50, -10), y = c(10, 40))

#Select data in season_of_interest
sept<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("09"), lat<35, lat>23, lon<(-20), lon>-40)
oct<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("10"), lat<35, lat>23, lon<(-20), lon>-40)
nov<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("11"),lat<35, lat>23, lon<(-20), lon>-40)
dec<-mhw_1x1_natlantic_2023_biomes %>% 
  filter(month %in% c("12"),lat<35, lat>23, lon<(-20), lon>-40)

#Plot MHWs
sept_plot_1x1<-mhw_season_plot(sept, "sept")
oct_plot_1x1<-mhw_season_plot(oct, "oct")
nov_plot_1x1<-mhw_season_plot(nov, "nov")
dec_plot_1x1<-mhw_season_plot(dec, "dec")

combined_plot <- grid.arrange(sept_plot_1x1, oct_plot_1x1, nov_plot_1x1, dec_plot_1x1, ncol = 2, nrow=2)
print(combined_plot)

#Argo float per season/group by month
argo_winter<-filter(argo_mhws_categ,  month(argo_mhws_categ$time) %in% c("1","2"))
argo_spring<-filter(argo_mhws_categ,  month(argo_mhws_categ$time) %in% c("3","4","5"))
argo_summer<-filter(argo_mhws_categ,  month(argo_mhws_categ$time) %in% c("6","7","8"))
argo_autumn<-filter(argo_mhws_categ,  month(argo_mhws_categ$time) %in% c("9","10","11", "12"))

argo_categorisation<-function(argo_dataset, chosen_months){
   categorisation_map <- base_map +
            geom_point(data = argo_dataset, aes(x = lon, y = lat, color = factor(most_freq_category))) +
            scale_color_manual(values = c("I Moderate" = "darkgoldenrod1", 
                                "II Strong" = "darkorange",
                                "III Severe" = "darkred",
                                "IV Extreme" = "#21152B"), name = "Category")+
  guides(color = guide_legend(override.aes = list(shape = 15, size = 10)))+
  labs(title = paste0("MHWs Categorization of Argo profiles in 2023"), 
       subtitle = paste0("months: ", chosen_months))
   return(categorisation_map)
}

#Plot Argo distrib
spring_argo<-argo_categorisation(argo_spring, "03, 04, 05")
summer_argo<-argo_categorisation(argo_summer, "06, 07, 08")
autumn_argo<-argo_categorisation(argo_autumn, "09, 10, 11, 12")
winter_argo<-argo_categorisation(argo_winter, "01, 02")

combined_plot <- grid.arrange(winter_argo, spring_argo, summer_argo, autumn_argo, ncol = 2, nrow = 2)
print(combined_plot)

Area of interest

#Defining are of interest spatially and temporally
month_interest=c("9", "10", "11", "12")
area_extent <- c(-45, 0, 20, 40)

argo_mhws_categ_focus<-argo_mhws_categ %>% 
  filter(month(time) %in% month_interest, lat>20, lat<40, lon>-45, lon<0, most_freq_category %in% c("III Severe", "IV Extreme"))

extreme_float<-argo_mhws_categ_focus %>% 
  filter(most_freq_category=="IV Extreme")

print(unique(extreme_float$file_id))


base_map <- base_map + lims(x= c(-50, 5), y = c(10, 50))

argo_focus_map<-argo_categorisation(argo_mhws_categ_focus, "09, 10, 11, 12")

3D maps

# devtools::install_github("tylermorganwall/rayshader")
# install.packages("rnaturalearth")
# library(scatterplot3d)
surface_argo5m<-temp_anomaly_va_biomes_2023 %>% 
  filter(depth==5)

library(marmap)
library(rayshader)
library(ggplot2)

#Bathymetric data
north_atlantic_bathymetry <- getNOAA.bathy(lon1 = 30, lon2 = -100,
                                           lat1 = 0, lat2 = 80, resolution = 10)

#Creating a custom palette of blues
blues <- c("lightsteelblue4", "lightsteelblue3","lightsteelblue2", "lightsteelblue1")

# Plotting the bathymetry with different colors for land and sea
plot(north_atlantic_bathymetry, image = TRUE, land = TRUE, lwd = 0.1,
bpal = list(c(0, max(north_atlantic_bathymetry), "grey"),
c(min(north_atlantic_bathymetry),0,blues)))

# Making the coastline more visible
plot(north_atlantic_bathymetry, deep = 0, shallow = 0, step = 0,
lwd = 0.4, add = TRUE)
scaleBathy(north_atlantic_bathymetry, deg = 2, x = "bottomleft", inset = 5) #adding scale

# points(surface_argo5m$lon, surface_argo5m$lat, pch = 19, cex = 0.3, asp = 1)

sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 15.5

Matrix products: default
BLAS:   /usr/local/R-4.2.2/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R-4.2.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] broom_1.0.5          paletteer_1.6.0      cluster_2.1.6       
 [4] gridExtra_2.3        scatterplot3d_0.3-44 viridis_0.6.2       
 [7] viridisLite_0.4.1    ggOceanMaps_1.3.4    ggspatial_1.1.7     
[10] oce_1.7-10           gsw_1.1-1            lubridate_1.9.0     
[13] timechange_0.1.1     forcats_0.5.2        stringr_1.5.0       
[16] dplyr_1.1.3          purrr_1.0.2          readr_2.1.3         
[19] tidyr_1.3.0          tibble_3.2.1         ggplot2_3.4.4       
[22] tidyverse_1.3.2      workflowr_1.7.0     

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0   colorspace_2.0-3    ellipsis_0.3.2     
 [4] class_7.3-20        rprojroot_2.0.3     fs_1.5.2           
 [7] rstudioapi_0.15.0   proxy_0.4-27        farver_2.1.1       
[10] fansi_1.0.3         xml2_1.3.3          codetools_0.2-18   
[13] cachem_1.0.6        knitr_1.41          jsonlite_1.8.3     
[16] dbplyr_2.2.1        rgeos_0.5-9         compiler_4.2.2     
[19] httr_1.4.4          backports_1.4.1     assertthat_0.2.1   
[22] fastmap_1.1.0       gargle_1.2.1        cli_3.6.1          
[25] later_1.3.0         htmltools_0.5.8.1   tools_4.2.2        
[28] gtable_0.3.1        glue_1.6.2          maps_3.4.1         
[31] Rcpp_1.0.10         cellranger_1.1.0    jquerylib_0.1.4    
[34] RNetCDF_2.6-1       raster_3.6-11       vctrs_0.6.4        
[37] lwgeom_0.2-10       xfun_0.35           ps_1.7.2           
[40] rvest_1.0.3         lifecycle_1.0.3     ncmeta_0.3.5       
[43] googlesheets4_1.0.1 terra_1.7-65        getPass_0.2-2      
[46] scales_1.2.1        hms_1.1.2           promises_1.2.0.1   
[49] parallel_4.2.2      rematch2_2.1.2      prismatic_1.1.2    
[52] yaml_2.3.6          sass_0.4.4          stringi_1.7.8      
[55] highr_0.9           e1071_1.7-12        rlang_1.1.1        
[58] pkgconfig_2.0.3     evaluate_0.18       lattice_0.20-45    
[61] sf_1.0-9            labeling_0.4.2      processx_3.8.0     
[64] tidyselect_1.2.0    magrittr_2.0.3      R6_2.5.1           
[67] generics_0.1.3      DBI_1.2.2           pillar_1.9.0       
[70] haven_2.5.1         whisker_0.4         withr_2.5.0        
[73] units_0.8-0         stars_0.6-0         abind_1.4-5        
[76] sp_1.5-1            modelr_0.1.10       crayon_1.5.2       
[79] KernSmooth_2.23-20  utf8_1.2.2          tzdb_0.3.0         
[82] rmarkdown_2.18      grid_4.2.2          readxl_1.4.1       
[85] callr_3.7.3         git2r_0.30.1        reprex_2.0.2       
[88] digest_0.6.30       classInt_0.4-8      httpuv_1.6.6       
[91] munsell_0.5.0       bslib_0.4.1

Marine Heat Waves categorisation

Marguerite Larriere & Jens Daniel Müller

17 April, 2024