Task

Set load options
Set cache directory
Load core data one year at a time - temperature, salinity and metadata data
Build consolidated all year file from series of yearly files - temperature, salinity and metadata data
A and AB flag focused temperature files

Last updated: 2023-10-10

Checks: 7 0

Knit directory: bgc_argo_r_argodata/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20211008)

The command set.seed(20211008) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: f5edbe3

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f5edbe3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    output/

Untracked files:
    Untracked:  code/Copy_VA.Rmd
    Untracked:  code/preprocess_code.Rmd
    Untracked:  code/start_background_job_bgc_load.R
    Untracked:  code/start_background_job_core_load.R

Unstaged changes:
    Modified:   analysis/argo_oxygen.Rmd
    Modified:   analysis/argo_ph.Rmd
    Modified:   analysis/argo_temp.Rmd
    Modified:   analysis/argo_temp_core.Rmd
    Modified:   analysis/extreme_temp_core.Rmd
    Modified:   analysis/oceanSODA_argo_temp.Rmd
    Modified:   analysis/variability_pH.Rmd
    Modified:   analysis/variability_temp.Rmd
    Modified:   code/Workflowr_project_managment.R
    Modified:   code/start_background_job.R
    Modified:   code/vertical_alignment.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/load_argo_core.Rmd) and HTML (docs/load_argo_core.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	f5edbe3	ds2n19	2023-10-10	Create simplified data sets and run for 2013 - 2023
html	7394ba8	ds2n19	2023-10-10	Build site.
Rmd	a5002da	ds2n19	2023-10-10	Create simplified data sets and run for 2013 - 2023
html	26f85f0	ds2n19	2023-10-06	Build site.
Rmd	2bd702c	ds2n19	2023-10-06	Changed core Argo location folders and run for 2022
html	c3381d0	ds2n19	2023-10-06	Build site.
Rmd	bc8d46d	ds2n19	2023-10-06	Changed core Argo location folders and run for 2022
html	8d5c853	pasqualina-vonlanthendinenna	2022-08-30	Build site.
Rmd	9bde106	pasqualina-vonlanthendinenna	2022-08-30	added 6 months of core data (still have to fix the dates
html	7b3d8c5	pasqualina-vonlanthendinenna	2022-08-29	Build site.
Rmd	8e81570	pasqualina-vonlanthendinenna	2022-08-29	load and add in core-argo data (1 month)

Task

Load core-Argo temperature data for comparison with BGC-Argo temperature data

Set load options

Determine if files are refreshed from dac or cache directory is used. Are metadata, temperature and salinity year files renewed? Are the consolidated all year files created from the individual year files?

# opt_refresh_cache
#   0 = do not refresh cache.
#   1 = refresh cache. (any none zero value will force a refresh)
opt_refresh_cache = 0

# opt_refresh_years_temp, opt_refresh_years_psal, opt_refresh_years_metadata
#   0 = do not refresh the yearly files. (any value <> 1 will omit annual refresh)
#   1 = refresh yearly files for given parameter.
#   year to be refreshed are set by opt_min_year and opt_max_year
opt_refresh_years_temp = 1
opt_refresh_years_psal = 1
opt_refresh_years_metadata = 1
opt_min_year = 2013
opt_max_year = 2023

# opt_consolidate_temp, opt_consolidate_psal, opt_consolidate_metadata
# Yearly files must have already been created!
#   0 = do not build consolidated file from previously written yearly files. (any value <> 1 will omit consolidation)
#   1 = build consolidated file from previously written yearly files for given parameter.
#   year to be included in the consolidation are set by opt_min_year and opt_max_year
opt_consolidate_temp = 0
opt_consolidate_psal = 0
opt_consolidate_metadata = 0

# opt_A_AB_files
# consolidated temp files must have already been created!
#   0 = do not build temp_A and temp_AB file from previously written consolidated files. (any value <> 1 will omit A and AB files)
#   1 = build temp_A and temp_AB file from previously written consolidated files.
opt_A_AB_files = 0

Set cache directory

Directory where the core-Argo profile files are stored

# set cache directory
argo_set_cache_dir(cache_dir = path_argo_core)

# check cache directory
argo_cache_dir()

[1] "/nfs/kryo/work/datasets/ungridded/3d/ocean/floats/core_argo_r_argodata"

# check argo mirror
argo_mirror()

[1] "https://data-argo.ifremer.fr"

# age argument: age of the cached files to update in hours (Inf means always use the cached file, and -Inf means always download from the server) 
# ex: max_global_cache_age = 5 updates files that have been in the cache for more than 5 hours, max_global_cache_age = 0.5 updates files that have been in the cache for more than 30 minutes, etc.
if (opt_refresh_cache == 0){
  argo_update_global(max_global_cache_age = Inf)  
  argo_update_data(max_data_cache_age = Inf)
} else {
  argo_update_global(max_global_cache_age = -Inf)  
  argo_update_data(max_data_cache_age = -Inf)
}

Load core data one year at a time - temperature, salinity and metadata data

#------------------------------------------------------------------------------
# Important - file are loaded for the given year processed and the files written to disk.
#------------------------------------------------------------------------------

for (target_year in opt_min_year:opt_max_year) {


  # if updating any year files it will be based on the initial index file core_index
  if (opt_refresh_years_temp == 1 | opt_refresh_years_psal == 1 | opt_refresh_years_metadata == 1)
  {
    core_index <- argo_global_prof() %>% 
      argo_filter_data_mode(data_mode = 'delayed') %>% 
      argo_filter_date(date_min = paste0(target_year, "-01-01"),
                       date_max = paste0(target_year, "-12-31"))
  }

  # if temp or psal are being updated get the profile data
  if (opt_refresh_years_temp == 1 | opt_refresh_years_psal == 1)
  {
    # read in the profiles (takes a while)
    core_data_yr <- argo_prof_levels(
      path = core_index,
      vars =
        c(
          'PRES_ADJUSTED',
          'PRES_ADJUSTED_QC',
          'PSAL_ADJUSTED',
          'PSAL_ADJUSTED_QC',
          'TEMP_ADJUSTED',
          'TEMP_ADJUSTED_QC'),
      quiet = TRUE
    )
    
    # We only want the synthesized profiles i.e. n_prof == 1
    core_data_yr <- core_data_yr %>%
    filter(n_prof == 1)

  } 

  # if updating metadata get the file based on core_index
  if (opt_refresh_years_metadata == 1)
  {
    # read associated metadata
    core_metadata_yr <- argo_prof_prof(path = core_index)
    
    # We only want the synthesized profiles i.e. n_prof == 1
    core_metadata_yr <- core_metadata_yr %>%
    filter(n_prof == 1)

  }

  # ------------------------------------------------------------------------------
  # Process temperature file
  # ------------------------------------------------------------------------------
  if (opt_refresh_years_temp == 1)
  {    
    # Base temperature data where qc flag = good
    core_data_temp_yr <- core_data_yr %>%
      filter(
        pres_adjusted_qc %in% c(1, 8) &
        temp_adjusted_qc %in% c(1, 8)
      ) %>%
      select (
        file,
        n_levels,
        pres_adjusted,
        temp_adjusted
      )
    
    # join to index to incorporate date, lat and lon
    core_data_temp_yr <- left_join(core_data_temp_yr, core_index)
    core_data_temp_yr <- core_data_temp_yr %>%
      select(    
        file,
        date,
        latitude,
        longitude,
        n_levels,
        pres_adjusted,
        temp_adjusted
    )
    
    # resolve lat and lon and derive depth using TEOS=10
    core_merge_temp_yr <- core_data_temp_yr %>%
      rename(lon = longitude,
             lat = latitude) %>%
      mutate(lon = if_else(lon < 20, lon + 360, lon)) %>%
      mutate(
        lat = cut(lat, seq(-90, 90, 1), seq(-89.5, 89.5, 1)),
        lat = as.numeric(as.character(lat)),
        lon = cut(lon, seq(20, 380, 1), seq(20.5, 379.5, 1)),
        lon = as.numeric(as.character(lon))
      ) %>%
      mutate(depth = gsw_z_from_p(pres_adjusted, latitude =  lat)*-1.0,
             .before = pres_adjusted)
  
    # write this years file
    core_merge_temp_yr %>% 
      write_rds(file = paste0(path_argo_core_preprocessed, "/", target_year, "_core_data_temp.rds"))

  }
  
  # ------------------------------------------------------------------------------
  # Process salinity file
  # ------------------------------------------------------------------------------
  if (opt_refresh_years_psal == 1)
  {    
    # Base salinity data where qc flag = good
    core_data_psal_yr <- core_data_yr %>%
      filter(
        pres_adjusted_qc %in% c(1, 8) &
        psal_adjusted_qc %in% c(1, 8)
      ) %>%
      select (
        file,
        n_levels,
        pres_adjusted,
        psal_adjusted
      )

    # join to index to incorporate date, lat and lon
    core_data_psal_yr <- left_join(core_data_psal_yr, core_index)
    core_data_psal_yr <- core_data_psal_yr %>%
      select(    
        file,
        date,
        latitude,
        longitude,
        n_levels,
        pres_adjusted,
        psal_adjusted
    )

    # resolve lat and lon and derive depth using TEOS=10
    core_merge_psal_yr <- core_data_psal_yr %>%
      rename(lon = longitude,
             lat = latitude) %>%
      mutate(lon = if_else(lon < 20, lon + 360, lon)) %>%
      mutate(
        lat = cut(lat, seq(-90, 90, 1), seq(-89.5, 89.5, 1)),
        lat = as.numeric(as.character(lat)),
        lon = cut(lon, seq(20, 380, 1), seq(20.5, 379.5, 1)),
        lon = as.numeric(as.character(lon))
      ) %>%
      mutate(depth = gsw_z_from_p(pres_adjusted, latitude =  lat)*-1.0,
             .before = pres_adjusted)
    
    # write this years file
    core_merge_psal_yr %>% 
      write_rds(file = paste0(path_argo_core_preprocessed, "/", target_year, "_core_data_psal.rds"))

  }


  # ------------------------------------------------------------------------------
  # Process metadata file
  # ------------------------------------------------------------------------------
  if (opt_refresh_years_metadata == 1)
  {
    # resolve lat and lon so that it is hamonised with data files
    core_metadata_yr <- core_metadata_yr %>%
      rename(lon = longitude,
             lat = latitude) %>%
      mutate(lon = if_else(lon < 20, lon + 360, lon)) %>%
      mutate( 
        lat = cut(lat, seq(-90, 90, 1), seq(-89.5, 89.5, 1)),
        lat = as.numeric(as.character(lat)),
        lon = cut(lon, seq(20, 380, 1), seq(20.5, 379.5, 1)),
        lon = as.numeric(as.character(lon))
      )
  
    # Select just the columns we are interested in
    core_metadata_yr <- core_metadata_yr %>%
      select (
        file,
        date,
        lat,
        lon,
        platform_number, 
        cycle_number,
        position_qc,
        profile_pres_qc,
        profile_temp_qc,
        profile_psal_qc
      )

    # write this years file
    core_metadata_yr %>% 
      write_rds(file = paste0(path_argo_core_preprocessed, "/", target_year, "_core_metadata.rds"))

  }
  
}

Build consolidated all year file from series of yearly files - temperature, salinity and metadata data

# ------------------------------------------------------------------------------
# Process temperature file
# ------------------------------------------------------------------------------
if (opt_consolidate_temp == 1){
  consolidated_created = 0
  
  for (target_year in opt_min_year:opt_max_year) {

    # read the yearly file based on target_year
    core_data_temp_yr <-
    read_rds(file = paste0(path_argo_core_preprocessed, "/", target_year, "_core_data_temp.rds"))

    # Combine into a consolidated all years file
    if (consolidated_created == 0) {
      core_data_temp <- core_data_temp_yr
      consolidated_created = 1
    } else {
      core_data_temp <- rbind(core_data_temp, core_data_temp_yr)
    }
  }
  
  # write consolidated files  
  core_data_temp %>% 
    write_rds(file = paste0(path_argo_core_preprocessed, "/core_data_temp.rds"))

  # remove files to free space
  rm(core_data_temp)
  rm(core_data_temp_yr)
  
}

# ------------------------------------------------------------------------------
# Process salinity file
# ------------------------------------------------------------------------------
if (opt_consolidate_psal == 1){
  consolidated_created = 0
  
  for (target_year in opt_min_year:opt_max_year) {

    # read the yearly file based on target_year
    core_data_psal_yr <-
    read_rds(file = paste0(path_argo_core_preprocessed, "/", target_year, "_core_data_psal.rds"))

    # Combine into a consolidated all years file
    if (consolidated_created == 0) {
      core_data_psal <- core_data_psal_yr
      consolidated_created = 1
    } else {
      core_data_psal <- rbind(core_data_psal, core_data_psal_yr)
    }
  }
  
  # write consolidated files  
  core_data_psal %>% 
    write_rds(file = paste0(path_argo_core_preprocessed, "/core_data_psal.rds"))

  # remove files to free space
  rm(core_data_psal)
  rm(core_data_psal_yr)
  
}

# ------------------------------------------------------------------------------
# Process metadata file
# ------------------------------------------------------------------------------
if (opt_consolidate_metadata == 1){
  consolidated_created = 0
  
  for (target_year in opt_min_year:opt_max_year) {

    # read the yearly file based on target_year
    core_metadata_yr <-
    read_rds(file = paste0(path_argo_core_preprocessed, "/", target_year, "_core_metadata.rds"))

    # Combine into a consolidated all years file
    if (consolidated_created == 0) {
      core_metadata <- core_metadata_yr
      consolidated_created = 1
    } else {
      core_metadata <- rbind(core_metadata, core_metadata_yr)
    }
  }
  
  # write consolidated files  
  core_metadata %>% 
    write_rds(file = paste0(path_argo_core_preprocessed, "/core_metadata.rds"))

  # remove files to free space
  rm(core_metadata)
  rm(core_metadata_yr)
  
}

A and AB flag focused temperature files

if (opt_A_AB_files == 1){

#   # Read temp and meta data and merge
#   core_data_temp <-
#   read_rds(file = paste0(path_argo_core_preprocessed, "/core_data_temp.rds"))
#   core_metadata <-
#   read_rds(file = paste0(path_argo_core_preprocessed, "/core_metadata.rds"))
# 
#   core_merge <- left_join(core_data_temp, core_metadata)
#   rm(core_data_temp)
#   rm(core_metadata)
#   rm(core_merge)
# 
#   core_data_psal <-
#   read_rds(file = paste0(path_argo_core_preprocessed, "/2023_core_data_psal.rds"))
#   
#   
#   # Select just A profiles into core_temp_flag_A
#   core_temp_flag_A <- core_merge %>% 
#     filter(profile_temp_qc == 'A') %>% 
#     select(lat, 
#            lon, 
#            date, 
#            depth, 
#            temp_adjusted, 
#            temp_adjusted_qc,
# #           platform_number, 
# #           cycle_number,
#            profile_temp_qc)
# 
#   # write core_temp_flag_A
#   core_temp_flag_A %>% 
#     write_rds(file = paste0(path_argo_core_preprocessed, "/core_temp_flag_A.rds"))
#   rm(core_temp_flag_A)
# 
#   # Select just AB profiles into core_temp_flag_A
#   core_temp_flag_AB <- core_merge %>% 
#     filter(profile_temp_qc == 'A' | profile_temp_qc == 'B') %>% 
#     select(lat, 
#            lon, 
#            date, 
#            depth, 
#            temp_adjusted, 
#            temp_adjusted_qc,
# #           platform_number, 
# #           cycle_number,
#            profile_temp_qc)
# 
#   # write core_temp_flag_AB
#   core_temp_flag_AB %>% 
#     write_rds(file = paste0(path_argo_core_preprocessed, "/core_temp_flag_AB.rds"))
#   rm(list = ls(pattern = 'core_'))
  
}

sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 15.4

Matrix products: default
BLAS:   /usr/local/R-4.2.2/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R-4.2.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] oce_1.7-10       gsw_1.1-1        sf_1.0-9         lubridate_1.9.0 
 [5] timechange_0.1.1 argodata_0.1.0   forcats_0.5.2    stringr_1.4.1   
 [9] dplyr_1.0.10     purrr_0.3.5      readr_2.1.3      tidyr_1.2.1     
[13] tibble_3.1.8     ggplot2_3.4.0    tidyverse_1.3.2 

loaded via a namespace (and not attached):
 [1] httr_1.4.4          sass_0.4.4          bit64_4.0.5        
 [4] vroom_1.6.0         jsonlite_1.8.3      modelr_0.1.10      
 [7] bslib_0.4.1         assertthat_0.2.1    googlesheets4_1.0.1
[10] cellranger_1.1.0    progress_1.2.2      yaml_2.3.6         
[13] pillar_1.8.1        backports_1.4.1     glue_1.6.2         
[16] digest_0.6.30       promises_1.2.0.1    rvest_1.0.3        
[19] colorspace_2.0-3    htmltools_0.5.3     httpuv_1.6.6       
[22] pkgconfig_2.0.3     broom_1.0.1         haven_2.5.1        
[25] scales_1.2.1        whisker_0.4         later_1.3.0        
[28] tzdb_0.3.0          git2r_0.30.1        proxy_0.4-27       
[31] googledrive_2.0.0   generics_0.1.3      ellipsis_0.3.2     
[34] cachem_1.0.6        withr_2.5.0         cli_3.4.1          
[37] magrittr_2.0.3      crayon_1.5.2        readxl_1.4.1       
[40] evaluate_0.18       fs_1.5.2            fansi_1.0.3        
[43] xml2_1.3.3          class_7.3-20        prettyunits_1.1.1  
[46] tools_4.2.2         hms_1.1.2           gargle_1.2.1       
[49] lifecycle_1.0.3     munsell_0.5.0       reprex_2.0.2       
[52] compiler_4.2.2      jquerylib_0.1.4     e1071_1.7-12       
[55] RNetCDF_2.6-1       rlang_1.1.1         classInt_0.4-8     
[58] units_0.8-0         grid_4.2.2          rstudioapi_0.14    
[61] rmarkdown_2.18      gtable_0.3.1        DBI_1.1.3          
[64] R6_2.5.1            knitr_1.41          bit_4.0.5          
[67] fastmap_1.1.0       utf8_1.2.2          workflowr_1.7.0    
[70] rprojroot_2.0.3     KernSmooth_2.23-20  stringi_1.7.8      
[73] parallel_4.2.2      Rcpp_1.0.10         vctrs_0.5.1        
[76] dbplyr_2.2.1        tidyselect_1.2.0    xfun_0.35

Load Core-Argo Data

David Stappard, Pasqualina Vonlanthen & Jens Daniel Müller