Last updated: 2021-10-19

Checks: 7 0

Knit directory: bgc_argo_r_argodata/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211008) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f460b9a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   code/Workflowr_project_managment.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/loading_data.Rmd) and HTML (docs/loading_data.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f460b9a jens-daniel-mueller 2021-10-19 code review jens
html c5e1577 pasqualina-vonlanthendinenna 2021-10-18 Build site.
Rmd 8cc0106 pasqualina-vonlanthendinenna 2021-10-18 added description in loading_data
html 305873e pasqualina-vonlanthendinenna 2021-10-18 Build site.
Rmd 6b484e7 pasqualina-vonlanthendinenna 2021-10-18 added description in loading_data
html 83724a0 pasqualina-vonlanthendinenna 2021-10-14 Build site.
html 4840e49 pasqualina-vonlanthendinenna 2021-10-12 Build site.
Rmd 2fb35f7 pasqualina-vonlanthendinenna 2021-10-12 added reading data in page
html ff925ab pasqualina-vonlanthendinenna 2021-10-11 Build site.
Rmd 7e0cf34 pasqualina-vonlanthendinenna 2021-10-11 added reading data in page

Task

Using the argodata package to load in bgc argo data from the server and store it in a dataframe with the corresponding metadata

Set cache directory

The cache directory stores previously downloaded files to access them more quickly. Cached files are used indefinitely by default because of the considerable time it takes to refresh them. If you use a persistent cache, you should update the index files regularly by using argo_update_global() (data files are also updated occasionally; update these using argo_update_data())

# set cache directory
argo_set_cache_dir(cache_dir = path_argo)

# check cache directory
argo_cache_dir()
[1] "/nfs/kryo/work/updata/bgc_argo_r_argodata"
# check argo mirror
argo_mirror()
[1] "https://data-argo.ifremer.fr"
argo_update_global(max_global_cache_age = Inf)  # age argument: age of the cached files to update in hours (Inf means always use the cached file, and -Inf means always download from the server) 
# ex: max_global_cache_age = 5 updates files that have been in the cache for more than 5 hours, max_global_cache_age = 0.5 updates files that have been in the cache for more than 30 minutes, etc.
argo_update_data(max_data_cache_age = Inf)

Load index files

Load in the synthetic (core and bgc merged) index files (uses the data stored on the ifremer server by default), keeping only delayed-mode data (quality checked by PIs)

bgc_subset <- argo_global_synthetic_prof() %>%
  argo_filter_data_mode(data_mode = 'delayed') %>%
  argo_filter_date(date_min = '2013-01-01',
                   date_max = '2015-12-31')
Loading argo_global_synthetic_prof()
# check the dates 
# max(bgc_subset$date, na.rm = TRUE)
# min(bgc_subset$date, na.rm = TRUE)

# checking alternative functions
argo_global_meta(download = NULL, quiet = FALSE)
Loading argo_global_meta()
# A tibble: 16,781 × 4
   file                     profiler_type institution date_update        
   <chr>                            <dbl> <chr>       <dttm>             
 1 aoml/13857/13857_meta.nc           845 AO          2018-10-11 20:00:14
 2 aoml/13858/13858_meta.nc           845 AO          2018-10-11 20:00:15
 3 aoml/13859/13859_meta.nc           845 AO          2018-10-11 20:00:25
 4 aoml/15819/15819_meta.nc           845 AO          2018-10-11 20:00:16
 5 aoml/15820/15820_meta.nc           845 AO          2018-10-11 20:00:18
 6 aoml/15851/15851_meta.nc           845 AO          2018-10-11 20:00:26
 7 aoml/15852/15852_meta.nc           845 AO          2018-10-11 20:00:28
 8 aoml/15853/15853_meta.nc           845 AO          2018-10-11 20:00:29
 9 aoml/15854/15854_meta.nc           845 AO          2018-10-11 20:00:30
10 aoml/15855/15855_meta.nc           845 AO          2018-10-11 20:00:34
# … with 16,771 more rows
argo_global_prof(download = NULL, quiet = FALSE)
Loading argo_global_prof()
# A tibble: 2,523,178 × 8
   file   date                latitude longitude ocean profiler_type institution
   <chr>  <dttm>                 <dbl>     <dbl> <chr>         <dbl> <chr>      
 1 aoml/… 1997-07-29 20:03:00    0.267     -16.0 A               845 AO         
 2 aoml/… 1997-08-09 19:21:12    0.072     -17.7 A               845 AO         
 3 aoml/… 1997-08-20 18:45:45    0.543     -19.6 A               845 AO         
 4 aoml/… 1997-08-31 19:39:05    1.26      -20.5 A               845 AO         
 5 aoml/… 1997-09-11 18:58:08    0.72      -20.8 A               845 AO         
 6 aoml/… 1997-09-22 19:57:02    1.76      -21.6 A               845 AO         
 7 aoml/… 1997-10-03 19:15:49    2.60      -21.6 A               845 AO         
 8 aoml/… 1997-10-14 18:39:35    1.76      -21.6 A               845 AO         
 9 aoml/… 1997-10-25 19:32:34    1.80      -21.8 A               845 AO         
10 aoml/… 1997-11-05 18:51:42    1.64      -21.4 A               845 AO         
# … with 2,523,168 more rows, and 1 more variable: date_update <dttm>
argo_global_tech(download = NULL, quiet = FALSE)
Loading argo_global_tech()
# A tibble: 16,245 × 3
   file                     institution date_update        
   <chr>                    <chr>       <dttm>             
 1 aoml/13857/13857_tech.nc AO          2021-04-28 20:03:35
 2 aoml/13858/13858_tech.nc AO          2021-04-28 20:03:37
 3 aoml/13859/13859_tech.nc AO          2018-10-11 20:00:25
 4 aoml/15819/15819_tech.nc AO          2021-04-28 20:03:42
 5 aoml/15820/15820_tech.nc AO          2021-04-28 20:03:46
 6 aoml/15851/15851_tech.nc AO          2018-10-11 20:00:26
 7 aoml/15852/15852_tech.nc AO          2021-04-28 20:03:54
 8 aoml/15853/15853_tech.nc AO          2021-04-28 20:03:58
 9 aoml/15854/15854_tech.nc AO          2021-04-28 20:03:59
10 aoml/15855/15855_tech.nc AO          2021-04-28 20:04:05
# … with 16,235 more rows
argo_global_traj(download = NULL, quiet = FALSE)
Loading argo_global_traj()
# A tibble: 17,842 × 8
   file      latitude_max latitude_min longitude_max longitude_min profiler_type
   <chr>            <dbl>        <dbl>         <dbl>         <dbl>         <dbl>
 1 aoml/138…         6.93        0.008        -15.0         -33.8            845
 2 aoml/138…         5.21       -0.363         -9.50        -17.8            845
 3 aoml/138…         5.93       -0.939        -18.6         -33.7            845
 4 aoml/158…        -2.66       -9.19         -16.4         -40.0            845
 5 aoml/158…        -1.98       -7.14          -9.90        -35.8            845
 6 aoml/158…        -2.73       -6.22           3.33        -21.1            845
 7 aoml/158…        -5.04       -8.44          -1.49        -18.9            845
 8 aoml/158…        -4.70       -8.25          -4.40        -18.0            845
 9 aoml/158…        -4.81       -7.20          -6.41        -12.7            845
10 aoml/158…        -1.79       -6.00           7            -5.45           845
# … with 17,832 more rows, and 2 more variables: institution <chr>,
#   date_update <dttm>
argo_global_bio_traj(download = NULL, quiet = FALSE)
Loading argo_global_bio_traj()
# A tibble: 657 × 9
   file      latitude_max latitude_min longitude_max longitude_min profiler_type
   <chr>            <dbl>        <dbl>         <dbl>         <dbl>         <dbl>
 1 bodc/390…           NA           NA            NA            NA           836
 2 bodc/390…           NA           NA            NA            NA           836
 3 bodc/390…           NA           NA            NA            NA           836
 4 bodc/390…           NA           NA            NA            NA           836
 5 bodc/390…           NA           NA            NA            NA           836
 6 bodc/690…           NA           NA            NA            NA           836
 7 bodc/690…           NA           NA            NA            NA           836
 8 bodc/690…           NA           NA            NA            NA           836
 9 bodc/690…           NA           NA            NA            NA           836
10 bodc/690…           NA           NA            NA            NA           836
# … with 647 more rows, and 3 more variables: institution <chr>,
#   parameters <chr>, date_update <dttm>
argo_global_bio_prof(download = NULL, quiet = FALSE)
Loading argo_global_bio_prof()
# A tibble: 247,164 × 10
   file   date                latitude longitude ocean profiler_type institution
   <chr>  <dttm>                 <dbl>     <dbl> <chr>         <dbl> <chr>      
 1 aoml/… 2006-10-22 02:16:24    -40.3      73.4 I               846 AO         
 2 aoml/… 2006-11-01 06:44:23    -40.4      73.5 I               846 AO         
 3 aoml/… 2006-11-11 10:12:22    -40.5      73.3 I               846 AO         
 4 aoml/… 2006-11-21 07:50:21    -40.1      73.1 I               846 AO         
 5 aoml/… 2006-12-01 18:33:00    -39.6      73.2 I               846 AO         
 6 aoml/… 2006-12-11 22:27:04    -39.4      73.5 I               846 AO         
 7 aoml/… 2006-12-22 05:20:18    -39.8      74.0 I               846 AO         
 8 aoml/… 2007-01-01 06:34:17    -40.0      74.2 I               846 AO         
 9 aoml/… 2007-01-11 12:14:56    -39.4      75.3 I               846 AO         
10 aoml/… 2007-01-21 04:22:15    -38.5      76.0 I               846 AO         
# … with 247,154 more rows, and 3 more variables: parameters <chr>,
#   parameter_data_mode <chr>, date_update <dttm>
argo_global_synthetic_prof(download = NULL, quiet = FALSE)
# A tibble: 246,717 × 10
   file   date                latitude longitude ocean profiler_type institution
   <chr>  <dttm>                 <dbl>     <dbl> <chr>         <dbl> <chr>      
 1 aoml/… 2006-10-22 02:16:24    -40.3      73.4 I               846 AO         
 2 aoml/… 2006-11-01 06:44:23    -40.4      73.5 I               846 AO         
 3 aoml/… 2006-11-11 10:12:22    -40.5      73.3 I               846 AO         
 4 aoml/… 2006-11-21 07:50:21    -40.1      73.1 I               846 AO         
 5 aoml/… 2006-12-01 18:33:00    -39.6      73.2 I               846 AO         
 6 aoml/… 2006-12-11 22:27:04    -39.4      73.5 I               846 AO         
 7 aoml/… 2006-12-22 05:20:18    -39.8      74.0 I               846 AO         
 8 aoml/… 2007-01-01 06:34:17    -40.0      74.2 I               846 AO         
 9 aoml/… 2007-01-11 12:14:56    -39.4      75.3 I               846 AO         
10 aoml/… 2007-01-21 04:22:15    -38.5      76.0 I               846 AO         
# … with 246,707 more rows, and 3 more variables: parameters <chr>,
#   parameter_data_mode <chr>, date_update <dttm>

Read data

Read in the adjusted bgc and core variables corresponding to the index files downloaded above, with their quality control flags. (can take a while)

bgc_data <- argo_prof_levels(
  path = bgc_subset,
  vars =
    c(
      'PRES_ADJUSTED',
      'PRES_ADJUSTED_QC',
      'PRES_ADJUSTED_ERROR',
      'PSAL_ADJUSTED',
      'PSAL_ADJUSTED_QC',
      'PSAL_ADJUSTED_ERROR',
      'TEMP_ADJUSTED',
      'TEMP_ADJUSTED_QC',
      'TEMP_ADJUSTED_ERROR',
      'DOXY_ADJUSTED',
      'DOXY_ADJUSTED_QC',
      'DOXY_ADJUSTED_ERROR',
      'NITRATE_ADJUSTED',
      'NITRATE_ADJUSTED_QC',
      'NITRATE_ADJUSTED_ERROR',
      'PH_IN_SITU_TOTAL_ADJUSTED',
      'PH_IN_SITU_TOTAL_ADJUSTED_QC',
      'PH_IN_SITU_TOTAL_ADJUSTED_ERROR'
    ),
  quiet = TRUE
) 
# read in the profiles (takes a while)

The data is read in from the cached files stored in the path specified in set_argo_cache_dir() (in this case, /nfs/kryo/work/updata/bgc_argo_r_argodata). To download data directly from the files stored on the ifremer server, set max_global_cache_age and max_data_cache_age to -Inf, which will force a new download.

Read meta data

Read in the corresponding metadata:

bgc_metadata <- argo_prof_prof(path = bgc_subset)
Extracting from 54117 files

Join data

Join the metadata and data together into one dataset

# this does the same, but is more intuitive to read, imo
bgc_merge <-
  full_join(bgc_data, bgc_metadata)
Joining, by = c("file", "n_prof")

Write data

path_argo_preprocessed <- paste0(path_argo,"/preprocessed_bgc_data")

bgc_subset %>%
  write_rds(file = paste0(path_argo_preprocessed, "/bgc_subset.rds"))

bgc_metadata %>%
  write_rds(file = paste0(path_argo_preprocessed, "/bgc_metadata.rds"))

bgc_data %>%
  write_rds(file = paste0(path_argo_preprocessed, "/bgc_data.rds"))

bgc_merge %>%
  write_rds(file = paste0(path_argo_preprocessed, "/bgc_merge.rds"))

Data set description

Colums

The resulting dataframe contains:

  • the file name (‘file’ column)

  • the sampling level (‘n_level’ column)

  • the number of profiles per file (‘n_prof’ column)

  • adjusted values for pressure (PRES), salinity (PSAL), temperature (TEMP), dissolved oxygen (DOXY), pH (PH_IN_SITU_TOTAL), and nitrate (NITRATE) (‘parameter_adjusted’ columns)

  • a quality control flag associated with these adjusted values (‘parameter_adjusted_qc’ columns)

  • an error estimate on the adjustment of the measurement (‘parameter_adjusted_error’ columns)

  • WMO float identifier (‘platform_number’ column)

  • name of the project in charge of the float (‘project_name’ column)

  • name of principal investigator in charge of the float (‘pi_name’ column)

  • float cycle number (‘cycle_number’ column; cycle number 0 is the launch cycle and may not be complete, cycle number 1 is the first complete cycle)

  • descending (D) or ascending (A) profile (‘direction’ column)

  • code for the data centre in charge of the float data management (‘data_centre’ column),
    the type of float (‘platform_type’ column)

  • firmware version of the float (‘firmware_version’ column)

  • instrument type from the WMO code table 1770 (‘wmo_inst_type’ column)

  • the date and time at which the measurement was taken (‘date’ column)

  • a quality control flag for the date and time value (‘date_qc’ column)

  • the date and time of the profile location (‘date_location’ column)

  • latitude in degrees N (‘latitude’ column)

  • longitude in degrees E (‘longitude’ column)

  • quality control flag on the position (‘position_qc’ column)

  • name of the system in charge of positioning the float locations (‘positioning_system’ columns)

  • unique number of the mission to which this float belongs (‘config_mission_number’ column)

  • and a quality control flag on the profile of the parameter (‘profile__qc’ column)

QC flags

QC flags for values (‘parameter_adjusted_qc’ columns) are between 1 and 8, where:
1 is ‘good’ data,
2 is ‘probably good’ data,
3 is ‘probably bad’ data,
4 is ‘bad’ data,
5 is ‘value changed’,
8 is ‘estimated value’,
9 is ‘missing value’,
(6 and 7 are not used).

Profile QC flags (‘profile_parameter_qc’ columns) are QC codes attributed to the entire profile, and indicate the number of depth levels (in %) where the value is considered to be good data (QC flags of 1, 2, 5, and 8):
‘A’ means 100% of profile levels contain good data,
‘B’ means 75-<100% of profile levels contain good data,
‘C’ means 50-75% of profile levels contain good data,
‘D’ means 25-50% of profile levels contain good data,
‘E’ means >0-50% of profile levels contain good data,
‘F’ means 0% of profile levels contain good data.


sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 15.2

Matrix products: default
BLAS:   /usr/local/R-4.0.3/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R-4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.7.9     argodata_0.0.0.9000 forcats_0.5.0      
 [4] stringr_1.4.0       dplyr_1.0.5         purrr_0.3.4        
 [7] readr_1.4.0         tidyr_1.1.3         tibble_3.1.3       
[10] ggplot2_3.3.5       tidyverse_1.3.0     workflowr_1.6.2    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5        prettyunits_1.1.1 assertthat_0.2.1  rprojroot_2.0.2  
 [5] digest_0.6.27     utf8_1.1.4        R6_2.5.0          cellranger_1.1.0 
 [9] backports_1.1.10  reprex_0.3.0      evaluate_0.14     httr_1.4.2       
[13] pillar_1.6.2      progress_1.2.2    rlang_0.4.11      readxl_1.3.1     
[17] rstudioapi_0.13   whisker_0.4       jquerylib_0.1.4   blob_1.2.1       
[21] rmarkdown_2.10    bit_4.0.4         munsell_0.5.0     broom_0.7.9      
[25] compiler_4.0.3    httpuv_1.5.4      modelr_0.1.8      xfun_0.25        
[29] pkgconfig_2.0.3   htmltools_0.5.1.1 tidyselect_1.1.0  fansi_0.4.1      
[33] tzdb_0.1.2        crayon_1.3.4      dbplyr_1.4.4      withr_2.3.0      
[37] later_1.2.0       grid_4.0.3        jsonlite_1.7.1    gtable_0.3.0     
[41] lifecycle_1.0.0   DBI_1.1.0         git2r_0.27.1      magrittr_1.5     
[45] scales_1.1.1      vroom_1.5.5       cli_3.0.1         stringi_1.5.3    
[49] fs_1.5.0          promises_1.1.1    xml2_1.3.2        bslib_0.2.5.1    
[53] ellipsis_0.3.2    generics_0.1.0    vctrs_0.3.8       tools_4.0.3      
[57] bit64_4.0.5       glue_1.4.2        RNetCDF_2.4-2     hms_0.5.3        
[61] parallel_4.0.3    yaml_2.2.1        colorspace_2.0-2  rvest_0.3.6      
[65] knitr_1.33        haven_2.3.1       sass_0.4.0