Last updated: 2021-12-16

Checks: 7 0

Knit directory: emlr_obs_preprocessing/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200707)

The command set.seed(20200707) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: aca9273

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version aca9273. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/
    Ignored:    output/

Untracked files:
    Untracked:  code/IO_1990_own_crossover_analysis_backup.R
    Untracked:  code/read_GLODAPv2_2020.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/read_GLODAPv2_2021.Rmd) and HTML (docs/read_GLODAPv2_2021.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	aca9273	jens-daniel-mueller	2021-12-16	added maps per expocode
html	6aa4b75	jens-daniel-mueller	2021-12-16	Build site.
Rmd	3511fa7	jens-daniel-mueller	2021-12-16	f == 9 analysis added
html	163f976	jens-daniel-mueller	2021-12-16	Build site.
Rmd	7fa3a99	jens-daniel-mueller	2021-12-16	added cumulative data contribution as threshold
html	be0850d	jens-daniel-mueller	2021-12-16	Build site.
Rmd	8db3760	jens-daniel-mueller	2021-12-16	plot maps of f and qc data loss
html	61d5f49	jens-daniel-mueller	2021-12-15	Build site.
Rmd	be2f94e	jens-daniel-mueller	2021-12-15	analyse IO 1990 CRM data from Millero 1998 - TA only
html	d454df1	jens-daniel-mueller	2021-12-15	Build site.
Rmd	7802f47	jens-daniel-mueller	2021-12-15	analyse IO 1990 CRM data from Millero 1998
html	ce6cdae	jens-daniel-mueller	2021-12-15	Build site.
Rmd	acff553	jens-daniel-mueller	2021-12-15	plot qc data loss by cruise size
html	7ace7ab	jens-daniel-mueller	2021-12-15	Build site.
Rmd	554383a	jens-daniel-mueller	2021-12-15	plot qc data loss by cruise size
html	faa6b3c	jens-daniel-mueller	2021-12-15	Build site.
Rmd	be8751d	jens-daniel-mueller	2021-12-15	started data loss assesment
html	70923f2	jens-daniel-mueller	2021-12-14	Build site.
Rmd	1acf7ff	jens-daniel-mueller	2021-12-14	checked P18 nitrate data - quadratic fit
html	b68b58e	jens-daniel-mueller	2021-12-13	Build site.
Rmd	4c002c1	jens-daniel-mueller	2021-12-13	checked P18 nitrate data
html	de20732	jens-daniel-mueller	2021-12-08	Build site.
Rmd	badaed2	jens-daniel-mueller	2021-12-08	plotted f maps
html	daa43b9	jens-daniel-mueller	2021-12-06	Build site.
Rmd	b578bd9	jens-daniel-mueller	2021-12-06	plotted qc maps
html	2b22ffe	jens-daniel-mueller	2021-11-24	Build site.
Rmd	1b7ec1f	jens-daniel-mueller	2021-11-24	revised combined IO NS and EW analysis
html	0ef46e8	jens-daniel-mueller	2021-11-23	Build site.
Rmd	7fb15cf	jens-daniel-mueller	2021-11-23	combined IO NS and EW analysis
html	f2871b9	jens-daniel-mueller	2021-11-20	Build site.
Rmd	46c1246	jens-daniel-mueller	2021-11-19	rerun with GLODAP cast column
html	375d7c7	jens-daniel-mueller	2021-11-18	Build site.
Rmd	1839007	jens-daniel-mueller	2021-11-18	delta EW crossover values determined
html	f30883c	jens-daniel-mueller	2021-11-18	Build site.
Rmd	7acd48c	jens-daniel-mueller	2021-11-18	delta crossover values determined
html	2e6c3f1	jens-daniel-mueller	2021-11-18	Build site.
Rmd	49ca05c	jens-daniel-mueller	2021-11-18	delta crossover values determined
html	16dab59	jens-daniel-mueller	2021-11-18	Build site.
Rmd	620b6f4	jens-daniel-mueller	2021-11-18	delta crossover values determined
html	42965b9	jens-daniel-mueller	2021-11-18	Build site.
Rmd	69dbb5f	jens-daniel-mueller	2021-11-18	crossing checks
html	c9363ce	jens-daniel-mueller	2021-11-18	Build site.
Rmd	6bc79d6	jens-daniel-mueller	2021-11-18	crossing checks
html	0908ee5	jens-daniel-mueller	2021-11-15	Build site.
html	6d6a23e	jens-daniel-mueller	2021-11-01	Build site.
Rmd	2f36786	jens-daniel-mueller	2021-11-01	preprocess adjustment table, create new basinmaps
html	2a50fa9	jens-daniel-mueller	2021-10-28	Build site.
Rmd	67de9ab	jens-daniel-mueller	2021-10-28	preprocess tracers
html	a96bf9e	jens-daniel-mueller	2021-10-27	Build site.
Rmd	d99b131	jens-daniel-mueller	2021-10-27	added time series plots
html	fde6c32	jens-daniel-mueller	2021-10-27	Build site.
Rmd	db93d9f	jens-daniel-mueller	2021-10-27	added time series plots
html	7db7e6a	jens-daniel-mueller	2021-10-27	Build site.
Rmd	d6fb0dc	jens-daniel-mueller	2021-10-27	added time series plots
html	68d67e7	jens-daniel-mueller	2021-10-27	Build site.
Rmd	b4ea199	jens-daniel-mueller	2021-10-27	added time series plots
html	7987bb7	jens-daniel-mueller	2021-10-21	Build site.
Rmd	b64c54d	jens-daniel-mueller	2021-10-21	added inventory layer depth
html	8d1aaf8	jens-daniel-mueller	2021-10-20	Build site.
Rmd	5bce752	jens-daniel-mueller	2021-10-20	corrected qc flag in glodap
html	dc8d958	jens-daniel-mueller	2021-10-20	Build site.
Rmd	b2ccc04	jens-daniel-mueller	2021-10-20	corrected qc flag in glodap
html	2438c5a	jens-daniel-mueller	2021-08-30	Build site.
Rmd	4296433	jens-daniel-mueller	2021-08-30	rerun GLODAP preprocessing with officially released file
html	e49875a	jens-daniel-mueller	2021-07-07	Build site.
html	6312bd4	jens-daniel-mueller	2021-07-07	Build site.
Rmd	4905409	jens-daniel-mueller	2021-07-07	rerun with new setup_obs.Rmd file
html	58bc706	jens-daniel-mueller	2021-07-06	Build site.
Rmd	0db89e1	jens-daniel-mueller	2021-07-06	rerun with revised variable names
html	f600971	jens-daniel-mueller	2021-07-02	Build site.
html	98599d8	jens-daniel-mueller	2021-06-27	Build site.
Rmd	4f9c370	jens-daniel-mueller	2021-06-27	update to latest GLODAP pre-release
html	265c4ef	jens-daniel-mueller	2021-06-04	Build site.
html	c79346a	jens-daniel-mueller	2021-06-03	Build site.
html	9d8353f	jens-daniel-mueller	2021-05-31	Build site.
Rmd	b948168	jens-daniel-mueller	2021-05-31	ingest GLODAPv2_2021 beta data

path_glodapv2_2021  <- "/nfs/kryo/work/updata/glodapv2_2021/"
path_glodapv2_CRM  <- "/nfs/kryo/work/updata/glodapv2_CRM/"
path_preprocessing  <- paste(path_root, "/observations/preprocessing/", sep = "")

1 Read files

Main data source for this project is GLODAPv2.2021_Merged_Master_File.csv downloaded from https://www.ncei.noaa.gov/data/oceans/ncei/ocads/data/0237935/GLODAPv2.2021_Merged_Master_File.csv on Aug 30, 2021.

GLODAP <-
  read_csv(
    paste(
      path_glodapv2_2021,
      "GLODAPv2.2021_Merged_Master_File_20210830.csv",
      sep = ""
    ),
    na = "-9999",
    col_types = cols(.default = col_double())
  )


GLODAP <- GLODAP %>%
  rename_with(~str_remove(., 'G2'))

GLODAP_adjustments <-
  read_csv(
    paste(
      path_glodapv2_2021,
      "GLODAPv2.2021_adjustments_last_updated_on_2021_05_10.csv",
      sep = ""
    ),
    na = c("-666", "-777", "-888", "-999"),
    skip = 2
  )

GLODAP_expocodes <-
  read_tsv(
    paste(
      path_glodapv2_2021,
      "EXPOCODES.txt",
      sep = ""
    ),
    col_names = c("cruise", "cruise_expocode")
  )

IO_CRM_meas <-
  read_csv(
    paste(
      path_glodapv2_CRM,
      "/Millero_1998_Tab2.csv",
      sep = ""
    )
  )

CRM_ref <-
  read_csv(
    paste(
      path_glodapv2_CRM,
      "/Dickson_CRM_reference_values_20211215.csv",
      sep = ""
    )
  )

2 Data preparation

2.1 Correct qc flag

From an email conversation with Nico Lange

Yes, we are aware of these faulty(!) calculated TA data (using DIC and fCO2). It is linked to v2.2020 where we’ve added fCO2 to the “missing carbon calculation matrix”. Overall, including fCO2 in these calculations has worked great to fill some missing carbon gaps. However, for this cruise in particular the fCO2 values have most likely been converted wrongly to 20°C and are thus off! The problem of this all is that we haven’t really done a 2nd QC on the fCO2 values neither have we defined the corresponding “G2fCO2qc” variable, hence for the sake of consistency we kept all fCO2 values in. Again and unfortunately, in this particular case it led to the bad calculations of TA data…. We plan to do a full 2nd QC on all (!) fCO2 data for v3.

But you have indeed found a flaw in our merging script, as the corresponding calculated TA values should not have received a 2nd QC flag of 1! I missed out on adding a line to our merging script to accommodate for the non-existence of 2nd fCO2 flags in the carbon calculation matrix.

So long story short: Thank you very much for finding this flaw and letting me know of it!

and

Yes, the all calculated TA data from cruise 695 should have a talkqc of 0 (as they are based upon un QC’d fCO2 data…).

And no (thanks to your hint and questions), I figured that this wrongly assigned 2nd QC flag is a problem for all calculated carbon data, which used fCO2 for the calculations. However, luckily this is not really often the case.

You can check if thats the case by looking at which other carbon parameters are measured, i.e. by checking their primary flags (e.g. G2talkf, G2tco2f and G2phts25p0f and G2fco2f). If only two are measured and one of them is fCO2, it means that the other carbon parameters (the ones with a primary flag of 0) are calculated using fCO2. Hence, for these instances no 2nd QC is done and the corresponding qc flag should be 0 and not 1.

GLODAP_qc_check <- GLODAP %>% 
  filter(cruise == 717) %>% 
  count(talkqc)

# calculate number of measured co2 system variables

GLODAP <- GLODAP %>%
  mutate(measured_CO2_vars = rowSums(select(., c(
    tco2f, talkf, fco2f, phts25p0f
  )) == 2))

# identify cruises on which talk/tco2 was calculated

talk_qc_error_cruises <- GLODAP %>%
  select(cruise, tco2:phtsqc, measured_CO2_vars) %>% 
  filter(measured_CO2_vars == 2,
         fco2f == 2,
         talkf == 0) %>% 
  distinct(cruise, talkf, talkqc, fco2f)

tco2_qc_error_cruises <- GLODAP %>%
  select(cruise, tco2:phtsqc, measured_CO2_vars) %>% 
  filter(measured_CO2_vars == 2,
         fco2f == 2,
         tco2f == 0) %>% 
  distinct(cruise, tco2f, tco2qc, fco2f)

talk_qc_error_cruises %>% 
  write_csv("data/talk_qc_error_cruises_GLODAPv2_2021.csv")

tco2_qc_error_cruises %>% 
  write_csv("data/tco2_qc_error_cruises_GLODAPv2_2021.csv")

rm(talk_qc_error_cruises, tco2_qc_error_cruises)


# set qc = 0 for tco2 and talk values calculated from fco2   

GLODAP <- GLODAP %>%
  mutate(tco2qc = if_else(measured_CO2_vars == 2 &
                            fco2f == 2 & tco2f == 0,
                          0,
                          tco2qc))

GLODAP <- GLODAP %>%
  mutate(talkqc = if_else(measured_CO2_vars == 2 &
                            fco2f == 2 & talkf == 0,
                          0,
                          talkqc))

GLODAP <- GLODAP %>% 
  select(-measured_CO2_vars)

# calculate number of measured co2 system variables

GLODAP <- GLODAP %>%
  mutate(measured_CO2_vars = rowSums(select(., c(
    tco2f, talkf, fco2f, phts25p0f
  )) == 2))

# identify cruises on which talk/tco2 was calculated

tco2_talk_calc <- GLODAP %>%
  select(cruise, tco2:phtsqc, measured_CO2_vars) %>% 
  filter(measured_CO2_vars == 2,
         fco2f == 2,
         phts25p0f == 2)

GLODAP <- GLODAP %>% 
  select(-measured_CO2_vars)

2.2 Harmonize nomenclature

# create date column
GLODAP <- GLODAP %>%
  mutate(date = ymd(paste(year, month, day))) %>%
  relocate(date)

# harmonize column names
GLODAP <- GLODAP  %>%
  rename(sal = salinity,
         temp = temperature)

# harmonize coordinates
GLODAP <- GLODAP  %>%
  rename(lon = longitude,
         lat = latitude) %>%
  mutate(lon = if_else(lon < 20, lon + 360, lon))

2.3 Horizontal gridding

For merging with other data sets, all observations were grouped into latitude intervals of:

1° x 1°

GLODAP <- m_grid_horizontal(GLODAP)

2.4 Apply basin mask

# use only three basin to assign general basin mask
# ie this is not specific to the MLR fitting
basinmask <- basinmask %>% 
  filter(MLR_basins == "2") %>% 
  select(lat, lon, basin_AIP)

GLODAP <- inner_join(GLODAP, basinmask)

2.5 Add row number

GLODAP <- GLODAP  %>%  
  mutate(row_number = row_number()) %>% 
  relocate(row_number)

2.6 Split CO2 and tracers

# remove irrelevant columns
GLODAP <- GLODAP %>%
  select(-c(region,
            month:minute,
            maxsampdepth, bottle, sigma0:sigma4,
            nitrite:nitritef))


GLODAP_tracer <- GLODAP %>% 
  select(row_number:gamma,
         cfc11:sf6f,
         basin_AIP)

# select relevant columns
GLODAP <- GLODAP %>%
  select(row_number:talkqc,
         basin_AIP)

2.7 Subset tco2 data

The vast majority of rows is removed due to missing tco2 observations.

GLODAP <- GLODAP %>% 
  filter(!is.na(tco2))

2.8 Subset tracer data

Rows are removed if no tracer observation is available.

GLODAP_tracer <- GLODAP_tracer %>%
  filter(if_any(
    c(
      cfc11,
      cfc12,
      cfc113,
      ccl4,
      sf6,
      pcfc11,
      pcfc12,
      pcfc113,
      pccl4,
      psf6
    ),
    ~ !is.na(.)
  ))

2.9 Create clean observations grid

2.9.1 CO2

GLODAP_obs_grid <- GLODAP %>% 
  count(lat, lon)

GLODAP_grid_year <- GLODAP %>%
  count(lat, lon, year)

map +
  geom_raster(data = GLODAP_grid_year,
              aes(lon, lat)) +
  facet_wrap(~ year, ncol=3)

Version	Author	Date
dc8d958	jens-daniel-mueller	2021-10-20

2.9.2 Tracer

GLODAP_obs_grid_tracer <- GLODAP_tracer %>% 
  count(lat, lon)

GLODAP_grid_year_tracer <- GLODAP_tracer %>%
  count(lat, lon, year)

map +
  geom_raster(data = GLODAP_grid_year_tracer,
              aes(lon, lat)) +
  facet_wrap(~ year, ncol=3)

Version	Author	Date
2a50fa9	jens-daniel-mueller	2021-10-28

3 Flagging

3.1 qc

qc_flag <- full_join(
  GLODAP,
  GLODAP_expocodes
)

qc_flag <- qc_flag %>%
  mutate(decade = cut(
    year,
    seq(1990, 2020, 10),
    right = FALSE,
    labels = c("1990-1999", "2000-2009", "2010-2019")
  ),
  .after = year) %>%
  filter(!is.na(decade)) %>% 
  select(lon, lat, basin_AIP, decade, cruise_expocode, ends_with("qc")) %>%
  pivot_longer(ends_with("qc"),
               names_to = "parameter",
               values_to = "value")

qc_flag_grid <- qc_flag %>%
  count(lon, lat, decade, parameter, value)

p_qc_flag_map <- qc_flag_grid %>% 
  group_split(value) %>%
  # head(1) %>% 
  map(
    ~map +
  geom_tile(data = .x,
            aes(lon, lat, fill=n)) +
  facet_grid(parameter ~ decade) +
  labs(title = paste("qc flag =", unique(.x$value))) +
    scale_fill_viridis_c(option = "magma",
                         direction = -1,
                         trans = "log10")
  )

p_qc_flag_map

[[1]]

Version	Author	Date
daa43b9	jens-daniel-mueller	2021-12-06


[[2]]

Version	Author	Date
daa43b9	jens-daniel-mueller	2021-12-06

pdf("output/qc_flag_coverage_maps.pdf")
p_qc_flag_map

[[1]]


[[2]]

dev.off()

png 
  2

qc_flag %>% 
  filter(basin_AIP == "Pacific",
         decade == "1990-1999") %>% 
  count(cruise_expocode, parameter, value) %>% 
  arrange(value, -n) %>% 
  write_csv("output/Pacific_1990_qc_by_cruise_and_parameter.csv")

rm(qc_flag, qc_flag_grid, p_qc_flag_map)

3.2 f

f_flag <- full_join(
  GLODAP,
  GLODAP_expocodes
)

f_flag <- f_flag %>%
  mutate(decade = cut(
    year,
    seq(1990, 2020, 10),
    right = FALSE,
    labels = c("1990-1999", "2000-2009", "2010-2019")
  ),
  .after = year) %>%
  filter(!is.na(decade)) %>% 
  select(lon, lat, basin_AIP, decade, cruise_expocode, ends_with("f")) %>%
  pivot_longer(ends_with("f"),
               names_to = "parameter",
               values_to = "value")

f_flag_grid <- f_flag %>%
  count(lon, lat, decade, parameter, value)

p_f_flag_map <- f_flag_grid %>% 
  group_split(value) %>%
  # head(1) %>% 
  map(
    ~map +
  geom_tile(data = .x,
            aes(lon, lat, fill=n)) +
  facet_grid(parameter ~ decade) +
  labs(title = paste("f flag =", unique(.x$value))) +
    scale_fill_viridis_c(option = "magma",
                         direction = -1,
                         trans = "log10")
  )

p_f_flag_map

[[1]]

Version	Author	Date
de20732	jens-daniel-mueller	2021-12-08


[[2]]

Version	Author	Date
de20732	jens-daniel-mueller	2021-12-08


[[3]]

Version	Author	Date
de20732	jens-daniel-mueller	2021-12-08

pdf("output/f_flag_coverage_maps.pdf")
p_f_flag_map

[[1]]


[[2]]


[[3]]

dev.off()

png 
  2

f_flag %>% 
  filter(basin_AIP == "Pacific",
         decade == "1990-1999") %>% 
  count(cruise_expocode, parameter, value) %>% 
  arrange(value, -n) %>% 
  write_csv("output/Pacific_1990_f_by_cruise_and_parameter.csv")

rm(f_flag, f_flag_grid, p_f_flag_map)

3.3 data loss

loss_all <- full_join(
  GLODAP,
  GLODAP_expocodes
)

loss_all <- loss_all %>%
  mutate(decade = cut(
    year,
    seq(1989, 2019, 10),
    right = FALSE,
    labels = c("1989-1999", "2000-2009", "2010-2019")
  ),
  .after = year) %>%
  filter(!is.na(decade))

map +
  geom_tile(data = loss_all %>% distinct(lon, lat, decade),
            aes(lon, lat)) +
  facet_grid(decade ~ .)

loss <- loss_all %>% 
  filter(if_all(ends_with("f"), ~ . != 9))

loss_all_n <- loss_all %>% 
  count(basin_AIP, decade)

loss_n <- loss %>% 
  count(basin_AIP, decade)

3.3.1 qc

loss_qc <- loss %>% 
  select(lon, lat, basin_AIP, decade, cruise_expocode, ends_with("qc")) %>%
  pivot_longer(ends_with("qc"),
               names_to = "parameter",
               values_to = "value") %>% 
  mutate(parameter = str_remove(parameter, "qc"))

loss_qc_cruise <- loss_qc %>%
  count(cruise_expocode, basin_AIP, decade, parameter, value) %>%
  pivot_wider(
    names_from = value,
    names_prefix = "qc_",
    values_from = n,
    values_fill = 0
  ) %>%
  mutate(n_cruise = qc_0 + qc_1,
         category = if_else(qc_0 <= 0.1 * (n_cruise), "OK", "loss")) %>%
  mutate(parameter_class = if_else(
    parameter %in% c("tco2", "talk", "phosphate"),
    "target",
    "predictor"
  )) %>%
  count(cruise_expocode,
        basin_AIP,
        decade,
        n_cruise,
        parameter_class,
        category) %>% 
  pivot_wider(names_from = category,
              values_from = n,
              values_fill = 0) %>% 
  select(-OK) %>% 
  pivot_wider(names_from = parameter_class,
              values_from = loss) %>% 
  group_by(basin_AIP, decade) %>%
  mutate(rank_n_cruise = rank(-n_cruise)) %>%
  ungroup()

loss_qc_cruise <- full_join(loss_qc_cruise, loss_n)

loss_qc_cruise <- loss_qc_cruise %>% 
  mutate(n_cruise_rel = 100 * n_cruise / n) %>% 
  arrange(basin_AIP, decade, -n_cruise_rel) %>% 
  group_by(basin_AIP, decade) %>% 
  mutate(n_cruise_rel_cum = cumsum(n_cruise_rel)) %>% 
  ungroup() %>% 
  select(-n)

loss_qc_cruise <- loss_qc_cruise %>% 
  pivot_longer(predictor:target,
               names_to = "parameter_class",
               values_to = "loss") %>% 
  mutate(loss = as.factor(loss))

grey_plasma <- c("grey80", viridisLite::plasma(4))

loss_qc_cruise <- loss_qc_cruise %>%
  filter(n_cruise_rel >= 3)

loss_qc_cruise %>%
  group_split(basin_AIP) %>%
  # head(3) %>%
  map(
    ~ ggplot(data = .x,
             aes(rank_n_cruise, n_cruise_rel, fill = loss)) +
      geom_point(shape = 21, size = 2) +
      scale_fill_manual(values = grey_plasma,
                        name = "variables missing") +
      facet_grid(decade ~ parameter_class) +
      labs(title = paste("basin_AIP:", unique(.x$basin_AIP))) +
      ylim(0, NA)
  )

[[1]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
be0850d	jens-daniel-mueller	2021-12-16
ce6cdae	jens-daniel-mueller	2021-12-15


[[2]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
be0850d	jens-daniel-mueller	2021-12-16
ce6cdae	jens-daniel-mueller	2021-12-15


[[3]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
be0850d	jens-daniel-mueller	2021-12-16
ce6cdae	jens-daniel-mueller	2021-12-15

loss_qc_cruise %>% 
  filter(loss != 0) %>% 
  select(basin_AIP, decade, parameter_class, rank_n_cruise, cruise_expocode) %>% 
  arrange(basin_AIP, decade, parameter_class, rank_n_cruise) %>% 
  kable() %>% 
  kable_styling() %>% 
  scroll_box(height = "300px")

basin_AIP	decade	parameter_class	rank_n_cruise	cruise_expocode
Atlantic	1989-1999	target	11	06MT19900123
Atlantic	1989-1999	target	12	33LK19960415
Atlantic	1989-1999	target	13	33MW19930704
Atlantic	2000-2009	predictor	7	35TH20010823
Atlantic	2000-2009	predictor	13	33RO20070710
Atlantic	2000-2009	target	7	35TH20010823
Atlantic	2000-2009	target	8	74DI20040404
Atlantic	2000-2009	target	9	35TH20080610
Atlantic	2000-2009	target	11	35TH20040604
Atlantic	2000-2009	target	12	35TH20020611
Atlantic	2010-2019	predictor	5	74EQ20151206
Indian	1989-1999	target	11	320619960503
Pacific	1989-1999	predictor	2	31DS19940126
Pacific	1989-1999	predictor	4	31DS19920907
Pacific	1989-1999	target	4	31DS19920907
Pacific	1989-1999	target	6	316N19930222
Pacific	1989-1999	target	7	316N19921006
Pacific	1989-1999	target	8	90KD19920214
Pacific	1989-1999	target	11	316N19921204

loss_grid <- loss %>% distinct(lon, lat, cruise_expocode)

loss_qc_grid <- left_join(loss_qc_cruise,
                          loss_grid)

map +
  geom_tile(data = loss_qc_grid,
            aes(lon, lat, fill = loss)) +
  facet_grid(decade ~ parameter_class) +
  scale_fill_manual(values = grey_plasma)

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
be0850d	jens-daniel-mueller	2021-12-16

loss_qc_grid %>% filter(loss != 0) %>%
  group_split(parameter_class, decade) %>%
  # head(1) %>%
  map(
    ~ map +
      geom_tile(data = .x,
                aes(lon, lat, fill = cruise_expocode)) +
      scale_fill_brewer(palette = "Paired") +
      facet_grid(decade ~ parameter_class)
  )

[[1]]


[[2]]


[[3]]


[[4]]


[[5]]

3.3.2 f

loss_f <- loss %>% 
  select(lon, lat, basin_AIP, decade, cruise_expocode, ends_with("f")) %>%
  pivot_longer(ends_with("f"),
               names_to = "parameter",
               values_to = "value") %>% 
  mutate(parameter = str_remove(parameter, "f"))

loss_f_cruise <- loss_f %>%
  count(cruise_expocode, basin_AIP, decade, parameter, value) %>%
  pivot_wider(
    names_from = value,
    names_prefix = "f_",
    values_from = n,
    values_fill = 0
  ) %>%
  mutate(n_cruise = f_0 + f_2,
         category = if_else(f_0 <= 0.1 * (n_cruise), "OK", "loss")) %>%
  mutate(parameter_class = if_else(
    parameter %in% c("tco2", "talk", "phosphate"),
    "target",
    "predictor"
  )) %>%
  count(cruise_expocode,
        basin_AIP,
        decade,
        n_cruise,
        parameter_class,
        category) %>% 
  pivot_wider(names_from = category,
              values_from = n,
              values_fill = 0) %>% 
  select(-OK) %>% 
  pivot_wider(names_from = parameter_class,
              values_from = loss) %>% 
  group_by(basin_AIP, decade) %>%
  mutate(rank_n_cruise = rank(-n_cruise)) %>%
  ungroup()

loss_f_cruise <- full_join(loss_f_cruise, loss_n)

loss_f_cruise <- loss_f_cruise %>% 
  mutate(n_cruise_rel = 100 * n_cruise / n) %>% 
  arrange(basin_AIP, decade, -n_cruise_rel) %>% 
  group_by(basin_AIP, decade) %>% 
  mutate(n_cruise_rel_cum = cumsum(n_cruise_rel)) %>% 
  ungroup() %>% 
  select(-n)

loss_f_cruise <- loss_f_cruise %>% 
  pivot_longer(predictor:target,
               names_to = "parameter_class",
               values_to = "loss") %>% 
  mutate(loss = as.factor(loss))

grey_plasma <- c("grey80", viridisLite::plasma(4))

loss_f_cruise <- loss_f_cruise %>%
    filter(n_cruise_rel >= 3)

loss_f_cruise %>%
  # filter(n_cruise_rel_cum <= 90) %>%
  group_split(basin_AIP) %>%
  # head(1) %>%
  map(
    ~ ggplot(data = .x,
             aes(rank_n_cruise, n_cruise, fill = loss)) +
      geom_point(shape = 21, size = 2) +
      scale_fill_manual(values = grey_plasma,
                        name = "variables missing") +
      facet_grid(decade ~ parameter_class) +
      labs(title = paste("basin_AIP:", unique(.x$basin_AIP))) +
      ylim(0, NA)
  )

[[1]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
ce6cdae	jens-daniel-mueller	2021-12-15


[[2]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
ce6cdae	jens-daniel-mueller	2021-12-15


[[3]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
ce6cdae	jens-daniel-mueller	2021-12-15

loss_f_cruise %>% 
  filter(loss != 0) %>% 
  select(basin_AIP, decade, parameter_class, rank_n_cruise, cruise_expocode) %>% 
  arrange(basin_AIP, decade, parameter_class, rank_n_cruise) %>% 
  kable() %>% 
  kable_styling() %>% 
  scroll_box(height = "300px")

basin_AIP	decade	parameter_class	rank_n_cruise	cruise_expocode
Atlantic	1989-1999	target	1	323019940104
Atlantic	1989-1999	target	7	33RO19980123
Atlantic	1989-1999	target	9	35A319950113
Atlantic	1989-1999	target	11	06MT19900123
Atlantic	1989-1999	target	12	33LK19960415
Atlantic	1989-1999	target	13	33MW19930704
Atlantic	2000-2009	target	7	35TH20010823
Atlantic	2000-2009	target	8	74DI20040404
Atlantic	2000-2009	target	9	35TH20080610
Atlantic	2000-2009	target	11	35TH20040604
Atlantic	2000-2009	target	12	35TH20020611
Atlantic	2010-2019	target	10	33RO20110926
Indian	1989-1999	target	11	320619960503
Indian	2000-2009	target	2	33RR20080204
Pacific	1989-1999	target	3	31DS19960105
Pacific	1989-1999	target	6	316N19930222
Pacific	1989-1999	target	7	316N19921006
Pacific	1989-1999	target	8	90KD19920214
Pacific	1989-1999	target	11	316N19921204
Pacific	2000-2009	target	1	33RO20071215
Pacific	2010-2019	target	1	318M20091121
Pacific	2010-2019	target	5	320620170703

loss_grid <- loss %>% distinct(lon, lat, cruise_expocode)

loss_f_grid <- left_join(loss_f_cruise,
                          loss_grid)
map +
  geom_tile(data = loss_f_grid,
            aes(lon, lat, fill = loss)) +
  facet_grid(decade ~ parameter_class) +
  scale_fill_manual(values = grey_plasma)

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16
163f976	jens-daniel-mueller	2021-12-16
be0850d	jens-daniel-mueller	2021-12-16

loss_f_grid %>% filter(loss != 0) %>%
  group_split(parameter_class, decade) %>%
  # head(1) %>%
  map(
    ~ map +
      geom_tile(data = .x,
                aes(lon, lat, fill = cruise_expocode)) +
      scale_fill_brewer(palette = "Paired") +
      facet_grid(decade ~ parameter_class)
  )

[[1]]


[[2]]


[[3]]

3.3.3 f == 9

loss_f9 <- loss_all %>% 
  select(lon, lat, basin_AIP, decade, cruise_expocode, ends_with("f")) %>%
  pivot_longer(ends_with("f"),
               names_to = "parameter",
               values_to = "value") %>% 
  mutate(parameter = str_remove(parameter, "f"))

loss_f9_cruise <- loss_f9 %>%
  count(cruise_expocode, basin_AIP, decade, parameter, value) %>%
  pivot_wider(
    names_from = value,
    names_prefix = "f_",
    values_from = n,
    values_fill = 0
  ) %>%
  mutate(n_cruise = f_0 + f_2 + f_9,
         category = if_else(f_9 <= 0.1 * (n_cruise), "OK", "loss")) %>%
  mutate(parameter_class = if_else(
    parameter %in% c("tco2", "talk", "phosphate"),
    "target",
    "predictor"
  )) %>%
  count(cruise_expocode,
        basin_AIP,
        decade,
        n_cruise,
        parameter_class,
        category) %>% 
  pivot_wider(names_from = category,
              values_from = n,
              values_fill = 0) %>% 
  select(-OK) %>% 
  pivot_wider(names_from = parameter_class,
              values_from = loss) %>% 
  group_by(basin_AIP, decade) %>%
  mutate(rank_n_cruise = rank(-n_cruise)) %>%
  ungroup()

loss_f9_cruise <- full_join(loss_f9_cruise, loss_all_n)

loss_f9_cruise <- loss_f9_cruise %>% 
  mutate(n_cruise_rel = 100 * n_cruise / n) %>% 
  arrange(basin_AIP, decade, -n_cruise_rel) %>% 
  group_by(basin_AIP, decade) %>% 
  mutate(n_cruise_rel_cum = cumsum(n_cruise_rel)) %>% 
  ungroup() %>% 
  select(-n)

loss_f9_cruise <- loss_f9_cruise %>% 
  pivot_longer(predictor:target,
               names_to = "parameter_class",
               values_to = "loss") %>% 
  mutate(loss = as.factor(loss))

grey_plasma <- c("grey80", viridisLite::plasma(4))

loss_f9_cruise <- loss_f9_cruise %>%
    filter(n_cruise_rel >= 3)

loss_f9_cruise %>%
  group_split(basin_AIP) %>%
  # head(1) %>%
  map(
    ~ ggplot(data = .x,
             aes(rank_n_cruise, n_cruise, fill = loss)) +
      geom_point(shape = 21, size = 2) +
      scale_fill_manual(values = grey_plasma,
                        name = "variables missing") +
      facet_grid(decade ~ parameter_class) +
      labs(title = paste("basin_AIP:", unique(.x$basin_AIP))) +
      ylim(0, NA)
  )

[[1]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16


[[2]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16


[[3]]

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16

loss_f9_cruise %>% 
  filter(loss != 0) %>% 
  select(basin_AIP, decade, parameter_class, rank_n_cruise, cruise_expocode) %>% 
  arrange(basin_AIP, decade, parameter_class, rank_n_cruise) %>% 
  kable() %>% 
  kable_styling() %>% 
  scroll_box(height = "300px")

basin_AIP	decade	parameter_class	rank_n_cruise	cruise_expocode
Atlantic	1989-1999	predictor	2	316N19871123
Atlantic	1989-1999	predictor	4	06AQ19980328
Atlantic	1989-1999	predictor	6	74DI19970807
Atlantic	1989-1999	target	2	316N19871123
Atlantic	1989-1999	target	3	33RO19980123
Atlantic	1989-1999	target	4	06AQ19980328
Atlantic	1989-1999	target	6	74DI19970807
Atlantic	1989-1999	target	7	33MW19930704
Atlantic	2000-2009	target	1	33RO20050111
Atlantic	2000-2009	target	2	33RO20030604
Atlantic	2000-2009	target	3	06AQ20050122
Atlantic	2000-2009	target	4	06AQ20080210
Atlantic	2000-2009	target	5	35TH19990712
Atlantic	2010-2019	predictor	10	06M220170104
Atlantic	2010-2019	predictor	11	06AQ20120107
Atlantic	2010-2019	target	3	33RO20110926
Atlantic	2010-2019	target	6	29HE20130320
Atlantic	2010-2019	target	10	06M220170104
Indian	1989-1999	predictor	1	316N19951202
Indian	1989-1999	predictor	3	316N19950310
Indian	1989-1999	predictor	7	35MF19960220
Indian	1989-1999	target	1	316N19951202
Indian	1989-1999	target	5	316N19941201
Indian	1989-1999	target	8	320619960503
Indian	1989-1999	target	10	316N19950611
Indian	1989-1999	target	12	35MF19930123
Indian	2000-2009	predictor	9	09AR20071216
Indian	2000-2009	target	6	09AR20060102
Indian	2010-2019	predictor	7	09AR20141205
Indian	2010-2019	target	7	09AR20141205
Pacific	1989-1999	predictor	6	33MW19920224
Pacific	1989-1999	target	1	316N19920502
Pacific	1989-1999	target	6	33MW19920224
Pacific	1989-1999	target	8	316N19921006
Pacific	2000-2009	predictor	6	325020060213

loss_all_grid <- loss_all %>% distinct(lon, lat, cruise_expocode)

loss_f9_grid <- left_join(loss_f9_cruise,
                          loss_all_grid)
map +
  geom_tile(data = loss_f9_grid,
            aes(lon, lat, fill = loss)) +
  facet_grid(decade ~ parameter_class) +
  scale_fill_manual(values = grey_plasma)

Version	Author	Date
6aa4b75	jens-daniel-mueller	2021-12-16

loss_f9_grid %>% filter(loss != 0) %>%
  group_split(parameter_class, decade) %>%
  # head(1) %>%
  map(
    ~ map +
      geom_tile(data = .x,
                aes(lon, lat, fill = cruise_expocode)) +
      scale_fill_brewer(palette = "Paired") +
      facet_grid(decade ~ parameter_class)
  )

[[1]]


[[2]]


[[3]]


[[4]]


[[5]]


[[6]]

3.4 P18 phosphate

P18 <- full_join(
  GLODAP,
  GLODAP_expocodes
)

P18 <- P18 %>% 
  filter(cruise_expocode %in% c("33RO20161119",
                                "33RO20071215",
                                "31DS19940126"))

P18 %>% 
  ggplot(aes(date, lat)) +
  geom_point() +
  facet_grid() +
  facet_wrap(cruise_expocode ~., scales = "free_x", ncol = 1)

Version	Author	Date
70923f2	jens-daniel-mueller	2021-12-14
b68b58e	jens-daniel-mueller	2021-12-13

P18 %>% 
  filter(!is.na(nitrate)) %>% 
  ggplot(aes(lat, depth, col= nitrate)) +
  geom_point() +
  scale_color_viridis_c() +
  scale_y_reverse() +
  facet_grid(cruise_expocode ~.)

Version	Author	Date
70923f2	jens-daniel-mueller	2021-12-14
b68b58e	jens-daniel-mueller	2021-12-13

P18_grid <- P18 %>% 
  select(lat, lon, depth, cruise_expocode, nitrate) %>% 
  mutate(depth = as.numeric(as.character(cut(depth,
                     seq(0,1e4, 500), 
                     seq(250,1e4,500))))) %>% 
  group_by(lat, depth, cruise_expocode) %>% 
  summarise(nitrate = mean(nitrate, na.rm=TRUE)) %>% 
  ungroup()

P18_grid %>% 
  ggplot(aes(lat, depth, col= nitrate)) +
  geom_point() +
  scale_color_viridis_c() +
  scale_y_reverse() +
  facet_grid(cruise_expocode ~.)

Version	Author	Date
70923f2	jens-daniel-mueller	2021-12-14
b68b58e	jens-daniel-mueller	2021-12-13

P18_grid_offset <- P18_grid %>% 
  pivot_wider(names_from = cruise_expocode,
              values_from = nitrate) %>% 
  mutate(delta_nitrate_1994_2007 = (`31DS19940126` - `33RO20071215`) / `33RO20071215`,
         delta_nitrate_1994_2016 = (`31DS19940126` - `33RO20161119`) / `33RO20071215`,
         delta_nitrate_2007_2016 = (`33RO20071215` - `33RO20161119`) / `33RO20071215`) %>% 
  select(lat, depth, starts_with("delta")) %>% 
  pivot_longer(starts_with("delta"),
               values_to = "delta_nitrate",
               names_to = "years",
               names_prefix = "delta_nitrate_") %>% 
  filter(delta_nitrate > -20,
         depth > 1500)

P18_grid_offset %>% 
  ggplot(aes(lat, depth, col= delta_nitrate)) +
  geom_point() +
  scale_color_divergent() +
  scale_y_reverse() +
  facet_grid(years ~.)

Version	Author	Date
70923f2	jens-daniel-mueller	2021-12-14

P18_grid_offset %>%
  group_by(lat, years) %>%
  summarise(delta_nitrate = mean(delta_nitrate, na.rm = TRUE)) %>%
  ungroup() %>%
  ggplot(aes(lat, delta_nitrate, col = years, fill = years)) +
  geom_hline(yintercept = 0) +
  stat_smooth(method = "lm", formula = y ~ x + I(x ^ 2)) +
  geom_point() +
  geom_line()

Version	Author	Date
70923f2	jens-daniel-mueller	2021-12-14

rm(P18, P18_grid)

3.5 A16

A16 <- full_join(
  GLODAP,
  GLODAP_expocodes
)

A16 <- A16 %>% 
  filter(cruise_expocode %in% c(
    "33MW19930704" #A16N-1993
    ))

map + 
  geom_tile(data = A16 %>% distinct(lon, lat),
            aes(lon, lat))

Version	Author	Date
70923f2	jens-daniel-mueller	2021-12-14

A16 %>% 
  select(ends_with(c("qc"))) %>% 
  pivot_longer(everything(),
               names_to = "flag",
               values_to = "value") %>% 
  distinct(flag, value)

[38;5;246m# A tibble: 9 × 2[39m
  flag        value
  [3m[38;5;246m<chr>[39m[23m       [3m[38;5;246m<dbl>[39m[23m
[38;5;250m1[39m salinityqc      1
[38;5;250m2[39m oxygenqc        1
[38;5;250m3[39m nitrateqc       1
[38;5;250m4[39m silicateqc      1
[38;5;250m5[39m phosphateqc     1
[38;5;250m6[39m tco2qc          1
[38;5;250m7[39m talkqc          1
[38;5;250m8[39m talkqc          0
[38;5;250m9[39m tco2qc          0

rm(A16)

4 Adjustments

Typically, the reasons for multiple expocode entries of the same cruise in the adjustment table list are:

The cruise adjustments are different for different station, i.e. station split (e.g. 316N19821201)

-> How to merge? Based on first and last station? Cruise_ID not in GLODAP merged master file.

The cruise adjustments are different for different legs (e.g. 316N19871123.6) but have been merged into one cruise (316N19871123) for the product

-> How to merge? Based on first and last station?

The cruise adjustments have been updated/changed through the versions, here always look for the most recent entry (see table below) (e.g. 320620180309)

For the expocodes not listed in the expocode list the reason is that INDIGO has been splitted into three cruises: 35MF1985-1987 and the same holds for SAVE (316N1987 - 6legs). Further 49HH20011208 has been assigned wrongly and corrected to 49HH20011127.

Remove expocode INDIGO and maintain only 35MF19850224. Remove expocode SAVE and maintain only 316N1987.

GLODAP_adjustments <- GLODAP_adjustments %>%
  select(cruise_expocode,
         first_station, last_station,
         version,
         calculated_carbon_parameter,
         ends_with("_adj")) %>% 
  rename(talk_adj = alkalinity_adj)

# Remove cruises INDIGO and SAVE

GLODAP_adjustments <- 
  GLODAP_adjustments %>% 
  filter(!(cruise_expocode %in% c("INDIGO", "SAVE")))

# correct expocode 49HH20011208 to 49HH20011127

GLODAP_adjustments <- 
  GLODAP_adjustments %>% 
  mutate(cruise_expocode = if_else(
    cruise_expocode == "49HH20011208",
    "49HH20011127",
    cruise_expocode
  ))


# select latest adjustment versions

GLODAP_adjustments <- 
  GLODAP_adjustments %>% 
  group_by(cruise_expocode, first_station) %>% 
  mutate(n = n(),
         version_max = max(version)) %>%
  ungroup() %>% 
  filter(version == version_max | is.na(version)) %>% 
  select(-c(version_max, version, n))

# harmonize multiple cruise expocodes of 316N1987

GLODAP_adjustments <- GLODAP_adjustments %>% 
  # filter(str_detect(cruise_expocode, "\\.")) %>% 
  mutate(cruise_expocode = str_split(cruise_expocode,
                                     "\\.",
                                     simplify = TRUE)[,1])

# correct one wrong last_cruise label

GLODAP_adjustments <- GLODAP_adjustments %>%
  mutate(
    last_station = if_else(
      cruise_expocode == "318M20091121" &
        first_station == 1,
      127,
      last_station
    )
  )

# merge with expocode table
GLODAP_adjustments <- full_join(GLODAP_adjustments, GLODAP_expocodes) %>% 
  relocate(cruise)



GLODAP_adjustments_NA_cruises <- 
  GLODAP_adjustments %>% 
  filter(is.na(cruise))

GLODAP_adjustments_duplicated_cruises <- 
  GLODAP_adjustments %>% 
  group_by(cruise_expocode, cruise) %>% 
  mutate(n = n()) %>%
  ungroup() %>% 
  filter(n != 1)


GLODAP_adjustments %>% 
  pivot_longer(salinity_adj:c13_adj,
               names_to = "parameter",
               values_to = "adjustment") %>% 
  ggplot(aes(adjustment)) +
  geom_histogram() +
  scale_y_log10() +
  facet_wrap(~ parameter, scales = "free_x")

Version	Author	Date
6d6a23e	jens-daniel-mueller	2021-11-01

5 RV activity

RV_activity <- full_join(
  GLODAP,
  GLODAP_expocodes
)

RV_activity <- RV_activity %>%
  mutate(decade = cut(
    year,
    seq(1990, 2020, 10),
    right = FALSE,
    labels = c("1990-1999", "2000-2009", "2010-2019")
  ), .after = year) %>% 
  filter(!is.na(decade))

RV_activity <- RV_activity %>% 
  mutate(RV = str_sub(cruise_expocode, 1, 4))

RV_activity <- RV_activity %>% 
  count(decade, basin_AIP, RV) %>% 
  group_by(decade, basin_AIP) %>% 
  mutate(n_total = sum(n)) %>% 
  ungroup() %>% 
  mutate(n_prop = 100* n / n_total)

RV_activity <-RV_activity %>% 
  group_by(decade, basin_AIP) %>% 
  mutate(rank = rank(-n_prop)) %>% 
  ungroup()

RV_activity %>% 
  ggplot(aes(rank, n_prop)) +
  geom_line() +
  geom_point() +
  geom_text(data = RV_activity %>% filter(n_prop > 20),
             aes(rank, n_prop, label = RV),
             nudge_x = 5) +
  labs(y = "proportion of tco2 samples (%)") +
  facet_grid(decade ~ basin_AIP)

Version	Author	Date
daa43b9	jens-daniel-mueller	2021-12-06

rm(RV_activity)

6 Indian Ocean 1990 CRM

IO_CRM_meas <- IO_CRM_meas %>%
  fill(cruise:batch) %>% 
  select(-starts_with("ph")) %>% 
  rename(talk_meas = talk_ave,
         tco2_meas = tco2_ave)
  
CRM_ref <- CRM_ref %>% 
  select(-c(date, comment, sal)) %>% 
  rename(talk_ref = talk,
         tco2_ref = tco2)

IO_CRM_offset <-
  left_join(IO_CRM_meas,
            CRM_ref) %>% 
  mutate(batch = as.factor(batch))

IO_CRM_offset <- IO_CRM_offset %>% 
  mutate(talk_offset = talk_meas - talk_ref,
         tco2_offset = tco2_meas - tco2_ref)

IO_CRM_offset <- IO_CRM_offset %>% 
  select(-c(talk_meas:talk_ref)) %>% 
  pivot_longer(ends_with("_offset"),
               values_to = "offset",
               names_to = "parameter") %>% 
  mutate(parameter = str_remove(parameter, "_offset"),
         start_date = mdy(start_date))

IO_CRM_offset %>% 
  ggplot(aes(offset)) +
  geom_histogram(binwidth = 1) +
  facet_wrap(~ parameter)

Version	Author	Date
d454df1	jens-daniel-mueller	2021-12-15

IO_CRM_offset <- IO_CRM_offset %>% 
  filter(cell != "All")

IO_CRM_offset_mean <- IO_CRM_offset %>% 
  group_by(parameter) %>% 
  summarise(offset_mean = mean(offset),
            offset_sd = sd(offset)) %>% 
  ungroup()

IO_CRM_offset %>%
  filter(parameter == "talk") %>%
  ggplot() +
  scale_fill_brewer(palette = "Set1",
                    name = "CRM batch") +
  geom_hline(data = IO_CRM_offset_mean %>% filter(parameter == "talk"),
             aes(yintercept = offset_mean)) +
  geom_hline(
    data = IO_CRM_offset_mean %>% filter(parameter == "talk"),
    aes(yintercept = offset_mean - offset_sd),
    linetype = 2
  ) +
  geom_hline(
    data = IO_CRM_offset_mean %>% filter(parameter == "talk"),
    aes(yintercept = offset_mean + offset_sd),
    linetype = 2
  ) +
  geom_point(aes(start_date, offset, fill = batch, size=n),
             shape = 21) +
  scale_size(name = "Nr of\nmeasurements") +
  labs(x = "Cruise start date",
       y = "TA offset meas-CRM (µmol/kg)",
       title = "RV Knorr IO 1990 - TA reference measurements",
       subtitle = "Data source: Tables 1 and 2 from Millero et al. (1998)")

Version	Author	Date
61d5f49	jens-daniel-mueller	2021-12-15
d454df1	jens-daniel-mueller	2021-12-15

7 Write files

GLODAP  %>%
  write_csv(paste(path_preprocessing,
                  "GLODAPv2.2021_preprocessed.csv",
                  sep = ""))

GLODAP_tracer  %>%
  write_csv(paste(
    path_preprocessing,
    "GLODAPv2.2021_preprocessed_tracer.csv",
    sep = ""
  ))

GLODAP_adjustments  %>%
  write_csv(paste(path_preprocessing,
                  "GLODAPv2.2021_adustments.csv",
                  sep = ""))

# GLODAP_adjustments_NA_cruises  %>%
#   select(cruise_expocode, cruise) %>%
#   write_csv(paste(
#     path_preprocessing,
#     "GLODAPv2.2021_adustments_NA_cruises.csv",
#     sep = ""
#   ))
# 
# GLODAP_adjustments_duplicated_cruises  %>%
#   drop_na() %>%
#   write_csv(
#     paste(
#       path_preprocessing,
#       "GLODAPv2.2021_adustments_duplicated_cruises.csv",
#       sep = ""
#     )
#   )

8 Overview plots

8.1 Assign coarse spatial grid

For the following plots, the cleaned data set was re-opened and observations were gridded spatially to intervals of:

5° x 5°

GLODAP <- m_grid_horizontal_coarse(GLODAP)

8.2 Histogram Zonal coverage

GLODAP_histogram_lat <- GLODAP %>%
  group_by(lat_grid) %>%
  tally() %>%
  ungroup()

GLODAP_histogram_lat %>%
  ggplot(aes(lat_grid, n)) +
  geom_col() +
  coord_flip() +
  theme(legend.title = element_blank())

Version	Author	Date
98599d8	jens-daniel-mueller	2021-06-27
9d8353f	jens-daniel-mueller	2021-05-31

rm(GLODAP_histogram_lat)

8.3 Histogram temporal coverage

GLODAP_histogram_year <- GLODAP %>%
  group_by(year) %>%
  tally() %>%
  ungroup()

GLODAP_histogram_year %>%
  ggplot() +
  geom_col(aes(year, n)) +
  theme(
    axis.title.x = element_blank()
  )

Version	Author	Date
98599d8	jens-daniel-mueller	2021-06-27
9d8353f	jens-daniel-mueller	2021-05-31

rm(GLODAP_histogram_year)

8.4 Zonal temporal coverage (Hovmoeller)

GLODAP_hovmoeller_year <- GLODAP %>%
  group_by(year, lat_grid) %>%
  tally() %>%
  ungroup()

GLODAP_hovmoeller_year %>%
  ggplot(aes(year, lat_grid, fill = log10(n))) +
  geom_tile() +
  geom_vline(xintercept = c(1999.5, 2012.5)) +
  scale_fill_viridis_c(option = "magma", direction = -1) +
  theme(legend.position = "top",
        axis.title.x = element_blank())

Version	Author	Date
98599d8	jens-daniel-mueller	2021-06-27
9d8353f	jens-daniel-mueller	2021-05-31

rm(GLODAP_hovmoeller_year)

8.5 Coverage map

map +
  geom_raster(data = GLODAP_obs_grid,
              aes(lon, lat, fill = log10(n))) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1)

Version	Author	Date
98599d8	jens-daniel-mueller	2021-06-27
9d8353f	jens-daniel-mueller	2021-05-31

GLODAP_obs_grid_all_vars <- GLODAP %>% 
  select(year, lat, lon, cruise, sal, temp, oxygen,
         phosphate, nitrate, silicate, tco2, talk) %>% 
  pivot_longer(cols = sal:talk,
               names_to = "parameter",
               values_to = "value") %>% 
  mutate(presence = if_else(is.na(value), "missing", "available")) %>% 
  count(year, lat, lon, parameter, presence)

GLODAP_obs_grid_all_vars_wide <- GLODAP_obs_grid_all_vars %>% 
  pivot_wider(names_from = "presence",
              values_from = n,
              values_fill = 0) %>% 
  mutate(ratio_available = available/(available+missing))

all_plots <- GLODAP_obs_grid_all_vars_wide %>%
  # mutate(cruise = as.factor(cruise)) %>%
  group_split(year) %>%
  # tail(3) %>%
  map(
    ~ map +
      geom_tile(
        data = .x,
        aes(
          x = lon,
          y = lat,
          width = 1,
          height = 1,
          fill = ratio_available
        )
      ) +
      scale_fill_scico(palette = "berlin",
                       limits = c(0,1)) +
      labs(title = unique(.x$year)) +
      facet_wrap(~ parameter)
  )


pdf(file = paste0(path_preprocessing, "GLODAPv2.2021_preprocessed_coverage_maps.pdf"),
    width = 10, 
    height = 5)
all_plots

[[1]]


[[2]]


[[3]]


[[4]]


[[5]]


[[6]]


[[7]]


[[8]]


[[9]]


[[10]]


[[11]]


[[12]]


[[13]]


[[14]]


[[15]]


[[16]]


[[17]]


[[18]]


[[19]]


[[20]]


[[21]]


[[22]]


[[23]]


[[24]]


[[25]]


[[26]]


[[27]]


[[28]]


[[29]]


[[30]]


[[31]]


[[32]]


[[33]]


[[34]]


[[35]]


[[36]]


[[37]]


[[38]]


[[39]]


[[40]]


[[41]]


[[42]]


[[43]]


[[44]]


[[45]]

dev.off()

png 
  2

8.6 Time series

GLODAP_time_series <- GLODAP %>% 
  select(year, basin_AIP, lat, depth, sal, temp,
         oxygen, aou, nitrate, silicate, phosphate,
         tco2, talk)

GLODAP_time_series <- GLODAP_time_series %>% 
  mutate(depth_grid = cut(depth, seq(0,1e4,1000)))

GLODAP_time_series <- GLODAP_time_series %>% 
  pivot_longer(sal:talk,
               names_to = "parameter",
               values_to = "value") %>% 
  filter(!is.na(value),
         !is.na(depth_grid))

GLODAP_time_series %>%
  group_split(basin_AIP, depth_grid) %>%
  # head(1) %>%
  map(
    ~ ggplot(data = .x,
             aes(year, value, col = lat)) +
      geom_jitter(alpha = 0.1) +
      scale_color_divergent() +
      facet_grid(parameter ~ depth_grid,
                 scales = "free_y") +
      labs(title = paste(
        "basin_AIP:",
        unique(.x$basin_AIP),
        "| depth_grid:",
        unique(.x$depth_grid)
      ))
  )

9 CANYON-B

9.1 Comparison to GLODAP

source("/net/kryo/work/uptools/co2_calculation/CANYON-B/CANYONB.R")

GLODAP_Can_B <- GLODAP %>%
  mutate(lon = if_else(lon > 180, lon - 360, lon)) %>%
  arrange(year) %>% 
  select(row_number, year, date, lat, lon, depth, basin_AIP,
         temp, sal, oxygen,
         talk, tco2, nitrate, phosphate, silicate)

# filter rows with essential variables for Canyon-B
GLODAP_Can_B <- GLODAP_Can_B %>%
  filter(across(c(lat, lon, depth,
                  temp, sal, oxygen), ~ !is.na(.x)))

GLODAP_Can_B <- GLODAP_Can_B %>%
  mutate(as_tibble(
    CANYONB(
      date = paste0(as.character(date), " 12:00"),
      lat = lat,
      lon = lon,
      pres = depth,
      temp = temp,
      psal = sal,
      doxy = oxygen,
      param = c("AT", "CT", "NO3", "PO4", "SiOH4")
    )
  ))

GLODAP_Can_B <- GLODAP_Can_B %>%
  select(-ends_with(c("_cim", "_cin", "_cii")))


GLODAP_Can_B <- GLODAP_Can_B %>%
  rename(
    "talk_CANYONB" = "AT",
    "tco2_CANYONB" = "CT",
    "nitrate_CANYONB" = "NO3",
    "phosphate_CANYONB" = "PO4",
    "silicate_CANYONB" = "SiOH4"
  )


variables <- c("talk", "tco2", "nitrate", "phosphate", "silicate")

for (i_variable in variables) {
  # i_variable <- variables[1]
  
  # calculate equal axis limits and binwidth
  axis_lims <- GLODAP_Can_B %>%
    drop_na() %>% 
    summarise(max_value = max(c(max(
      !!sym(i_variable)
    ),
    max(!!sym(
      paste0(i_variable, "_CANYONB")
    )))),
    min_value = min(c(min(
      !!sym(i_variable)
    ),
    min(!!sym(
      paste0(i_variable, "_CANYONB")
    )))))
  
  binwidth_value <- (axis_lims$max_value - axis_lims$min_value) / 60
  axis_lims <- c(axis_lims$min_value, axis_lims$max_value)
  
  print(
    ggplot(GLODAP_Can_B, aes(
      x = !!sym(i_variable),
      y = !!sym(paste0(i_variable, "_CANYONB"))
    )) +
      geom_bin2d(binwidth = binwidth_value) +
      scale_fill_viridis_c(trans = "log10") +
      geom_abline(slope = 1, col = 'red') +
      coord_equal(xlim = axis_lims,
                  ylim = axis_lims) +
      facet_wrap( ~ basin_AIP) +
      labs(title = "All years")
  ) 
  
  
  # for (i_year in unique(GLODAP_Can_B$year)) {
  #   # i_year <- 2017
  #   
  #   print(
  #     ggplot(
  #       GLODAP_Can_B %>% filter(year == i_year),
  #       aes(x = !!sym(i_variable),
  #           y = !!sym(paste0(
  #             i_variable, "_CANYONB"
  #           )))
  #     ) +
  #       geom_bin2d(binwidth = binwidth_value) +
  #       scale_fill_viridis_c(trans = "log10") +
  #       geom_abline(slope = 1, col = 'red') +
  #       coord_equal(xlim = axis_lims,
  #                   ylim = axis_lims) +
  #       facet_wrap( ~ basin_AIP) +
  #       labs(title = paste("Year:", i_year))
  #   )
  # }
  
}

9.2 Write Canyon-B file

GLODAP_Can_B %>% 
  select(row_number,
         talk_CANYONB, tco2_CANYONB,
         nitrate_CANYONB, phosphate_CANYONB, silicate_CANYONB) %>% 
  write_csv(paste(path_preprocessing,
                             "GLODAPv2.2021_Canyon-B.csv",
                             sep = ""))

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 15.2

Matrix products: default
BLAS:   /usr/local/R-4.0.3/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R-4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kableExtra_1.3.1 ggrepel_0.8.2    lubridate_1.7.9  ggforce_0.3.3   
 [5] metR_0.9.0       scico_1.2.0      patchwork_1.1.1  collapse_1.5.0  
 [9] forcats_0.5.0    stringr_1.4.0    dplyr_1.0.5      purrr_0.3.4     
[13] readr_1.4.0      tidyr_1.1.3      tibble_3.1.3     ggplot2_3.3.5   
[17] tidyverse_1.3.0  workflowr_1.6.2 

loaded via a namespace (and not attached):
 [1] nlme_3.1-149             fs_1.5.0                 RColorBrewer_1.1-2      
 [4] webshot_0.5.2            httr_1.4.2               rprojroot_2.0.2         
 [7] tools_4.0.3              backports_1.1.10         bslib_0.2.5.1           
[10] utf8_1.1.4               R6_2.5.0                 mgcv_1.8-33             
[13] DBI_1.1.0                colorspace_2.0-2         withr_2.3.0             
[16] tidyselect_1.1.0         compiler_4.0.3           git2r_0.27.1            
[19] cli_3.0.1                rvest_0.3.6              xml2_1.3.2              
[22] labeling_0.4.2           sass_0.4.0               scales_1.1.1            
[25] checkmate_2.0.0          digest_0.6.27            rmarkdown_2.10          
[28] pkgconfig_2.0.3          htmltools_0.5.1.1        highr_0.8               
[31] dbplyr_1.4.4             rlang_0.4.11             readxl_1.3.1            
[34] rstudioapi_0.13          jquerylib_0.1.4          generics_0.1.0          
[37] farver_2.0.3             jsonlite_1.7.1           magrittr_1.5            
[40] Matrix_1.2-18            Rcpp_1.0.5               munsell_0.5.0           
[43] fansi_0.4.1              lifecycle_1.0.0          stringi_1.5.3           
[46] whisker_0.4              yaml_2.2.1               MASS_7.3-53             
[49] grid_4.0.3               blob_1.2.1               parallel_4.0.3          
[52] promises_1.1.1           crayon_1.3.4             lattice_0.20-41         
[55] splines_4.0.3            haven_2.3.1              hms_0.5.3               
[58] knitr_1.33               pillar_1.6.2             reprex_0.3.0            
[61] glue_1.4.2               evaluate_0.14            RcppArmadillo_0.10.1.2.0
[64] data.table_1.14.0        modelr_0.1.8             vctrs_0.3.8             
[67] tweenr_1.0.2             httpuv_1.5.4             cellranger_1.1.0        
[70] gtable_0.3.0             polyclip_1.10-0          assertthat_0.2.1        
[73] xfun_0.25                broom_0.7.9              RcppEigen_0.3.3.7.0     
[76] later_1.2.0              viridisLite_0.3.0        ellipsis_0.3.2

GLODAPv2_2020

Jens Daniel Müller

16 December, 2021