Last updated: 2020-09-08

Checks: 7 0

Knit directory: Cant_eMLR/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200707) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 03e5b6b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    data/GLODAPv1_1/
    Ignored:    data/GLODAPv2_2016b_MappedClimatologies/
    Ignored:    data/GLODAPv2_2020/
    Ignored:    data/Gruber_2019/
    Ignored:    data/WOCE/
    Ignored:    data/World_Ocean_Atlas_2013_Clement/
    Ignored:    data/World_Ocean_Atlas_2018/
    Ignored:    data/eMLR/
    Ignored:    data/mapping/
    Ignored:    data/pCO2_atmosphere/
    Ignored:    dump/

Unstaged changes:
    Modified:   analysis/_site.yml
    Modified:   analysis/analysis_this_study.Rmd
    Modified:   analysis/config_nomenclature.Rmd
    Modified:   analysis/config_parameterization.Rmd
    Modified:   analysis/eMLR_model_fitting.Rmd
    Modified:   analysis/mapping_cant_calculation.Rmd
    Modified:   analysis/read_World_Ocean_Atlas_2018.Rmd
    Modified:   code/plotting_functions.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/read_GLODAPv2_2020.Rmd) and HTML (docs/read_GLODAPv2_2020.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 03e5b6b jens-daniel-mueller 2020-09-08 cleaned script
html 95d4e9b jens-daniel-mueller 2020-09-08 Build site.
Rmd d3ddb89 jens-daniel-mueller 2020-09-08 considered qc flag
html a50f053 jens-daniel-mueller 2020-09-07 Build site.
Rmd fe673db jens-daniel-mueller 2020-09-07 fitted model by era and saved era coeffcients
html da445a6 jens-daniel-mueller 2020-09-04 Build site.
html de4515a jens-daniel-mueller 2020-09-04 Build site.
Rmd 8c0f6ac jens-daniel-mueller 2020-09-04 Revised data cleaning
html 5511301 jens-daniel-mueller 2020-09-04 Build site.
Rmd 41d3ba3 jens-daniel-mueller 2020-09-04 explanation added
html fd28faa jens-daniel-mueller 2020-09-04 Build site.
Rmd d998206 jens-daniel-mueller 2020-09-04 era as factor
html 1dcd293 jens-daniel-mueller 2020-09-04 Build site.
Rmd bac1f3a jens-daniel-mueller 2020-09-04 no flagging removal, but plotted
html fa11a74 jens-daniel-mueller 2020-09-02 Build site.
Rmd da4907d jens-daniel-mueller 2020-09-02 all flagged variables
html 429aab3 jens-daniel-mueller 2020-09-01 Build site.
html f4216dd jens-daniel-mueller 2020-09-01 Build site.
Rmd 8f3ce45 jens-daniel-mueller 2020-09-01 rebuild without PO4 star selection, oxygen only
html b41a87d jens-daniel-mueller 2020-09-01 Build site.
Rmd 4dcad9a jens-daniel-mueller 2020-09-01 plots subsetting maps with facets
html 86bada3 jens-daniel-mueller 2020-09-01 Build site.
Rmd a0b3cf0 jens-daniel-mueller 2020-09-01 only cstar flagging checked in map
html d895004 jens-daniel-mueller 2020-09-01 Build site.
Rmd be824b0 jens-daniel-mueller 2020-09-01 talk subsetting stats flagging resolved
html 3d9b20f jens-daniel-mueller 2020-09-01 Build site.
Rmd b047c4b jens-daniel-mueller 2020-09-01 subsetting stats for talk and po4
html 4049eb0 jens-daniel-mueller 2020-09-01 Build site.
Rmd 782c595 jens-daniel-mueller 2020-09-01 cleaning process maps
html c575584 jens-daniel-mueller 2020-08-31 Build site.
Rmd 824d7b7 jens-daniel-mueller 2020-08-31 add tco2 sampling grid to map
html 13a76d5 jens-daniel-mueller 2020-08-28 Build site.
Rmd 2e6a4ca jens-daniel-mueller 2020-08-28 XXX
html 27404de jens-daniel-mueller 2020-08-27 Build site.
html b6d0e6a jens-daniel-mueller 2020-08-27 Build site.
html f40e48b jens-daniel-mueller 2020-08-26 Build site.
html ec20f40 jens-daniel-mueller 2020-08-24 Build site.
html 0772e79 jens-daniel-mueller 2020-08-21 Build site.
Rmd e04b03d jens-daniel-mueller 2020-08-21 Individual cruise plots with all relevant parameters and histograms in eMLR
html 5ffe187 jens-daniel-mueller 2020-08-20 Build site.
html 1064ef8 jens-daniel-mueller 2020-08-19 Build site.
html 3574489 jens-daniel-mueller 2020-08-18 Build site.
Rmd 1a4d1a8 jens-daniel-mueller 2020-08-18 parameters included in mapping
html 66382bf jens-daniel-mueller 2020-08-18 Build site.
Rmd 7460b1f jens-daniel-mueller 2020-08-18 assign A16 cruises to GO_SHIP
html 29a537a jens-daniel-mueller 2020-08-18 Build site.
Rmd 7fb61d5 jens-daniel-mueller 2020-08-18 rerun with all parameters in one file
html 4faaf29 jens-daniel-mueller 2020-08-18 Build site.
Rmd a24a414 jens-daniel-mueller 2020-08-18 re-run with all parameters in one list
html 9eaa2fb jens-daniel-mueller 2020-08-18 Build site.
Rmd 079873a jens-daniel-mueller 2020-08-18 rerun with parameters in list
html a3b6b68 jens-daniel-mueller 2020-08-13 Build site.
html 00c2120 jens-daniel-mueller 2020-08-13 Build site.
html dec27b2 jens-daniel-mueller 2020-08-13 Build site.
html 6fd35c8 jens-daniel-mueller 2020-08-13 Build site.
Rmd 634ae4e jens-daniel-mueller 2020-08-13 cleaning and reference to parameters and masks
html bf69270 jens-daniel-mueller 2020-08-13 Build site.
html 1176c9a jens-daniel-mueller 2020-08-12 Build site.
html 1a078e7 jens-daniel-mueller 2020-08-12 Build site.
Rmd 6344282 jens-daniel-mueller 2020-08-12 updated lon conversion, variable names and masks
html f143b2d jens-daniel-mueller 2020-08-12 Build site.
html 2d179e7 jens-daniel-mueller 2020-08-12 Build site.
html 5d33341 jens-daniel-mueller 2020-08-11 Build site.
html 8a010ca jens-daniel-mueller 2020-08-11 Build site.
html a01041a jens-daniel-mueller 2020-08-11 Build site.
html 1b3f5e2 jens-daniel-mueller 2020-08-11 Build site.
Rmd e646f85 jens-daniel-mueller 2020-08-11 parameterization in separate file
html e18e59a jens-daniel-mueller 2020-08-10 Build site.
html 7d7900a jens-daniel-mueller 2020-08-07 Build site.
html ddf44d7 jens-daniel-mueller 2020-07-31 Build site.
Rmd b4fa5a8 jens-daniel-mueller 2020-07-31 link to all cruise plots included
html 243f31c jens-daniel-mueller 2020-07-31 Build site.
Rmd 977782e jens-daniel-mueller 2020-07-31 minor plot update, open questions added
html 9dc5d7f jens-daniel-mueller 2020-07-29 Build site.
html 21524b4 jens-daniel-mueller 2020-07-29 Build site.
html ff17968 jens-daniel-mueller 2020-07-29 Build site.
Rmd cb5074e jens-daniel-mueller 2020-07-29 subset Cant to GLODAP before merging
html 8d71a56 jens-daniel-mueller 2020-07-29 Build site.
Rmd 82db969 jens-daniel-mueller 2020-07-29 format and plots revised
html 2e2412d jens-daniel-mueller 2020-07-29 Build site.
Rmd a9fbf3a jens-daniel-mueller 2020-07-29 formating
html 5274d0b jens-daniel-mueller 2020-07-29 Build site.
Rmd 304aa74 jens-daniel-mueller 2020-07-29 plot improvements
html 75da185 jens-daniel-mueller 2020-07-29 Build site.
Rmd 26db597 jens-daniel-mueller 2020-07-29 cleaned code and output
html 2e08795 jens-daniel-mueller 2020-07-24 Build site.
html c84424c jens-daniel-mueller 2020-07-23 Build site.
Rmd 8d47b71 jens-daniel-mueller 2020-07-23 era names changed
html 2df2065 jens-daniel-mueller 2020-07-23 Build site.
Rmd fa350cf jens-daniel-mueller 2020-07-23 predictor correlation plots, bin2d map plots
html 9d1d67d jens-daniel-mueller 2020-07-23 Build site.
Rmd 3b6658b jens-daniel-mueller 2020-07-23 predictor correlation plots, bin2d map plots
html 2e3691a jens-daniel-mueller 2020-07-23 Build site.
Rmd 26bdc0a jens-daniel-mueller 2020-07-23 new era label, predictor correlation check started
html 556e6cc jens-daniel-mueller 2020-07-23 Build site.
html c1289a2 jens-daniel-mueller 2020-07-23 Build site.
html 2890e73 jens-daniel-mueller 2020-07-23 Build site.
html fdfa7b9 jens-daniel-mueller 2020-07-22 Build site.
html bb9c002 jens-daniel-mueller 2020-07-21 Build site.
Rmd d2ed0f8 jens-daniel-mueller 2020-07-21 harmonied lat lon labeling
html b47adc2 jens-daniel-mueller 2020-07-20 Build site.
Rmd 366d7d5 jens-daniel-mueller 2020-07-20 update plots
html 25812b9 jens-daniel-mueller 2020-07-20 Build site.
Rmd e45a882 jens-daniel-mueller 2020-07-20 basins included, higher grid resolution for maps
html 6a6431e jens-daniel-mueller 2020-07-20 Build site.
Rmd 50be8de jens-daniel-mueller 2020-07-20 formating
html b9769e8 jens-daniel-mueller 2020-07-20 Build site.
Rmd 82120fe jens-daniel-mueller 2020-07-20 nitrate, gamma included, 65N removed
html fe55909 jens-daniel-mueller 2020-07-19 Build site.
Rmd 66540a9 jens-daniel-mueller 2020-07-19 formating and color scale
html 61b3390 jens-daniel-mueller 2020-07-18 Build site.
Rmd d3032c7 jens-daniel-mueller 2020-07-18 formating
html 22b588c jens-daniel-mueller 2020-07-18 Build site.
html 3964fd7 jens-daniel-mueller 2020-07-17 Build site.
Rmd 4685ad7 jens-daniel-mueller 2020-07-17 150m depth limit implemented
html 56c3ed9 jens-daniel-mueller 2020-07-14 Build site.
Rmd aed89af jens-daniel-mueller 2020-07-14 added Rmd for MappedClimatologies
html 74d4abd jens-daniel-mueller 2020-07-14 Build site.
html 1c511ce jens-daniel-mueller 2020-07-14 Build site.
Rmd e03016e jens-daniel-mueller 2020-07-14 split read in per data set

library(tidyverse)
library(lubridate)
library(patchwork)
library(cutr)

1 Read files

Main data source for this project is GLODAPv2.2020_Merged_Master_File.csv downloaded from glodap.info in June 2020.

GLODAP <-
  read_csv(
    here::here(
      "data/GLODAPv2_2020/Merged_data_product",
      "GLODAPv2.2020_Merged_Master_File.csv"
    ),
    na = "-9999",
    col_types = cols(.default = col_double())
  )


# select relevant columns
GLODAP <- GLODAP %>%
  select(cruise:talkqc)

# harmonize column names
GLODAP <- GLODAP  %>%
  rename(sal = salinity,
         tem = temperature)

# harmonize coordinates
GLODAP <- GLODAP  %>%
  rename(lon = longitude,
         lat = latitude) %>%
  mutate(lon = if_else(lon < 20, lon + 360, lon))

# remove irrelevant columns
GLODAP <- GLODAP %>%
  select(-c(month:minute,
            maxsampdepth, bottle, sigma0:sigma4,
            nitrite:nitritef))

GLODAP <- GLODAP %>%
  mutate_at(vars(ends_with(c("f", "qc"))), as.factor)

2 Exploratory data analysis

source("code/eda.R")
eda(GLODAP, "GLODAPv2_2020")
rm(eda)

The output of an automated Exploratory Data Analysis (EDA) performed with the package DataExplorer can be accessed here:

Link to EDA report of GLODAPv2_2020 raw data

3 Data preparation

3.1 Horizontal gridding

For merging with other data sets, all observations were grouped into latitude intervals of:

  • 1° x 1°
GLODAP <- GLODAP %>% 
  mutate(lat = cut(lat, seq(-90, 90, 1), seq(-89.5, 89.5, 1)),
         lat = as.numeric(as.character(lat)),
         lon = cut(lon, seq(20, 380, 1), seq(20.5, 379.5, 1)),
         lon = as.numeric(as.character(lon)))

3.2 Reference eras

Samples were assigned to following eras:

  • JGOFS_WOCE: 1981 - 1999

  • GO_SHIP: 2000 - 2012

  • new_era: > 2013

GLODAP <- GLODAP %>%
  filter(year >= parameters$year_JGOFS_start) %>% 
  mutate(era = "JGOFS_WOCE",
         era = if_else(year > parameters$year_JGOFS_end, "GO_SHIP", era),
         era = if_else(year > parameters$year_GOSHIP_end, "new_era", era))

GLODAP <- GLODAP %>% 
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")))

3.3 Spatial boundaries

3.3.1 Depth

Only observations were taken into consideration with:

  • minimum sampling depth: 150m
# create observations grid and summary stats before remving data

GLODAP_obs_grid <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "all")

GLODAP_stats <- GLODAP %>% 
  summarise(all = n())

##

GLODAP <- GLODAP %>% 
  filter(depth >= parameters$depth_min)

3.3.2 Bottomdepth

Only observations were taken into consideration with:

  • minimum bottom depth: 500m
GLODAP <- GLODAP %>% 
  filter(bottomdepth >= parameters$bottomdepth_min)

3.3.3 Basin mask

The basin mask from the World Ocean Atlas was used. For details consult the data base subsection for WOA18 data.

Please note that some GLODAP observations were made outside the WOA18 basin mask and will be removed for further analysis (eg North Sea, Sea of Japan).

GLODAP <- inner_join(GLODAP, basinmask)

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(spatial_boundaries = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "spatial_boundaries")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)
GLODAP_obs <- GLODAP %>% 
  group_by(lat, lon) %>% 
  summarise(n = n()) %>% 
  ungroup()

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = basinmask, aes(lon, lat, fill = basin)) +
  geom_raster(data = GLODAP_obs, aes(lon, lat)) +
  scale_fill_brewer(palette = "Dark2") +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank(),
        axis.title = element_blank())

rm(GLODAP_obs)

3.4 Flags and missing data

Only rows (samples) for which all relevant parameters are available were selected, ie NA’s were removed.

According to Olsen et al (2020), flags within the merged master file identify:

  • f:

    • 2: Acceptable
    • 0: Interpolated or calculated value (or Average of replicate, Manual chromatographic peak measurement)
    • 9: Data not used (so, only NA data should have this flag)
  • qc:

    • 1: Adjusted or unadjusted data
    • 0: Data appear of good quality but have not been subjected to full secondary QC
    • data with poor or uncertain quality are excluded.

Following flagging criteria were taken into account:

  • flag_f: 2, 0
  • flag_qc: 1

Summary statistics were calculated during cleaning process.

3.4.1 tco2

3.4.1.1 NA

The vast majority of rows is removed due to missing tco2 observations.

GLODAP <- GLODAP %>% 
  filter(!is.na(tco2))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(tco2_values = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "tco2_values")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.1.2 f flag

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, tco2f)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat),
              fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ tco2f) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top")

###

GLODAP <- GLODAP %>%
  filter(tco2f %in% parameters$flag_f)

# ##
# 
# GLODAP_stats_temp <- GLODAP %>%
#   summarise(tco2_f_flag = n())
# 
# GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
# rm(GLODAP_stats_temp)
# 
# ##
# 
# GLODAP_obs_grid_temp <- GLODAP %>%
#   count(lat, lon, era) %>%
#   mutate(cleaning_level = "tco2_f_flag")
# 
# GLODAP_obs_grid <-
#   bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)
# 
# rm(GLODAP_obs_grid_temp)

3.4.1.3 qc flag

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, tco2qc)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat),
              fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ tco2qc) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top")

###

GLODAP <- GLODAP %>%
  filter(tco2qc %in% parameters$flag_qc)

##

GLODAP_stats_temp <- GLODAP %>%
  summarise(tco2_qc_flag = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era) %>%
  mutate(cleaning_level = "tco2_qc_flag")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.2 talk

3.4.2.1 NA

Quite a few tco2 observations are discarded, due to missing talk data, in particular in the JGOFS/WOCE era. However, there seems to be a high number of observations remaining from the affected cruises, so that the coverage does not seem to be reduced drastically.

GLODAP <- GLODAP %>% 
  mutate(talkna = if_else(is.na(talk), "NA", "Value"))

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, talkna)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat),
              fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ talkna) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top")

GLODAP <- GLODAP %>% 
  select(-talkna) %>% 
  filter(!is.na(talk))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(talk_values = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "talk_values")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.2.2 f flag

Restricting the f flag to 2 would results in data gaps in the south-east Pacific. Interpolated or calculated data (f flag 0) are therefore included.

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, talkf)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
    scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era~talkf) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

# ###

GLODAP <- GLODAP %>%
  filter(talkf %in% parameters$flag_f)

##
# 
# GLODAP_stats_temp <- GLODAP %>%
#   summarise(talk_f_flag = n())
# 
# GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
# rm(GLODAP_stats_temp)
# 
# ##
# 
# GLODAP_obs_grid_temp <- GLODAP %>%
#   count(lat, lon, era) %>%
#   mutate(cleaning_level = "talk_f_flag")
# 
# GLODAP_obs_grid <-
#   bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)
# 
# rm(GLODAP_obs_grid_temp)

3.4.2.3 qc flag

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, talkqc)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
    scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era~talkqc) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

###

GLODAP <- GLODAP %>%
  filter(talkqc %in% parameters$flag_qc)

##

GLODAP_stats_temp <- GLODAP %>%
  summarise(talk_qc_flag = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era) %>%
  mutate(cleaning_level = "talk_qc_flag")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.3 Phosphate

3.4.3.1 NA

Quite a few tco2/talk observations are discarded, due to missing phosphate data. However, there seems to be a high number of observations remaining from the affected cruises, so that the coverage does not seem to be reduced drastically.

GLODAP <- GLODAP %>% 
  mutate(phosphatena = if_else(is.na(phosphate), "NA", "Value"))

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, phosphatena)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat),
              fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ phosphatena) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top")

GLODAP <- GLODAP %>% 
  select(-phosphatena) %>% 
  filter(!is.na(phosphate))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(phosphate_values = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "phosphate_values")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.3.2 f flag

Restricting the f flag to 2 would not drastically reduce available data. Interpolated or calculated data (f flag 0) are therefore included.

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, phosphatef)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
    scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era~phosphatef) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

###

GLODAP <- GLODAP %>%
  filter(phosphatef %in% parameters$flag_f)

# ##
# 
# GLODAP_stats_temp <- GLODAP %>%
#   summarise(phosphate_f_flag = n())
# 
# GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
# rm(GLODAP_stats_temp)
# 
# ##
# 
# GLODAP_obs_grid_temp <- GLODAP %>%
#   count(lat, lon, era) %>%
#   mutate(cleaning_level = "phosphate_f_flag")
# 
# GLODAP_obs_grid <-
#   bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)
# 
# rm(GLODAP_obs_grid_temp)

3.4.3.3 qc flag

Phosphate data for which secondary quality was not applied (qc flag 0) were removed.

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, phosphateqc)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
    scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era~phosphateqc) +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

###

GLODAP <- GLODAP %>%
  filter(phosphateqc %in% parameters$flag_qc)

##

GLODAP_stats_temp <- GLODAP %>%
  summarise(phosphate_qc_flag = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era) %>%
  mutate(cleaning_level = "phosphate_qc_flag")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.4 eMLR variables

Rows with missing eMLR variables were removed, only the qc flag was considered.

GLODAP <- GLODAP %>% 
  filter(!is.na(tem))

##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(sal))

GLODAP <- GLODAP %>%
  filter(salinityf %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(salinityqc %in% parameters$flag_qc)

##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(silicate))

GLODAP <- GLODAP %>%
  filter(silicatef %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(silicateqc %in% parameters$flag_qc)
##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(oxygen))

GLODAP <- GLODAP %>%
  filter(oxygenf %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(oxygenqc %in% parameters$flag_qc)
##

GLODAP <- GLODAP %>% 
  filter(!is.na(aou))

GLODAP <- GLODAP %>%
  filter(aouf %in% parameters$flag_f)
##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(nitrate))

GLODAP <- GLODAP %>%
  filter(nitratef %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(nitrateqc %in% parameters$flag_qc)
##

GLODAP <- GLODAP %>% 
  filter(!is.na(depth))

GLODAP <- GLODAP %>% 
  filter(!is.na(gamma))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(eMLR_variables = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)

rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "eMLR_variables")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.5 Manual adjustment A16 cruise

For harmonization with Gruber et al. (2019), cruises 1041 (A16N) and 1042 (A16S) were grouped into the GO_SHIP area despite taking place in 2013/14.

GLODAP_cruises <- GLODAP %>% 
  filter(basin == "Atlantic",
         year %in% c(2013, 2014)) %>% 
  count(lat, lon, cruise)

ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = GLODAP_cruises, aes(lon, lat, fill = as.factor(cruise))) +
  scale_fill_brewer(palette = "Dark2") +
  coord_quickmap(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

rm(GLODAP_cruises)
GLODAP <- GLODAP %>%
  mutate(era = as.character(era)) %>% 
  mutate(era = if_else(cruise %in% c(1041, 1042),
                      "GO_SHIP", era))

GLODAP <- GLODAP %>% 
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")))

3.5 Write summary file

GLODAP_stats_long <- GLODAP_stats %>%
  pivot_longer(1:length(GLODAP_stats),
               names_to = "parameter",
               values_to = "n")

GLODAP_stats_long  %>%  write_csv(
  here::here(
    "data/GLODAPv2_2020/_summarized_data_files",
    "GLODAPv2.2020_stats.csv"
  )
)

##

GLODAP_obs_grid  %>%
  filter(cleaning_level == "eMLR_variables") %>%
  write_csv(
    here::here(
      "data/GLODAPv2_2020/_summarized_data_files",
      "GLODAPv2.2020_clean_obs_grid.csv"
    )
  )

##

GLODAP <- GLODAP %>% 
  select(year, era, basin, lat, lon, cruise,
         bottomdepth, depth, tem, sal, gamma,
         tco2, talk, phosphate,
         oxygen, aou, nitrate, silicate)

GLODAP  %>%  write_csv(
  here::here(
    "data/GLODAPv2_2020/_summarized_data_files",
    "GLODAPv2.2020_clean.csv"
  )
)

4 Overview plots

4.1 Cleaning stats

Number of observations at various steps of data cleaning.

GLODAP_stats_long <- GLODAP_stats_long %>%
  mutate(parameter = fct_reorder(parameter, n))

GLODAP_stats_long %>% 
  ggplot(aes(parameter, n/1000)) +
  geom_col() +
  coord_flip() +
  labs(y = "n (1000)") +
  theme(axis.title.y = element_blank())

rm(GLODAP_stats_long)

4.2 Assign coarse spatial grid

For the following plots, the cleaned data set was re-opened and observations were gridded spatially to intervals of:

  • 5° x 5°
GLODAP <- GLODAP %>% 
  mutate(lat_grid = cut(lat, seq(-90, 90, 5), seq(-87.5, 87.5, 5)),
         lat_grid = as.numeric(as.character(lat_grid)),
         lon_grid = cut(lon, seq(20, 380, 5), seq(22.5, 377.5, 5)),
         lon_grid = as.numeric(as.character(lon_grid)))

4.3 Histogram Zonal coverage

GLODAP_histogram_lat <- GLODAP %>%
  group_by(era, lat_grid, basin) %>%
  tally() %>%
  ungroup()

GLODAP_histogram_lat %>%
  ggplot(aes(lat_grid, n, fill = era)) +
  geom_col() +
  scale_fill_brewer(palette = "Dark2") +
  facet_wrap( ~ basin) +
  coord_flip(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

rm(GLODAP_histogram_lat)

4.4 Histogram temporal coverage

GLODAP_histogram_year <- GLODAP %>%
  group_by(year, basin) %>%
  tally() %>%
  ungroup()

era_median_year <- GLODAP %>%
  group_by(era) %>%
  summarise(t_ref = median(year)) %>%
  ungroup()

GLODAP_histogram_year %>%
  ggplot() +
  geom_vline(xintercept = c(
    parameters$year_JGOFS_end + 0.5,
    parameters$year_GOSHIP_end + 0.5
  )) +
  geom_col(aes(year, n, fill = basin)) +
  geom_point(
    data = era_median_year,
    aes(t_ref, 0, shape = "Median year"),
    size = 2,
    fill = "white"
  ) +
  scale_fill_brewer(palette = "Dark2") +
  scale_shape_manual(values = 24, name = "") +
  scale_y_continuous() +
  coord_cartesian(expand = 0) +
  theme(
    legend.position = "top",
    legend.direction = "vertical",
    legend.title = element_blank(),
    axis.title.x = element_blank()
  )

rm(GLODAP_histogram_year,
   era_median_year)

4.5 Zonal temporal coverage (Hovmoeller)

GLODAP_hovmoeller_year <- GLODAP %>%
  group_by(year, lat_grid, basin) %>%
  tally() %>%
  ungroup()

GLODAP_hovmoeller_year %>%
  ggplot(aes(year, lat_grid, fill = log10(n))) +
  geom_tile() +
  geom_vline(xintercept = c(1999.5, 2012.5)) +
  scale_fill_viridis_c(option = "magma", direction = -1) +
  facet_wrap( ~ basin, ncol = 1) +
  theme(legend.position = "top",
        axis.title.x = element_blank())

rm(GLODAP_hovmoeller_year)

4.6 Coverage maps by era

4.6.1 Subsetting process

GLODAP_obs_grid <- GLODAP_obs_grid %>%
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")),
         cleaning_level = factor(
           cleaning_level,
           unique(GLODAP_obs_grid$cleaning_level)
         ))



ggplot() +
  geom_raster(data = landmask,
              aes(lon, lat),
              fill = "grey80") +
  geom_raster(data = GLODAP_obs_grid %>%
                filter(cleaning_level == "all") %>% 
                select(-cleaning_level),
              aes(lon, lat, fill = "all")) +
  geom_raster(data = GLODAP_obs_grid %>%
                filter(cleaning_level != "all"),
              aes(lon, lat, fill = "subset")) +
  coord_quickmap(expand = FALSE) +
  scale_fill_manual(values = c("grey", "red"), name = "") +
  facet_grid(cleaning_level ~ era) +
  theme(legend.position = "top",
        axis.title = element_blank())

4.6.2 eMLR variables

Grey pixels refer to sampling locations filtered for availability of tco2 data only.

GLODAP <- GLODAP %>% 
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")))

GLODAP_tco2_grid <- GLODAP %>%
  count(lat, lon, era)

GLODAP %>%
  ggplot(aes(lon, lat)) +
  geom_raster(data = landmask,
              aes(lon, lat), fill = "grey80") +
  geom_raster(data = GLODAP_tco2_grid, aes(lon, lat), fill = "grey80") +
  geom_bin2d(binwidth = c(1,1)) +
  scale_fill_viridis_c(option = "magma", direction = -1, trans = "log10",
                       name = "log10(eMLR_variables)") +
  coord_quickmap(expand = FALSE) +
  facet_wrap(~era, ncol = 1) +
  theme(legend.position = "top",
        axis.title = element_blank())


sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Germany.1252  LC_CTYPE=English_Germany.1252   
[3] LC_MONETARY=English_Germany.1252 LC_NUMERIC=C                    
[5] LC_TIME=English_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cutr_0.0.0.9000 patchwork_1.0.1 lubridate_1.7.9 forcats_0.5.0  
 [5] stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4     readr_1.3.1    
 [9] tidyr_1.1.0     tibble_3.0.3    ggplot2_3.3.2   tidyverse_1.3.0
[13] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         here_0.1           assertthat_0.2.1   rprojroot_1.3-2   
 [5] digest_0.6.25      R6_2.4.1           cellranger_1.1.0   backports_1.1.8   
 [9] reprex_0.3.0       evaluate_0.14      httr_1.4.2         pillar_1.4.6      
[13] rlang_0.4.7        readxl_1.3.1       rstudioapi_0.11    whisker_0.4       
[17] blob_1.2.1         rmarkdown_2.3      labeling_0.3       munsell_0.5.0     
[21] broom_0.7.0        compiler_4.0.2     httpuv_1.5.4       modelr_0.1.8      
[25] xfun_0.16          pkgconfig_2.0.3    htmltools_0.5.0    tidyselect_1.1.0  
[29] viridisLite_0.3.0  fansi_0.4.1        crayon_1.3.4       dbplyr_1.4.4      
[33] withr_2.2.0        later_1.1.0.1      grid_4.0.2         jsonlite_1.7.0    
[37] gtable_0.3.0       lifecycle_0.2.0    DBI_1.1.0          git2r_0.27.1      
[41] magrittr_1.5       scales_1.1.1       cli_2.0.2          stringi_1.4.6     
[45] farver_2.0.3       fs_1.4.2           promises_1.1.1     xml2_1.3.2        
[49] ellipsis_0.3.1     generics_0.0.2     vctrs_0.3.2        RColorBrewer_1.1-2
[53] tools_4.0.2        glue_1.4.1         hms_0.5.3          yaml_2.2.1        
[57] colorspace_1.4-1   rvest_0.3.6        knitr_1.29         haven_2.3.1