Last updated: 2020-11-17

Checks: 7 0

Knit directory: Cant_eMLR/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200707)

The command set.seed(20200707) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 3a67ed0

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 3a67ed0. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    data/WOCE/
    Ignored:    data/World_Ocean_Atlas_2013_Clement/
    Ignored:    data/auxillary/
    Ignored:    data/eMLR/
    Ignored:    data/input/
    Ignored:    data/interim/
    Ignored:    data/mapping/
    Ignored:    data/output/
    Ignored:    dump/

Untracked files:
    Untracked:  output/figure/analysis_this_study/

Unstaged changes:
    Modified:   analysis/analysis_this_study.Rmd
    Modified:   analysis/read_GLODAPv2_2016_MappedClimatologies.Rmd
    Modified:   code/plotting_functions.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/read_GLODAPv2_2020.Rmd) and HTML (docs/read_GLODAPv2_2020.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	3a67ed0	jens-daniel-mueller	2020-11-17	test rebuild
html	fc70a34	jens-daniel-mueller	2020-11-10	Build site.
html	b170b79	jens-daniel-mueller	2020-11-03	Build site.
html	76fd767	jens-daniel-mueller	2020-10-14	Build site.
Rmd	4e84499	jens-daniel-mueller	2020-10-14	GLODAP link in new tab
html	411a35b	jens-daniel-mueller	2020-10-02	Build site.
Rmd	d0dd42e	jens-daniel-mueller	2020-10-02	included tco2 f flag 2 data
html	0d38979	jens-daniel-mueller	2020-10-01	Build site.
html	80e9b77	jens-daniel-mueller	2020-09-30	Build site.
Rmd	1794dc7	jens-daniel-mueller	2020-09-30	added 3d globe vis
html	4578cfa	jens-daniel-mueller	2020-09-19	Build site.
html	d59f716	jens-daniel-mueller	2020-09-18	Build site.
Rmd	27809b8	jens-daniel-mueller	2020-09-18	restricted tco2 f flag to 2
html	f5a20ac	jens-daniel-mueller	2020-09-17	Build site.
html	cf64460	jens-daniel-mueller	2020-09-17	Build site.
html	d43d9e2	jens-daniel-mueller	2020-09-16	Build site.
html	1f3319a	jens-daniel-mueller	2020-09-16	Build site.
html	da91d98	jens-daniel-mueller	2020-09-10	Build site.
html	a34a239	jens-daniel-mueller	2020-09-10	Build site.
html	c3ffdfd	jens-daniel-mueller	2020-09-08	Build site.
Rmd	04ee730	jens-daniel-mueller	2020-09-08	rebuild after revision
html	10f7e8f	jens-daniel-mueller	2020-09-08	Build site.
Rmd	03e5b6b	jens-daniel-mueller	2020-09-08	cleaned script
html	95d4e9b	jens-daniel-mueller	2020-09-08	Build site.
Rmd	d3ddb89	jens-daniel-mueller	2020-09-08	considered qc flag
html	a50f053	jens-daniel-mueller	2020-09-07	Build site.
Rmd	fe673db	jens-daniel-mueller	2020-09-07	fitted model by era and saved era coeffcients
html	da445a6	jens-daniel-mueller	2020-09-04	Build site.
html	de4515a	jens-daniel-mueller	2020-09-04	Build site.
Rmd	8c0f6ac	jens-daniel-mueller	2020-09-04	Revised data cleaning
html	5511301	jens-daniel-mueller	2020-09-04	Build site.
Rmd	41d3ba3	jens-daniel-mueller	2020-09-04	explanation added
html	fd28faa	jens-daniel-mueller	2020-09-04	Build site.
Rmd	d998206	jens-daniel-mueller	2020-09-04	era as factor
html	1dcd293	jens-daniel-mueller	2020-09-04	Build site.
Rmd	bac1f3a	jens-daniel-mueller	2020-09-04	no flagging removal, but plotted
html	fa11a74	jens-daniel-mueller	2020-09-02	Build site.
Rmd	da4907d	jens-daniel-mueller	2020-09-02	all flagged variables
html	429aab3	jens-daniel-mueller	2020-09-01	Build site.
html	f4216dd	jens-daniel-mueller	2020-09-01	Build site.
Rmd	8f3ce45	jens-daniel-mueller	2020-09-01	rebuild without PO4 star selection, oxygen only
html	b41a87d	jens-daniel-mueller	2020-09-01	Build site.
Rmd	4dcad9a	jens-daniel-mueller	2020-09-01	plots subsetting maps with facets
html	86bada3	jens-daniel-mueller	2020-09-01	Build site.
Rmd	a0b3cf0	jens-daniel-mueller	2020-09-01	only cstar flagging checked in map
html	d895004	jens-daniel-mueller	2020-09-01	Build site.
Rmd	be824b0	jens-daniel-mueller	2020-09-01	talk subsetting stats flagging resolved
html	3d9b20f	jens-daniel-mueller	2020-09-01	Build site.
Rmd	b047c4b	jens-daniel-mueller	2020-09-01	subsetting stats for talk and po4
html	4049eb0	jens-daniel-mueller	2020-09-01	Build site.
Rmd	782c595	jens-daniel-mueller	2020-09-01	cleaning process maps
html	c575584	jens-daniel-mueller	2020-08-31	Build site.
Rmd	824d7b7	jens-daniel-mueller	2020-08-31	add tco2 sampling grid to map
html	13a76d5	jens-daniel-mueller	2020-08-28	Build site.
Rmd	2e6a4ca	jens-daniel-mueller	2020-08-28	XXX
html	27404de	jens-daniel-mueller	2020-08-27	Build site.
html	b6d0e6a	jens-daniel-mueller	2020-08-27	Build site.
html	f40e48b	jens-daniel-mueller	2020-08-26	Build site.
html	ec20f40	jens-daniel-mueller	2020-08-24	Build site.
html	0772e79	jens-daniel-mueller	2020-08-21	Build site.
Rmd	e04b03d	jens-daniel-mueller	2020-08-21	Individual cruise plots with all relevant parameters and histograms in eMLR
html	5ffe187	jens-daniel-mueller	2020-08-20	Build site.
html	1064ef8	jens-daniel-mueller	2020-08-19	Build site.
html	3574489	jens-daniel-mueller	2020-08-18	Build site.
Rmd	1a4d1a8	jens-daniel-mueller	2020-08-18	parameters included in mapping
html	66382bf	jens-daniel-mueller	2020-08-18	Build site.
Rmd	7460b1f	jens-daniel-mueller	2020-08-18	assign A16 cruises to GO_SHIP
html	29a537a	jens-daniel-mueller	2020-08-18	Build site.
Rmd	7fb61d5	jens-daniel-mueller	2020-08-18	rerun with all parameters in one file
html	4faaf29	jens-daniel-mueller	2020-08-18	Build site.
Rmd	a24a414	jens-daniel-mueller	2020-08-18	re-run with all parameters in one list
html	9eaa2fb	jens-daniel-mueller	2020-08-18	Build site.
Rmd	079873a	jens-daniel-mueller	2020-08-18	rerun with parameters in list
html	a3b6b68	jens-daniel-mueller	2020-08-13	Build site.
html	00c2120	jens-daniel-mueller	2020-08-13	Build site.
html	dec27b2	jens-daniel-mueller	2020-08-13	Build site.
html	6fd35c8	jens-daniel-mueller	2020-08-13	Build site.
Rmd	634ae4e	jens-daniel-mueller	2020-08-13	cleaning and reference to parameters and masks
html	bf69270	jens-daniel-mueller	2020-08-13	Build site.
html	1176c9a	jens-daniel-mueller	2020-08-12	Build site.
html	1a078e7	jens-daniel-mueller	2020-08-12	Build site.
Rmd	6344282	jens-daniel-mueller	2020-08-12	updated lon conversion, variable names and masks
html	f143b2d	jens-daniel-mueller	2020-08-12	Build site.
html	2d179e7	jens-daniel-mueller	2020-08-12	Build site.
html	5d33341	jens-daniel-mueller	2020-08-11	Build site.
html	8a010ca	jens-daniel-mueller	2020-08-11	Build site.
html	a01041a	jens-daniel-mueller	2020-08-11	Build site.
html	1b3f5e2	jens-daniel-mueller	2020-08-11	Build site.
Rmd	e646f85	jens-daniel-mueller	2020-08-11	parameterization in separate file
html	e18e59a	jens-daniel-mueller	2020-08-10	Build site.
html	7d7900a	jens-daniel-mueller	2020-08-07	Build site.
html	ddf44d7	jens-daniel-mueller	2020-07-31	Build site.
Rmd	b4fa5a8	jens-daniel-mueller	2020-07-31	link to all cruise plots included
html	243f31c	jens-daniel-mueller	2020-07-31	Build site.
Rmd	977782e	jens-daniel-mueller	2020-07-31	minor plot update, open questions added
html	9dc5d7f	jens-daniel-mueller	2020-07-29	Build site.
html	21524b4	jens-daniel-mueller	2020-07-29	Build site.
html	ff17968	jens-daniel-mueller	2020-07-29	Build site.
Rmd	cb5074e	jens-daniel-mueller	2020-07-29	subset Cant to GLODAP before merging
html	8d71a56	jens-daniel-mueller	2020-07-29	Build site.
Rmd	82db969	jens-daniel-mueller	2020-07-29	format and plots revised
html	2e2412d	jens-daniel-mueller	2020-07-29	Build site.
Rmd	a9fbf3a	jens-daniel-mueller	2020-07-29	formating
html	5274d0b	jens-daniel-mueller	2020-07-29	Build site.
Rmd	304aa74	jens-daniel-mueller	2020-07-29	plot improvements
html	75da185	jens-daniel-mueller	2020-07-29	Build site.
Rmd	26db597	jens-daniel-mueller	2020-07-29	cleaned code and output
html	2e08795	jens-daniel-mueller	2020-07-24	Build site.
html	c84424c	jens-daniel-mueller	2020-07-23	Build site.
Rmd	8d47b71	jens-daniel-mueller	2020-07-23	era names changed
html	2df2065	jens-daniel-mueller	2020-07-23	Build site.
Rmd	fa350cf	jens-daniel-mueller	2020-07-23	predictor correlation plots, bin2d map plots
html	9d1d67d	jens-daniel-mueller	2020-07-23	Build site.
Rmd	3b6658b	jens-daniel-mueller	2020-07-23	predictor correlation plots, bin2d map plots
html	2e3691a	jens-daniel-mueller	2020-07-23	Build site.
Rmd	26bdc0a	jens-daniel-mueller	2020-07-23	new era label, predictor correlation check started
html	556e6cc	jens-daniel-mueller	2020-07-23	Build site.
html	c1289a2	jens-daniel-mueller	2020-07-23	Build site.
html	2890e73	jens-daniel-mueller	2020-07-23	Build site.
html	fdfa7b9	jens-daniel-mueller	2020-07-22	Build site.
html	bb9c002	jens-daniel-mueller	2020-07-21	Build site.
Rmd	d2ed0f8	jens-daniel-mueller	2020-07-21	harmonied lat lon labeling
html	b47adc2	jens-daniel-mueller	2020-07-20	Build site.
Rmd	366d7d5	jens-daniel-mueller	2020-07-20	update plots
html	25812b9	jens-daniel-mueller	2020-07-20	Build site.
Rmd	e45a882	jens-daniel-mueller	2020-07-20	basins included, higher grid resolution for maps
html	6a6431e	jens-daniel-mueller	2020-07-20	Build site.
Rmd	50be8de	jens-daniel-mueller	2020-07-20	formating
html	b9769e8	jens-daniel-mueller	2020-07-20	Build site.
Rmd	82120fe	jens-daniel-mueller	2020-07-20	nitrate, gamma included, 65N removed
html	fe55909	jens-daniel-mueller	2020-07-19	Build site.
Rmd	66540a9	jens-daniel-mueller	2020-07-19	formating and color scale
html	61b3390	jens-daniel-mueller	2020-07-18	Build site.
Rmd	d3032c7	jens-daniel-mueller	2020-07-18	formating
html	22b588c	jens-daniel-mueller	2020-07-18	Build site.
html	3964fd7	jens-daniel-mueller	2020-07-17	Build site.
Rmd	4685ad7	jens-daniel-mueller	2020-07-17	150m depth limit implemented
html	56c3ed9	jens-daniel-mueller	2020-07-14	Build site.
Rmd	aed89af	jens-daniel-mueller	2020-07-14	added Rmd for MappedClimatologies
html	74d4abd	jens-daniel-mueller	2020-07-14	Build site.
html	1c511ce	jens-daniel-mueller	2020-07-14	Build site.
Rmd	e03016e	jens-daniel-mueller	2020-07-14	split read in per data set

library(tidyverse)
library(lubridate)
library(patchwork)
library(cutr)

1 Read files

Main data source for this project is GLODAPv2.2020_Merged_Master_File.csv downloaded from glodap.info in June 2020.

GLODAP <-
  read_csv(
    here::here(
      "data/input/GLODAPv2_2020",
      "GLODAPv2.2020_Merged_Master_File.csv"
    ),
    na = "-9999",
    col_types = cols(.default = col_double())
  )

# select relevant columns
GLODAP <- GLODAP %>%
  select(cruise:talkqc)

# create date column
GLODAP <- GLODAP %>%
  mutate(date = ymd(paste(year, month, day))) %>%
  #decade = as.factor(floor(year / 10) * 10)) %>%
  relocate(date)

# harmonize column names
GLODAP <- GLODAP  %>%
  rename(sal = salinity,
         tem = temperature)

# harmonize coordinates
GLODAP <- GLODAP  %>%
  rename(lon = longitude,
         lat = latitude) %>%
  mutate(lon = if_else(lon < 20, lon + 360, lon))

# remove irrelevant columns
GLODAP <- GLODAP %>%
  select(-c(month:minute,
            maxsampdepth, bottle, sigma0:sigma4,
            nitrite:nitritef))

GLODAP <- GLODAP %>%
  mutate_at(vars(ends_with(c("f", "qc"))), as.factor)

2 Exploratory data analysis

source("code/eda.R")
eda(GLODAP, "GLODAPv2_2020")
rm(eda)

The output of an automated Exploratory Data Analysis (EDA) performed with the package DataExplorer can be accessed here:

Link to EDA report of GLODAPv2_2020 raw data

3 Data preparation

3.1 Horizontal gridding

For merging with other data sets, all observations were grouped into latitude intervals of:

1° x 1°

GLODAP <- GLODAP %>% 
  mutate(lat = cut(lat, seq(-90, 90, 1), seq(-89.5, 89.5, 1)),
         lat = as.numeric(as.character(lat)),
         lon = cut(lon, seq(20, 380, 1), seq(20.5, 379.5, 1)),
         lon = as.numeric(as.character(lon)))

3.2 Reference eras

Samples were assigned to following eras:

JGOFS_WOCE: 1981 - 1999
GO_SHIP: 2000 - 2012
new_era: > 2013

GLODAP <- GLODAP %>%
  filter(year >= parameters$year_JGOFS_start) %>% 
  mutate(era = "JGOFS_WOCE",
         era = if_else(year > parameters$year_JGOFS_end, "GO_SHIP", era),
         era = if_else(year > parameters$year_GOSHIP_end, "new_era", era))

GLODAP <- GLODAP %>% 
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")))

3.3 Spatial boundaries

3.3.1 Depth

Data above:

minimum sampling depth: 150m

were excluded from the fit in order to avoid the inclusion of seasonal biases.

# create observations grid and summary stats before remving data

GLODAP_obs_grid <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "all")

GLODAP_stats <- GLODAP %>% 
  summarise(all = n())

##

GLODAP <- GLODAP %>% 
  filter(depth >= parameters$depth_min)

3.3.2 Bottomdepth

Following restriction was considered, but is currently not implemented:

minimum bottom depth: 500m

GLODAP <- GLODAP %>% 
  filter(bottomdepth >= parameters$bottomdepth_min)

3.3.3 Basin mask

The basin mask from the World Ocean Atlas was used. For details consult the data base subsection for WOA18 data.

Please note that some GLODAP observations were made outside the WOA18 basin mask and will be removed for further analysis (eg North Sea, Sea of Japan).

GLODAP <- inner_join(GLODAP, basinmask)

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(spatial_boundaries = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "spatial_boundaries")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

GLODAP_obs <- GLODAP %>% 
  group_by(lat, lon) %>% 
  summarise(n = n()) %>% 
  ungroup()

map +
  geom_raster(data = basinmask, aes(lon, lat, fill = basin)) +
  geom_raster(data = GLODAP_obs, aes(lon, lat)) +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.position = "top",
        legend.title = element_blank())

rm(GLODAP_obs)

3.4 Flags and missing data

Only rows (samples) for which all relevant parameters are available were selected, ie NA’s were removed.

According to Olsen et al (2020), flags within the merged master file identify:

f:
- 2: Acceptable
- 0: Interpolated or calculated value (or Average of replicate, Manual chromatographic peak measurement)
- 9: Data not used (so, only NA data should have this flag)
qc:
- 1: Adjusted or unadjusted data
- 0: Data appear of good quality but have not been subjected to full secondary QC
- data with poor or uncertain quality are excluded.

Following flagging criteria were taken into account:

flag_f: 2, 0
flag_qc: 1

Summary statistics were calculated during cleaning process.

3.4.1 tco2

3.4.1.1 NA

The vast majority of rows is removed due to missing tco2 observations.

GLODAP <- GLODAP %>% 
  filter(!is.na(tco2))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(tco2_values = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "tco2_values")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.1.2 f flag

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, tco2f)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ tco2f) +
  theme(legend.position = "top")

###

GLODAP <- GLODAP %>%
  filter(tco2f %in% parameters$flag_f)

# ##
# 
# GLODAP_stats_temp <- GLODAP %>%
#   summarise(tco2_f_flag = n())
# 
# GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
# rm(GLODAP_stats_temp)
# 
# ##
# 
# GLODAP_obs_grid_temp <- GLODAP %>%
#   count(lat, lon, era) %>%
#   mutate(cleaning_level = "tco2_f_flag")
# 
# GLODAP_obs_grid <-
#   bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)
# 
# rm(GLODAP_obs_grid_temp)

3.4.1.3 qc flag

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, tco2qc)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ tco2qc) +
  theme(legend.position = "top")

###

GLODAP <- GLODAP %>%
  filter(tco2qc %in% parameters$flag_qc)

##

GLODAP_stats_temp <- GLODAP %>%
  summarise(tco2_qc_flag = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era) %>%
  mutate(cleaning_level = "tco2_qc_flag")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.2 talk

3.4.2.1 NA

Quite a few tco2 observations are discarded, due to missing talk data, in particular in the JGOFS/WOCE era. However, there seems to be a high number of observations remaining from the affected cruises, so that the coverage does not seem to be reduced drastically.

GLODAP <- GLODAP %>% 
  mutate(talkna = if_else(is.na(talk), "NA", "Value"))

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, talkna)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ talkna) +
  theme(legend.position = "top")

GLODAP <- GLODAP %>% 
  select(-talkna) %>% 
  filter(!is.na(talk))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(talk_values = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "talk_values")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.2.2 f flag

Restricting the f flag to 2 would results in data gaps in the south-east Pacific. Interpolated or calculated data (f flag 0) are therefore included.

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, talkf)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ talkf) +
  theme(legend.position = "top",
        legend.title = element_blank())

# ###

GLODAP <- GLODAP %>%
  filter(talkf %in% parameters$flag_f)

##
# 
# GLODAP_stats_temp <- GLODAP %>%
#   summarise(talk_f_flag = n())
# 
# GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
# rm(GLODAP_stats_temp)
# 
# ##
# 
# GLODAP_obs_grid_temp <- GLODAP %>%
#   count(lat, lon, era) %>%
#   mutate(cleaning_level = "talk_f_flag")
# 
# GLODAP_obs_grid <-
#   bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)
# 
# rm(GLODAP_obs_grid_temp)

3.4.2.3 qc flag

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, talkqc)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ talkqc) +
  theme(legend.position = "top",
        legend.title = element_blank())

###

GLODAP <- GLODAP %>%
  filter(talkqc %in% parameters$flag_qc)

##

GLODAP_stats_temp <- GLODAP %>%
  summarise(talk_qc_flag = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era) %>%
  mutate(cleaning_level = "talk_qc_flag")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.3 Phosphate

3.4.3.1 NA

Quite a few tco2/talk observations are discarded, due to missing phosphate data. However, there seems to be a high number of observations remaining from the affected cruises, so that the coverage does not seem to be reduced drastically.

GLODAP <- GLODAP %>% 
  mutate(phosphatena = if_else(is.na(phosphate), "NA", "Value"))

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, phosphatena)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era ~ phosphatena) +
  theme(legend.position = "top")

GLODAP <- GLODAP %>% 
  select(-phosphatena) %>% 
  filter(!is.na(phosphate))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(phosphate_values = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "phosphate_values")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.3.2 f flag

Restricting the f flag to 2 would not drastically reduce available data. Interpolated or calculated data (f flag 0) are therefore included.

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, phosphatef)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
    scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era~phosphatef) +
  theme(legend.position = "top",
        legend.title = element_blank())

###

GLODAP <- GLODAP %>%
  filter(phosphatef %in% parameters$flag_f)

# ##
# 
# GLODAP_stats_temp <- GLODAP %>%
#   summarise(phosphate_f_flag = n())
# 
# GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
# rm(GLODAP_stats_temp)
# 
# ##
# 
# GLODAP_obs_grid_temp <- GLODAP %>%
#   count(lat, lon, era) %>%
#   mutate(cleaning_level = "phosphate_f_flag")
# 
# GLODAP_obs_grid <-
#   bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)
# 
# rm(GLODAP_obs_grid_temp)

3.4.3.3 qc flag

Phosphate data for which secondary quality was not applied (qc flag 0) were removed.

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era, phosphateqc)

map +
  geom_raster(data = GLODAP_obs_grid_temp, aes(lon, lat, fill = n)) +
    scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       trans = "log10") +
  facet_grid(era~phosphateqc) +
  theme(legend.position = "top",
        legend.title = element_blank())

###

GLODAP <- GLODAP %>%
  filter(phosphateqc %in% parameters$flag_qc)

##

GLODAP_stats_temp <- GLODAP %>%
  summarise(phosphate_qc_flag = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)
rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>%
  count(lat, lon, era) %>%
  mutate(cleaning_level = "phosphate_qc_flag")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.4 eMLR variables

Rows with missing eMLR variables were removed, only the qc flag was considered.

GLODAP <- GLODAP %>% 
  filter(!is.na(tem))

##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(sal))

GLODAP <- GLODAP %>%
  filter(salinityf %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(salinityqc %in% parameters$flag_qc)

##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(silicate))

GLODAP <- GLODAP %>%
  filter(silicatef %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(silicateqc %in% parameters$flag_qc)
##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(oxygen))

GLODAP <- GLODAP %>%
  filter(oxygenf %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(oxygenqc %in% parameters$flag_qc)
##

GLODAP <- GLODAP %>% 
  filter(!is.na(aou))

GLODAP <- GLODAP %>%
  filter(aouf %in% parameters$flag_f)
##
  
GLODAP <- GLODAP %>% 
  filter(!is.na(nitrate))

GLODAP <- GLODAP %>%
  filter(nitratef %in% parameters$flag_f)

GLODAP <- GLODAP %>%
  filter(nitrateqc %in% parameters$flag_qc)
##

GLODAP <- GLODAP %>% 
  filter(!is.na(depth))

GLODAP <- GLODAP %>% 
  filter(!is.na(gamma))

##

GLODAP_stats_temp <- GLODAP %>% 
  summarise(eMLR_variables = n())

GLODAP_stats <- cbind(GLODAP_stats, GLODAP_stats_temp)

rm(GLODAP_stats_temp)

##

GLODAP_obs_grid_temp <- GLODAP %>% 
  count(lat, lon, era) %>% 
  mutate(cleaning_level = "eMLR_variables")

GLODAP_obs_grid <-
  bind_rows(GLODAP_obs_grid, GLODAP_obs_grid_temp)

rm(GLODAP_obs_grid_temp)

3.4.5 Manual adjustment A16 cruise

For harmonization with Gruber et al. (2019), cruises 1041 (A16N) and 1042 (A16S) were grouped into the GO_SHIP area despite taking place in 2013/14.

GLODAP_cruises <- GLODAP %>% 
  filter(basin == "Atlantic",
         year %in% c(2013, 2014)) %>% 
  count(lat, lon, cruise)

map +
  geom_raster(data = GLODAP_cruises, aes(lon, lat, fill = as.factor(cruise))) +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.position = "top",
        legend.title = element_blank())

rm(GLODAP_cruises)

GLODAP <- GLODAP %>%
  mutate(era = as.character(era)) %>% 
  mutate(era = if_else(cruise %in% c(1041, 1042),
                      "GO_SHIP", era))

GLODAP <- GLODAP %>% 
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")))

3.5 Create clean observations grid

GLODAP_obs_grid_clean <- GLODAP %>% 
  count(lat, lon) %>% 
  select(-n)

3.6 Write summary file

GLODAP_stats_long <- GLODAP_stats %>%
  pivot_longer(1:length(GLODAP_stats),
               names_to = "parameter",
               values_to = "n")

GLODAP_stats_long  %>%  write_csv(
  here::here(
    "data/interim",
    "GLODAPv2.2020_stats.csv"
  )
)

##

GLODAP_obs_grid_clean %>% 
  write_csv(
    here::here(
      "data/interim",
      "GLODAPv2.2020_clean_obs_grid.csv"
    )
  )

##

GLODAP <- GLODAP %>% 
  select(year, date, era, basin, lat, lon, cruise,
         bottomdepth, depth, tem, sal, gamma,
         tco2, talk, phosphate,
         oxygen, aou, nitrate, silicate)

GLODAP  %>%  write_csv(
  here::here(
    "data/interim",
    "GLODAPv2.2020_clean.csv"
  )
)

4 Overview plots

4.1 Cleaning stats

Number of observations at various steps of data cleaning.

GLODAP_stats_long <- GLODAP_stats_long %>%
  mutate(parameter = fct_reorder(parameter, n))

GLODAP_stats_long %>% 
  ggplot(aes(parameter, n/1000)) +
  geom_col() +
  coord_flip() +
  labs(y = "n (1000)") +
  theme(axis.title.y = element_blank())

rm(GLODAP_stats_long)

4.2 Assign coarse spatial grid

For the following plots, the cleaned data set was re-opened and observations were gridded spatially to intervals of:

5° x 5°

GLODAP <- GLODAP %>% 
  mutate(lat_grid = cut(lat, seq(-90, 90, 5), seq(-87.5, 87.5, 5)),
         lat_grid = as.numeric(as.character(lat_grid)),
         lon_grid = cut(lon, seq(20, 380, 5), seq(22.5, 377.5, 5)),
         lon_grid = as.numeric(as.character(lon_grid)))

4.3 Histogram Zonal coverage

GLODAP_histogram_lat <- GLODAP %>%
  group_by(era, lat_grid, basin) %>%
  tally() %>%
  ungroup()

GLODAP_histogram_lat %>%
  ggplot(aes(lat_grid, n, fill = era)) +
  geom_col() +
  scale_fill_brewer(palette = "Dark2") +
  facet_wrap( ~ basin) +
  coord_flip(expand = 0) +
  theme(legend.position = "top",
        legend.title = element_blank())

rm(GLODAP_histogram_lat)

4.4 Histogram temporal coverage

GLODAP_histogram_year <- GLODAP %>%
  group_by(year, basin) %>%
  tally() %>%
  ungroup()

era_median_year <- GLODAP %>%
  group_by(era) %>%
  summarise(t_ref = median(year)) %>%
  ungroup()

GLODAP_histogram_year %>%
  ggplot() +
  geom_vline(xintercept = c(
    parameters$year_JGOFS_end + 0.5,
    parameters$year_GOSHIP_end + 0.5
  )) +
  geom_col(aes(year, n, fill = basin)) +
  geom_point(
    data = era_median_year,
    aes(t_ref, 0, shape = "Median year"),
    size = 2,
    fill = "white"
  ) +
  scale_fill_brewer(palette = "Dark2") +
  scale_shape_manual(values = 24, name = "") +
  scale_y_continuous() +
  coord_cartesian(expand = 0) +
  theme(
    legend.position = "top",
    legend.direction = "vertical",
    legend.title = element_blank(),
    axis.title.x = element_blank()
  )

rm(GLODAP_histogram_year,
   era_median_year)

4.5 Zonal temporal coverage (Hovmoeller)

GLODAP_hovmoeller_year <- GLODAP %>%
  group_by(year, lat_grid, basin) %>%
  tally() %>%
  ungroup()

GLODAP_hovmoeller_year %>%
  ggplot(aes(year, lat_grid, fill = log10(n))) +
  geom_tile() +
  geom_vline(xintercept = c(1999.5, 2012.5)) +
  scale_fill_viridis_c(option = "magma", direction = -1) +
  facet_wrap( ~ basin, ncol = 1) +
  theme(legend.position = "top",
        axis.title.x = element_blank())

rm(GLODAP_hovmoeller_year)

4.6 Coverage maps by era

4.6.1 Subsetting process

GLODAP_obs_grid <- GLODAP_obs_grid %>%
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")),
         cleaning_level = factor(
           cleaning_level,
           unique(GLODAP_obs_grid$cleaning_level)
         ))

map +
  geom_raster(data = GLODAP_obs_grid %>%
                filter(cleaning_level == "all") %>% 
                select(-cleaning_level),
              aes(lon, lat, fill = "all")) +
  geom_raster(data = GLODAP_obs_grid %>%
                filter(cleaning_level != "all"),
              aes(lon, lat, fill = "subset")) +
  scale_fill_manual(values = c("grey", "red"), name = "") +
  facet_grid(cleaning_level ~ era) +
  theme(legend.position = "top",
        axis.title = element_blank())

4.6.2 eMLR variables

Grey pixels refer to sampling locations filtered for availability of tco2 data only.

GLODAP <- GLODAP %>% 
  mutate(era = factor(era, c("JGOFS_WOCE", "GO_SHIP", "new_era")))

GLODAP_tco2_grid <- GLODAP %>%
  count(lat, lon, era)

map +
  geom_raster(data = GLODAP_tco2_grid, aes(lon, lat), fill = "grey80") +
  geom_bin2d(data = GLODAP,
             aes(lon, lat),
             binwidth = c(1,1)) +
  scale_fill_viridis_c(option = "magma", direction = -1, trans = "log10",
                       name = "log10(eMLR_variables)") +
  facet_wrap(~era, ncol = 1) +
  theme(legend.position = "top",
        axis.title = element_blank())

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Germany.1252  LC_CTYPE=English_Germany.1252   
[3] LC_MONETARY=English_Germany.1252 LC_NUMERIC=C                    
[5] LC_TIME=English_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cutr_0.0.0.9000 patchwork_1.0.1 lubridate_1.7.9 forcats_0.5.0  
 [5] stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4     readr_1.3.1    
 [9] tidyr_1.1.0     tibble_3.0.3    ggplot2_3.3.2   tidyverse_1.3.0
[13] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         here_0.1           assertthat_0.2.1   rprojroot_1.3-2   
 [5] digest_0.6.25      R6_2.4.1           cellranger_1.1.0   backports_1.1.8   
 [9] reprex_0.3.0       evaluate_0.14      httr_1.4.2         pillar_1.4.6      
[13] rlang_0.4.7        readxl_1.3.1       rstudioapi_0.11    whisker_0.4       
[17] blob_1.2.1         rmarkdown_2.3      labeling_0.3       munsell_0.5.0     
[21] broom_0.7.0        compiler_4.0.2     httpuv_1.5.4       modelr_0.1.8      
[25] xfun_0.16          pkgconfig_2.0.3    htmltools_0.5.0    tidyselect_1.1.0  
[29] viridisLite_0.3.0  fansi_0.4.1        crayon_1.3.4       dbplyr_1.4.4      
[33] withr_2.2.0        later_1.1.0.1      grid_4.0.2         jsonlite_1.7.0    
[37] gtable_0.3.0       lifecycle_0.2.0    DBI_1.1.0          git2r_0.27.1      
[41] magrittr_1.5       scales_1.1.1       cli_2.0.2          stringi_1.4.6     
[45] farver_2.0.3       fs_1.4.2           promises_1.1.1     xml2_1.3.2        
[49] ellipsis_0.3.1     generics_0.0.2     vctrs_0.3.2        RColorBrewer_1.1-2
[53] tools_4.0.2        glue_1.4.1         hms_0.5.3          yaml_2.2.1        
[57] colorspace_1.4-1   rvest_0.3.6        knitr_1.30         haven_2.3.1

GLODAPv2_2020

Jens Daniel Müller

17 November, 2020