Last updated: 2021-03-16

Checks: 7 0

Knit directory: emlr_obs_v_XXX/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200707) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a9930ea. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   code/Workflowr_project_managment.R
    Modified:   data/auxillary/params_local.rds

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/eMLR_model_fitting.Rmd) and HTML (docs/eMLR_model_fitting.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html a1d52ff jens-daniel-mueller 2021-03-15 Build site.
html 0bade3b jens-daniel-mueller 2021-03-15 Build site.
html 27c1f4b jens-daniel-mueller 2021-03-14 Build site.
html af75ebf jens-daniel-mueller 2021-03-14 Build site.
html 5017709 jens-daniel-mueller 2021-03-11 Build site.
html 585b07f jens-daniel-mueller 2021-03-11 Build site.
html 6482ed7 jens-daniel-mueller 2021-03-11 Build site.
html 85a5ed2 jens-daniel-mueller 2021-03-10 Build site.
html 00688a1 jens-daniel-mueller 2021-03-05 Build site.
html 6c0bec6 jens-daniel-mueller 2021-03-05 Build site.
html 3c2ec33 jens-daniel-mueller 2021-03-05 Build site.
html af70b94 jens-daniel-mueller 2021-03-04 Build site.
Rmd c9cf1fd jens-daniel-mueller 2021-03-04 rebuild with NA in Cant replaced by 0
html 27ae473 jens-daniel-mueller 2021-02-24 Build site.
Rmd 7f77d91 jens-daniel-mueller 2021-02-24 removed log10 color scale
html fec3558 jens-daniel-mueller 2021-02-24 Build site.
Rmd 9ebedac jens-daniel-mueller 2021-02-24 latitudinal residual plots
html 4bc00ea jens-daniel-mueller 2021-02-24 Build site.
Rmd de11bfe jens-daniel-mueller 2021-02-24 clean up purrr approach and residual plots
html 42eca5d jens-daniel-mueller 2021-02-24 Build site.
Rmd 06a2f3b jens-daniel-mueller 2021-02-24 purrr residual plots by basin
html a1ba577 jens-daniel-mueller 2021-02-24 Build site.
Rmd 9ae7d87 jens-daniel-mueller 2021-02-24 loop residual plots by basin
html 071743d jens-daniel-mueller 2021-02-24 Build site.
Rmd c45672c jens-daniel-mueller 2021-02-24 added residual plots
html ac1a836 jens-daniel-mueller 2021-02-24 Build site.
Rmd 5f655e0 jens-daniel-mueller 2021-02-24 added plots back to after switching to map aproach
html b03fbd3 jens-daniel-mueller 2021-02-24 Build site.
Rmd c69736b jens-daniel-mueller 2021-02-24 added plots back to after switching to map aproach
html 86406d5 jens-daniel-mueller 2021-02-24 Build site.
Rmd 1b3c171 jens-daniel-mueller 2021-02-24 introduced purrr::map to model fitting, rebuild all
html 3d3b4cc jens-daniel-mueller 2021-02-23 Build site.
Rmd cbfc388 jens-daniel-mueller 2021-02-23 introduced purrr::map to model fitting
html 7b672f7 jens-daniel-mueller 2021-01-11 Build site.
html 33ba23c jens-daniel-mueller 2021-01-07 Build site.
Rmd 0ad30ba jens-daniel-mueller 2021-01-07 removed GLODAP gamma filter, target variable mapped by eras+era
html 318609d jens-daniel-mueller 2020-12-23 adapted more variable predictor selection
html 9d0b2d0 jens-daniel-mueller 2020-12-23 Build site.
html 0aa2b50 jens-daniel-mueller 2020-12-23 remove html before duplication
html 39113c3 jens-daniel-mueller 2020-12-23 Build site.
Rmd bef9220 jens-daniel-mueller 2020-12-23 rebuild before sensitivity test
html 2886da0 jens-daniel-mueller 2020-12-19 Build site.
html 02f0ee9 jens-daniel-mueller 2020-12-18 cleaned up for copying template
html 965dba3 jens-daniel-mueller 2020-12-18 Build site.
html 5d452fe jens-daniel-mueller 2020-12-18 Build site.
Rmd ca65bf5 jens-daniel-mueller 2020-12-18 rebuild after final cleaning
html 7bcb4eb jens-daniel-mueller 2020-12-18 Build site.
html d397028 jens-daniel-mueller 2020-12-18 Build site.
Rmd 7e1b1c0 jens-daniel-mueller 2020-12-18 rebuild without na predictors
html 7131186 jens-daniel-mueller 2020-12-17 Build site.
Rmd 737d904 jens-daniel-mueller 2020-12-17 rebuild without na predictors
html 22b07fb jens-daniel-mueller 2020-12-17 Build site.
html a84ff3c jens-daniel-mueller 2020-12-17 Build site.
Rmd 40369db jens-daniel-mueller 2020-12-17 model selection criterion added
html 5b48ef5 jens-daniel-mueller 2020-12-17 Build site.
Rmd e6ed2bc jens-daniel-mueller 2020-12-17 plotted model results
html f3a708f jens-daniel-mueller 2020-12-17 Build site.
Rmd 7c8ace9 jens-daniel-mueller 2020-12-17 new MLR fitting routine, rmse corrected
html e4ca289 jens-daniel-mueller 2020-12-16 Build site.
Rmd 3d5a3e2 jens-daniel-mueller 2020-12-16 new MLR fitting routine
html 158fe26 jens-daniel-mueller 2020-12-15 Build site.
html 7a9a4cb jens-daniel-mueller 2020-12-15 Build site.
html 61b263c jens-daniel-mueller 2020-12-15 Build site.
html 4d612dd jens-daniel-mueller 2020-12-15 Build site.
html e91cebd jens-daniel-mueller 2020-12-15 Build site.
Rmd d7992c4 jens-daniel-mueller 2020-12-15 eMLR target variable selection
html 953caf3 jens-daniel-mueller 2020-12-15 Build site.
html 42daf5c jens-daniel-mueller 2020-12-14 Build site.
Rmd 923aa7f jens-daniel-mueller 2020-12-14 rebuild with new path and auto folder creation
html 984697e jens-daniel-mueller 2020-12-12 Build site.
html 3ebff89 jens-daniel-mueller 2020-12-12 Build site.
html ba112d3 jens-daniel-mueller 2020-12-11 Build site.
Rmd 91b2b29 jens-daniel-mueller 2020-12-11 selectable basinmask, try 5
html b01a367 jens-daniel-mueller 2020-12-09 Build site.
Rmd 71c63b0 jens-daniel-mueller 2020-12-09 rerun with variable predictor assignment
html 24a632f jens-daniel-mueller 2020-12-07 Build site.
html 92dca91 jens-daniel-mueller 2020-12-07 Build site.
html 6a8004b jens-daniel-mueller 2020-12-07 Build site.
html 70bf1a5 jens-daniel-mueller 2020-12-07 Build site.
html 7555355 jens-daniel-mueller 2020-12-07 Build site.
html 143d6fa jens-daniel-mueller 2020-12-07 Build site.
html abc6818 jens-daniel-mueller 2020-12-03 Build site.
Rmd 992ba15 jens-daniel-mueller 2020-12-03 rebuild with variable inventory depth
html c8c2e7b jens-daniel-mueller 2020-12-03 Build site.
Rmd 83203db jens-daniel-mueller 2020-12-03 calculate cant with variable inventory depth
html 090e4d5 jens-daniel-mueller 2020-12-02 Build site.
html 7c25f7a jens-daniel-mueller 2020-12-02 Build site.
html ec8dc38 jens-daniel-mueller 2020-12-02 Build site.
html c987de1 jens-daniel-mueller 2020-12-02 Build site.
html f8358f8 jens-daniel-mueller 2020-12-02 Build site.
html b03ddb8 jens-daniel-mueller 2020-12-02 Build site.
Rmd 9183e8f jens-daniel-mueller 2020-12-02 revised assignment of era to eras
html 22d0127 jens-daniel-mueller 2020-12-01 Build site.
html 0ff728b jens-daniel-mueller 2020-12-01 Build site.
html 91435ae jens-daniel-mueller 2020-12-01 Build site.
Rmd 17d09be jens-daniel-mueller 2020-12-01 auto eras naming
html cf19652 jens-daniel-mueller 2020-11-30 Build site.
Rmd 0895ad6 jens-daniel-mueller 2020-11-30 rebuild with all plot output
Rmd 2842970 jens-daniel-mueller 2020-11-30 cleaned for eMLR part only
html 196be51 jens-daniel-mueller 2020-11-30 Build site.
Rmd 7a4b015 jens-daniel-mueller 2020-11-30 first rebuild on ETH server
Rmd bc61ce3 Jens Müller 2020-11-30 Initial commit

1 Required data

Required are:

  • cleaned and prepared GLODAPv2_2020 file
GLODAP <-
  read_csv(paste(path_version_data,
                 "GLODAPv2.2020_MLR_fitting_ready.csv",
                 sep = ""))

2 Predictor combinations

Find all possible combinations of following considered predictor variables:

  • sal, temp, aou, nitrate, silicate, phosphate, phosphate_star
# the following code is a workaround to find all predictor combinations
# using the olsrr package and fit all models for one era, slab, and basin

i_basin <- unique(GLODAP$basin)[1]
i_era   <- unique(GLODAP$era)[1]

# subset one basin and era for fitting
GLODAP_basin_era <- GLODAP %>%
  filter(basin == i_basin, era == i_era)

i_gamma_slab <- unique(GLODAP_basin_era$gamma_slab)[1]
print(i_gamma_slab)

# subset one gamma slab
GLODAP_basin_era_slab <- GLODAP_basin_era %>%
  filter(gamma_slab == i_gamma_slab)

# fit the full linear model, i.e. all predictor combinations
lm_full <- lm(paste(
  params_local$MLR_target,
  paste(params_local$MLR_predictors, collapse = " + "),
  sep = " ~ "
),
data = GLODAP_basin_era_slab)

# fit linear models for all possible predictor combinations
# unfortunately, this functions does not provide model coefficients (yet)
lm_all <- ols_step_all_possible(lm_full)

# convert to tibble
lm_all <- as_tibble(lm_all)

# format model formula
lm_all <- lm_all %>% 
  select(n, predictors) %>% 
  mutate(model = str_replace_all(predictors, " ", " + "),
         model = paste(params_local$MLR_target, "~", model))

# remove helper objects
rm(i_gamma_slab,
   i_era,
   i_basin,
   GLODAP_basin_era,
   GLODAP_basin_era_slab,
   lm_full)

3 Apply predictor threshold

Select combinations with a total number of predictors in the range:

  • Minimum: 2
  • Maximum: 5
lm_all <- lm_all %>% 
  filter(n >= params_local$MLR_predictors_min,
         n <= params_local$MLR_predictors_max)

This results in a total number of MLR models of:

  • 112

4 Fit all models

Individual linear regression models were fitted for the chosen target variable:

  • cstar_tref

as a function of each predictor combination. Fitting was performed separately within each basin, era, and slab. Model diagnostics, such as the root mean squared error (RMSE), were calculated for each fitted model.

# GLODAP_NA <- GLODAP %>% 
#   filter_all(any_vars(is.na(.)))

# prepare nested data frame

GLODAP %>% 
  # filter(basin %in% unique(GLODAP$basin)[1],
  #        era %in% unique(GLODAP$era)[c(1,2)],
  #        gamma_slab %in% unique(GLODAP$gamma_slab)[c(5,6)]) %>%
  filter_all(any_vars(is.na(.)))

GLODAP_nested <- GLODAP %>% 
  # filter(basin %in% unique(GLODAP$basin)[1],
  #        era %in% unique(GLODAP$era)[c(1,2)],
  #        gamma_slab %in% unique(GLODAP$gamma_slab)[c(5,6)]) %>%
  drop_na() %>% 
  group_by(gamma_slab, era, basin) %>% 
  nest()

# expand with model definitions

GLODAP_nested_lm <- expand_grid(
  GLODAP_nested,
  lm_all
)

# fit models and extract tidy model output

GLODAP_nested_lm_fit <- GLODAP_nested_lm %>% 
  mutate(
    fit = map2(.x = data, .y = model, 
               ~ lm(as.formula(.y), data = .x)),
    tidied = map(fit, tidy),
    glanced = map(fit, glance),
    augmented = map(fit, augment)
  )

# print(object.size(GLODAP_nested), units = "MB")
# print(object.size(GLODAP_nested_lm), units = "MB")
# print(object.size(GLODAP_nested_lm_fit), units = "MB")

5 Tidy models

# extract glanced model output (model diagnostics, such as AIC)
GLODAP_glanced <- GLODAP_nested_lm_fit %>%
  select(-c(data, fit, tidied, augmented)) %>%
  unnest(glanced) %>% 
  rename(n_predictors = n)

# extract tidy model output (model coefficients)
GLODAP_tidy <- GLODAP_nested_lm_fit %>%
  select(-c(data, fit, glanced, augmented)) %>%
  unnest(tidied)

# extract augmented model output (fitted values and residuals)
GLODAP_augmented <- GLODAP_nested_lm_fit %>% 
  select(-c(data, fit, tidied, glanced)) %>% 
  unnest(augmented) 

# print(object.size(GLODAP_augmented), units = "MB")

# calculate RMSE from augmented output
GLODAP_glanced_rmse <- GLODAP_augmented %>%
  group_by(gamma_slab, era, basin, model) %>%
  summarise(rmse = sqrt(c(crossprod(.resid)) / length(.resid))) %>%
  ungroup()

# add RMSE to glanced output
GLODAP_glanced <- full_join(GLODAP_glanced, GLODAP_glanced_rmse)
rm(GLODAP_glanced_rmse)


# extract input data
GLODAP_data <- GLODAP_nested_lm_fit %>%
  select(-c(fit, tidied, glanced, augmented)) %>%
  unnest(data)

# append input data with augmented data
GLODAP_augmented <- bind_cols(
  GLODAP_data,
  GLODAP_augmented %>% select(.fitted, .resid)
)

rm(GLODAP, GLODAP_nested, GLODAP_nested_lm, GLODAP_nested_lm_fit, lm_all,
   GLODAP_data)

6 Prepare coeffcients

Coefficients are prepared for the mapping of Cant and the chosen target variable.

6.1 Predictor selection

Within each basin and slab, the following number of best linear regression models was selected:

  • 10

The criterion used to select the best models was:

  • rmse

The criterion was summed up for two adjacent eras, and the models with lowest summed values were selected.

# calculate RMSE sum for adjacent eras
lm_all_eras <- GLODAP_glanced  %>%
  select(basin, gamma_slab, model, era, AIC, rmse) %>% 
  arrange(era) %>% 
  group_by(basin, gamma_slab, model) %>% 
  mutate(eras = paste(lag(era), era, sep = " --> "),
         rmse_sum = rmse + lag(rmse),
         aic_sum = AIC + lag(AIC)
         ) %>% 
  ungroup() %>% 
  select(-c(era)) %>% 
  drop_na()

# subset models with lowest summed criterion
# chose which criterion is applied

if (params_local$MLR_criterion == "aic") {
  lm_best_eras <- lm_all_eras %>%
    group_by(basin, gamma_slab, eras) %>%
    slice_min(order_by = aic_sum,
              with_ties = FALSE,
              n = params_local$MLR_number) %>%
    ungroup() %>%
    arrange(basin, gamma_slab, eras, model)
} else {
  lm_best_eras <- lm_all_eras %>%
    group_by(basin, gamma_slab, eras) %>%
    slice_min(order_by = rmse_sum,
              with_ties = FALSE,
              n = params_local$MLR_number) %>%
    ungroup() %>%
    arrange(basin, gamma_slab, eras, model)
}


# print table
lm_best_eras %>% 
  kable() %>%
  add_header_above() %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "400px")
basin gamma_slab model AIC rmse eras rmse_sum aic_sum
Atlantic (-Inf,26] cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star 267.5869 4.617007 1982-1999 –> 2000-2010 10.432763 834.8215
Atlantic (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate 267.5626 4.724301 1982-1999 –> 2000-2010 11.166004 850.5839
Atlantic (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate 269.5573 4.724009 1982-1999 –> 2000-2010 10.518846 836.1649
Atlantic (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate_star 269.5441 4.723284 1982-1999 –> 2000-2010 10.514506 836.0431
Atlantic (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + silicate 267.7446 4.625478 1982-1999 –> 2000-2010 11.030375 851.7688
Atlantic (-Inf,26] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 270.6510 4.784469 1982-1999 –> 2000-2010 10.566577 836.8760
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + phosphate + phosphate_star 278.1825 5.222368 1982-1999 –> 2000-2010 11.011219 844.6103
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + silicate + phosphate 277.4918 5.180593 1982-1999 –> 2000-2010 11.131456 848.7224
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + silicate + phosphate_star 277.4106 5.175705 1982-1999 –> 2000-2010 11.106395 848.0504
Atlantic (-Inf,26] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 277.2371 5.165272 1982-1999 –> 2000-2010 11.058904 846.7862
Atlantic (-Inf,26] cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star 413.7789 3.038329 2000-2010 –> 2011-2019 7.655337 681.3659
Atlantic (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate 411.9823 3.003976 2000-2010 –> 2011-2019 7.727984 681.5396
Atlantic (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate_star 411.8147 3.000791 2000-2010 –> 2011-2019 7.724075 681.3588
Atlantic (-Inf,26] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 411.4512 2.993895 2000-2010 –> 2011-2019 7.778364 682.1021
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + phosphate + phosphate_star 411.1104 2.987445 2000-2010 –> 2011-2019 8.209813 689.2929
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + phosphate_star 421.9721 3.240804 2000-2010 –> 2011-2019 8.479657 698.4256
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + silicate + phosphate 424.4422 3.250461 2000-2010 –> 2011-2019 8.431054 701.9340
Atlantic (-Inf,26] cstar_tref ~ temp + aou + nitrate + silicate + phosphate_star 423.6359 3.233916 2000-2010 –> 2011-2019 8.409622 701.0465
Atlantic (-Inf,26] cstar_tref ~ temp + nitrate + phosphate + phosphate_star 419.8646 3.197865 2000-2010 –> 2011-2019 8.424452 696.1166
Atlantic (-Inf,26] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 421.6980 3.194494 2000-2010 –> 2011-2019 8.359765 698.9351
Atlantic (26,28.2] cstar_tref ~ sal + temp + aou + nitrate + phosphate_star 120580.0635 6.785221 1982-1999 –> 2000-2010 13.268167 245884.3478
Atlantic (26,28.2] cstar_tref ~ sal + temp + aou + phosphate + phosphate_star 120420.9548 6.755435 1982-1999 –> 2000-2010 13.441361 246899.9695
Atlantic (26,28.2] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 121989.6994 7.054908 1982-1999 –> 2000-2010 13.675127 248092.3827
Atlantic (26,28.2] cstar_tref ~ temp + aou + nitrate + phosphate + phosphate_star 121374.2404 6.935867 1982-1999 –> 2000-2010 13.425131 246715.6369
Atlantic (26,28.2] cstar_tref ~ temp + aou + nitrate + phosphate_star 121510.0911 6.962354 1982-1999 –> 2000-2010 13.465526 246931.0655
Atlantic (26,28.2] cstar_tref ~ temp + aou + nitrate + silicate + phosphate_star 120062.2924 6.688772 1982-1999 –> 2000-2010 13.040116 244585.1150
Atlantic (26,28.2] cstar_tref ~ temp + aou + phosphate + phosphate_star 122644.7471 7.184250 1982-1999 –> 2000-2010 14.000637 249858.1158
Atlantic (26,28.2] cstar_tref ~ temp + aou + silicate + phosphate + phosphate_star 119221.9222 6.535140 1982-1999 –> 2000-2010 13.008782 244471.4741
Atlantic (26,28.2] cstar_tref ~ temp + nitrate + phosphate + phosphate_star 122328.5998 7.121722 1982-1999 –> 2000-2010 13.741999 248429.6194
Atlantic (26,28.2] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 121953.2421 7.047800 1982-1999 –> 2000-2010 13.634594 247863.0572
Atlantic (26,28.2] cstar_tref ~ sal + temp + aou + nitrate + phosphate_star 77347.1983 7.201985 2000-2010 –> 2011-2019 13.987206 197927.2618
Atlantic (26,28.2] cstar_tref ~ sal + temp + aou + phosphate + phosphate_star 77388.8688 7.215166 2000-2010 –> 2011-2019 13.970601 197809.8236
Atlantic (26,28.2] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 77651.6470 7.298841 2000-2010 –> 2011-2019 14.353749 199641.3464
Atlantic (26,28.2] cstar_tref ~ temp + aou + nitrate + phosphate + phosphate_star 77441.1793 7.231746 2000-2010 –> 2011-2019 14.167613 198815.4197
Atlantic (26,28.2] cstar_tref ~ temp + aou + nitrate + phosphate_star 77446.8829 7.234191 2000-2010 –> 2011-2019 14.196545 198956.9740
Atlantic (26,28.2] cstar_tref ~ temp + aou + nitrate + silicate + phosphate_star 76391.2177 6.906129 2000-2010 –> 2011-2019 13.594901 196453.5101
Atlantic (26,28.2] cstar_tref ~ temp + aou + phosphate + phosphate_star 77778.7019 7.340290 2000-2010 –> 2011-2019 14.524540 200423.4490
Atlantic (26,28.2] cstar_tref ~ temp + aou + silicate + phosphate + phosphate_star 76056.8300 6.805538 2000-2010 –> 2011-2019 13.340679 195278.7522
Atlantic (26,28.2] cstar_tref ~ temp + nitrate + phosphate + phosphate_star 77653.2335 7.299990 2000-2010 –> 2011-2019 14.421712 199981.8333
Atlantic (26,28.2] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 77272.7160 7.178486 2000-2010 –> 2011-2019 14.226286 199225.9581
Atlantic (28.2, Inf] cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star 10519.6366 3.157172 1982-1999 –> 2000-2010 6.292520 16389.6336
Atlantic (28.2, Inf] cstar_tref ~ sal + aou + phosphate + phosphate_star 10521.1730 3.159903 1982-1999 –> 2000-2010 6.295579 16389.4096
Atlantic (28.2, Inf] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 10522.9117 3.159701 1982-1999 –> 2000-2010 6.238952 16351.6380
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + nitrate + phosphate 10545.4220 3.177140 1982-1999 –> 2000-2010 6.321540 16422.0099
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + phosphate 10549.6632 3.181991 1982-1999 –> 2000-2010 6.326604 16424.4054
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + phosphate + phosphate_star 10500.6954 3.142585 1982-1999 –> 2000-2010 6.276173 16369.4092
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + silicate + phosphate 10548.4942 3.179527 1982-1999 –> 2000-2010 6.232485 16357.6173
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + silicate + phosphate_star 10500.0234 3.142069 1982-1999 –> 2000-2010 6.221305 16328.7391
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star 10509.9349 3.149692 1982-1999 –> 2000-2010 6.213216 16326.9559
Atlantic (28.2, Inf] cstar_tref ~ temp + aou + silicate + phosphate + phosphate_star 10635.7966 3.248124 1982-1999 –> 2000-2010 6.335079 16470.2351
Atlantic (28.2, Inf] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 4779.0256 3.161966 2000-2010 –> 2011-2019 6.321667 15301.9372
Atlantic (28.2, Inf] cstar_tref ~ sal + nitrate + silicate + phosphate + phosphate_star 4778.0811 3.160355 2000-2010 –> 2011-2019 6.322880 15304.6458
Atlantic (28.2, Inf] cstar_tref ~ sal + silicate + phosphate + phosphate_star 4777.3814 3.162573 2000-2010 –> 2011-2019 6.327534 15305.0955
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + nitrate + phosphate 4781.1544 3.165599 2000-2010 –> 2011-2019 6.342738 15326.5764
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + phosphate + phosphate_star 4786.4697 3.174687 2000-2010 –> 2011-2019 6.317272 15287.1651
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + silicate + phosphate 4762.7300 3.134296 2000-2010 –> 2011-2019 6.313823 15311.2242
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + aou + silicate + phosphate_star 4765.1005 3.138306 2000-2010 –> 2011-2019 6.280374 15265.1239
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 4794.9762 3.189287 2000-2010 –> 2011-2019 6.337626 15303.1537
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + phosphate + phosphate_star 4803.1111 3.206769 2000-2010 –> 2011-2019 6.356463 15311.0487
Atlantic (28.2, Inf] cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star 4760.5381 3.130592 2000-2010 –> 2011-2019 6.280285 15270.4730
Indo-Pacific (-Inf,26] cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star 22961.4289 6.046591 1982-1999 –> 2000-2010 13.902919 31077.4692
Indo-Pacific (-Inf,26] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 23304.7553 6.344873 1982-1999 –> 2000-2010 14.371333 31470.6712
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate 23126.3147 6.189788 1982-1999 –> 2000-2010 14.037118 31237.6872
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate 22976.8754 6.059705 1982-1999 –> 2000-2010 13.818981 31063.9780
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate_star 22986.7044 6.068064 1982-1999 –> 2000-2010 13.823982 31072.7993
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + silicate 23125.4654 6.187315 1982-1999 –> 2000-2010 13.996297 31227.4337
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + silicate + phosphate 23409.4149 6.438695 1982-1999 –> 2000-2010 14.357436 31543.8765
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + silicate + phosphate_star 23448.2206 6.473833 1982-1999 –> 2000-2010 14.381782 31579.5071
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 23036.5127 6.110602 1982-1999 –> 2000-2010 13.857692 31119.9560
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star 23464.3603 6.488504 1982-1999 –> 2000-2010 14.394854 31595.1761
Indo-Pacific (-Inf,26] cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star 22436.4709 4.606690 2000-2010 –> 2011-2019 10.653282 45397.8998
Indo-Pacific (-Inf,26] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 23616.5043 5.379380 2000-2010 –> 2011-2019 11.724253 46921.2596
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate 22928.4101 4.915612 2000-2010 –> 2011-2019 11.105400 46054.7248
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate 22822.1074 4.846150 2000-2010 –> 2011-2019 10.905855 45798.9827
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + phosphate_star 22850.6972 4.864391 2000-2010 –> 2011-2019 10.932455 45837.4015
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + nitrate + silicate 22924.6377 4.911885 2000-2010 –> 2011-2019 11.099200 46050.1030
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + aou + silicate + phosphate 23924.4800 5.601548 2000-2010 –> 2011-2019 12.040243 47333.8949
Indo-Pacific (-Inf,26] cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star 23042.7490 4.988715 2000-2010 –> 2011-2019 11.099317 46079.2617
Indo-Pacific (-Inf,26] cstar_tref ~ temp + aou + nitrate + phosphate + phosphate_star 23657.5175 5.408450 2000-2010 –> 2011-2019 12.096528 47337.8775
Indo-Pacific (-Inf,26] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 23586.4937 5.358208 2000-2010 –> 2011-2019 12.139975 47366.0404
Indo-Pacific (26,28.1] cstar_tref ~ aou + nitrate + silicate + phosphate + phosphate_star 320662.5479 5.544253 1982-1999 –> 2000-2010 11.041629 497949.9078
Indo-Pacific (26,28.1] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 318684.9601 5.438195 1982-1999 –> 2000-2010 10.912104 495729.5031
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + aou + silicate + phosphate 318036.3068 5.403852 1982-1999 –> 2000-2010 10.980818 496139.5380
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + aou + silicate + phosphate_star 318356.4538 5.420775 1982-1999 –> 2000-2010 11.202949 498510.6970
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + nitrate + silicate + phosphate_star 316662.1484 5.331811 1982-1999 –> 2000-2010 11.008306 495769.4139
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star 318275.5559 5.416494 1982-1999 –> 2000-2010 11.180084 498247.0792
Indo-Pacific (26,28.1] cstar_tref ~ temp + aou + nitrate + silicate + phosphate 318948.8510 5.452230 1982-1999 –> 2000-2010 11.028163 497041.5765
Indo-Pacific (26,28.1] cstar_tref ~ temp + aou + nitrate + silicate + phosphate_star 318518.4088 5.429356 1982-1999 –> 2000-2010 11.159719 498161.7656
Indo-Pacific (26,28.1] cstar_tref ~ temp + aou + silicate + phosphate + phosphate_star 320978.3438 5.561380 1982-1999 –> 2000-2010 11.136750 499065.3388
Indo-Pacific (26,28.1] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 318529.6812 5.429954 1982-1999 –> 2000-2010 11.179025 498358.0383
Indo-Pacific (26,28.1] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 231342.5291 4.644461 2000-2010 –> 2011-2019 10.082656 550027.4892
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + aou + silicate + phosphate 231041.9593 4.626665 2000-2010 –> 2011-2019 10.030517 549078.2661
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + aou + silicate + phosphate_star 231466.1909 4.651802 2000-2010 –> 2011-2019 10.072577 549822.6448
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + nitrate + silicate + phosphate_star 230764.8243 4.610317 2000-2010 –> 2011-2019 9.942128 547426.9727
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star 231407.5854 4.648321 2000-2010 –> 2011-2019 10.064815 549683.1413
Indo-Pacific (26,28.1] cstar_tref ~ sal + temp + silicate + phosphate_star 234070.4204 4.809256 2000-2010 –> 2011-2019 10.490335 557227.1042
Indo-Pacific (26,28.1] cstar_tref ~ temp + aou + nitrate + silicate + phosphate 237194.9602 5.004936 2000-2010 –> 2011-2019 10.457166 556143.8111
Indo-Pacific (26,28.1] cstar_tref ~ temp + aou + nitrate + silicate + phosphate_star 236981.8466 4.991331 2000-2010 –> 2011-2019 10.420688 555500.2554
Indo-Pacific (26,28.1] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 236998.8030 4.992412 2000-2010 –> 2011-2019 10.422367 555528.4842
Indo-Pacific (26,28.1] cstar_tref ~ temp + nitrate + silicate + phosphate_star 238089.7761 5.062595 2000-2010 –> 2011-2019 10.545236 557606.1465
Indo-Pacific (28.1, Inf] cstar_tref ~ aou + nitrate + silicate + phosphate + phosphate_star 39943.1941 3.095167 1982-1999 –> 2000-2010 6.971113 79510.5578
Indo-Pacific (28.1, Inf] cstar_tref ~ aou + silicate + phosphate + phosphate_star 40345.3847 3.176063 1982-1999 –> 2000-2010 7.067894 79969.0720
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 40246.0015 3.155575 1982-1999 –> 2000-2010 7.042739 79854.5771
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + temp + aou + nitrate + phosphate 41607.2888 3.442043 1982-1999 –> 2000-2010 7.244918 80903.2538
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + temp + aou + phosphate + phosphate_star 41894.6797 3.505770 1982-1999 –> 2000-2010 7.317537 81223.9484
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + aou + nitrate + phosphate 41672.9863 3.456949 1982-1999 –> 2000-2010 7.271595 81011.0190
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + aou + nitrate + phosphate + phosphate_star 41468.4418 3.411671 1982-1999 –> 2000-2010 7.223754 80798.8924
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + aou + nitrate + silicate + phosphate 41094.0597 3.331107 1982-1999 –> 2000-2010 7.132780 80385.5160
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + aou + silicate + phosphate + phosphate_star 40145.4433 3.135385 1982-1999 –> 2000-2010 6.944539 79464.9336
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + nitrate + silicate + phosphate + phosphate_star 41556.4978 3.430902 1982-1999 –> 2000-2010 7.294899 81079.8330
Indo-Pacific (28.1, Inf] cstar_tref ~ aou + nitrate + silicate + phosphate + phosphate_star 30183.5032 2.692261 2000-2010 –> 2011-2019 5.787428 70126.6973
Indo-Pacific (28.1, Inf] cstar_tref ~ aou + silicate + phosphate + phosphate_star 30487.3156 2.758821 2000-2010 –> 2011-2019 5.934884 70832.7003
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star 29925.6504 2.637389 2000-2010 –> 2011-2019 6.034616 71327.6277
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + aou + nitrate + silicate + phosphate 30061.0722 2.666066 2000-2010 –> 2011-2019 6.133257 71782.3999
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + aou + phosphate + phosphate_star 29983.4557 2.650015 2000-2010 –> 2011-2019 6.156263 71878.2696
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star 29941.3925 2.640706 2000-2010 –> 2011-2019 5.796282 70187.3941
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + temp + aou + nitrate + phosphate 30059.8164 2.665799 2000-2010 –> 2011-2019 6.107842 71667.1052
Indo-Pacific (28.1, Inf] cstar_tref ~ sal + temp + aou + phosphate + phosphate_star 29843.6875 2.620182 2000-2010 –> 2011-2019 6.125952 71738.3672
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + aou + nitrate + silicate + phosphate 30792.1734 2.826359 2000-2010 –> 2011-2019 6.157466 71886.2331
Indo-Pacific (28.1, Inf] cstar_tref ~ temp + aou + silicate + phosphate + phosphate_star 30136.5138 2.682177 2000-2010 –> 2011-2019 5.817562 70281.9571

6.2 Target variable coefficients

A data frame to map the target variable is prepared.

# create table with two era belonging to one eras
eras_forward <- GLODAP_glanced %>%
  arrange(era) %>% 
  group_by(basin, gamma_slab, model) %>% 
  mutate(eras = paste(era, lead(era), sep = " --> ")) %>% 
  ungroup() %>% 
  select(era, eras) %>% 
  unique()

eras_backward <- GLODAP_glanced %>%
  arrange(era) %>% 
  group_by(basin, gamma_slab, model) %>% 
  mutate(eras = paste(lag(era), era, sep = " --> ")) %>% 
  ungroup() %>% 
  select(era, eras) %>% 
  unique()

eras_era <- full_join(eras_backward, eras_forward) %>% 
  filter(str_detect(eras, "NA") == FALSE)

# extend best model selection from eras to era
lm_best <- full_join(
  lm_best_eras %>% select(basin, gamma_slab, model, eras),
  eras_era)

lm_best <- left_join(
  lm_best,
  GLODAP_tidy %>% select(basin, gamma_slab, era, model, term, estimate))

rm(eras_era, eras_forward, eras_backward)

6.3 Cant coeffcients

A data frame of coefficient offsets is prepared to facilitate the direct mapping of Cant.

# subtract coefficients of adjacent era  
lm_best_cant <- lm_best %>%
  arrange(era) %>%
  group_by(basin, gamma_slab, eras, model, term) %>%
  mutate(delta_coeff = estimate - lag(estimate)) %>%
  ungroup() %>%
  arrange(basin, gamma_slab, model, term, eras) %>%
  drop_na() %>%
  select(-c(era,estimate))

# pivot to wide format
lm_best_cant <- lm_best_cant %>%
  pivot_wider(values_from = delta_coeff,
              names_from = term,
              names_prefix = "delta_coeff_",
              values_fill = 0)

6.4 Write files

# create table of target varaible coefficients in wide format
lm_best_target <- lm_best %>% 
  pivot_wider(names_from = "term",
              names_prefix = "coeff_",
              values_from = "estimate",
              values_fill = 0
              )


lm_best_target %>%
  write_csv(paste(path_version_data,
                  "lm_best_target.csv",
                  sep = ""))

lm_best_cant %>%
  write_csv(paste(path_version_data,
                  "lm_best_cant.csv",
                  sep = ""))

7 Model diagnotics

7.1 Selection criterion vs predictors

The selection criterion (rmse) was plotted against the number of predictors (limited to 2 - 5).

7.1.1 All models

GLODAP_glanced %>%
  ggplot(aes(as.factor(n_predictors),
             !!sym(params_local$MLR_criterion),
             col = basin)) +
  geom_hline(yintercept = c(0,10)) +
  geom_boxplot() +
  facet_grid(gamma_slab~era) +
  scale_color_brewer(palette = "Set1") +
  ylim(c(0,NA)) +
  labs(x="Number of predictors")

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
ac1a836 jens-daniel-mueller 2021-02-24
b03fbd3 jens-daniel-mueller 2021-02-24
3d3b4cc jens-daniel-mueller 2021-02-23
7b672f7 jens-daniel-mueller 2021-01-11
33ba23c jens-daniel-mueller 2021-01-07
318609d jens-daniel-mueller 2020-12-23
9d0b2d0 jens-daniel-mueller 2020-12-23
0aa2b50 jens-daniel-mueller 2020-12-23
2886da0 jens-daniel-mueller 2020-12-19
02f0ee9 jens-daniel-mueller 2020-12-18
7bcb4eb jens-daniel-mueller 2020-12-18
7131186 jens-daniel-mueller 2020-12-17
5b48ef5 jens-daniel-mueller 2020-12-17
f3a708f jens-daniel-mueller 2020-12-17

7.1.2 Best models

left_join(lm_best_target %>% select(basin, gamma_slab, era, model),
          GLODAP_glanced) %>%
  ggplot(aes("",
             !!sym(params_local$MLR_criterion),
             col = basin)) +
  geom_hline(yintercept = c(0, 10)) +
  geom_boxplot() +
  facet_grid(gamma_slab ~ era) +
  scale_color_brewer(palette = "Set1") +
  ylim(c(0, NA)) +
  labs(x = "Number of predictors pooled")

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
ac1a836 jens-daniel-mueller 2021-02-24
b03fbd3 jens-daniel-mueller 2021-02-24
3d3b4cc jens-daniel-mueller 2021-02-23
7b672f7 jens-daniel-mueller 2021-01-11
33ba23c jens-daniel-mueller 2021-01-07
318609d jens-daniel-mueller 2020-12-23
9d0b2d0 jens-daniel-mueller 2020-12-23
0aa2b50 jens-daniel-mueller 2020-12-23
2886da0 jens-daniel-mueller 2020-12-19
02f0ee9 jens-daniel-mueller 2020-12-18
7bcb4eb jens-daniel-mueller 2020-12-18
7131186 jens-daniel-mueller 2020-12-17
5b48ef5 jens-daniel-mueller 2020-12-17
f3a708f jens-daniel-mueller 2020-12-17

7.2 RMSE correlation between eras

RMSE was plotted to compare the agreement for one model applied to two adjecent eras (ie check whether the same predictor combination performs equal in both eras).

7.2.1 All models

# find max rmse to scale axis
max_rmse <-
  max(c(lm_all_eras$rmse,
        lm_all_eras$rmse_sum - lm_all_eras$rmse))

lm_all_eras %>%
  ggplot(aes(rmse, rmse_sum - rmse, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  coord_equal(xlim = c(0, max_rmse),
              ylim = c(0, max_rmse)) +
  geom_abline(slope = 1,
              col = 'red') +
  facet_grid(eras ~ basin)

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
ac1a836 jens-daniel-mueller 2021-02-24
b03fbd3 jens-daniel-mueller 2021-02-24
3d3b4cc jens-daniel-mueller 2021-02-23
7b672f7 jens-daniel-mueller 2021-01-11
33ba23c jens-daniel-mueller 2021-01-07
318609d jens-daniel-mueller 2020-12-23
9d0b2d0 jens-daniel-mueller 2020-12-23
0aa2b50 jens-daniel-mueller 2020-12-23
2886da0 jens-daniel-mueller 2020-12-19
02f0ee9 jens-daniel-mueller 2020-12-18
7bcb4eb jens-daniel-mueller 2020-12-18
7131186 jens-daniel-mueller 2020-12-17
5b48ef5 jens-daniel-mueller 2020-12-17
e4ca289 jens-daniel-mueller 2020-12-16
158fe26 jens-daniel-mueller 2020-12-15
7a9a4cb jens-daniel-mueller 2020-12-15
61b263c jens-daniel-mueller 2020-12-15
984697e jens-daniel-mueller 2020-12-12
3ebff89 jens-daniel-mueller 2020-12-12
ba112d3 jens-daniel-mueller 2020-12-11
24a632f jens-daniel-mueller 2020-12-07
6a8004b jens-daniel-mueller 2020-12-07
70bf1a5 jens-daniel-mueller 2020-12-07
7555355 jens-daniel-mueller 2020-12-07
143d6fa jens-daniel-mueller 2020-12-07
090e4d5 jens-daniel-mueller 2020-12-02
7c25f7a jens-daniel-mueller 2020-12-02
b03ddb8 jens-daniel-mueller 2020-12-02
91435ae jens-daniel-mueller 2020-12-01
196be51 jens-daniel-mueller 2020-11-30
rm(max_rmse)

7.2.2 Best models

# find max rmse to scale axis
max_rmse <-
  max(c(lm_best_eras$rmse,
        lm_best_eras$rmse_sum - lm_best_eras$rmse))

lm_best_eras %>%
  ggplot(aes(rmse, rmse_sum - rmse, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  coord_equal(xlim = c(0, max_rmse),
              ylim = c(0, max_rmse)) +
  geom_abline(slope = 1,
              col = 'red') +
  facet_grid(eras ~ basin)

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
ac1a836 jens-daniel-mueller 2021-02-24
b03fbd3 jens-daniel-mueller 2021-02-24
3d3b4cc jens-daniel-mueller 2021-02-23
7b672f7 jens-daniel-mueller 2021-01-11
33ba23c jens-daniel-mueller 2021-01-07
318609d jens-daniel-mueller 2020-12-23
9d0b2d0 jens-daniel-mueller 2020-12-23
0aa2b50 jens-daniel-mueller 2020-12-23
2886da0 jens-daniel-mueller 2020-12-19
02f0ee9 jens-daniel-mueller 2020-12-18
7bcb4eb jens-daniel-mueller 2020-12-18
7131186 jens-daniel-mueller 2020-12-17
a84ff3c jens-daniel-mueller 2020-12-17
rm(max_rmse)

7.3 Predictor counts

The number of models where a particular predictor was included were counted for each basin, density slab and compared eras

# calculate cases of predictor used
lm_all_stats <- lm_best_cant %>% 
  pivot_longer(starts_with("delta_coeff_"),
               names_to = "term",
               names_prefix = "delta_coeff_",
               values_to = "delta_coeff") %>% 
  filter(term != "(Intercept)",
         delta_coeff != 0) %>% 
  group_by(basin, eras, gamma_slab) %>% 
  count(term) %>% 
  ungroup() %>% 
  pivot_wider(values_from = n, names_from = term)

# print table
lm_all_stats %>%
  gt(rowname_col = "gamma_slab",
     groupname_col = c("basin", "eras")) %>% 
  summary_rows(
    groups = TRUE,
    fns = list(total = "sum")
  )
aou nitrate phosphate phosphate_star sal silicate temp
Atlantic - 1982-1999 --> 2000-2010
(-Inf,26] 8 10 6 6 6 4 9
(26,28.2] 7 7 7 10 3 3 10
(28.2, Inf] 9 2 9 7 9 5 7
total 24.00 19.00 22.00 23.00 18.00 12.00 26.00
Atlantic - 2000-2010 --> 2011-2019
(-Inf,26] 7 10 7 8 4 3 9
(26,28.2] 7 7 7 10 3 3 10
(28.2, Inf] 5 3 9 8 10 6 7
total 19.00 20.00 23.00 26.00 17.00 12.00 26.00
Indo-Pacific - 1982-1999 --> 2000-2010
(-Inf,26] 8 6 6 6 10 5 8
(26,28.1] 7 5 7 8 5 10 8
(28.1, Inf] 9 6 10 7 3 6 7
total 24.00 17.00 23.00 21.00 18.00 21.00 23.00
Indo-Pacific - 2000-2010 --> 2011-2019
(-Inf,26] 8 8 7 6 8 4 8
(26,28.1] 5 5 5 8 6 10 9
(28.1, Inf] 10 5 10 7 6 6 4
total 23.00 18.00 22.00 21.00 20.00 20.00 21.00

7.4 RMSE alternatives

AIC is an alternative criterion to RMSE to judge model quality, but not (yet) taken into account.

lm_all_eras %>% 
  ggplot(aes(rmse, AIC, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  facet_grid(eras~basin)

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
ac1a836 jens-daniel-mueller 2021-02-24
b03fbd3 jens-daniel-mueller 2021-02-24
3d3b4cc jens-daniel-mueller 2021-02-23
7b672f7 jens-daniel-mueller 2021-01-11
33ba23c jens-daniel-mueller 2021-01-07
318609d jens-daniel-mueller 2020-12-23
9d0b2d0 jens-daniel-mueller 2020-12-23
0aa2b50 jens-daniel-mueller 2020-12-23
2886da0 jens-daniel-mueller 2020-12-19
02f0ee9 jens-daniel-mueller 2020-12-18
7bcb4eb jens-daniel-mueller 2020-12-18
7131186 jens-daniel-mueller 2020-12-17
5b48ef5 jens-daniel-mueller 2020-12-17
lm_best_eras %>% 
  ggplot(aes(rmse, AIC, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  facet_grid(eras~basin)

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
ac1a836 jens-daniel-mueller 2021-02-24
b03fbd3 jens-daniel-mueller 2021-02-24
3d3b4cc jens-daniel-mueller 2021-02-23
7b672f7 jens-daniel-mueller 2021-01-11
33ba23c jens-daniel-mueller 2021-01-07
318609d jens-daniel-mueller 2020-12-23
9d0b2d0 jens-daniel-mueller 2020-12-23
0aa2b50 jens-daniel-mueller 2020-12-23
2886da0 jens-daniel-mueller 2020-12-19
02f0ee9 jens-daniel-mueller 2020-12-18
7bcb4eb jens-daniel-mueller 2020-12-18
7131186 jens-daniel-mueller 2020-12-17
5b48ef5 jens-daniel-mueller 2020-12-17

7.5 Residual patterns

7.5.1 Fitted vs actual models

Plotted are fitted vs actual target variable values, here: rparams_local$MLR_target`

GLODAP_augmented_best <- left_join(
  lm_best_target %>% select(basin, gamma_slab, era, model),
  GLODAP_augmented
)
# calculate equal axis limits and binwidth
axis_lims <- GLODAP_augmented %>%
  summarise(
    max_value = max(
      c(max(.fitted, max(!!sym(params_local$MLR_target))))
      ),
    min_value = min(
      c(min(.fitted, min(!!sym(params_local$MLR_target))))
    )
  )

i_binwidth <- 1
# binwidth_value <- (axis_lims$max_value - axis_lims$min_value) / 40
axis_lims <- c(axis_lims$min_value, axis_lims$max_value)

GLODAP_augmented %>%
  ggplot(aes(cstar_tref, .fitted)) +
  geom_bin2d(binwidth = i_binwidth) +
  scale_fill_viridis_c() +
  geom_abline(slope = 1,
              col = 'red') +
  coord_equal(xlim = axis_lims,
              ylim = axis_lims) +
  labs(title = "All models")

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
27ae473 jens-daniel-mueller 2021-02-24
4bc00ea jens-daniel-mueller 2021-02-24
GLODAP_augmented_best %>%
  ggplot(aes(cstar_tref, .fitted)) +
  geom_bin2d(binwidth = i_binwidth) +
  scale_fill_viridis_c() +
  geom_abline(slope = 1,
              col = 'red') +
  coord_equal(xlim = axis_lims,
              ylim = axis_lims) +
  labs(title = "Selected models")

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
27ae473 jens-daniel-mueller 2021-02-24
4bc00ea jens-daniel-mueller 2021-02-24
rm(binwidth_value, axis_lims)

7.5.2 Pooled

In the following, we present residual patterns vs latitude across all domains.

i_ylim <- c(-30,30)

GLODAP_augmented_best %>%
  ggplot(aes(lat, .resid)) +
  geom_bin2d(binwidth = i_binwidth) +
  geom_hline(yintercept = 0, col = "white") +
  scale_fill_viridis_c() +
  labs(
    title = paste(
      "Target variable:",
      params_local$MLR_target,
      "| Selected models",
      "| All domains"
    )
  )

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
27ae473 jens-daniel-mueller 2021-02-24
4bc00ea jens-daniel-mueller 2021-02-24

Due to the few large residuals, we limit the y axis range for the plots below.

GLODAP_augmented_best %>%
  ggplot(aes(lat, .resid)) +
  geom_bin2d(binwidth = i_binwidth) +
  geom_hline(yintercept = 0, col = "white") +
  scale_fill_viridis_c() +
  coord_cartesian(ylim = i_ylim) +
  labs(
    title = paste(
      "Target variable:",
      params_local$MLR_target,
      "| Selected models",
      "| All domains"
    )
  )

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
27ae473 jens-daniel-mueller 2021-02-24
4bc00ea jens-daniel-mueller 2021-02-24

7.5.3 By model domain

In the following, we present residual patterns vs latitude for separate model domains, ie basins, density slabs and eras.

p_residuals <- function(df){
    ggplot(data = df, aes(lat, .resid)) +
      geom_bin2d(binwidth = i_binwidth) +
      geom_hline(yintercept = 0, col = "black") +
      scale_fill_viridis_c() +
      facet_grid(gamma_slab ~ era) +
      coord_cartesian(ylim = i_ylim) +
      labs(
        title = paste(
          "Target variable:",
          params_local$MLR_target,
          "| selected best models | basin:",
          unique(df$basin)
        )
      )
}

GLODAP_augmented_best %>% 
  group_split(basin) %>% 
  map(p_residuals)
[[1]]

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
27ae473 jens-daniel-mueller 2021-02-24
a1ba577 jens-daniel-mueller 2021-02-24
071743d jens-daniel-mueller 2021-02-24

[[2]]

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
27ae473 jens-daniel-mueller 2021-02-24
a1ba577 jens-daniel-mueller 2021-02-24

7.5.4 Latitudinal mean

GLODAP_augmented_best <- GLODAP_augmented_best %>%
  mutate(lat_grid = as.numeric(as.character(cut(
    lat,
    seq(-90, 90, 10),
    seq(-85, 85, 10)
  ))))

lat_residual <- GLODAP_augmented_best %>%
  group_by(basin, gamma_slab, era, lat_grid) %>% 
  summarise(.resid_mean = mean(.resid)) %>% 
  ungroup()

lat_residual %>%
  ggplot(aes(lat_grid, .resid_mean, col=era)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 0, col = "black") +
  facet_grid(gamma_slab ~ basin)

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
fec3558 jens-daniel-mueller 2021-02-24

7.5.5 Latitudinal offset

# calculate residual offset for adjacent eras
lat_residual_offset <- lat_residual  %>%
  select(basin, gamma_slab, era, lat_grid, .resid_mean) %>% 
  arrange(era) %>% 
  group_by(basin, gamma_slab, lat_grid) %>% 
  mutate(eras = paste(lag(era), era, sep = " --> "),
         .resid_mean_offset = .resid_mean - lag(.resid_mean)
         ) %>% 
  ungroup() %>% 
  select(-era) %>% 
  drop_na() %>% 
  filter(eras != paste(unique(lat_residual$era)[1],
                       unique(lat_residual$era)[3],
                       sep = " --> "))

lat_residual_offset %>%
  ggplot(aes(lat_grid, .resid_mean_offset, col=eras)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 0, col = "black") +
  facet_grid(gamma_slab ~ basin)

Version Author Date
a1d52ff jens-daniel-mueller 2021-03-15
0bade3b jens-daniel-mueller 2021-03-15
27c1f4b jens-daniel-mueller 2021-03-14
af75ebf jens-daniel-mueller 2021-03-14
5017709 jens-daniel-mueller 2021-03-11
585b07f jens-daniel-mueller 2021-03-11
85a5ed2 jens-daniel-mueller 2021-03-10
6c0bec6 jens-daniel-mueller 2021-03-05
3c2ec33 jens-daniel-mueller 2021-03-05
af70b94 jens-daniel-mueller 2021-03-04
fec3558 jens-daniel-mueller 2021-02-24

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 15.2

Matrix products: default
BLAS:   /usr/local/R-4.0.3/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R-4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gt_0.2.2         corrr_0.4.3      broom_0.7.2      kableExtra_1.3.1
 [5] knitr_1.30       olsrr_0.5.3      GGally_2.0.0     lubridate_1.7.9 
 [9] metR_0.9.0       scico_1.2.0      patchwork_1.1.1  collapse_1.5.0  
[13] forcats_0.5.0    stringr_1.4.0    dplyr_1.0.2      purrr_0.3.4     
[17] readr_1.4.0      tidyr_1.1.2      tibble_3.0.4     ggplot2_3.3.2   
[21] tidyverse_1.3.0  workflowr_1.6.2 

loaded via a namespace (and not attached):
 [1] fs_1.5.0                 webshot_0.5.2            RColorBrewer_1.1-2      
 [4] httr_1.4.2               rprojroot_2.0.2          tools_4.0.3             
 [7] backports_1.1.10         utf8_1.1.4               R6_2.5.0                
[10] nortest_1.0-4            DBI_1.1.0                colorspace_1.4-1        
[13] withr_2.3.0              gridExtra_2.3            tidyselect_1.1.0        
[16] curl_4.3                 compiler_4.0.3           git2r_0.27.1            
[19] cli_2.1.0                rvest_0.3.6              xml2_1.3.2              
[22] sass_0.2.0               labeling_0.4.2           scales_1.1.1            
[25] checkmate_2.0.0          goftest_1.2-2            digest_0.6.27           
[28] foreign_0.8-80           rmarkdown_2.5            rio_0.5.16              
[31] pkgconfig_2.0.3          htmltools_0.5.0          highr_0.8               
[34] dbplyr_1.4.4             rlang_0.4.9              readxl_1.3.1            
[37] rstudioapi_0.13          farver_2.0.3             generics_0.0.2          
[40] jsonlite_1.7.1           zip_2.1.1                car_3.0-10              
[43] magrittr_1.5             Matrix_1.2-18            Rcpp_1.0.5              
[46] munsell_0.5.0            fansi_0.4.1              abind_1.4-5             
[49] lifecycle_0.2.0          stringi_1.5.3            whisker_0.4             
[52] yaml_2.2.1               carData_3.0-4            plyr_1.8.6              
[55] grid_4.0.3               blob_1.2.1               parallel_4.0.3          
[58] promises_1.1.1           crayon_1.3.4             lattice_0.20-41         
[61] haven_2.3.1              hms_0.5.3                pillar_1.4.7            
[64] reprex_0.3.0             glue_1.4.2               evaluate_0.14           
[67] RcppArmadillo_0.10.1.2.0 data.table_1.13.2        modelr_0.1.8            
[70] vctrs_0.3.5              httpuv_1.5.4             cellranger_1.1.0        
[73] gtable_0.3.0             reshape_0.8.8            assertthat_0.2.1        
[76] xfun_0.18                openxlsx_4.2.3           RcppEigen_0.3.3.7.0     
[79] later_1.1.0.1            viridisLite_0.3.0        ellipsis_0.3.1          
[82] here_0.1