Last updated: 2021-02-24

Checks: 7 0

Knit directory: emlr_obs_v_XXX/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200707)

The command set.seed(20200707) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: c69736b

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version c69736b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  analysis/eMLR_model_fitting_for_loop.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/eMLR_model_fitting.Rmd) and HTML (docs/eMLR_model_fitting.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	c69736b	jens-daniel-mueller	2021-02-24	added plots back to after switching to map aproach
html	86406d5	jens-daniel-mueller	2021-02-24	Build site.
Rmd	1b3c171	jens-daniel-mueller	2021-02-24	introduced purrr::map to model fitting, rebuild all
html	3d3b4cc	jens-daniel-mueller	2021-02-23	Build site.
Rmd	cbfc388	jens-daniel-mueller	2021-02-23	introduced purrr::map to model fitting
html	7b672f7	jens-daniel-mueller	2021-01-11	Build site.
html	33ba23c	jens-daniel-mueller	2021-01-07	Build site.
Rmd	0ad30ba	jens-daniel-mueller	2021-01-07	removed GLODAP gamma filter, target variable mapped by eras+era
html	318609d	jens-daniel-mueller	2020-12-23	adapted more variable predictor selection
html	9d0b2d0	jens-daniel-mueller	2020-12-23	Build site.
html	0aa2b50	jens-daniel-mueller	2020-12-23	remove html before duplication
html	39113c3	jens-daniel-mueller	2020-12-23	Build site.
Rmd	bef9220	jens-daniel-mueller	2020-12-23	rebuild before sensitivity test
html	2886da0	jens-daniel-mueller	2020-12-19	Build site.
html	02f0ee9	jens-daniel-mueller	2020-12-18	cleaned up for copying template
html	965dba3	jens-daniel-mueller	2020-12-18	Build site.
html	5d452fe	jens-daniel-mueller	2020-12-18	Build site.
Rmd	ca65bf5	jens-daniel-mueller	2020-12-18	rebuild after final cleaning
html	7bcb4eb	jens-daniel-mueller	2020-12-18	Build site.
html	d397028	jens-daniel-mueller	2020-12-18	Build site.
Rmd	7e1b1c0	jens-daniel-mueller	2020-12-18	rebuild without na predictors
html	7131186	jens-daniel-mueller	2020-12-17	Build site.
Rmd	737d904	jens-daniel-mueller	2020-12-17	rebuild without na predictors
html	22b07fb	jens-daniel-mueller	2020-12-17	Build site.
html	a84ff3c	jens-daniel-mueller	2020-12-17	Build site.
Rmd	40369db	jens-daniel-mueller	2020-12-17	model selection criterion added
html	5b48ef5	jens-daniel-mueller	2020-12-17	Build site.
Rmd	e6ed2bc	jens-daniel-mueller	2020-12-17	plotted model results
html	f3a708f	jens-daniel-mueller	2020-12-17	Build site.
Rmd	7c8ace9	jens-daniel-mueller	2020-12-17	new MLR fitting routine, rmse corrected
html	e4ca289	jens-daniel-mueller	2020-12-16	Build site.
Rmd	3d5a3e2	jens-daniel-mueller	2020-12-16	new MLR fitting routine
html	158fe26	jens-daniel-mueller	2020-12-15	Build site.
html	7a9a4cb	jens-daniel-mueller	2020-12-15	Build site.
html	61b263c	jens-daniel-mueller	2020-12-15	Build site.
html	4d612dd	jens-daniel-mueller	2020-12-15	Build site.
html	e91cebd	jens-daniel-mueller	2020-12-15	Build site.
Rmd	d7992c4	jens-daniel-mueller	2020-12-15	eMLR target variable selection
html	953caf3	jens-daniel-mueller	2020-12-15	Build site.
html	42daf5c	jens-daniel-mueller	2020-12-14	Build site.
Rmd	923aa7f	jens-daniel-mueller	2020-12-14	rebuild with new path and auto folder creation
html	984697e	jens-daniel-mueller	2020-12-12	Build site.
html	3ebff89	jens-daniel-mueller	2020-12-12	Build site.
html	ba112d3	jens-daniel-mueller	2020-12-11	Build site.
Rmd	91b2b29	jens-daniel-mueller	2020-12-11	selectable basinmask, try 5
html	b01a367	jens-daniel-mueller	2020-12-09	Build site.
Rmd	71c63b0	jens-daniel-mueller	2020-12-09	rerun with variable predictor assignment
html	24a632f	jens-daniel-mueller	2020-12-07	Build site.
html	92dca91	jens-daniel-mueller	2020-12-07	Build site.
html	6a8004b	jens-daniel-mueller	2020-12-07	Build site.
html	70bf1a5	jens-daniel-mueller	2020-12-07	Build site.
html	7555355	jens-daniel-mueller	2020-12-07	Build site.
html	143d6fa	jens-daniel-mueller	2020-12-07	Build site.
html	abc6818	jens-daniel-mueller	2020-12-03	Build site.
Rmd	992ba15	jens-daniel-mueller	2020-12-03	rebuild with variable inventory depth
html	c8c2e7b	jens-daniel-mueller	2020-12-03	Build site.
Rmd	83203db	jens-daniel-mueller	2020-12-03	calculate cant with variable inventory depth
html	090e4d5	jens-daniel-mueller	2020-12-02	Build site.
html	7c25f7a	jens-daniel-mueller	2020-12-02	Build site.
html	ec8dc38	jens-daniel-mueller	2020-12-02	Build site.
html	c987de1	jens-daniel-mueller	2020-12-02	Build site.
html	f8358f8	jens-daniel-mueller	2020-12-02	Build site.
html	b03ddb8	jens-daniel-mueller	2020-12-02	Build site.
Rmd	9183e8f	jens-daniel-mueller	2020-12-02	revised assignment of era to eras
html	22d0127	jens-daniel-mueller	2020-12-01	Build site.
html	0ff728b	jens-daniel-mueller	2020-12-01	Build site.
html	91435ae	jens-daniel-mueller	2020-12-01	Build site.
Rmd	17d09be	jens-daniel-mueller	2020-12-01	auto eras naming
html	cf19652	jens-daniel-mueller	2020-11-30	Build site.
Rmd	0895ad6	jens-daniel-mueller	2020-11-30	rebuild with all plot output
Rmd	2842970	jens-daniel-mueller	2020-11-30	cleaned for eMLR part only
html	196be51	jens-daniel-mueller	2020-11-30	Build site.
Rmd	7a4b015	jens-daniel-mueller	2020-11-30	first rebuild on ETH server
Rmd	bc61ce3	Jens Müller	2020-11-30	Initial commit

1 Required data

Required are:

cleaned and prepared GLODAPv2_2020 file

GLODAP <-
  read_csv(paste(path_version_data,
                 "GLODAPv2.2020_MLR_fitting_ready.csv",
                 sep = ""))

2 Predictor combinations

Find all possible combinations of following considered predictor variables:

sal, temp, aou, nitrate, silicate, phosphate, phosphate_star

# the following code is a workaround to find all predictor combinations
# using the olsrr package and fit all models for one era, slab, and basin

i_basin <- unique(GLODAP$basin)[1]
i_era   <- unique(GLODAP$era)[1]

# subset one basin and era for fitting
GLODAP_basin_era <- GLODAP %>%
  filter(basin == i_basin, era == i_era)

i_gamma_slab <- unique(GLODAP_basin_era$gamma_slab)[1]
print(i_gamma_slab)

# subset one gamma slab
GLODAP_basin_era_slab <- GLODAP_basin_era %>%
  filter(gamma_slab == i_gamma_slab)

# fit the full linear model, i.e. all predictor combinations
lm_full <- lm(paste(
  params_local$MLR_target,
  paste(params_local$MLR_predictors, collapse = " + "),
  sep = " ~ "
),
data = GLODAP_basin_era_slab)

# fit linear models for all possible predictor combinations
# unfortunately, this functions does not provide model coefficients (yet)
lm_all <- ols_step_all_possible(lm_full)

# convert to tibble
lm_all <- as_tibble(lm_all)

# format model formula
lm_all <- lm_all %>% 
  select(n, predictors) %>% 
  mutate(model = str_replace_all(predictors, " ", " + "),
         model = paste(params_local$MLR_target, "~", model))

# remove helper objects
rm(i_gamma_slab,
   i_era,
   i_basin,
   GLODAP_basin_era,
   GLODAP_basin_era_slab,
   lm_full)

3 Apply predictor threshold

Select combinations with a total number of predictors in the range:

Minimum: 2
Maximum: 5

lm_all <- lm_all %>% 
  filter(n >= params_local$MLR_predictors_min,
         n <= params_local$MLR_predictors_max)

This results in a total number of MLR models of:

4 Fit all models

Individual linear regression models were fitted for the chosen target variable:

cstar_tref

as a function of each predictor combination. Fitting was performed separately within each basin, era, and slab. Model diagnostics, such as the root mean squared error (RMSE), were calculated for each fitted model.

GLODAP %>% 
  filter(basin %in% unique(GLODAP$basin)[1],
         era %in% unique(GLODAP$era)[c(1,2)],
         gamma_slab %in% unique(GLODAP$gamma_slab)[c(5,6)]) %>%
  filter_all(any_vars(is.na(.)))

GLODAP_nested <- GLODAP %>% 
  filter(basin %in% unique(GLODAP$basin)[1],
         era %in% unique(GLODAP$era)[c(1,2)],
         gamma_slab %in% unique(GLODAP$gamma_slab)[c(5,6)]) %>%
  drop_na() %>% 
  group_by(gamma_slab, era, basin) %>% 
  nest()

GLODAP_nested_lm <- expand_grid(
  GLODAP_nested,
  lm_all
)

GLODAP_nested_lm_fit <- GLODAP_nested_lm %>% 
  mutate(
    fit = map2(.x = data, .y = model, 
               ~ lm(as.formula(.y), data = .x)),
    tidied = map(fit, tidy),
    glanced = map(fit, glance),
    augmented = map(fit, augment)
  )

print(object.size(GLODAP_nested), units = "MB")
print(object.size(GLODAP_nested_lm), units = "MB")
print(object.size(GLODAP_nested_lm_fit), units = "MB")

GLODAP_glanced <- GLODAP_nested_lm_fit %>%
  select(-c(data, fit, tidied, augmented)) %>%
  unnest(glanced)

GLODAP_glanced <- GLODAP_glanced %>% 
  rename(n_predictors = n)

# GLODAP_glanced <- GLODAP_glanced %>% 
#   mutate(rmse = sqrt ( (sigma * df.residual) / nobs) )


GLODAP_tidy <- GLODAP_nested_lm_fit %>%
  select(-c(data, fit, glanced, augmented)) %>%
  unnest(tidied)

GLODAP_augmented <- GLODAP_nested_lm_fit %>% 
  select(-c(data, fit, tidied, glanced)) %>% 
  unnest(augmented) 

print(object.size(GLODAP_augmented), units = "MB")

GLODAP_glanced_rmse <- GLODAP_augmented %>%
  group_by(gamma_slab, era, basin, model) %>%
  summarise(rmse = sqrt(c(crossprod(.resid)) / length(.resid))) %>%
  ungroup()

GLODAP_glanced <- full_join(GLODAP_glanced, GLODAP_glanced_rmse)
rm(GLODAP_glanced_rmse)


# GLODAP_data <- GLODAP_nested_lm_fit %>% 
#   select(-c(fit, tidied, glanced, augmented)) %>% 
#   unnest(data) 


# # find max predictor correlation
# i_cor_max <- GLODAP_basin_era_slab %>%
#   select(!!!syms(str_split(i_predictors, " ",
#                            simplify = TRUE))) %>%
#   correlate(quiet = TRUE) %>%
#   select(-term) %>%
#   abs() %>%
#   max(na.rm = TRUE)
# 
# 
# # calculate maximum residual
# i_resid_max <- max(abs(i_lm_fit$residuals))
# 

rm(GLODAP, GLODAP_nested, GLODAP_nested_lm, GLODAP_nested_lm_fit, lm_all)

5 Prepare coeffcients

Coefficients are prepared for the mapping of Cant and the chosen target variable.

5.1 Predictor selection

Within each basin and slab, the following number of best linear regression models was selected:

The criterion used to select the best models was:

rmse

The criterion was summed up for two adjacent eras, and the models with lowest summed values were selected.

# calculate RMSE sum for adjacent eras
lm_all_eras <- GLODAP_glanced  %>%
  select(basin, gamma_slab, model, era, AIC, rmse) %>% 
  arrange(era) %>% 
  group_by(basin, gamma_slab, model) %>% 
  mutate(eras = paste(lag(era), era, sep = " --> "),
         rmse_sum = rmse + lag(rmse),
         aic_sum = AIC + lag(AIC)
         ) %>% 
  ungroup() %>% 
  select(-c(era)) %>% 
  drop_na()

# subset models with lowest summed criterion
# chose which criterion is applied

if (params_local$MLR_criterion == "aic") {
  lm_best_eras <- lm_all_eras %>%
    group_by(basin, gamma_slab, eras) %>%
    slice_min(order_by = aic_sum,
              with_ties = FALSE,
              n = params_local$MLR_number) %>%
    ungroup() %>%
    arrange(basin, gamma_slab, eras, model)
} else {
  lm_best_eras <- lm_all_eras %>%
    group_by(basin, gamma_slab, eras) %>%
    slice_min(order_by = rmse_sum,
              with_ties = FALSE,
              n = params_local$MLR_number) %>%
    ungroup() %>%
    arrange(basin, gamma_slab, eras, model)
}


# print table
lm_best_eras %>% 
  kable() %>%
  add_header_above() %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "400px")

basin	gamma_slab	model	AIC	rmse	eras	rmse_sum	aic_sum
Atlantic	(27.25,27.5]	cstar_tref ~ sal + aou + nitrate + phosphate + phosphate_star	16252.05	4.451293	1982-1999 –> 2000-2012	9.089033	30529.85
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + aou + nitrate + phosphate	16172.26	4.388044	1982-1999 –> 2000-2012	9.019600	30443.60
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + aou + nitrate + phosphate_star	16133.74	4.357842	1982-1999 –> 2000-2012	9.028179	30445.36
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + aou + phosphate	16201.76	4.412906	1982-1999 –> 2000-2012	9.143955	30573.76
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + aou + phosphate + phosphate_star	16176.48	4.391370	1982-1999 –> 2000-2012	9.116691	30544.63
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + aou + silicate + phosphate	16199.70	4.409697	1982-1999 –> 2000-2012	9.138929	30571.85
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + aou + silicate + phosphate_star	16175.86	4.390880	1982-1999 –> 2000-2012	9.153335	30581.82
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + nitrate + phosphate + phosphate_star	16143.58	4.365533	1982-1999 –> 2000-2012	9.016608	30435.24
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + phosphate + phosphate_star	16178.80	4.394774	1982-1999 –> 2000-2012	9.152330	30577.79
Atlantic	(27.25,27.5]	cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star	16179.16	4.393479	1982-1999 –> 2000-2012	9.147065	30576.11
Atlantic	(27.85,27.95]	cstar_tref ~ sal + aou + nitrate + silicate + phosphate_star	25857.02	4.974593	1982-1999 –> 2000-2012	9.088464	38858.88
Atlantic	(27.85,27.95]	cstar_tref ~ sal + aou + silicate + phosphate + phosphate_star	25747.92	4.911509	1982-1999 –> 2000-2012	9.075460	38805.25
Atlantic	(27.85,27.95]	cstar_tref ~ sal + aou + silicate + phosphate_star	25855.32	4.974769	1982-1999 –> 2000-2012	9.139290	38911.27
Atlantic	(27.85,27.95]	cstar_tref ~ sal + nitrate + silicate + phosphate + phosphate_star	26188.51	5.171299	1982-1999 –> 2000-2012	9.358673	39271.55
Atlantic	(27.85,27.95]	cstar_tref ~ sal + silicate + phosphate + phosphate_star	26205.41	5.182745	1982-1999 –> 2000-2012	9.400108	39319.16
Atlantic	(27.85,27.95]	cstar_tref ~ sal + temp + aou + silicate + phosphate	25753.71	4.914835	1982-1999 –> 2000-2012	9.089378	38822.68
Atlantic	(27.85,27.95]	cstar_tref ~ sal + temp + aou + silicate + phosphate_star	25747.25	4.911124	1982-1999 –> 2000-2012	9.075467	38805.01
Atlantic	(27.85,27.95]	cstar_tref ~ sal + temp + nitrate + silicate + phosphate	25864.17	4.978761	1982-1999 –> 2000-2012	9.153894	38933.80
Atlantic	(27.85,27.95]	cstar_tref ~ sal + temp + silicate + phosphate	25868.62	4.982514	1982-1999 –> 2000-2012	9.292880	39082.36
Atlantic	(27.85,27.95]	cstar_tref ~ sal + temp + silicate + phosphate + phosphate_star	25751.98	4.913841	1982-1999 –> 2000-2012	9.084490	38816.68

5.2 Target variable coefficients

A data frame to map the target variable is prepared.

# create table with two era belonging to one eras
eras_forward <- GLODAP_glanced %>%
  arrange(era) %>% 
  group_by(basin, gamma_slab, model) %>% 
  mutate(eras = paste(era, lead(era), sep = " --> ")) %>% 
  ungroup() %>% 
  select(era, eras) %>% 
  unique()

eras_backward <- GLODAP_glanced %>%
  arrange(era) %>% 
  group_by(basin, gamma_slab, model) %>% 
  mutate(eras = paste(lag(era), era, sep = " --> ")) %>% 
  ungroup() %>% 
  select(era, eras) %>% 
  unique()

eras_era <- full_join(eras_backward, eras_forward) %>% 
  filter(str_detect(eras, "NA") == FALSE)

# extend best model selection from eras to era
lm_best <- full_join(
  lm_best_eras %>% select(basin, gamma_slab, model, eras),
  eras_era)

lm_best <- left_join(
  lm_best,
  GLODAP_tidy %>% select(basin, gamma_slab, era, model, term, estimate))

rm(eras_era, eras_forward, eras_backward)

5.3 Cant coeffcients

A data frame of coefficient offsets is prepared to facilitate the direct mapping of Cant.

# subtract coefficients of adjacent era  
lm_best_cant <- lm_best %>%
  arrange(era) %>%
  group_by(basin, gamma_slab, eras, model, term) %>%
  mutate(delta_coeff = estimate - lag(estimate)) %>%
  ungroup() %>%
  arrange(basin, gamma_slab, model, term, eras) %>%
  drop_na() %>%
  select(-c(era,estimate))

# pivot to wide format
lm_best_cant <- lm_best_cant %>%
  pivot_wider(values_from = delta_coeff,
              names_from = term,
              names_prefix = "delta_coeff_",
              values_fill = 0)

5.4 Write files

lm_best_target <- lm_best %>% 
  pivot_wider(names_from = "term",
              names_prefix = "coeff_",
              values_from = "estimate",
              values_fill = 0
              )


lm_best_target %>%
  write_csv(paste(path_version_data,
                  "lm_best_target.csv",
                  sep = ""))

lm_best_cant %>%
  write_csv(paste(path_version_data,
                  "lm_best_cant.csv",
                  sep = ""))

6 Model diagnotics

6.1 Selection criterion vs predictors

The selection criterion (rmse) was plotted against the number of predictors (limited to 2 - 5).

6.1.1 All models

GLODAP_glanced %>%
  ggplot(aes(as.factor(n_predictors),
             !!sym(params_local$MLR_criterion),
             col = basin)) +
  geom_hline(yintercept = c(0,10)) +
  geom_boxplot() +
  facet_grid(gamma_slab~era) +
  scale_color_brewer(palette = "Set1") +
  ylim(c(0,NA)) +
  labs(x="Number of predictors")

Version	Author	Date
3d3b4cc	jens-daniel-mueller	2021-02-23
7b672f7	jens-daniel-mueller	2021-01-11
33ba23c	jens-daniel-mueller	2021-01-07
318609d	jens-daniel-mueller	2020-12-23
9d0b2d0	jens-daniel-mueller	2020-12-23
0aa2b50	jens-daniel-mueller	2020-12-23
2886da0	jens-daniel-mueller	2020-12-19
02f0ee9	jens-daniel-mueller	2020-12-18
7bcb4eb	jens-daniel-mueller	2020-12-18
7131186	jens-daniel-mueller	2020-12-17
5b48ef5	jens-daniel-mueller	2020-12-17
f3a708f	jens-daniel-mueller	2020-12-17

6.1.2 Best models

left_join(lm_best_target %>% select(basin, gamma_slab, era, model),
          GLODAP_glanced) %>%
  ggplot(aes("",
             !!sym(params_local$MLR_criterion),
             col = basin)) +
  geom_hline(yintercept = c(0, 10)) +
  geom_boxplot() +
  facet_grid(gamma_slab ~ era) +
  scale_color_brewer(palette = "Set1") +
  ylim(c(0, NA)) +
  labs(x = "Number of predictors pooled")

Version	Author	Date
3d3b4cc	jens-daniel-mueller	2021-02-23
7b672f7	jens-daniel-mueller	2021-01-11
33ba23c	jens-daniel-mueller	2021-01-07
318609d	jens-daniel-mueller	2020-12-23
9d0b2d0	jens-daniel-mueller	2020-12-23
0aa2b50	jens-daniel-mueller	2020-12-23
2886da0	jens-daniel-mueller	2020-12-19
02f0ee9	jens-daniel-mueller	2020-12-18
7bcb4eb	jens-daniel-mueller	2020-12-18
7131186	jens-daniel-mueller	2020-12-17
5b48ef5	jens-daniel-mueller	2020-12-17
f3a708f	jens-daniel-mueller	2020-12-17

6.2 RMSE correlation between eras

RMSE was plotted to compare the agreement for one model applied to two adjecent eras (ie check whether the same predictor combination performs equal in both eras).

6.2.1 All models

# find max rmse to scale axis
max_rmse <-
  max(c(lm_all_eras$rmse,
        lm_all_eras$rmse_sum - lm_all_eras$rmse))

lm_all_eras %>%
  ggplot(aes(rmse, rmse_sum - rmse, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  coord_equal(xlim = c(0, max_rmse),
              ylim = c(0, max_rmse)) +
  geom_abline(slope = 1,
              col = 'red') +
  facet_grid(eras ~ basin)

Version	Author	Date
3d3b4cc	jens-daniel-mueller	2021-02-23
7b672f7	jens-daniel-mueller	2021-01-11
33ba23c	jens-daniel-mueller	2021-01-07
318609d	jens-daniel-mueller	2020-12-23
9d0b2d0	jens-daniel-mueller	2020-12-23
0aa2b50	jens-daniel-mueller	2020-12-23
2886da0	jens-daniel-mueller	2020-12-19
02f0ee9	jens-daniel-mueller	2020-12-18
7bcb4eb	jens-daniel-mueller	2020-12-18
7131186	jens-daniel-mueller	2020-12-17
5b48ef5	jens-daniel-mueller	2020-12-17
e4ca289	jens-daniel-mueller	2020-12-16
158fe26	jens-daniel-mueller	2020-12-15
7a9a4cb	jens-daniel-mueller	2020-12-15
61b263c	jens-daniel-mueller	2020-12-15
984697e	jens-daniel-mueller	2020-12-12
3ebff89	jens-daniel-mueller	2020-12-12
ba112d3	jens-daniel-mueller	2020-12-11
24a632f	jens-daniel-mueller	2020-12-07
6a8004b	jens-daniel-mueller	2020-12-07
70bf1a5	jens-daniel-mueller	2020-12-07
7555355	jens-daniel-mueller	2020-12-07
143d6fa	jens-daniel-mueller	2020-12-07
090e4d5	jens-daniel-mueller	2020-12-02
7c25f7a	jens-daniel-mueller	2020-12-02
b03ddb8	jens-daniel-mueller	2020-12-02
91435ae	jens-daniel-mueller	2020-12-01
196be51	jens-daniel-mueller	2020-11-30

rm(max_rmse)

6.2.2 Best models

# find max rmse to scale axis
max_rmse <-
  max(c(lm_best_eras$rmse,
        lm_best_eras$rmse_sum - lm_best_eras$rmse))

lm_best_eras %>%
  ggplot(aes(rmse, rmse_sum - rmse, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  coord_equal(xlim = c(0, max_rmse),
              ylim = c(0, max_rmse)) +
  geom_abline(slope = 1,
              col = 'red') +
  facet_grid(eras ~ basin)

Version	Author	Date
3d3b4cc	jens-daniel-mueller	2021-02-23
7b672f7	jens-daniel-mueller	2021-01-11
33ba23c	jens-daniel-mueller	2021-01-07
318609d	jens-daniel-mueller	2020-12-23
9d0b2d0	jens-daniel-mueller	2020-12-23
0aa2b50	jens-daniel-mueller	2020-12-23
2886da0	jens-daniel-mueller	2020-12-19
02f0ee9	jens-daniel-mueller	2020-12-18
7bcb4eb	jens-daniel-mueller	2020-12-18
7131186	jens-daniel-mueller	2020-12-17
a84ff3c	jens-daniel-mueller	2020-12-17

rm(max_rmse)

6.3 Predictor counts

The number of models where a particular predictor was included were counted for each basin, density slab and compared eras

# calculate cases of predictor used
lm_all_stats <- lm_best_cant %>% 
  pivot_longer(starts_with("delta_coeff_"),
               names_to = "term",
               names_prefix = "delta_coeff_",
               values_to = "delta_coeff") %>% 
  filter(term != "(Intercept)",
         delta_coeff != 0) %>% 
  group_by(basin, eras, gamma_slab) %>% 
  count(term) %>% 
  ungroup() %>% 
  pivot_wider(values_from = n, names_from = term)

# print table
lm_all_stats %>%
  gt(rowname_col = "gamma_slab",
     groupname_col = c("basin", "eras")) %>% 
  summary_rows(
    groups = TRUE,
    fns = list(total = "sum")
  )

	aou	nitrate	phosphate	phosphate_star	sal	silicate	temp
Atlantic - 1982-1999 --> 2000-2012
(27.25,27.5]	7	4	8	7	10	3	9
(27.85,27.95]	5	3	7	7	10	10	5
total	12.00	7.00	15.00	14.00	20.00	13.00	14.00

6.4 RMSE alternatives

AIC is an alternative criterion to RMSE to judge model quality, but not (yet) taken into account.

lm_all_eras %>% 
  ggplot(aes(rmse, AIC, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  facet_grid(eras~basin)

Version	Author	Date
3d3b4cc	jens-daniel-mueller	2021-02-23
7b672f7	jens-daniel-mueller	2021-01-11
33ba23c	jens-daniel-mueller	2021-01-07
318609d	jens-daniel-mueller	2020-12-23
9d0b2d0	jens-daniel-mueller	2020-12-23
0aa2b50	jens-daniel-mueller	2020-12-23
2886da0	jens-daniel-mueller	2020-12-19
02f0ee9	jens-daniel-mueller	2020-12-18
7bcb4eb	jens-daniel-mueller	2020-12-18
7131186	jens-daniel-mueller	2020-12-17
5b48ef5	jens-daniel-mueller	2020-12-17

lm_best_eras %>% 
  ggplot(aes(rmse, AIC, col = gamma_slab)) +
  geom_point() +
  scale_color_viridis_d() +
  facet_grid(eras~basin)

Version	Author	Date
3d3b4cc	jens-daniel-mueller	2021-02-23
7b672f7	jens-daniel-mueller	2021-01-11
33ba23c	jens-daniel-mueller	2021-01-07
318609d	jens-daniel-mueller	2020-12-23
9d0b2d0	jens-daniel-mueller	2020-12-23
0aa2b50	jens-daniel-mueller	2020-12-23
2886da0	jens-daniel-mueller	2020-12-19
02f0ee9	jens-daniel-mueller	2020-12-18
7bcb4eb	jens-daniel-mueller	2020-12-18
7131186	jens-daniel-mueller	2020-12-17
5b48ef5	jens-daniel-mueller	2020-12-17

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 15.2

Matrix products: default
BLAS:   /usr/local/R-4.0.3/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R-4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gt_0.2.2         corrr_0.4.3      broom_0.7.2      kableExtra_1.3.1
 [5] knitr_1.30       olsrr_0.5.3      GGally_2.0.0     lubridate_1.7.9 
 [9] metR_0.9.0       scico_1.2.0      patchwork_1.1.1  collapse_1.5.0  
[13] forcats_0.5.0    stringr_1.4.0    dplyr_1.0.2      purrr_0.3.4     
[17] readr_1.4.0      tidyr_1.1.2      tibble_3.0.4     ggplot2_3.3.2   
[21] tidyverse_1.3.0  workflowr_1.6.2 

loaded via a namespace (and not attached):
 [1] fs_1.5.0                 webshot_0.5.2            RColorBrewer_1.1-2      
 [4] httr_1.4.2               rprojroot_2.0.2          tools_4.0.3             
 [7] backports_1.1.10         utf8_1.1.4               R6_2.5.0                
[10] nortest_1.0-4            DBI_1.1.0                colorspace_1.4-1        
[13] withr_2.3.0              gridExtra_2.3            tidyselect_1.1.0        
[16] curl_4.3                 compiler_4.0.3           git2r_0.27.1            
[19] cli_2.1.0                rvest_0.3.6              xml2_1.3.2              
[22] sass_0.2.0               labeling_0.4.2           scales_1.1.1            
[25] checkmate_2.0.0          goftest_1.2-2            digest_0.6.27           
[28] foreign_0.8-80           rmarkdown_2.5            rio_0.5.16              
[31] pkgconfig_2.0.3          htmltools_0.5.0          highr_0.8               
[34] dbplyr_1.4.4             rlang_0.4.9              readxl_1.3.1            
[37] rstudioapi_0.13          farver_2.0.3             generics_0.0.2          
[40] jsonlite_1.7.1           zip_2.1.1                car_3.0-10              
[43] magrittr_1.5             Matrix_1.2-18            Rcpp_1.0.5              
[46] munsell_0.5.0            fansi_0.4.1              abind_1.4-5             
[49] lifecycle_0.2.0          stringi_1.5.3            whisker_0.4             
[52] yaml_2.2.1               carData_3.0-4            plyr_1.8.6              
[55] grid_4.0.3               blob_1.2.1               parallel_4.0.3          
[58] promises_1.1.1           crayon_1.3.4             lattice_0.20-41         
[61] haven_2.3.1              hms_0.5.3                pillar_1.4.7            
[64] reprex_0.3.0             glue_1.4.2               evaluate_0.14           
[67] RcppArmadillo_0.10.1.2.0 data.table_1.13.2        modelr_0.1.8            
[70] vctrs_0.3.5              httpuv_1.5.4             cellranger_1.1.0        
[73] gtable_0.3.0             reshape_0.8.8            assertthat_0.2.1        
[76] xfun_0.18                openxlsx_4.2.3           RcppEigen_0.3.3.7.0     
[79] later_1.1.0.1            viridisLite_0.3.0        ellipsis_0.3.1          
[82] here_0.1

eMLR - model fitting

Jens Daniel Müller

24 February, 2021