Last updated: 2022-01-30
Checks: 7 0
Knit directory: ms_mariposas_biodiversity/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20211228)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b29b528. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Unstaged changes:
Modified: data/mod_den_selectionBIC.csv
Modified: data/mod_riq_selectionBIC.csv
Modified: glmulti.analysis.modgen.back
Modified: glmulti.analysis.mods.back
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/modeliza.Rmd
) and HTML (docs/modeliza.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b29b528 | ajpelu | 2022-01-30 | add equations |
Rmd | 04d5a3b | ajpelu | 2022-01-28 | modelling |
html | 04d5a3b | ajpelu | 2022-01-28 | modelling |
library(tidyverse)
library(readxl)
library(janitor)
library(here)
library(correlation)
library(patchwork)
library(vegan)
library(glmulti)
library(DHARMa)
library(MuMIn)
library(performance)
library(tweedie)
library(kableExtra)
library(visreg)
library(tab)
library(MASS)
library(equatiomatic)
library(report)
variables_sel <- c("Pp_anu", "TP_PEND", "FR_MATDE",
"TP_SU_NO", "FR_QUERC", "FR_CONIF",
"TP_RSH_V", "HIDRO_ITH", "temp_anu",
"TP_RSD_P", "Pp_ver", "TP_ES_OE", "TP_EXPO",
"DS_ARBOL", "elev")
diversidad_year <- read_csv(here::here("data/diversidad_by_year.csv"))
riqueza_year <- read_csv(here::here("data/riqueza_by_year.csv"))
densidad_year <- read_csv(here::here("data/densidad_by_year.csv"))
env <- read_csv(here::here("data/matrix_env_variables_selected.csv")) %>%
dplyr::select(-elev, -temp_anu, -Pp_ver, -Pp_anu)
climate_year <- read_csv(here::here("data/climate_year.csv")) %>%
dplyr::select(Id_transect, year, p_anu_year, p_ver_year, t_anu_year = t_anual)
m <- riqueza_year %>%
inner_join(diversidad_year) %>%
inner_join(densidad_year) %>%
dplyr::select(-longitud, -min_altitu, -max_altitu, -long_total, -abundancia) %>%
unite("id", c("id_transecto", "year")) %>%
inner_join(
(climate_year %>% unite("id", c("Id_transect", "year"), remove = FALSE)),
by="id") %>%
inner_join(env) %>%
rowwise() %>%
mutate(FR_ARBOL = sum(FR_CONIF, FR_QUERC)) %>%
rename(div = diversidad) %>%
relocate(transecto, id, Id_transect, site, elev,year) %>%
dplyr::select(-Transecto, -Abreviatura)
co <- correlation((m %>% dplyr::select(p_anu_year:FR_ARBOL)))
co %>% summary() %>%
plot(size_point = .5,
show_values = TRUE,
show_p = TRUE,
show_legend = FALSE,
size_text = 3.5) +
theme(axis.text = element_text(size = 8))
Warning: Removed 91 rows containing missing values (geom_point).
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
Creamos una variable llamada FR_ARBOL = FR_CONIF + FR_QUERC
Analizar correlación entre variables
theme_set(theme_bw())
(m %>% ggplot(aes(x=DS_ARBOL, y=FR_ARBOL)) + geom_point()) +
(m %>% ggplot(aes(x=DS_ARBOL, y=FR_QUERC)) + geom_point()) +
(m %>% ggplot(aes(x=DS_ARBOL, y=FR_CONIF)) + geom_point())
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
En caso de querer dejar alguna variable de arbolado, dejaríamos la FR_ARBOL, pero esta variable está muy correlacionada con DS_ARBOL (\(r > |.7|\)), por lo tanto, nos quedamos con DS_ARBOL y descartamos FR_QUERC y FR_CONIF.
(m %>% ggplot(aes(x=TP_RSD_P, y=TP_SU_NO)) + geom_point())
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
Elegimos a TP_RSD_P, por ser una varible de mayor sentido biológico (cantidad de radiación que recibe)
Analizamos ahora TP_PEND. Esta variable aparece muy correlacionada con HIDRO_ITH (r = -0.6925104) y con TP_RSH_V (r = -0.6747214), aunque valores muy cercanos a <|.7|. Puede ser buena idea descartar TP_PEND.
theme_set(theme_bw())
(m %>% ggplot(aes(x=elev, y=t_anu_year)) + geom_point()) +
(m %>% ggplot(aes(x=elev, y=DS_ARBOL)) + geom_point()) +
(m %>% ggplot(aes(x=elev, y=HIDRO_ITH)) + geom_point())
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
myvars <- m %>%
dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO) %>% names()
corvif(m[,myvars])
Variance inflation factors
GVIF
elev 5.353464
p_anu_year 1.210317
p_ver_year 1.187311
TP_PEND 18.681484
FR_MATDE 2.360937
TP_RSH_V 5.410423
HIDRO_ITH 9.707477
TP_RSD_P 5.095531
TP_ES_OE 1.726852
TP_EXPO 4.495690
DS_ARBOL 8.592039
FR_ARBOL 4.955440
Efectivamente vemos altos valores de VIF para TP_PEND (la descartamos)
myvars <- m %>%
dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND) %>% names()
corvif(m[,myvars])
Variance inflation factors
GVIF
elev 5.291071
p_anu_year 1.181742
p_ver_year 1.179339
FR_MATDE 2.289574
TP_RSH_V 2.717009
HIDRO_ITH 3.024327
TP_RSD_P 2.345438
TP_ES_OE 1.639223
TP_EXPO 4.495657
DS_ARBOL 7.108273
FR_ARBOL 3.415775
La siguente candidata a eliminar es DS_ARBOL, aunque dependerá del umbral que seleccionemos. Algunos autores hablan de VIF < 3, otros VIF < 5 y otros de VIF < 10. No obstante, antes vemos la posible relación con FR_ARBOL (r = -0.7147883), que es una variable derivada (combinada de FR_CONIF y FR_QUERC). Proponemos descartar FR_ARBOL.
myvars <- m %>%
dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-elev, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND, -FR_ARBOL) %>% names()
corvif(m[,myvars])
Variance inflation factors
GVIF
p_anu_year 1.172146
p_ver_year 1.190897
t_anu_year 2.196946
FR_MATDE 2.076348
TP_RSH_V 2.657142
HIDRO_ITH 2.290738
TP_RSD_P 2.167037
TP_ES_OE 1.625434
TP_EXPO 3.807454
DS_ARBOL 2.652751
Por tanto tenemos seleccionadas las siguientes variables:
m %>%
dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND, -FR_ARBOL) %>% names()
[1] "elev" "p_anu_year" "p_ver_year" "FR_MATDE" "TP_RSH_V"
[6] "HIDRO_ITH" "TP_RSD_P" "TP_ES_OE" "TP_EXPO" "DS_ARBOL"
nobs_var <- nrow(m)/
ncol(m %>%
dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND, -FR_ARBOL))
Algunos hablan de tener entre 15 - 25 veces el numero de observaciones por cada covariable. Actualmente tenemos 15.6
theme_set(theme_bw())
m %>%
pivot_longer(p_anu_year:FR_ARBOL) %>%
ggplot(aes(x=value, y=den)) +
geom_point() + geom_smooth() +
facet_wrap(~name, scales = "free_x")
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
# Define formula
fden <- as.formula(
paste("den",
paste(
names(
m %>% dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND, -FR_ARBOL)), collapse = "+"),
sep = "~")
)
fden
den ~ elev + p_anu_year + p_ver_year + FR_MATDE + TP_RSH_V +
HIDRO_ITH + TP_RSD_P + TP_ES_OE + TP_EXPO + DS_ARBOL
# Probamos varias familias, entre ellas Tweedie
library(statmod)
library(tweedie)
profile1 <- tweedie.profile(den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL, data = m, p.vec = seq(1.1, 3.0, 0.1), fit.glm = TRUE)
print(profile1$p.max)
# automatic model selection
set.seed(1234)
# fam <- "poisson"
# fam <- "gaussian"
fam <- "gamma"
select_fden <- glmulti(fden, data = m,
level= 1,
chunk = 1, chunks = 4,
method = "ga", crit = "bic",
family = Gamma(link ="log"),
marginality = TRUE,
confsetsize = 5,
plotty = FALSE, report = FALSE)
TASK: Genetic algorithm in the candidate set.
Initialization...
Algorithm started...
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
fden1 <- glm(select_fden@formulas[[1]],
family = Gamma(link ="log"), data = m)
fden2 <- glm(select_fden@formulas[[2]],
family = Gamma(link ="log"), data = m)
fden3 <- glm(select_fden@formulas[[3]],
family = Gamma(link ="log"), data = m)
fden4 <- glm(select_fden@formulas[[4]],
family = Gamma(link ="log"), data = m)
fden5 <- glm(select_fden@formulas[[5]],
family = Gamma(link ="log"), data = m)
top5_table_fden <- as.data.frame(model.sel(fden1, fden2, fden3, fden4, fden5, rank = BIC)) %>%
dplyr::select(-family) %>%
mutate(model =
c(fden1$formula, fden2$formula, fden3$formula, fden4$formula, fden5$formula)) %>%
relocate(model)
write.csv(as.matrix(top5_table_fden), file=here::here("data/mod_den_selectionBIC.csv"))
### Model validation
performance::check_model(fden1)
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
performance(fden1) %>%
kbl() %>%
kable_styling()
AIC | BIC | R2_Nagelkerke | RMSE | Sigma |
---|---|---|---|---|
511.9234 | 527.1727 | 0.727684 | 1.642396 | 0.4541641 |
select_fden@formulas[[1]]
den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL
<environment: 0x7fbfde2956c8>
modelo_densidad <- glm(den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL,
family = Gamma(link ="log"), data = m)
ytitle <- "Density"
visreg(modelo_densidad, scale="response", partial=TRUE,
ylab = ytitle, "FR_MATDE", xlab = "FR_MATDE")
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
visreg(modelo_densidad, scale="response", partial=TRUE,
ylab = ytitle, "DS_ARBOL", xlab = "DS_ARBOL")
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
visreg(modelo_densidad, scale="response", partial=TRUE,
"HIDRO_ITH", xlab = "HIDRO_ITH")
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
ms <- modelo_densidad
tc <- tab::tabglm(ms, columns = c("beta.se", "test", "p"), decimals = 4)
names(tc) <- c("Variable", "Estimate", "Zvalue", "pvalue")
tablita_auxiliar <- data.frame(
Variable = c("DegreeFreedom", "AIC", "BIC", "DevianceExplained"),
Estimate = as.character(c(df.residual(ms), round(AIC(ms),0), round(BIC(ms), 0),
round( ((ms$null.deviance - ms$deviance) / ms$null.deviance),3))),
Zvalue = "", pvalue = "")
tc <- bind_rows(tc, tablita_auxiliar)
write.csv(tc, here::here("data/mod_den_coefficients.csv"))
tc %>% kbl() %>% kable_styling()
Variable | Estimate | Zvalue | pvalue |
---|---|---|---|
Intercept | 0.6159 (0.2048) | 3.0067 | 0.003 |
FR_MATDE | 0.0243 (0.0032) | 7.5854 | <0.001 |
HIDRO_ITH | 0.0802 (0.0256) | 3.1338 | 0.002 |
DS_ARBOL | -0.0003 (0.0000) | -12.8372 | <0.001 |
DegreeFreedom | 152 | ||
AIC | 512 | ||
BIC | 527 | ||
DevianceExplained | 0.664 |
extract_eq(modelo_densidad, wrap = TRUE, intercept = "beta", use_coefs = TRUE)
\[ \begin{aligned} \log ({ \widehat{E( \operatorname{den} )} }) &= 0.62 + 0.02(\operatorname{FR\_MATDE}) + 0.08(\operatorname{HIDRO\_ITH}) + 0(\operatorname{DS\_ARBOL}) \end{aligned} \]
report::report(modelo_densidad)
We fitted a general linear model (Gamma family with a log link) (estimated using ML) to predict den with FR_MATDE, HIDRO_ITH and DS_ARBOL (formula: den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL). The model's explanatory power is substantial (Nagelkerke's R2 = 0.73). The model's intercept, corresponding to FR_MATDE = 0, HIDRO_ITH = 0 and DS_ARBOL = 0, is at 0.62 (95% CI [0.21, 1.01], t(152) = 3.01, p = 0.003). Within this model:
- The effect of FR_MATDE is statistically significant and positive (beta = 0.02, 95% CI [0.02, 0.03], t(152) = 7.59, p < .001; Std. beta = 0.31, 95% CI [0.23, 0.40])
- The effect of HIDRO_ITH is statistically significant and positive (beta = 0.08, 95% CI [0.03, 0.13], t(152) = 3.13, p = 0.002; Std. beta = 0.12, 95% CI [0.05, 0.20])
- The effect of DS_ARBOL is statistically significant and negative (beta = -3.14e-04, 95% CI [-3.62e-04, -2.64e-04], t(152) = -12.84, p < .001; Std. beta = -0.53, 95% CI [-0.61, -0.45])
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using
theme_set(theme_bw())
m %>%
pivot_longer(p_anu_year:FR_ARBOL) %>%
ggplot(aes(x=value, y=div)) +
geom_point() + geom_smooth() +
facet_wrap(~name, scales = "free_x")
Warning: Removed 15 rows containing non-finite values (stat_smooth).
Warning: Removed 15 rows containing missing values (geom_point).
# Define formula
fdiv <- as.formula(
paste("div",
paste(
names(
m %>% dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND, -FR_ARBOL,
-TP_EXPO, -p_anu_year, -p_ver_year)), collapse = "+"),
sep = "~")
)
fdiv
div ~ elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_ES_OE +
DS_ARBOL
# automatic model selection
set.seed(1234)
# fam <- "poisson"
fam <- "gaussian"
select_fdiv <- glmulti(fdiv, data = m,
level= 1,
chunk = 1, chunks = 4,
method = "ga", crit = "bic",
family = fam,
marginality = TRUE,
confsetsize = 5,
plotty = FALSE, report = FALSE)
TASK: Genetic algorithm in the candidate set.
Initialization...
Algorithm started...
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
fdiv1 <- glm(select_fdiv@formulas[[1]], data = m)
fdiv2 <- glm(select_fdiv@formulas[[2]], data = m)
fdiv3 <- glm(select_fdiv@formulas[[3]], data = m)
fdiv4 <- glm(select_fdiv@formulas[[4]], data = m)
fdiv5 <- glm(select_fdiv@formulas[[5]], data = m)
top5_table_fdiv <- as.data.frame(model.sel(fdiv1, fdiv2, fdiv3, fdiv4, fdiv5, rank = BIC)) %>%
dplyr::select(-family) %>%
mutate(model =
c(fdiv1$formula, fdiv2$formula, fdiv3$formula,
fdiv4$formula, fdiv5$formula)) %>%
relocate(model)
write.csv(as.matrix(top5_table_fdiv), file=here::here("data/mod_div_selectionBIC.csv"))
### Model validation
performance::check_model(fdiv2)
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.
performance(fdiv2) %>%
kbl() %>%
kable_styling()
AIC | BIC | R2 | RMSE | Sigma |
---|---|---|---|---|
148.2462 | 166.5067 | 0.7374419 | 0.3755233 | 0.3817307 |
select_fdiv@formulas[[2]]
div ~ 1 + FR_MATDE + HIDRO_ITH + TP_ES_OE + DS_ARBOL
<environment: 0x7fbff98918c0>
modelo_diversidad <- glm(div ~ 1 + FR_MATDE + HIDRO_ITH + TP_ES_OE + DS_ARBOL, data = m)
ytitle <- "Diversity"
visreg(modelo_diversidad, scale="response", partial=TRUE,
ylab = ytitle, "FR_MATDE", xlab = "FR_MATDE")
visreg(modelo_diversidad, scale="response", partial=TRUE,
ylab = ytitle, "HIDRO_ITH", xlab = "HIDRO_ITH")
visreg(modelo_diversidad, scale="response", partial=TRUE,
ylab = ytitle, "TP_ES_OE", xlab = "TP_ES_OE")
visreg(modelo_diversidad, scale="response", partial=TRUE,
ylab = ytitle, "DS_ARBOL", xlab = "DS_ARBOL")
ms <- modelo_diversidad
tc <- tab::tabglm(ms, columns = c("beta.se", "test", "p"), decimals = 4)
names(tc) <- c("Variable", "Estimate", "Zvalue", "pvalue")
tablita_auxiliar <- data.frame(
Variable = c("DegreeFreedom", "AIC", "BIC", "DevianceExplained"),
Estimate = as.character(c(df.residual(ms), round(AIC(ms),0), round(BIC(ms), 0),
round( ((ms$null.deviance - ms$deviance) / ms$null.deviance),3))),
Zvalue = "", pvalue = "")
tc <- bind_rows(tc, tablita_auxiliar)
write.csv(tc, here::here("data/mod_div_coefficients.csv"))
tc %>% kbl() %>% kable_styling()
Variable | Estimate | Zvalue | pvalue |
---|---|---|---|
Intercept | 3.7419 (0.1920) | 19.4891 | <0.001 |
FR_MATDE | 0.0242 (0.0028) | 8.5998 | <0.001 |
HIDRO_ITH | -0.1417 (0.0212) | -6.6774 | <0.001 |
TP_ES_OE | -0.0081 (0.0019) | -4.2891 | <0.001 |
DS_ARBOL | -0.0002 (0.0000) | -10.6152 | <0.001 |
DegreeFreedom | 150 | ||
AIC | 148 | ||
BIC | 167 | ||
DevianceExplained | 0.737 |
extract_eq(modelo_diversidad, wrap = TRUE, intercept = "beta", use_coefs = TRUE)
\[ \begin{aligned} \widehat{E( \operatorname{div} )} &= 3.74 + 0.02(\operatorname{FR\_MATDE}) - 0.14(\operatorname{HIDRO\_ITH}) - 0.01(\operatorname{TP\_ES\_OE})\ + \\ &\quad 0(\operatorname{DS\_ARBOL}) \end{aligned} \]
report::report(modelo_diversidad)
We fitted a linear model (estimated using ML) to predict div with FR_MATDE, HIDRO_ITH, TP_ES_OE and DS_ARBOL (formula: div ~ 1 + FR_MATDE + HIDRO_ITH + TP_ES_OE + DS_ARBOL). The model's explanatory power is substantial (R2 = 0.74). The model's intercept, corresponding to FR_MATDE = 0, HIDRO_ITH = 0, TP_ES_OE = 0 and DS_ARBOL = 0, is at 3.74 (95% CI [3.37, 4.12], t(150) = 19.49, p < .001). Within this model:
- The effect of FR_MATDE is statistically significant and positive (beta = 0.02, 95% CI [0.02, 0.03], t(150) = 8.60, p < .001; Std. beta = 0.42, 95% CI [0.33, 0.52])
- The effect of HIDRO_ITH is statistically significant and negative (beta = -0.14, 95% CI [-0.18, -0.10], t(150) = -6.68, p < .001; Std. beta = -0.29, 95% CI [-0.37, -0.20])
- The effect of TP_ES_OE is statistically significant and negative (beta = -8.07e-03, 95% CI [-0.01, -4.38e-03], t(150) = -4.29, p < .001; Std. beta = -0.20, 95% CI [-0.29, -0.11])
- The effect of DS_ARBOL is statistically significant and negative (beta = -2.33e-04, 95% CI [-2.76e-04, -1.90e-04], t(150) = -10.62, p < .001; Std. beta = -0.53, 95% CI [-0.63, -0.43])
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using
theme_set(theme_bw())
m %>%
pivot_longer(p_anu_year:FR_ARBOL) %>%
ggplot(aes(x=value, y=riq)) +
geom_point() + geom_smooth() +
facet_wrap(~name, scales = "free_x")
# Define formula
friq <- as.formula(
paste("riq",
paste(
names(
m %>% dplyr::select(-transecto, -id, -Id_transect, -site, -year,
-riq, -div, -den,
-t_anu_year, -FR_QUERC, -FR_CONIF,
-TP_SU_NO, -TP_PEND, -FR_ARBOL,
-p_anu_year, -p_ver_year)), collapse = "+"),
sep = "~")
)
friq
riq ~ elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_ES_OE +
TP_EXPO + DS_ARBOL
# automatic model selection
set.seed(1234)
# fam <- "poisson"
fam <- "gaussian"
select_friq <- glmulti(friq, data = m,
level= 1,
chunk = 1, chunks = 4,
method = "ga", crit = "bic",
family = fam,
marginality = TRUE,
confsetsize = 5,
plotty = FALSE, report = FALSE)
TASK: Genetic algorithm in the candidate set.
Initialization...
Algorithm started...
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
friq1 <- glm(select_friq@formulas[[1]], data = m)
friq2 <- glm(select_friq@formulas[[2]], data = m)
friq3 <- glm(select_friq@formulas[[3]], data = m)
friq4 <- glm(select_friq@formulas[[4]], data = m)
friq5 <- glm(select_friq@formulas[[5]], data = m)
top5_table_friq <- as.data.frame(model.sel(friq1, friq2, friq3, friq4, friq5, rank = BIC)) %>%
dplyr::select(-family) %>%
mutate(model =
c(friq1$formula, friq3$formula, friq3$formula, friq4$formula, friq5$formula)) %>%
relocate(model)
write.csv(as.matrix(top5_table_friq), file=here::here("data/mod_riq_selectionBIC.csv"))
### Model validation
performance::check_model(friq1)
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.
Version | Author | Date |
---|---|---|
04d5a3b | ajpelu | 2022-01-28 |
performance(friq1) %>%
kbl() %>%
kable_styling()
AIC | BIC | R2 | RMSE | Sigma |
---|---|---|---|---|
999.4117 | 1026.86 | 0.8699678 | 5.621575 | 5.77151 |
select_friq@formulas[[1]]
riq ~ 1 + elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P +
TP_ES_OE + TP_EXPO
<environment: 0x7fbfe4549ee8>
modelo_riqueza <- glm(riq ~ 1 + elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_ES_OE + TP_EXPO, data = m)
ytitle <- "Richness"
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "elev", xlab = "elev")
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "FR_MATDE", xlab = "FR_MATDE")
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "TP_RSH_V", xlab = "TP_RSH_V")
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "HIDRO_ITH", xlab = "HIDRO_ITH")
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "TP_RSD_P", xlab = "TP_RSD_P")
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "TP_ES_OE", xlab = "TP_ES_OE")
visreg(modelo_riqueza, scale="response", partial=TRUE,
ylab = ytitle, "TP_EXPO", xlab = "TP_EXPO")
ms <- modelo_riqueza
tc <- tab::tabglm(ms, columns = c("beta.se", "test", "p"), decimals = 4)
names(tc) <- c("Variable", "Estimate", "Zvalue", "pvalue")
tablita_auxiliar <- data.frame(
Variable = c("DegreeFreedom", "AIC", "BIC", "DevianceExplained"),
Estimate = as.character(c(df.residual(ms), round(AIC(ms),0), round(BIC(ms), 0),
round( ((ms$null.deviance - ms$deviance) / ms$null.deviance),3))),
Zvalue = "", pvalue = "")
tc <- bind_rows(tc, tablita_auxiliar)
write.csv(tc, here::here("data/mod_riq_coefficients.csv"))
tc %>% kbl() %>% kable_styling()
Variable | Estimate | Zvalue | pvalue |
---|---|---|---|
Intercept | -56.3488 (14.9369) | -3.7724 | <0.001 |
elev | -0.0051 (0.0012) | -4.2190 | <0.001 |
FR_MATDE | 1.0777 (0.0438) | 24.6190 | <0.001 |
TP_RSH_V | 8.4994 (1.0706) | 7.9386 | <0.001 |
HIDRO_ITH | -3.9907 (0.4811) | -8.2949 | <0.001 |
TP_RSD_P | 0.0029 (0.0007) | 4.0488 | <0.001 |
TP_ES_OE | -0.2069 (0.0315) | -6.5788 | <0.001 |
TP_EXPO | -0.0769 (0.0154) | -4.9800 | <0.001 |
DegreeFreedom | 148 | ||
AIC | 999 | ||
BIC | 1027 | ||
DevianceExplained | 0.87 |
extract_eq(modelo_riqueza, wrap = TRUE, intercept = "beta", use_coefs = TRUE)
\[ \begin{aligned} \widehat{E( \operatorname{riq} )} &= -56.35 - 0.01(\operatorname{elev}) + 1.08(\operatorname{FR\_MATDE}) + 8.5(\operatorname{TP\_RSH\_V})\ - \\ &\quad 3.99(\operatorname{HIDRO\_ITH}) + 0(\operatorname{TP\_RSD\_P}) - 0.21(\operatorname{TP\_ES\_OE}) - 0.08(\operatorname{TP\_EXPO}) \end{aligned} \]
report::report(modelo_riqueza)
We fitted a linear model (estimated using ML) to predict riq with elev, FR_MATDE, TP_RSH_V, HIDRO_ITH, TP_RSD_P, TP_ES_OE and TP_EXPO (formula: riq ~ 1 + elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_ES_OE + TP_EXPO). The model's explanatory power is substantial (R2 = 0.87). The model's intercept, corresponding to elev = 0, FR_MATDE = 0, TP_RSH_V = 0, HIDRO_ITH = 0, TP_RSD_P = 0, TP_ES_OE = 0 and TP_EXPO = 0, is at -56.35 (95% CI [-85.62, -27.07], t(148) = -3.77, p < .001). Within this model:
- The effect of elev is statistically significant and negative (beta = -5.09e-03, 95% CI [-7.45e-03, -2.73e-03], t(148) = -4.22, p < .001; Std. beta = -0.22, 95% CI [-0.33, -0.12])
- The effect of FR_MATDE is statistically significant and positive (beta = 1.08, 95% CI [0.99, 1.16], t(148) = 24.62, p < .001; Std. beta = 0.88, 95% CI [0.81, 0.95])
- The effect of TP_RSH_V is statistically significant and positive (beta = 8.50, 95% CI [6.40, 10.60], t(148) = 7.94, p < .001; Std. beta = 0.33, 95% CI [0.25, 0.41])
- The effect of HIDRO_ITH is statistically significant and negative (beta = -3.99, 95% CI [-4.93, -3.05], t(148) = -8.29, p < .001; Std. beta = -0.38, 95% CI [-0.47, -0.29])
- The effect of TP_RSD_P is statistically significant and positive (beta = 2.88e-03, 95% CI [1.48e-03, 4.27e-03], t(148) = 4.05, p < .001; Std. beta = 0.18, 95% CI [0.09, 0.26])
- The effect of TP_ES_OE is statistically significant and negative (beta = -0.21, 95% CI [-0.27, -0.15], t(148) = -6.58, p < .001; Std. beta = -0.24, 95% CI [-0.31, -0.17])
- The effect of TP_EXPO is statistically significant and negative (beta = -0.08, 95% CI [-0.11, -0.05], t(148) = -4.98, p < .001; Std. beta = -0.28, 95% CI [-0.39, -0.17])
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] report_0.3.0 equatiomatic_0.2.0 MASS_7.3-53 tab_4.1.1
[5] knitr_1.31 visreg_2.7.0 kableExtra_1.3.1 tweedie_2.3.3
[9] performance_0.8.0 MuMIn_1.43.17 DHARMa_0.3.3.0 glmulti_1.0.8
[13] leaps_3.1 rJava_0.9-13 vegan_2.5-7 lattice_0.20-41
[17] permute_0.9-5 patchwork_1.1.1 correlation_0.6.1 here_1.0.1
[21] janitor_2.1.0 readxl_1.3.1 forcats_0.5.1 stringr_1.4.0
[25] dplyr_1.0.6 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
[29] tibble_3.1.2 ggplot2_3.3.5 tidyverse_1.3.1 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] backports_1.2.1 plyr_1.8.6 splines_4.0.2 TH.data_1.0-10
[5] digest_0.6.27 qqplotr_0.0.5 foreach_1.5.1 htmltools_0.5.2
[9] fansi_0.4.2 magrittr_2.0.1 cluster_2.1.0 see_0.6.4
[13] modelr_0.1.8 sandwich_3.0-0 colorspace_2.0-2 rvest_1.0.0
[17] ggrepel_0.9.1 mitools_2.4 haven_2.3.1 xfun_0.23
[21] callr_3.7.0 crayon_1.4.1 jsonlite_1.7.2 lme4_1.1-27.1
[25] survival_3.2-7 zoo_1.8-8 iterators_1.0.13 glue_1.4.2
[29] gtable_0.3.0 emmeans_1.5.4 webshot_0.5.2 DEoptimR_1.0-8
[33] scales_1.1.1.9000 mvtnorm_1.1-1 DBI_1.1.1 Rcpp_1.0.7
[37] viridisLite_0.4.0 xtable_1.8-4 stats4_4.0.2 survey_4.0
[41] httr_1.4.2 ellipsis_0.3.2 pkgconfig_2.0.3 farver_2.1.0
[45] sass_0.3.1 dbplyr_2.1.1 utf8_1.1.4 tidyselect_1.1.1
[49] labeling_0.4.2 rlang_0.4.12 later_1.1.0.1 effectsize_0.4.5
[53] munsell_0.5.0 cellranger_1.1.0 tools_4.0.2 cli_2.5.0
[57] generics_0.1.0 broom_0.7.9 ggridges_0.5.3 evaluate_0.14
[61] fastmap_1.1.0 yaml_2.2.1 processx_3.5.1 fs_1.5.0
[65] robustbase_0.93-7 nlme_3.1-152 whisker_0.4 xml2_1.3.2
[69] compiler_4.0.2 rstudioapi_0.13 reprex_2.0.0 bslib_0.2.4
[73] stringi_1.7.4 highr_0.8 ps_1.5.0 parameters_0.14.0
[77] Matrix_1.3-2 nloptr_1.2.2.2 vctrs_0.3.8 pillar_1.6.1
[81] lifecycle_1.0.1 jquerylib_0.1.3 estimability_1.3 insight_0.14.4
[85] httpuv_1.5.5 R6_2.5.1 promises_1.2.0.1 gridExtra_2.3
[89] codetools_0.2-18 boot_1.3-26 assertthat_0.2.1 rprojroot_2.0.2
[93] withr_2.4.1 multcomp_1.4-16 mgcv_1.8-33 bayestestR_0.9.0
[97] parallel_4.0.2 hms_1.0.0 grid_4.0.2 coda_0.19-4
[101] minqa_1.2.4 rmarkdown_2.8 snakecase_0.11.0 git2r_0.28.0
[105] getPass_0.2-2 lubridate_1.7.10