Last updated: 2022-07-04

Checks: 7 0

Knit directory: ms_mariposas_biodiversity/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211228) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a9c1df0. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  data/tabla_4-v2.xlsx

Unstaged changes:
    Modified:   data/mod_den_selectionBIC.csv
    Modified:   data/mod_div_coefficients.csv
    Modified:   data/mod_div_selectionBIC.csv
    Modified:   data/mod_riq_coefficients.csv
    Modified:   data/mod_riq_selectionBIC.csv
    Modified:   glmulti.analysis.modgen.back
    Modified:   glmulti.analysis.mods.back

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/modeliza.Rmd) and HTML (docs/modeliza.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd a9c1df0 ajpelu 2022-07-04 eliminar TP_ES_OE y rehacer modelos
html 2a10774 ajpelu 2022-01-30 Build site.
Rmd b29b528 ajpelu 2022-01-30 add equations
Rmd 04d5a3b ajpelu 2022-01-28 modelling
html 04d5a3b ajpelu 2022-01-28 modelling

library(tidyverse)
library(readxl)
library(janitor)
library(here)
library(correlation)
Warning: package 'correlation' was built under R version 4.0.5
library(patchwork)
library(vegan)
library(glmulti)
library(DHARMa)
library(MuMIn)
library(performance)
library(tweedie)
library(kableExtra)
library(visreg)
library(tab)
library(MASS)
library(equatiomatic)
library(report)
Warning: package 'report' was built under R version 4.0.5

Introduction

Prepara datos

  • De las variables climáticas según nuestra selección previa tenemos:
variables_sel <- c("Pp_anu", "TP_PEND", "FR_MATDE", 
             "TP_SU_NO", "FR_QUERC", "FR_CONIF", 
             "TP_RSH_V", "HIDRO_ITH", "temp_anu",
             "TP_RSD_P", "Pp_ver", "TP_ES_OE", "TP_EXPO",
             "DS_ARBOL", "elev")
  • Por tanto de las variables climáticas anuales solo cogeré: Pp_anu, t_anual, Pp_ver
diversidad_year <- read_csv(here::here("data/diversidad_by_year.csv"))
riqueza_year <- read_csv(here::here("data/riqueza_by_year.csv"))
densidad_year <- read_csv(here::here("data/densidad_by_year.csv"))

env <- read_csv(here::here("data/matrix_env_variables_selected.csv")) %>% 
  dplyr::select(-elev, -temp_anu, -Pp_ver, -Pp_anu)
climate_year <- read_csv(here::here("data/climate_year.csv")) %>% 
  dplyr::select(Id_transect, year, p_anu_year, p_ver_year, t_anu_year = t_anual) 

m <- riqueza_year %>% 
  inner_join(diversidad_year) %>% 
  inner_join(densidad_year) %>% 
  dplyr::select(-longitud, -min_altitu, -max_altitu, -long_total, -abundancia) %>% 
  unite("id", c("id_transecto", "year")) %>% 
  inner_join(
    (climate_year %>% unite("id", c("Id_transect", "year"), remove = FALSE)),
    by="id") %>% 
  inner_join(env) %>% 
  rowwise() %>% 
  mutate(FR_ARBOL = sum(FR_CONIF, FR_QUERC)) %>% 
  rename(div = diversidad) %>% 
  relocate(transecto, id, Id_transect, site, elev,year) %>% 
  dplyr::select(-Transecto, -Abreviatura)

Exploración y selección de variables

Correlaciones

co <- correlation((m %>% dplyr::select(p_anu_year:FR_ARBOL)))

co %>% summary() %>% 
  corrplot(size_point = .5, 
       show_values = TRUE,
       show_p = TRUE,
       show_legend = FALSE,
       size_text = 3.5) + 
  theme(axis.text = element_text(size = 8))

Evaluar variables de arbolado

  1. Creamos una variable llamada FR_ARBOL = FR_CONIF + FR_QUERC

  2. Analizar correlación entre variables

  • FR_CONIF y FR_QUERC (r = -0.1884633)
  • FR_CONIF y DS_ARBOL (r = -0.5334323)
  • FR_QUERC y DS_ARBOL (r = -0.366733)
  • FR_ARBOL y DS_ARBOL (r = -0.7147883)
  1. Evaluar relación entre variables:
theme_set(theme_bw())
(m %>% ggplot(aes(x=DS_ARBOL, y=FR_ARBOL)) + geom_point()) + 
(m %>% ggplot(aes(x=DS_ARBOL, y=FR_QUERC)) + geom_point()) +
(m %>% ggplot(aes(x=DS_ARBOL, y=FR_CONIF)) + geom_point())

Version Author Date
04d5a3b ajpelu 2022-01-28

En caso de querer dejar alguna variable de arbolado, dejaríamos la FR_ARBOL, pero esta variable está muy correlacionada con DS_ARBOL (\(r > |.7|\)), por lo tanto, nos quedamos con DS_ARBOL y descartamos FR_QUERC y FR_CONIF.

Evaluar variables de gradientes topográficos

  • Las variables TP_RSD_P y TP_SU_NO están muy correlacionadas (r = -0.9459845)
(m %>% ggplot(aes(x=TP_RSD_P, y=TP_SU_NO)) + geom_point())

Version Author Date
04d5a3b ajpelu 2022-01-28
  • Elegimos a TP_RSD_P, por ser una varible de mayor sentido biológico (cantidad de radiación que recibe)

  • Analizamos ahora TP_PEND. Esta variable aparece muy correlacionada con HIDRO_ITH (r = -0.6925104) y con TP_RSH_V (r = -0.6747214), aunque valores muy cercanos a <|.7|. Puede ser buena idea descartar TP_PEND.

Evaluar elevación

  • La elevación presenta alta correlación con la t_anual_year (r = -0.8056942).
  • Asímismo, la elevación presenta una correlación cercana al umbral (|.7|) con DS_ARBOL (r = 0.5965598) y con HIDRO_ITH (r = -0.6394274).
  • Sin embargo, la elevación a priori es una covariable que nos interesa mantener, por lo que descartamos t_anu_year.
theme_set(theme_bw())
(m %>% ggplot(aes(x=elev, y=t_anu_year)) + geom_point()) + 
(m %>% ggplot(aes(x=elev, y=DS_ARBOL)) + geom_point()) +
(m %>% ggplot(aes(x=elev, y=HIDRO_ITH)) + geom_point())

Version Author Date
04d5a3b ajpelu 2022-01-28

Evaluar VIF

myvars <- m %>% 
  dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO) %>% names()
corvif(m[,myvars])


Variance inflation factors

                GVIF
elev        5.353464
p_anu_year  1.210317
p_ver_year  1.187311
TP_PEND    18.681484
FR_MATDE    2.360937
TP_RSH_V    5.410423
HIDRO_ITH   9.707477
TP_RSD_P    5.095531
TP_ES_OE    1.726852
TP_EXPO     4.495690
DS_ARBOL    8.592039
FR_ARBOL    4.955440

Efectivamente vemos altos valores de VIF para TP_PEND (la descartamos)

myvars <- m %>% 
  dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_PEND) %>% names()

corvif(m[,myvars])


Variance inflation factors

               GVIF
elev       5.291071
p_anu_year 1.181742
p_ver_year 1.179339
FR_MATDE   2.289574
TP_RSH_V   2.717009
HIDRO_ITH  3.024327
TP_RSD_P   2.345438
TP_ES_OE   1.639223
TP_EXPO    4.495657
DS_ARBOL   7.108273
FR_ARBOL   3.415775

La siguente candidata a eliminar es DS_ARBOL, aunque dependerá del umbral que seleccionemos. Algunos autores hablan de VIF < 3, otros VIF < 5 y otros de VIF < 10. No obstante, antes vemos la posible relación con FR_ARBOL (r = -0.7147883), que es una variable derivada (combinada de FR_CONIF y FR_QUERC). Proponemos descartar FR_ARBOL.

myvars <- m %>% 
  dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -elev, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_PEND, -FR_ARBOL, -TP_ES_OE) %>% names()

corvif(m[,myvars])


Variance inflation factors

               GVIF
p_anu_year 1.172004
p_ver_year 1.190846
t_anu_year 2.180990
FR_MATDE   1.606927
TP_RSH_V   2.572037
HIDRO_ITH  2.284596
TP_RSD_P   2.062400
TP_EXPO    3.793956
DS_ARBOL   2.437238

Por tanto tenemos seleccionadas las siguientes variables:

m %>% 
  dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_PEND, -FR_ARBOL, -TP_ES_OE) %>% names()
[1] "elev"       "p_anu_year" "p_ver_year" "FR_MATDE"   "TP_RSH_V"  
[6] "HIDRO_ITH"  "TP_RSD_P"   "TP_EXPO"    "DS_ARBOL"  
nobs_var <- nrow(m)/
  ncol(m %>% 
 dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_PEND, -FR_ARBOL, -TP_ES_OE))

Algunos hablan de tener entre 15 - 25 veces el numero de observaciones por cada covariable. Actualmente tenemos 17.3333333

Modelo de Densidad

  • Explorar densidad frente a todas las variables
theme_set(theme_bw())
m %>% 
  pivot_longer(p_anu_year:FR_ARBOL) %>% 
  ggplot(aes(x=value, y=den)) +
  geom_point() + geom_smooth() + 
  facet_wrap(~name, scales = "free_x")

Version Author Date
04d5a3b ajpelu 2022-01-28
# Define formula 
fden <- as.formula(
  paste("den", 
      paste(
        names(
          m %>% dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_PEND, -FR_ARBOL, -TP_ES_OE)), collapse = "+"),
      sep = "~")
  )
fden
den ~ elev + p_anu_year + p_ver_year + FR_MATDE + TP_RSH_V + 
    HIDRO_ITH + TP_RSD_P + TP_EXPO + DS_ARBOL

Aproximación modelo GLM

  • Probamos con varias familias, y optamos por Gamma
# Probamos varias familias, entre ellas Tweedie 
library(statmod)
library(tweedie)
profile1 <- tweedie.profile(den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL, data = m, p.vec = seq(1.1, 3.0, 0.1), fit.glm = TRUE)
print(profile1$p.max)
# automatic model selection 
set.seed(1234)
# fam <- "poisson"
# fam <- "gaussian"
fam <- "gamma" 
select_fden <- glmulti(fden, data = m,
              level= 1, 
              chunk = 1, chunks = 4, 
              method = "ga", crit = "bic", 
              family = Gamma(link ="log"), 
              marginality = TRUE,
              confsetsize = 5, 
              plotty = FALSE, report = FALSE)
TASK: Genetic algorithm in the candidate set.
Initialization...
Algorithm started...
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
fden1 <- glm(select_fden@formulas[[1]], 
             family = Gamma(link ="log"), data = m)
fden2 <- glm(select_fden@formulas[[2]], 
             family = Gamma(link ="log"), data = m)
fden3 <- glm(select_fden@formulas[[3]], 
             family = Gamma(link ="log"), data = m)
fden4 <- glm(select_fden@formulas[[4]], 
             family = Gamma(link ="log"), data = m)
fden5 <- glm(select_fden@formulas[[5]], 
             family = Gamma(link ="log"), data = m)
  • Generar tabla de top five modelos
top5_table_fden <- as.data.frame(model.sel(fden1, fden2, fden3, fden4, fden5, rank = BIC)) %>% 
    dplyr::select(-family) %>% 
    mutate(model = 
             c(fden1$formula, fden2$formula, fden3$formula, fden4$formula, fden5$formula)) %>% 
    relocate(model)

write.csv(as.matrix(top5_table_fden), file=here::here("data/mod_den_selectionBIC.csv"))

### Model validation

performance::check_model(fden1) 
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

Version Author Date
04d5a3b ajpelu 2022-01-28
  • GOF
performance(fden1) %>% 
  kbl() %>% 
  kable_styling()
AIC BIC R2_Nagelkerke RMSE Sigma
511.9234 527.1727 0.727684 1.642396 0.4541641

Modelo seleccionado

select_fden@formulas[[1]]
den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL
<environment: 0x7fb04c1d6198>
modelo_densidad <- glm(den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL, 
                       family = Gamma(link ="log"), data = m)

Visualización

ytitle <- "Density" 

visreg(modelo_densidad, scale="response", partial=TRUE,
       ylab = ytitle, "FR_MATDE", xlab = "FR_MATDE")

Version Author Date
04d5a3b ajpelu 2022-01-28
visreg(modelo_densidad, scale="response", partial=TRUE,
       ylab = ytitle, "DS_ARBOL", xlab = "DS_ARBOL")

Version Author Date
04d5a3b ajpelu 2022-01-28
visreg(modelo_densidad, scale="response", partial=TRUE,
       "HIDRO_ITH", xlab = "HIDRO_ITH")

Version Author Date
04d5a3b ajpelu 2022-01-28

Parámetros

ms <- modelo_densidad
tc <- tab::tabglm(ms, columns = c("beta.se",  "test", "p"), decimals = 4) 
names(tc) <- c("Variable", "Estimate", "Zvalue", "pvalue") 
  
tablita_auxiliar <- data.frame(
    Variable = c("DegreeFreedom", "AIC", "BIC", "DevianceExplained"), 
    Estimate = as.character(c(df.residual(ms), round(AIC(ms),0), round(BIC(ms), 0), 
                              round( ((ms$null.deviance - ms$deviance) / ms$null.deviance),3))),
    Zvalue = "", pvalue = "")
  
tc <- bind_rows(tc, tablita_auxiliar)
write.csv(tc, here::here("data/mod_den_coefficients.csv"))
tc %>% kbl() %>% kable_styling()
Variable Estimate Zvalue pvalue
Intercept 0.6159 (0.2048) 3.0067 0.003
FR_MATDE 0.0243 (0.0032) 7.5854 <0.001
HIDRO_ITH 0.0802 (0.0256) 3.1338 0.002
DS_ARBOL -0.0003 (0.0000) -12.8372 <0.001
DegreeFreedom 152
AIC 512
BIC 527
DevianceExplained 0.664
extract_eq(modelo_densidad, wrap = TRUE, intercept = "beta", use_coefs = TRUE)

\[ \begin{aligned} \log ({ \widehat{E( \operatorname{den} )} }) &= 0.62 + 0.02(\operatorname{FR\_MATDE}) + 0.08(\operatorname{HIDRO\_ITH}) + 0(\operatorname{DS\_ARBOL}) \end{aligned} \]

report::report(modelo_densidad)
We fitted a general linear model (Gamma family with a log link) (estimated using ML) to predict den with FR_MATDE, HIDRO_ITH and DS_ARBOL (formula: den ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL). The model's explanatory power is substantial (Nagelkerke's R2 = 0.73). The model's intercept, corresponding to FR_MATDE = 0, HIDRO_ITH = 0 and DS_ARBOL = 0, is at 0.62 (95% CI [0.21, 1.01], t(152) = 3.01, p = 0.003). Within this model:

  - The effect of FR MATDE is statistically significant and positive (beta = 0.02, 95% CI [0.02, 0.03], t(152) = 7.59, p < .001; Std. beta = 0.31, 95% CI [0.23, 0.40])
  - The effect of HIDRO ITH is statistically significant and positive (beta = 0.08, 95% CI [0.03, 0.13], t(152) = 3.13, p = 0.002; Std. beta = 0.12, 95% CI [0.05, 0.20])
  - The effect of DS ARBOL is statistically significant and negative (beta = -3.14e-04, 95% CI [-3.62e-04, -2.64e-04], t(152) = -12.84, p < .001; Std. beta = -0.53, 95% CI [-0.61, -0.45])

Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using 

Modelo de Diversidad

  • Explorar diversidad frente a todas las variables
theme_set(theme_bw())
m %>% 
  pivot_longer(p_anu_year:FR_ARBOL) %>% 
  ggplot(aes(x=value, y=div)) +
  geom_point() + geom_smooth() + 
  facet_wrap(~name, scales = "free_x")
Warning: Removed 15 rows containing non-finite values (stat_smooth).
Warning: Removed 15 rows containing missing values (geom_point).

Version Author Date
2a10774 ajpelu 2022-01-30
# Define formula 
fdiv <- as.formula(
  paste("div", 
      paste(
        names(
          m %>% dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_PEND, -FR_ARBOL, 
                -TP_EXPO, -TP_ES_OE, -p_anu_year, -p_ver_year)), collapse = "+"),
      sep = "~")
  )
fdiv
div ~ elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + DS_ARBOL

Aproximación modelo GLM

  • Probamos con varias familias, y optamos por Gaussian
  • Ojo, he quitado TP_EXPO, p_anu_year, p_ver_year por valores altos de vif según performance::check_model
# automatic model selection 
set.seed(1234)
# fam <- "poisson"
fam <- "gaussian"
 
select_fdiv <- glmulti(fdiv, data = m,
              level= 1, 
              chunk = 1, chunks = 4, 
              method = "ga", crit = "bic", 
              family = fam, 
              marginality = TRUE,
              confsetsize = 5, 
              plotty = FALSE, report = FALSE)
TASK: Genetic algorithm in the candidate set.
Initialization...
Algorithm started...
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
fdiv1 <- glm(select_fdiv@formulas[[1]], data = m)
fdiv2 <- glm(select_fdiv@formulas[[2]], data = m)
fdiv3 <- glm(select_fdiv@formulas[[3]], data = m)
fdiv4 <- glm(select_fdiv@formulas[[4]], data = m)
fdiv5 <- glm(select_fdiv@formulas[[5]], data = m)
  • Generar tabla de top five modelos
top5_table_fdiv <- as.data.frame(model.sel(fdiv1, fdiv2, fdiv3, fdiv4, fdiv5, rank = BIC)) %>% 
    dplyr::select(-family) %>% 
    mutate(model = 
             c(fdiv1$formula, fdiv2$formula, fdiv3$formula,
               fdiv4$formula, fdiv5$formula)) %>% 
    relocate(model)

write.csv(as.matrix(top5_table_fdiv), file=here::here("data/mod_div_selectionBIC.csv"))

### Model validation

  • Eligo modelo 2, no hay casi diferencias y tiene menos params.
performance::check_model(fdiv2) 
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

Version Author Date
2a10774 ajpelu 2022-01-30
  • GOF
performance(fdiv2) %>% 
  kbl() %>% 
  kable_styling()
AIC BIC R2 RMSE Sigma
164.1773 179.3944 0.7052414 0.3978849 0.4031205

Modelo seleccionado

select_fdiv@formulas[[2]]
div ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL
<environment: 0x7fb068751ae8>
modelo_diversidad <- glm(div ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL, data = m)

Visualización

ytitle <- "Diversity" 

visreg(modelo_diversidad, scale="response", partial=TRUE, 
       ylab = ytitle, "FR_MATDE", xlab = "FR_MATDE")

Version Author Date
2a10774 ajpelu 2022-01-30
visreg(modelo_diversidad, scale="response", partial=TRUE,
       ylab = ytitle, "HIDRO_ITH", xlab = "HIDRO_ITH")

Version Author Date
2a10774 ajpelu 2022-01-30
#visreg(modelo_diversidad, scale="response", partial=TRUE,
#       ylab = ytitle, "TP_ES_OE", xlab = "TP_ES_OE")

visreg(modelo_diversidad, scale="response", partial=TRUE,
       ylab = ytitle, "DS_ARBOL", xlab = "DS_ARBOL")

Version Author Date
2a10774 ajpelu 2022-01-30

Parámetros

ms <- modelo_diversidad
tc <- tab::tabglm(ms, columns = c("beta.se",  "test", "p"), decimals = 4) 
names(tc) <- c("Variable", "Estimate", "Zvalue", "pvalue") 
  
tablita_auxiliar <- data.frame(
    Variable = c("DegreeFreedom", "AIC", "BIC", "DevianceExplained"), 
    Estimate = as.character(c(df.residual(ms), round(AIC(ms),0), round(BIC(ms), 0), 
                              round( ((ms$null.deviance - ms$deviance) / ms$null.deviance),3))),
    Zvalue = "", pvalue = "")
  
tc <- bind_rows(tc, tablita_auxiliar)
write.csv(tc, here::here("data/mod_div_coefficients.csv"))
tc %>% kbl() %>% kable_styling()
Variable Estimate Zvalue pvalue
Intercept 3.3504 (0.1784) 18.7825 <0.001
FR_MATDE 0.0200 (0.0028) 7.1733 <0.001
HIDRO_ITH -0.1323 (0.0223) -5.9345 <0.001
DS_ARBOL -0.0003 (0.0000) -12.3877 <0.001
DegreeFreedom 151
AIC 164
BIC 179
DevianceExplained 0.705
extract_eq(modelo_diversidad, wrap = TRUE, intercept = "beta", use_coefs = TRUE)

\[ \begin{aligned} \widehat{E( \operatorname{div} )} &= 3.35 + 0.02(\operatorname{FR\_MATDE}) - 0.13(\operatorname{HIDRO\_ITH}) + 0(\operatorname{DS\_ARBOL}) \end{aligned} \]

report::report(modelo_diversidad)
We fitted a linear model (estimated using ML) to predict div with FR_MATDE, HIDRO_ITH and DS_ARBOL (formula: div ~ 1 + FR_MATDE + HIDRO_ITH + DS_ARBOL). The model's explanatory power is substantial (R2 = 0.71). The model's intercept, corresponding to FR_MATDE = 0, HIDRO_ITH = 0 and DS_ARBOL = 0, is at 3.35 (95% CI [3.00, 3.70], t(151) = 18.78, p < .001). Within this model:

  - The effect of FR MATDE is statistically significant and positive (beta = 0.02, 95% CI [0.01, 0.03], t(151) = 7.17, p < .001; Std. beta = 0.35, 95% CI [0.25, 0.44])
  - The effect of HIDRO ITH is statistically significant and negative (beta = -0.13, 95% CI [-0.18, -0.09], t(151) = -5.93, p < .001; Std. beta = -0.27, 95% CI [-0.36, -0.18])
  - The effect of DS ARBOL is statistically significant and negative (beta = -2.67e-04, 95% CI [-3.10e-04, -2.25e-04], t(151) = -12.39, p < .001; Std. beta = -0.61, 95% CI [-0.70, -0.51])

Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using 

Modelo de Riqueza

  • Explorar riqueza frente a todas las variables
theme_set(theme_bw())
m %>% 
  pivot_longer(p_anu_year:FR_ARBOL) %>% 
  ggplot(aes(x=value, y=riq)) +
  geom_point() + geom_smooth() + 
  facet_wrap(~name, scales = "free_x")

Version Author Date
2a10774 ajpelu 2022-01-30
# Define formula 
friq <- as.formula(
  paste("riq", 
      paste(
        names(
          m %>% dplyr::select(-transecto, -id, -Id_transect, -site, -year, 
                -riq, -div, -den, 
                -t_anu_year, -FR_QUERC, -FR_CONIF, 
                -TP_SU_NO, -TP_ES_OE, -TP_PEND, -FR_ARBOL, 
                -p_anu_year, -p_ver_year)), collapse = "+"),
      sep = "~")
  )
friq
riq ~ elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_EXPO + 
    DS_ARBOL

Aproximación modelo GLM

  • Probamos con varias familias, y optamos por Gamma
# automatic model selection 
set.seed(1234)
# fam <- "poisson"
fam <- "gaussian"
select_friq <- glmulti(friq, data = m,
              level= 1, 
              chunk = 1, chunks = 4, 
              method = "ga", crit = "bic", 
              family = fam, 
              marginality = TRUE,
              confsetsize = 5, 
              plotty = FALSE, report = FALSE)
TASK: Genetic algorithm in the candidate set.
Initialization...
Algorithm started...
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
friq1 <- glm(select_friq@formulas[[1]], data = m)
friq2 <- glm(select_friq@formulas[[2]], data = m)
friq3 <- glm(select_friq@formulas[[3]], data = m)
friq4 <- glm(select_friq@formulas[[4]], data = m)
friq5 <- glm(select_friq@formulas[[5]], data = m)
  • Generar tabla de top five modelos
top5_table_friq <- as.data.frame(model.sel(friq1, friq2, friq3, friq4, friq5, rank = BIC)) %>% 
    dplyr::select(-family) %>% 
    mutate(model = 
             c(friq1$formula, friq3$formula, friq3$formula, friq4$formula, friq5$formula)) %>% 
    relocate(model)

write.csv(as.matrix(top5_table_friq), file=here::here("data/mod_riq_selectionBIC.csv"))

### Model validation

performance::check_model(friq1) 
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

Version Author Date
2a10774 ajpelu 2022-01-30
04d5a3b ajpelu 2022-01-28
  • GOF
performance(friq1) %>% 
  kbl() %>% 
  kable_styling()
AIC BIC R2 RMSE Sigma
1037.431 1061.83 0.8319412 6.390918 6.539318

Modelo seleccionado

select_friq@formulas[[1]]
riq ~ 1 + elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + 
    TP_EXPO
<environment: 0x7fb062a0c708>
modelo_riqueza <- glm(riq ~ 1 + elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_EXPO, data = m)

Visualización

ytitle <- "Richness" 

visreg(modelo_riqueza, scale="response", partial=TRUE,
       ylab = ytitle, "elev", xlab = "elev")

Version Author Date
2a10774 ajpelu 2022-01-30
visreg(modelo_riqueza, scale="response", partial=TRUE,
       ylab = ytitle, "FR_MATDE", xlab = "FR_MATDE")

Version Author Date
2a10774 ajpelu 2022-01-30
visreg(modelo_riqueza, scale="response", partial=TRUE,
       ylab = ytitle, "TP_RSH_V", xlab = "TP_RSH_V")

Version Author Date
2a10774 ajpelu 2022-01-30
visreg(modelo_riqueza, scale="response", partial=TRUE,
       ylab = ytitle, "HIDRO_ITH", xlab = "HIDRO_ITH")

Version Author Date
2a10774 ajpelu 2022-01-30
visreg(modelo_riqueza, scale="response", partial=TRUE,
       ylab = ytitle, "TP_RSD_P", xlab = "TP_RSD_P")

Version Author Date
2a10774 ajpelu 2022-01-30
# visreg(modelo_riqueza, scale="response", partial=TRUE,
#        ylab = ytitle, "TP_ES_OE", xlab = "TP_ES_OE")

visreg(modelo_riqueza, scale="response", partial=TRUE,
       ylab = ytitle, "TP_EXPO", xlab = "TP_EXPO")

Version Author Date
2a10774 ajpelu 2022-01-30

Parámetros

ms <- modelo_riqueza
tc <- tab::tabglm(ms, columns = c("beta.se",  "test", "p"), decimals = 4) 
names(tc) <- c("Variable", "Estimate", "Zvalue", "pvalue") 
  
tablita_auxiliar <- data.frame(
    Variable = c("DegreeFreedom", "AIC", "BIC", "DevianceExplained"), 
    Estimate = as.character(c(df.residual(ms), round(AIC(ms),0), round(BIC(ms), 0), 
                              round( ((ms$null.deviance - ms$deviance) / ms$null.deviance),3))),
    Zvalue = "", pvalue = "")
  
tc <- bind_rows(tc, tablita_auxiliar)
write.csv(tc, here::here("data/mod_riq_coefficients.csv"))
tc %>% kbl() %>% kable_styling()
Variable Estimate Zvalue pvalue
Intercept -48.5462 (16.8706) -2.8776 0.005
elev -0.0073 (0.0013) -5.5456 <0.001
FR_MATDE 0.9570 (0.0450) 21.2517 <0.001
TP_RSH_V 8.0258 (1.2103) 6.6311 <0.001
HIDRO_ITH -4.3652 (0.5413) -8.0645 <0.001
TP_RSD_P 0.0021 (0.0008) 2.6954 0.008
TP_EXPO -0.0884 (0.0174) -5.0847 <0.001
DegreeFreedom 149
AIC 1037
BIC 1062
DevianceExplained 0.832
extract_eq(modelo_riqueza, wrap = TRUE, intercept = "beta", use_coefs = TRUE)

\[ \begin{aligned} \widehat{E( \operatorname{riq} )} &= -48.55 - 0.01(\operatorname{elev}) + 0.96(\operatorname{FR\_MATDE}) + 8.03(\operatorname{TP\_RSH\_V})\ - \\ &\quad 4.37(\operatorname{HIDRO\_ITH}) + 0(\operatorname{TP\_RSD\_P}) - 0.09(\operatorname{TP\_EXPO}) \end{aligned} \]

report::report(modelo_riqueza)
We fitted a linear model (estimated using ML) to predict riq with elev, FR_MATDE, TP_RSH_V, HIDRO_ITH, TP_RSD_P and TP_EXPO (formula: riq ~ 1 + elev + FR_MATDE + TP_RSH_V + HIDRO_ITH + TP_RSD_P + TP_EXPO). The model's explanatory power is substantial (R2 = 0.83). The model's intercept, corresponding to elev = 0, FR_MATDE = 0, TP_RSH_V = 0, HIDRO_ITH = 0, TP_RSD_P = 0 and TP_EXPO = 0, is at -48.55 (95% CI [-81.61, -15.48], t(149) = -2.88, p = 0.004). Within this model:

  - The effect of elev is statistically significant and negative (beta = -7.28e-03, 95% CI [-9.86e-03, -4.71e-03], t(149) = -5.55, p < .001; Std. beta = -0.32, 95% CI [-0.43, -0.21])
  - The effect of FR MATDE is statistically significant and positive (beta = 0.96, 95% CI [0.87, 1.05], t(149) = 21.25, p < .001; Std. beta = 0.79, 95% CI [0.71, 0.86])
  - The effect of TP RSH V is statistically significant and positive (beta = 8.03, 95% CI [5.65, 10.40], t(149) = 6.63, p < .001; Std. beta = 0.31, 95% CI [0.22, 0.40])
  - The effect of HIDRO ITH is statistically significant and negative (beta = -4.37, 95% CI [-5.43, -3.30], t(149) = -8.06, p < .001; Std. beta = -0.41, 95% CI [-0.51, -0.31])
  - The effect of TP RSD P is statistically significant and positive (beta = 2.14e-03, 95% CI [5.85e-04, 3.70e-03], t(149) = 2.70, p = 0.007; Std. beta = 0.13, 95% CI [0.04, 0.23])
  - The effect of TP EXPO is statistically significant and negative (beta = -0.09, 95% CI [-0.12, -0.05], t(149) = -5.08, p < .001; Std. beta = -0.32, 95% CI [-0.45, -0.20])

Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using 

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] report_0.5.1       equatiomatic_0.2.0 MASS_7.3-53        tab_4.1.1         
 [5] knitr_1.31         visreg_2.7.0       kableExtra_1.3.1   tweedie_2.3.3     
 [9] performance_0.8.0  MuMIn_1.43.17      DHARMa_0.3.3.0     glmulti_1.0.8     
[13] leaps_3.1          rJava_0.9-13       vegan_2.5-7        lattice_0.20-41   
[17] permute_0.9-5      patchwork_1.1.1    correlation_0.8.0  here_1.0.1        
[21] janitor_2.1.0      readxl_1.3.1       forcats_0.5.1      stringr_1.4.0     
[25] dplyr_1.0.6        purrr_0.3.4        readr_1.4.0        tidyr_1.1.3       
[29] tibble_3.1.2       ggplot2_3.3.5      tidyverse_1.3.1    workflowr_1.7.0   

loaded via a namespace (and not attached):
  [1] TH.data_1.0-10     minqa_1.2.4        colorspace_2.0-2  
  [4] ggridges_0.5.3     ellipsis_0.3.2     rprojroot_2.0.2   
  [7] estimability_1.3   snakecase_0.11.0   parameters_0.17.0 
 [10] fs_1.5.0           rstudioapi_0.13    farver_2.1.0      
 [13] ggrepel_0.9.1      mvtnorm_1.1-1      fansi_0.4.2       
 [16] lubridate_1.7.10   xml2_1.3.2         codetools_0.2-18  
 [19] splines_4.0.2      robustbase_0.93-7  jsonlite_1.7.2    
 [22] nloptr_1.2.2.2     broom_0.7.9        cluster_2.1.0     
 [25] dbplyr_2.1.1       effectsize_0.6.0.1 compiler_4.0.2    
 [28] httr_1.4.2         emmeans_1.5.4      backports_1.2.1   
 [31] assertthat_0.2.1   Matrix_1.3-2       fastmap_1.1.0     
 [34] survey_4.0         cli_2.5.0          later_1.1.0.1     
 [37] htmltools_0.5.2    tools_4.0.2        coda_0.19-4       
 [40] gtable_0.3.0       glue_1.4.2         Rcpp_1.0.7        
 [43] cellranger_1.1.0   jquerylib_0.1.3    vctrs_0.3.8       
 [46] nlme_3.1-152       iterators_1.0.13   insight_0.17.0    
 [49] xfun_0.30          ps_1.5.0           lme4_1.1-27.1     
 [52] rvest_1.0.0        lifecycle_1.0.1    DEoptimR_1.0-8    
 [55] zoo_1.8-8          getPass_0.2-2      scales_1.1.1.9000 
 [58] hms_1.0.0          promises_1.2.0.1   sandwich_3.0-0    
 [61] parallel_4.0.2     qqplotr_0.0.5      yaml_2.2.1        
 [64] gridExtra_2.3      see_0.6.4          sass_0.4.1        
 [67] stringi_1.7.4      highr_0.8          bayestestR_0.11.5 
 [70] foreach_1.5.1      boot_1.3-26        rlang_0.4.12      
 [73] pkgconfig_2.0.3    evaluate_0.14      labeling_0.4.2    
 [76] processx_3.5.1     tidyselect_1.1.1   plyr_1.8.6        
 [79] magrittr_2.0.1     R6_2.5.1           generics_0.1.0    
 [82] multcomp_1.4-16    DBI_1.1.1          pillar_1.6.1      
 [85] haven_2.3.1        whisker_0.4        withr_2.4.1       
 [88] mgcv_1.8-33        survival_3.2-7     datawizard_0.4.0  
 [91] modelr_0.1.8       crayon_1.4.1       utf8_1.1.4        
 [94] rmarkdown_2.14     grid_4.0.2         callr_3.7.0       
 [97] git2r_0.28.0       reprex_2.0.0       digest_0.6.27     
[100] webshot_0.5.2      xtable_1.8-4       httpuv_1.5.5      
[103] stats4_4.0.2       munsell_0.5.0      viridisLite_0.4.0 
[106] bslib_0.3.1        mitools_2.4