Static Predictors - Project and Folder description

0. Meta information:

Project title: Static Predictors
Author: Friederike Johanna Rosa Wölke, MSc
Date: 2025-05-28
Location: Prague, Czech Republic
License: CC BY-NC-ND 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/)
R Package versions: registered in file renv.lock
Computational demands:
- Estimated total run time: 50 h
  locally on laptop without parallelization;
  can be significantly enhanced by running predictor scripts in parallel or enable multiple cores to approx. 6 h
Manual to the folder: Folder_metadata.xlsx
- has a list of A) scripts, input and output files, figure locations, run times, etc. and B) Files and their sources

How to use this folder:

install renv and here packages if not already installed
open the Git.Rproj in RStudio or VScode and set here::here() as working directory to the root folder (“Git”)
use renv::restore() to restore packages & dependencies from the lockfile (this will lead in a huge downloading session of packages)

1. Project/File structure:

Code

fs::dir_tree(here::here(), recurse = F)

C:/Users/wolke/OneDrive - CZU v Praze/Frieda_PhD_files/02_StaticPatterns/Git
├── Code
├── Data
├── Demo_NewYork
├── Documentation
├── Figures
├── Folder_metadata.xlsx
├── Git.Rproj
├── images
├── Project_Description.html
├── Project_Description.qmd
├── Project_Description.rmarkdown
├── README.html
├── README.md
├── README.qmd
├── README_files
└── StaticPatterns_Results_all.xlsx

There are three sections in this project: The first part (A) produces the predictors and data needed for modelling. It starts by grabbing data from the database, cleaning it, filtering it, ad then producing the predictors. The second part (B) uses the predictors to train a randomForest model and evaluate it using xAI (explainable AI), interaction effects and variable importance. In the last part (C), the model predictions are checked against the latest replication of the empirical data.

Additionally, the project contains several sensitivity analyses and robustness checks, which are not part of the main analysis but were used to aid interpretation of the results and determine patterns of stochasticity in the data.

Script nomenclature:

Each R script is labelled by part (A,B,C) and script sequence (1-14).
The 00_Configuration.R script is needed for almost all other scripts. It ensures that packages are installed and has file paths and global variables and lookup tables needed for many steps.

2. Methods summary

A) Workflow diagram

Code

#install.packages("DiagrammeR")
library(DiagrammeR)

grViz("
digraph workflow {
  
  graph [rankdir = LR]  // left to right
  node [shape = box, style = filled, fillcolor = LightBlue, fontname = Helvetica]

  Step1 [label = 'A Get Data', fillcolor = '#f1a340']
  Step2 [label = 'A Clean Data', fillcolor = '#f1a340']
  Step3 [label = 'A Prepare Predictors', fillcolor = '#f1a340']
  Step4 [label = 'B Random Forest' , fillcolor = '#7fbf7b']
  Step5 [label = 'B Performance Evaluation', fillcolor = '#7fbf7b']
  Step6 [label = 'B xAI', fillcolor = '#7fbf7b']
  Step7 [label = 'C Validation against Atlas 3', fillcolor = '#998ec3']
  Step8 [label = 'B Phylogenetic Autocorrelation', fillcolor = '#7fbf7b']
  Step9 [label = 'C Jaccard simulations', fillcolor = '#998ec3']
  Step10 [label = 'D Make Change Maps', fillcolor = '#e66101']
  Step11 [label = 'Figure 1', fillcolor = '#e66101']
  Step12 [label = 'Figure 2', fillcolor = '#e66101']
  Step13 [label = 'Figure 3', fillcolor = '#e66101']
  Step14 [label = 'Figure 4', fillcolor = '#e66101']

  Step1 -> Step2 -> Step3 -> Step4 -> Step5;
  Step3 -> Step9;
  Step4 -> Step6; Step4 -> Step7; Step4 -> Step8;
  Step2 -> Step10;
  Step3 -> Step11; Step3 -> Step12;
  Step5 -> Step13; Step6 -> Step13;
  Step7 -> Step14;
  Step10 -> Step11;
}
")

B) Description of steps

Get data from MOBI database for first two replications (Cz, Ny, Jp, Eu)
Remove cells and species that were not sampled twice; filter species based on expert knowledge and introduced status
Prepare predictors for H1 and H2, use datasetID as H3 to determine effect of atlas in the ‘full model’
H1: Body mass, Habitat_5, Threatened_01, Generalism_01, Phylodistinct, Migration_123, Global range size
H2: Fractal dimension, Lacunarity, Spatial autocorrelation, circularity, AOO, minimum distance to the border from the centroid
H3: datasetID
Calculate responses:
1. Jaccard_dissimilarity,
2. log Ratio AOO,
3. log Ratio AOO per year
Make simulations of Jaccard_dissimilarity based on different combinations of parameters and evaluate the effect of these on the Jaccard values. Certain combinations of parameters restrict Jaccard_dissimilarity to a range of values. This can be used to determine the effect of mathematical constraints on the Jaccard_dissimilarity values.
Parameters:
1. initially occupied cells,
2. total number of cells possible,
3. number of changes
Train model with
1. ‘all data’ and
2. subsets for each datasetID (‘split data’) using random forest
Extract for all three responses:
1. rsq, rmse,
2. hyper-parameters,
3. predictions,
4. variable importance,
5. interactions
6. partial dependence plots
Test for phylogenetic autocorrelation for each datasetID in the model residuals (and the raw data)
Calculate responses from third atlas replication (Cz, Jp), use predictors calculated from second period to predict responses for the third period and get residuals

Modeling settings:

80/20 split (80 training, 20 testing)
3x repeated 10-fold cross validation
permutation importance (not impurity)
always split variables : datasetID
respect unordered factors = T
Bayesian hyperparameter tuning:
- mtry = 2-10
- min_n = 5-15
- trees = 1000-5000
- initial values = 5
- iterations = 50
- no improve = 10
- set a seed.

C) Data profiling

Responses:

Jaccard_dissimilarity
log_R2_1 (log ratio between sampling period 2 and 1)
log_R2_1_per_year (log ratio between sampling period 2 and 1 divided by the number of years between sampling)

Notes:

The higher J_dissim, the more variable log_R2_1 and log_R2_1_per_year
The smaller AOO, the more variable log_R2_1 and log_R2_1_per_year
The lower D, the more variable (and more positive?) log_R2_1
The higher mean_lnLac, the more variable (and more positive?) log_R2_1
The higher mean_lnLac, the lower Jaccard_dissim
Species in New York and Japan are more dissimilar than species in Czechia and Europe

Table of Variables

Va riable	Type	Hy po th es is	E x planation	Ref erence
Jac card_d issimi larity	resp onse	-	P roportion of sites that have changed status in occupancy between sampling period. S y m metrical, hence does not indicate d irection. Magnitude depends on the smallest number of sites occupied across sampling periods and the total number of sites possible to occupy. Few occupied sites in a large matrix will produce large dis s i milarity.
lo g_R2_1	resp onse	-	(natural) log change in AOO between sampling periods. Magnitude depends on number of sites occupied in period 1 (few sites will produce large log ratios)
log_R 2_1_pe r_year	resp onse	-	(natural) log change in AOO between sampling periods - adjusted for the number of years between the end of period 1 and the end of period 2.	Blowes et al. 2024
Body mass	p redi ctor	1	Bird body mass	Avonet ( Tobias 2021)
Hab itat_5	p redi ctor	1	Preferred bird habitat in 5 c ategories (open, closed, marine, f r eshwater, human modified)	Avonet ( Tobias 2021)
Thre atened	p redi ctor	1	Binary IUCN status ( T h reatened: 0, 1)	IUCN (retr ieved: 25 .March .2025)
Gene ralism	p redi ctor	1	M o d ification of “primary . l ifestyle” column from AVONET where category ’ G e neralist’ got assigned a 1 and all other c ategories got a 0	Avonet ( Tobias 2021)
Phylog enetic dis tincti veness	p redi ctor	1	How isolated the lineage is in the tree. Larger values of pd indicate p h y logenetic isolation and i n dependent e volution. Species with ‘old genes’ like this may be better or worse adapted to changing en v i ronments.	Bi rdTree (Jetz, 2012)
Global range size	p redi ctor	1	Total occupied area c alculated from the native breeding and resident ranges for extant species / l ocations.	Bi rdLife v9 (Oct, 2024)
Mig ration	p redi ctor	1	Sedentary (1), Partially migratory (2), migratory (3)	Avonet ( Tobias 2021)
mean ln lacu narity		2	log ’ G a ppieness’ of the species di s t ribution, averaged over 6 windows with i ncreasing size. L acunarity estimates include creating a moving window across the species d i s tribution for windows of different sizes. L acunarity is then the sum of all occupied patches within this window. It is averaged across all movement instances and then across window sizes to get a ‘scale i nvariant’ version of l acunarity across different window sizes.
f ractal dim ension		2	Computed using the area of each cell (as opposed to cell side length) using the formula: D = -2 * slope from OAR + 2. OAR is the r egression between AOO and scale.	Wilson 2004, Kunin 1998
AOO		2	Sum of areas of all occupied cells	IUCN rec ommend ations
s patial aut ocorre lation		2	Joincount s tatistics literally counts the average number of joins for each cell and compares this number to an expected value based on occupied number of sites. We use the d ifference between the expected and modelled value for JC ( h ereafter: joincount delta)	Joi ncount for binary data (0,1)
norm alized circu larity		2	aka ’ c o mpactness ratio’. Dim e n sionless. Formula: (pe r i meter^2) / (4 * pi * area). A perfect circle = 1. Values > 1 indicate deviation from the circle (as d eviations increase the perimeter in ratio to the area, thus i ncreasing the values of circNorm.	G abriel
m inimum di stance to the border from range ce ntroid		2	Measured the smallest distance from the centroid of the species d i s tribution to the border of the atlas region. Ranges that have a centroid in the middle of the atlas region have a higher s tochastic potential to fill the whole plane, while those situated closer to the borders are (a r t ifically) limited by these borders within the analysis. This could be a useful predictor for the magnitude of change that may happen, as ranges close to the border may be ‘cut out’ of the study area and thus show smaller change (e.g., because the change is happening in the periphery of the range which may be outside of the study area)	-
dat asetID		3	Indicator which dataset (Czechia, Japan, New York, Europe)	-

3. Data overview

Code

skimr::skim(dat_reduced %>% group_by(samplingPeriodID))

Data summary
Name	dat_reduced %>% group_by(…
Number of rows	2108
Number of columns	18
_______________________
Column type frequency:
factor	5
numeric	12
________________________
Group variables	samplingPeriodID

Variable type: factor

skim_variable	samplingPeriodID	n_missing	complete_rate	ordered	n_unique	top_counts
Migration	1	9	0.99	FALSE	3	3: 563, 2: 247, 1: 235, NA: 0
Migration	2	9	0.99	FALSE	3	3: 563, 2: 247, 1: 235, NA: 0
Habitat_5	1	9	0.99	FALSE	5	clo: 393, fre: 265, ope: 243, mar: 95
Habitat_5	2	9	0.99	FALSE	5	clo: 393, fre: 265, ope: 243, mar: 95
Generalism	1	0	1.00	FALSE	2	0: 918, 1: 136
Generalism	2	0	1.00	FALSE	2	0: 918, 1: 136
Threatened	1	2	1.00	FALSE	2	0: 902, 1: 150
Threatened	2	2	1.00	FALSE	2	0: 902, 1: 150
datasetID	1	0	1.00	FALSE	4	26: 412, 6: 233, 13: 208, 5: 201
datasetID	2	0	1.00	FALSE	4	26: 412, 6: 233, 13: 208, 5: 201

Variable type: numeric

skim_variable	samplingPeriodID	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
pd	1	9	0.99	8.20	6.31	1.11	4.36	6.19	9.71	56.96	▇▁▁▁▁
pd	2	9	0.99	8.20	6.31	1.11	4.36	6.19	9.71	56.96	▇▁▁▁▁
D_AOO_a	1	0	1.00	1.30	0.51	0.00	0.95	1.40	1.73	2.00	▂▂▅▆▇
D_AOO_a	2	0	1.00	1.34	0.49	0.00	1.00	1.43	1.75	2.00	▁▂▅▆▇
mean_lnLac	1	0	1.00	1.40	1.10	0.06	0.55	1.13	2.10	5.73	▇▅▂▁▁
mean_lnLac	2	0	1.00	1.31	1.06	0.06	0.45	1.03	1.97	5.73	▇▃▂▁▁
joincount_delta	1	4	1.00	0.94	0.72	-0.03	0.35	0.80	1.39	2.98	▇▆▃▂▁
joincount_delta	2	5	1.00	0.92	0.72	-0.07	0.35	0.74	1.41	3.17	▇▆▃▂▁
Jaccard_dissim	1	0	1.00	0.50	0.29	0.00	0.26	0.50	0.74	1.00	▇▇▇▇▇
Jaccard_dissim	2	0	1.00	0.50	0.29	0.00	0.26	0.50	0.74	1.00	▇▇▇▇▇
log_R2_1	1	0	1.00	0.17	0.67	-3.17	-0.04	0.05	0.26	7.96	▁▇▁▁▁
log_R2_1	2	0	1.00	0.17	0.67	-3.17	-0.04	0.05	0.26	7.96	▁▇▁▁▁
log_R2_1_per_year	1	0	1.00	0.01	0.02	-0.07	0.00	0.00	0.01	0.28	▃▇▁▁▁
log_R2_1_per_year	2	0	1.00	0.01	0.02	-0.07	0.00	0.00	0.01	0.28	▃▇▁▁▁
log_mass	1	9	0.99	4.60	1.78	1.13	2.99	4.35	6.11	9.28	▅▇▆▅▁
log_mass	2	9	0.99	4.60	1.78	1.13	2.99	4.35	6.11	9.28	▅▇▆▅▁
log_aoo	1	0	1.00	11.05	2.82	-0.37	9.52	11.19	13.06	15.54	▁▁▃▇▆
log_aoo	2	0	1.00	11.21	2.72	1.60	9.69	11.23	13.26	15.57	▁▂▃▇▅
log_circ	1	0	1.00	3.65	1.26	0.24	2.86	3.79	4.50	6.60	▂▃▇▇▂
log_circ	2	0	1.00	3.69	1.22	0.24	2.97	3.82	4.47	6.65	▁▃▇▇▂
log_rangesize	1	3	1.00	15.85	1.47	6.38	15.40	16.13	16.75	18.59	▁▁▁▇▇
log_rangesize	2	3	1.00	15.85	1.47	6.38	15.40	16.13	16.75	18.59	▁▁▁▇▇
log_dist	1	0	1.00	10.78	1.00	6.29	10.22	10.89	11.35	12.99	▁▁▃▇▂
log_dist	2	0	1.00	10.76	1.00	5.77	10.17	10.84	11.35	13.03	▁▁▃▇▂

Code

plots

$Migration


$Habitat_5


$Generalism


$Threatened

$pd


$D_AOO_a


$mean_lnLac


$joincount_delta


$datasetID


$Jaccard_dissim


$log_R2_1


$log_R2_1_per_year


$samplingPeriodID


$log_mass


$log_aoo


$log_circ


$log_rangesize


$log_dist

4. List of Figures

Figure 1 - Panels:

Change maps for one example species from each datasetID

histograms of Jaccard_dissimilarity, log Ratio AOO, log Ratio AOO per year

Figure 2

Relationship between Jaccard_dissimilarity and log Ratio AOO (incl.. marginal plots)

Figure 3 - Panels:

Model performance plots: rmse/rsq

variable importance

interactions

partial dependence plots

Figure 4

Prediction validation against third atlas replication (Cz, Jp) for panels:

Jaccard_dissimilarity

log Ratio AOO

Table 1: Model results

r esp o nse	d a tas e tID	t est r mse	t est rsq	oob rsq	oob mse	m try	m i n_n	t r ees
Ja c car d _di s sim	all	0 .11	0 .85	0 .86	0 .01	6	5	4 998
Ja c car d _di s sim	5	0 .10	0 .90	0 .89	0 .01	5	5	1 420
Ja c car d _di s sim	6	0 .11	0 .85	0 .85	0 .01	9	5	1 804
Ja c car d _di s sim	13	0 .15	0 .66	0 .70	0 .02	6	5	3 005
Ja c car d _di s sim	26	0 .12	0 .72	0 .72	0 .01	6	5	4 976
l o g_ R 2_1	all	0 .62	0 .09	0 .35	0 .31	10	5	1 074
l o g_ R 2_1	5	0 .30	0 .07	0 .10	0 .19	2	15	4 967
l o g_ R 2_1	6	0 .72	0 .02	0 .38	0 .38	5	13	2 968
l o g_ R 2_1	13	0 .72	0 .28	0 .51	0 .51	10	5	1 118
l o g_ R 2_1	26	0 .45	0 .22	0 .06	0 .20	2	6	1 008
l o g_R 2 _ 1_ p e r_ y ear	all	0 .02	0 .14	0 .34	0 .00	7	5	2 148
l o g_R 2 _ 1_ p e r_ y ear	5	0 .02	0 .07	0 .09	0 .00	2	15	4 688
l o g_R 2 _ 1_ p e r_ y ear	6	0 .03	0 .02	0 .15	0 .00	4	15	1 173
l o g_R 2 _ 1_ p e r_ y ear	13	0 .02	0 .29	0 .58	0 .00	10	5	1 945
l o g_R 2 _ 1_ p e r_ y ear	26	0 .01	0 .22	0 .06	0 .00	2	6	4 921

--- title: "Static Predictors - Project and Folder description" format: html: self-contained: true embed-resources: true toc: true # optional: adds a table of contents theme: cosmo # optional: Bootstrap theme code-fold: true # optional: collapsible code blocks code-tools: true # optional: adds copy/paste buttons toc-depth: 2 editor: markdown: wrap: 72 --- # 0. Meta information: - **Project title**: Static Predictors - **Author**: Friederike Johanna Rosa Wölke, MSc - **Date:** 2025-05-28 - **Location**: Prague, Czech Republic - **License:** CC BY-NC-ND 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) - **R Package versions**: registered in file `renv.lock` - **Computational demands:** - *Estimated total run time*: 50 h\ locally on laptop without parallelization;\ can be significantly enhanced by running predictor scripts in parallel or enable multiple cores to approx. 6 h - **Manual to the folder:** `Folder_metadata.xlsx` - has a list of A) scripts, input and output files, figure locations, run times, etc. and B) Files and their sources ------------------------------------------------------------------------ ### How to use this folder: - install `renv` and `here` packages if not already installed - open the Git.Rproj in RStudio or VScode and set `here::here()` as working directory to the root folder ("Git") - use `renv::restore()` to restore packages & dependencies from the lockfile (this will lead in a huge downloading session of packages) ------------------------------------------------------------------------ # 1. Project/File structure: ```{r} fs::dir_tree(here::here(), recurse = F) ``` There are three sections in this project: The first part (A) produces the predictors and data needed for modelling. It starts by grabbing data from the database, cleaning it, filtering it, ad then producing the predictors. The second part (B) uses the predictors to train a randomForest model and evaluate it using xAI (explainable AI), interaction effects and variable importance. In the last part (C), the model predictions are checked against the latest replication of the empirical data. Additionally, the project contains several sensitivity analyses and robustness checks, which are not part of the main analysis but were used to aid interpretation of the results and determine patterns of stochasticity in the data. ### Script nomenclature: Each R script is labelled by part (A,B,C) and script sequence (1-14).\ The 00_Configuration.R script is needed for almost all other scripts. It ensures that packages are installed and has file paths and global variables and lookup tables needed for many steps. # 2. Methods summary ## A) Workflow diagram ```{r} #install.packages("DiagrammeR") library(DiagrammeR) grViz(" digraph workflow { graph [rankdir = LR] // left to right node [shape = box, style = filled, fillcolor = LightBlue, fontname = Helvetica] Step1 [label = 'A Get Data', fillcolor = '#f1a340'] Step2 [label = 'A Clean Data', fillcolor = '#f1a340'] Step3 [label = 'A Prepare Predictors', fillcolor = '#f1a340'] Step4 [label = 'B Random Forest' , fillcolor = '#7fbf7b'] Step5 [label = 'B Performance Evaluation', fillcolor = '#7fbf7b'] Step6 [label = 'B xAI', fillcolor = '#7fbf7b'] Step7 [label = 'C Validation against Atlas 3', fillcolor = '#998ec3'] Step8 [label = 'B Phylogenetic Autocorrelation', fillcolor = '#7fbf7b'] Step9 [label = 'C Jaccard simulations', fillcolor = '#998ec3'] Step10 [label = 'D Make Change Maps', fillcolor = '#e66101'] Step11 [label = 'Figure 1', fillcolor = '#e66101'] Step12 [label = 'Figure 2', fillcolor = '#e66101'] Step13 [label = 'Figure 3', fillcolor = '#e66101'] Step14 [label = 'Figure 4', fillcolor = '#e66101'] Step1 -> Step2 -> Step3 -> Step4 -> Step5; Step3 -> Step9; Step4 -> Step6; Step4 -> Step7; Step4 -> Step8; Step2 -> Step10; Step3 -> Step11; Step3 -> Step12; Step5 -> Step13; Step6 -> Step13; Step7 -> Step14; Step10 -> Step11; } ") ``` ## B) Description of steps 1. Get data from MOBI database for first two replications (Cz, Ny, Jp, Eu) 2. Remove cells and species that were not sampled twice; filter species based on expert knowledge and introduced status 3. Prepare predictors for H1 and H2, use datasetID as H3 to determine effect of atlas in the 'full model'\ [H1:]{.underline} Body mass, Habitat_5, Threatened_01, Generalism_01, Phylodistinct, Migration_123, Global range size\ [H2:]{.underline} Fractal dimension, Lacunarity, Spatial autocorrelation, circularity, AOO, minimum distance to the border from the centroid\ [H3:]{.underline} datasetID 4. Calculate responses:\ a) Jaccard_dissimilarity,\ b) log Ratio AOO,\ c) log Ratio AOO per year 5. Make simulations of Jaccard_dissimilarity based on different combinations of parameters and evaluate the effect of these on the Jaccard values. Certain combinations of parameters restrict Jaccard_dissimilarity to a range of values. This can be used to determine the effect of mathematical constraints on the Jaccard_dissimilarity values.\ [Parameters:]{.underline}\ a) initially occupied cells,\ b) total number of cells possible,\ c) number of changes 6. Train model with\ a) 'all data' and\ b) subsets for each datasetID ('split data') using random forest 7. Extract for all three responses:\ a) rsq, rmse,\ b) hyper-parameters,\ c) predictions,\ d) variable importance,\ e) interactions\ f) partial dependence plots 8. Test for phylogenetic autocorrelation for each datasetID in the model residuals (and the raw data) 9. Calculate responses from third atlas replication (Cz, Jp), use predictors calculated from second period to predict responses for the third period and get residuals #### Modeling settings: - 80/20 split (80 training, 20 testing) - 3x repeated 10-fold cross validation - permutation importance (not impurity) - always split variables : datasetID - respect unordered factors = T - Bayesian hyperparameter tuning: - mtry = 2-10 - min_n = 5-15 - trees = 1000-5000 - initial values = 5 - iterations = 50 - no improve = 10 - set a seed. ## C) Data profiling ### Responses: - Jaccard_dissimilarity - log_R2_1 (log ratio between sampling period 2 and 1) - log_R2_1_per_year (log ratio between sampling period 2 and 1 divided by the number of years between sampling) #### Notes: - The higher J_dissim, the more variable log_R2_1 and log_R2_1_per_year - The smaller AOO, the more variable log_R2_1 and log_R2_1_per_year - The lower D, the more variable (and more positive?) log_R2_1 - The higher mean_lnLac, the more variable (and more positive?) log_R2_1 - The higher mean_lnLac, the lower Jaccard_dissim - Species in New York and Japan are more dissimilar than species in Czechia and Europe ## Table of Variables +-----------+-----------+-----------+-----------+-----------+ | Va riable | Type | Hy po th | E x | Ref | | | | es is | planation | erence | +===========+===========+===========+===========+===========+ | Jac | resp onse | \- | P | | | card_d | | | roportion | | | issimi | | | of sites | | | larity | | | that have | | | | | | changed | | | | | | status in | | | | | | occupancy | | | | | | between | | | | | | sampling | | | | | | period. S | | | | | | y m | | | | | | metrical, | | | | | | hence | | | | | | does not | | | | | | indicate | | | | | | d | | | | | | irection. | | | | | | Magnitude | | | | | | depends | | | | | | on the | | | | | | smallest | | | | | | number of | | | | | | sites | | | | | | occupied | | | | | | across | | | | | | sampling | | | | | | periods | | | | | | and the | | | | | | total | | | | | | number of | | | | | | sites | | | | | | possible | | | | | | to | | | | | | occupy. | | | | | | Few | | | | | | occupied | | | | | | sites in | | | | | | a large | | | | | | matrix | | | | | | will | | | | | | produce | | | | | | large dis | | | | | | s i | | | | | | milarity. | | +-----------+-----------+-----------+-----------+-----------+ | lo g_R2_1 | resp onse | \- | (natural) | | | | | | log | | | | | | change in | | | | | | AOO | | | | | | between | | | | | | sampling | | | | | | periods. | | | | | | Magnitude | | | | | | depends | | | | | | on number | | | | | | of sites | | | | | | occupied | | | | | | in period | | | | | | 1 (few | | | | | | sites | | | | | | will | | | | | | produce | | | | | | large log | | | | | | ratios) | | +-----------+-----------+-----------+-----------+-----------+ | log_R | resp onse | \- | (natural) | Blowes et | | 2_1_pe | | | log | al. 2024 | | r_year | | | change in | | | | | | AOO | | | | | | between | | | | | | sampling | | | | | | periods - | | | | | | adjusted | | | | | | for the | | | | | | number of | | | | | | years | | | | | | between | | | | | | the end | | | | | | of period | | | | | | 1 and the | | | | | | end of | | | | | | period 2. | | +-----------+-----------+-----------+-----------+-----------+ | Body mass | p redi | 1 | Bird body | Avonet ( | | | ctor | | mass | Tobias | | | | | | 2021) | +-----------+-----------+-----------+-----------+-----------+ | Hab | p redi | 1 | Preferred | Avonet ( | | itat_5 | ctor | | bird | Tobias | | | | | habitat | 2021) | | | | | in 5 c | | | | | | ategories | | | | | | (open, | | | | | | closed, | | | | | | marine, f | | | | | | r | | | | | | eshwater, | | | | | | human | | | | | | modified) | | +-----------+-----------+-----------+-----------+-----------+ | Thre | p redi | 1 | Binary | IUCN | | atened | ctor | | IUCN | (retr | | | | | status ( | ieved: 25 | | | | | T h | .March | | | | | reatened: | .2025) | | | | | 0, 1) | | +-----------+-----------+-----------+-----------+-----------+ | Gene | p redi | 1 | M o d | Avonet ( | | ralism | ctor | | ification | Tobias | | | | | of | 2021) | | | | | "primary | | | | | | . l | | | | | | ifestyle" | | | | | | column | | | | | | from | | | | | | AVONET | | | | | | where | | | | | | category | | | | | | ' G e | | | | | | neralist' | | | | | | got | | | | | | assigned | | | | | | a 1 and | | | | | | all other | | | | | | c | | | | | | ategories | | | | | | got a 0 | | +-----------+-----------+-----------+-----------+-----------+ | Phylog | p redi | 1 | How | Bi rdTree | | enetic | ctor | | isolated | (Jetz, | | dis | | | the | 2012) | | tincti | | | lineage | | | veness | | | is in the | | | | | | tree. | | | | | | Larger | | | | | | values of | | | | | | pd | | | | | | indicate | | | | | | p h y | | | | | | logenetic | | | | | | isolation | | | | | | and i n | | | | | | dependent | | | | | | e | | | | | | volution. | | | | | | Species | | | | | | with 'old | | | | | | genes' | | | | | | like this | | | | | | may be | | | | | | better or | | | | | | worse | | | | | | adapted | | | | | | to | | | | | | changing | | | | | | en v i | | | | | | ronments. | | +-----------+-----------+-----------+-----------+-----------+ | Global | p redi | 1 | Total | Bi rdLife | | range | ctor | | occupied | v9 (Oct, | | size | | | area c | 2024) | | | | | alculated | | | | | | from the | | | | | | native | | | | | | breeding | | | | | | and | | | | | | resident | | | | | | ranges | | | | | | for | | | | | | extant | | | | | | species / | | | | | | l | | | | | | ocations. | | +-----------+-----------+-----------+-----------+-----------+ | Mig | p redi | 1 | Sedentary | Avonet ( | | ration | ctor | | (1), | Tobias | | | | | Partially | 2021) | | | | | migratory | | | | | | (2), | | | | | | migratory | | | | | | (3) | | +-----------+-----------+-----------+-----------+-----------+ | mean ln | | 2 | log ' G a | | | lacu | | | ppieness' | | | narity | | | of the | | | | | | species | | | | | | di s t | | | | | | ribution, | | | | | | averaged | | | | | | over 6 | | | | | | windows | | | | | | with i | | | | | | ncreasing | | | | | | size. L | | | | | | acunarity | | | | | | estimates | | | | | | include | | | | | | creating | | | | | | a moving | | | | | | window | | | | | | across | | | | | | the | | | | | | species d | | | | | | i s | | | | | | tribution | | | | | | for | | | | | | windows | | | | | | of | | | | | | different | | | | | | sizes. L | | | | | | acunarity | | | | | | is then | | | | | | the sum | | | | | | of all | | | | | | occupied | | | | | | patches | | | | | | within | | | | | | this | | | | | | window. | | | | | | It is | | | | | | averaged | | | | | | across | | | | | | all | | | | | | movement | | | | | | instances | | | | | | and then | | | | | | across | | | | | | window | | | | | | sizes to | | | | | | get a | | | | | | 'scale i | | | | | | nvariant' | | | | | | version | | | | | | of l | | | | | | acunarity | | | | | | across | | | | | | different | | | | | | window | | | | | | sizes. | | +-----------+-----------+-----------+-----------+-----------+ | f ractal | | 2 | Computed | Wilson | | dim | | | using the | 2004,\ | | ension | | | area of | Kunin | | | | | each cell | 1998 | | | | | (as | | | | | | opposed | | | | | | to cell | | | | | | side | | | | | | length) | | | | | | using the | | | | | | formula:\ | | | | | | D = -2 \* | | | | | | slope | | | | | | from | | | | | | OAR + 2.\ | | | | | | OAR is | | | | | | the r | | | | | | egression | | | | | | between | | | | | | AOO and | | | | | | scale. | | +-----------+-----------+-----------+-----------+-----------+ | AOO | | 2 | Sum of | IUCN rec | | | | | areas of | ommend | | | | | all | ations | | | | | occupied | | | | | | cells | | +-----------+-----------+-----------+-----------+-----------+ | s patial | | 2 | Joincount | Joi | | aut | | | s | ncount | | ocorre | | | tatistics | for | | lation | | | literally | binary | | | | | counts | data | | | | | the | (0,1) | | | | | average | | | | | | number of | | | | | | joins for | | | | | | each cell | | | | | | and | | | | | | compares | | | | | | this | | | | | | number to | | | | | | an | | | | | | expected | | | | | | value | | | | | | based on | | | | | | occupied | | | | | | number of | | | | | | sites. We | | | | | | use the d | | | | | | ifference | | | | | | between | | | | | | the | | | | | | expected | | | | | | and | | | | | | modelled | | | | | | value for | | | | | | JC ( h | | | | | | ereafter: | | | | | | joincount | | | | | | delta) | | +-----------+-----------+-----------+-----------+-----------+ | norm | | 2 | aka ' c o | G abriel | | alized | | | mpactness | | | circu | | | ratio'. | | | larity | | | Dim e n | | | | | | sionless. | | | | | | | | | | | | Formula: | | | | | | (pe r i | | | | | | meter\^2) | | | | | | / (4 \* | | | | | | pi \* | | | | | | area).\ | | | | | | A perfect | | | | | | circle | | | | | | = 1. | | | | | | Values \> | | | | | | 1 | | | | | | indicate | | | | | | deviation | | | | | | from the | | | | | | circle | | | | | | (as d | | | | | | eviations | | | | | | increase | | | | | | the | | | | | | perimeter | | | | | | in ratio | | | | | | to the | | | | | | area, | | | | | | thus i | | | | | | ncreasing | | | | | | the | | | | | | values of | | | | | | circNorm. | | +-----------+-----------+-----------+-----------+-----------+ | m inimum | | 2 | Measured | \- | | di stance | | | the | | | to the | | | smallest | | | border | | | distance | | | from | | | from the | | | range ce | | | centroid | | | ntroid | | | of the | | | | | | species d | | | | | | i s | | | | | | tribution | | | | | | to the | | | | | | border of | | | | | | the atlas | | | | | | region. | | | | | | Ranges | | | | | | that have | | | | | | a | | | | | | centroid | | | | | | in the | | | | | | middle of | | | | | | the atlas | | | | | | region | | | | | | have a | | | | | | higher s | | | | | | tochastic | | | | | | potential | | | | | | to fill | | | | | | the whole | | | | | | plane, | | | | | | while | | | | | | those | | | | | | situated | | | | | | closer to | | | | | | the | | | | | | borders | | | | | | are (a r | | | | | | t | | | | | | ifically) | | | | | | limited | | | | | | by these | | | | | | borders | | | | | | within | | | | | | the | | | | | | analysis. | | | | | | This | | | | | | could be | | | | | | a useful | | | | | | predictor | | | | | | for the | | | | | | magnitude | | | | | | of change | | | | | | that may | | | | | | happen, | | | | | | as ranges | | | | | | close to | | | | | | the | | | | | | border | | | | | | may be | | | | | | 'cut out' | | | | | | of the | | | | | | study | | | | | | area and | | | | | | thus show | | | | | | smaller | | | | | | change | | | | | | (e.g., | | | | | | because | | | | | | the | | | | | | change is | | | | | | happening | | | | | | in the | | | | | | periphery | | | | | | of the | | | | | | range | | | | | | which may | | | | | | be | | | | | | outside | | | | | | of the | | | | | | study | | | | | | area) | | +-----------+-----------+-----------+-----------+-----------+ | dat | | 3 | Indicator | \- | | asetID | | | which | | | | | | dataset | | | | | | (Czechia, | | | | | | Japan, | | | | | | New York, | | | | | | Europe) | | +-----------+-----------+-----------+-----------+-----------+ # 3. Data overview ```{r} #| echo: false #| include: true #| warning: false #| error: false #| message: false suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(ggplot2)) dat <- readRDS(here::here("Data/output/1_all_predictors_merged.rds")) H1 <- c("Mass", "GlobRangeSize_km2", "Migration", "Habitat_5", "Generalism", "Threatened", "pd") H2 <- c("D_AOO_a", "mean_lnLac", "AOO", "joincount_delta", "circNorm", "minDist_toBorder_centr") H3 <- c("datasetID") predictors <- c(H1, H2, H3) responses <- c("Jaccard_dissim", "log_R2_1", "log_R2_1_per_year") dat_reduced <- dat %>% select(all_of(c(predictors, responses, "samplingPeriodID"))) %>% mutate(log_mass = log(Mass), log_aoo = log(AOO), log_circ = log(circNorm), log_rangesize = log(GlobRangeSize_km2), log_dist = log(minDist_toBorder_centr)) %>% select(-Mass, -AOO, -circNorm, -GlobRangeSize_km2, -minDist_toBorder_centr) plots <- list() # Store plots for (col_i in seq_along(names(dat_reduced))) { col_name <- names(dat_reduced)[col_i] is_cat <- !is.numeric(dat_reduced[[col_name]]) # Detect categorical # Categorical variable if (is_cat) { p <- ggplot(dat_reduced) + geom_bar(aes(x = .data[[col_name]], fill = samplingPeriodID), position = "dodge", alpha = 0.7) + xlab(col_name) + ggthemes::theme_few() + scale_fill_manual(values = c("#998ec3", "#f1a340")) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Numeric variable (two-group vs plain histogram) } else if (col_name %in% c(H2, "log_circ", "log_aoo", "log_dist")) { p <- ggplot(dat_reduced) + geom_histogram( aes(x = .data[[col_name]], fill = samplingPeriodID), alpha = 0.5, position = "identity"#,binwidth = 0.2 ) + xlab(col_name) + ggthemes::theme_few() + scale_fill_manual(values = c("#998ec3", "#f1a340")) } else { p <- ggplot(dat_reduced) + geom_histogram( aes(x = .data[[col_name]]), fill = "#7fbf7b", alpha = 0.5#, binwidth = 0.3 ) + xlab(col_name) + ggthemes::theme_few() } plots[[col_name]] <- p } ``` ::: panel-tabset ## Data summary ```{r} #| warning: false #| error: false #| message: false skimr::skim(dat_reduced %>% group_by(samplingPeriodID)) ``` ## Univariate distributions ```{r, fig.width=6, fig.height=3} #| warning: false #| error: false #| message: false plots ``` ::: # 4. List of Figures ## Figure 1 - Panels: a) Change maps for one example species from each datasetID ![](Figures/A_data/Fig1_Maps.svg){width="516"} b) histograms of Jaccard_dissimilarity, log Ratio AOO, log Ratio AOO per year ![](Figures/A_data/D_02_J_Histogram.svg){width="257"}![](Figures/A_data/D_02_LogRatio_Histogram.svg){width="256"} ## Figure 2 Relationship between Jaccard_dissimilarity and log Ratio AOO (incl.. marginal plots) ![](Figures/A_data/D_03_Figure_2.bmp){width="519"} ## Figure 3 - Panels: ### Model performance plots: rmse/rsq ![](Figures/B_models/performance/Model_performance_all_data.svg){width="270"}![](Figures/B_models/performance/Model_performance_split_data.svg){width="270"} ### variable importance ![](Figures/B_models/performance/Variable_importance_all_data.svg) ![](Figures/B_models/performance/Figure_3_vip_J_split.svg) ![](Figures/B_models/performance/Figure_3_vip_lnR_split.svg) ![](Figures/B_models/performance/Figure_3_vip_lnRy_split.svg) ### interactions ![](Figures/B_models/performance/interactions_full_data.svg) ![](Figures/B_models/performance/interactions_split_data.svg) ### partial dependence plots ![](Figures/B_models/performance/B_04_Figure3c_Partial_Jaccard.svg) ![](Figures/B_models/performance/B_04_Figure3c_Partial_LogRatio.svg) ## Figure 4 Prediction validation against third atlas replication (Cz, Jp) for panels: ### Jaccard_dissimilarity ![](Figures/C_validation/C_01_Validation_Jaccard.svg){width="320"} ### log Ratio AOO ![](Figures/C_validation/C_01_Validation_logRatio.svg){width="320"} ## Table 1: Model results +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | r | d | t | t | oob | oob | m | m | t | | esp | a | est | est | rsq | mse | try | i | r | | o | tas | r | rsq | | | | n_n | ees | | nse | e | mse | | | | | | | | | tID | | | | | | | | +=====+=====+=====+=====+=====+=====+=====+=====+=====+ | Ja | all | 0 | 0 | 0 | 0 | 6 | 5 | 4 | | c | | .11 | .85 | .86 | .01 | | | 998 | | car | | | | | | | | | | d | | | | | | | | | | _di | | | | | | | | | | s | | | | | | | | | | sim | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | Ja | 5 | 0 | 0 | 0 | 0 | 5 | 5 | 1 | | c | | .10 | .90 | .89 | .01 | | | 420 | | car | | | | | | | | | | d | | | | | | | | | | _di | | | | | | | | | | s | | | | | | | | | | sim | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | Ja | 6 | 0 | 0 | 0 | 0 | 9 | 5 | 1 | | c | | .11 | .85 | .85 | .01 | | | 804 | | car | | | | | | | | | | d | | | | | | | | | | _di | | | | | | | | | | s | | | | | | | | | | sim | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | Ja | 13 | 0 | 0 | 0 | 0 | 6 | 5 | 3 | | c | | .15 | .66 | .70 | .02 | | | 005 | | car | | | | | | | | | | d | | | | | | | | | | _di | | | | | | | | | | s | | | | | | | | | | sim | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | Ja | 26 | 0 | 0 | 0 | 0 | 6 | 5 | 4 | | c | | .12 | .72 | .72 | .01 | | | 976 | | car | | | | | | | | | | d | | | | | | | | | | _di | | | | | | | | | | s | | | | | | | | | | sim | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | all | 0 | 0 | 0 | 0 | 10 | 5 | 1 | | o | | .62 | .09 | .35 | .31 | | | 074 | | g\_ | | | | | | | | | | R | | | | | | | | | | 2_1 | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 5 | 0 | 0 | 0 | 0 | 2 | 15 | 4 | | o | | .30 | .07 | .10 | .19 | | | 967 | | g\_ | | | | | | | | | | R | | | | | | | | | | 2_1 | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 6 | 0 | 0 | 0 | 0 | 5 | 13 | 2 | | o | | .72 | .02 | .38 | .38 | | | 968 | | g\_ | | | | | | | | | | R | | | | | | | | | | 2_1 | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 13 | 0 | 0 | 0 | 0 | 10 | 5 | 1 | | o | | .72 | .28 | .51 | .51 | | | 118 | | g\_ | | | | | | | | | | R | | | | | | | | | | 2_1 | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 26 | 0 | 0 | 0 | 0 | 2 | 6 | 1 | | o | | .45 | .22 | .06 | .20 | | | 008 | | g\_ | | | | | | | | | | R | | | | | | | | | | 2_1 | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | all | 0 | 0 | 0 | 0 | 7 | 5 | 2 | | o | | .02 | .14 | .34 | .00 | | | 148 | | g_R | | | | | | | | | | 2 | | | | | | | | | | \_ | | | | | | | | | | 1\_ | | | | | | | | | | p | | | | | | | | | | e | | | | | | | | | | r\_ | | | | | | | | | | y | | | | | | | | | | ear | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 5 | 0 | 0 | 0 | 0 | 2 | 15 | 4 | | o | | .02 | .07 | .09 | .00 | | | 688 | | g_R | | | | | | | | | | 2 | | | | | | | | | | \_ | | | | | | | | | | 1\_ | | | | | | | | | | p | | | | | | | | | | e | | | | | | | | | | r\_ | | | | | | | | | | y | | | | | | | | | | ear | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 6 | 0 | 0 | 0 | 0 | 4 | 15 | 1 | | o | | .03 | .02 | .15 | .00 | | | 173 | | g_R | | | | | | | | | | 2 | | | | | | | | | | \_ | | | | | | | | | | 1\_ | | | | | | | | | | p | | | | | | | | | | e | | | | | | | | | | r\_ | | | | | | | | | | y | | | | | | | | | | ear | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 13 | 0 | 0 | 0 | 0 | 10 | 5 | 1 | | o | | .02 | .29 | .58 | .00 | | | 945 | | g_R | | | | | | | | | | 2 | | | | | | | | | | \_ | | | | | | | | | | 1\_ | | | | | | | | | | p | | | | | | | | | | e | | | | | | | | | | r\_ | | | | | | | | | | y | | | | | | | | | | ear | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ | l | 26 | 0 | 0 | 0 | 0 | 2 | 6 | 4 | | o | | .01 | .22 | .06 | .00 | | | 921 | | g_R | | | | | | | | | | 2 | | | | | | | | | | \_ | | | | | | | | | | 1\_ | | | | | | | | | | p | | | | | | | | | | e | | | | | | | | | | r\_ | | | | | | | | | | y | | | | | | | | | | ear | | | | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+