9 Palaeobiology demo: disparity-through-time and within groups

This demo aims to give quick overview of the dispRity package (v.1.3) for palaeobiology analyses of disparity, including disparity through time analyses.

This demo showcases a typical disparity-through-time analysis: we are going to test whether the disparity changed through time in a subset of eutherian mammals from the last 100 million years using a dataset from Beck and Lee (2014).

9.1 Before starting

9.1.1 The morphospace

In this example, we are going to use a subset of the data from Beck and Lee (2014). See the example data description for more details. Briefly, this dataset contains an ordinated matrix of 50 discrete characters from mammals (BeckLee_mat50), another matrix of the same 50 mammals and the estimated discrete data characters of their descendants (thus 50 + 49 rows, BeckLee_mat99), a dataframe containing the ages of each taxon in the dataset (BeckLee_ages) and finally a phylogenetic tree with the relationships among the 50 mammals (BeckLee_tree). The ordinated matrix will represent our full morphospace, i.e. all the mammalian morphologies that ever existed through time (for this dataset).

##                    [,1]          [,2]        [,3]       [,4]      [,5]
## Cimolestes   -0.5319679  0.1117759259  0.09865194 -0.1933148 0.2035833
## Maelestes    -0.4087147  0.0139690317  0.26268300  0.2297096 0.1310953
## Batodon      -0.6923194  0.3308625215 -0.10175223 -0.1899656 0.1003108
## Bulaklestes  -0.6802291 -0.0134872777  0.11018009 -0.4103588 0.4326298
## Daulestes    -0.7386111  0.0009001369  0.12006449 -0.4978191 0.4741342
## Uchkudukodon -0.5105254 -0.2420633915  0.44170317 -0.1172972 0.3602273
## [1] 50 48
##             FAD  LAD
## Adapis     37.2 36.8
## Asioryctes 83.6 72.1
## Leptictis  33.9 33.3
## Miacis     49.0 46.7
## Mimotona   61.6 59.2
## Notharctus 50.2 47.0

You can have an even nicer looking tree if you use the strap package!

9.2 A disparity-through-time analysis

9.2.1 Splitting the morphospace through time

One of the crucial steps in disparity-through-time analysis is to split the full morphospace into smaller time subsets that contain the total number of morphologies at certain points in time (time-slicing) or during certain periods in time (time-binning). Basically, the full morphospace represents the total number of morphologies across all time and will be greater than any of the time subsets of the morphospace.

The dispRity package provides a chrono.subsets function that allows users to split the morphospace into time slices (using method = continuous) or into time bins (using method = discrete). In this example, we are going to split the morphospace into five equal time bins of 20 million years long from 100 million years ago to the present. We will also provide to the function a table containing the first and last occurrences dates for some fossils to take into account that some fossils might occur in several of our different time bins.

## [1] 100  80  60  40  20   0
##  ---- dispRity object ---- 
## 5 discrete time subsets for 50 elements:
##     100 - 80, 80 - 60, 60 - 40, 40 - 20, 20 - 0.

The output object is a dispRity object (see more about that here. In brief, however, dispRity objects are lists of different elements (i.e. disparity results, morphospace time subsets, morphospace attributes, etc.) that display only a summary of the object when calling the object to avoiding filling the R console with superfluous output.

## [1] "dispRity"
## List of 3
##  $ matrix : num [1:50, 1:48] -0.532 -0.409 -0.692 -0.68 -0.739 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:50] "Cimolestes" "Maelestes" "Batodon" "Bulaklestes" ...
##   .. ..$ : NULL
##  $ call   :List of 1
##   ..$ subsets: chr "discrete"
##  $ subsets:List of 5
##   ..$ 100 - 80:List of 1
##   .. ..$ elements: int [1:8, 1] 5 4 6 8 43 10 11 42
##   ..$ 80 - 60 :List of 1
##   .. ..$ elements: int [1:15, 1] 7 8 9 1 2 3 12 13 14 44 ...
##   ..$ 60 - 40 :List of 1
##   .. ..$ elements: int [1:13, 1] 41 49 24 25 26 27 28 21 22 19 ...
##   ..$ 40 - 20 :List of 1
##   .. ..$ elements: int [1:6, 1] 15 39 40 35 23 47
##   ..$ 20 - 0  :List of 1
##   .. ..$ elements: int [1:10, 1] 36 37 38 32 33 34 50 48 29 30
##  - attr(*, "class")= chr "dispRity"
## [1] "matrix"  "call"    "subsets"
##  ---- dispRity object ---- 
## 5 discrete time subsets for 50 elements:
##     100 - 80, 80 - 60, 60 - 40, 40 - 20, 20 - 0.

These objects will gradual.splitly contain more information when completing the following steps in the disparity-through-time analysis.

9.2.2 Bootstrapping the data

Once we obtain our different time subsets, we can bootstrap and rarefy them (i.e. pseudo-replicating the data). The bootstrapping allows us to make each subset more robust to outliers and the rarefaction allows us to compare subsets with the same number of taxa to remove sampling biases (i.e. more taxa in one subset than the others). The boot.matrix function bootstraps the dispRity object and the rarefaction option within performs rarefaction.

##  ---- dispRity object ---- 
## 5 discrete time subsets for 50 elements with 48 dimensions:
##     100 - 80, 80 - 60, 60 - 40, 40 - 20, 20 - 0.
## Data was bootstrapped 100 times (method:"full").
## [1] 6
##  ---- dispRity object ---- 
## 5 discrete time subsets for 50 elements with 48 dimensions:
##     100 - 80, 80 - 60, 60 - 40, 40 - 20, 20 - 0.
## Data was bootstrapped 100 times (method:"full") and rarefied to 6 elements.

9.2.3 Calculating disparity

We can now calculate the disparity within each time subsets along with some confidence intervals generated by the pseudoreplication step above (bootstraps/rarefaction). Disparity can be calculated in many ways and this package allows users to come up with their own disparity metrics. For more details, please refer to the dispRity metric section.

In this example, we are going to calculate the spread of the data in each time subset by calculating disparity as the sum of the variance of each dimension of the morphospace in each time subset using the dispRity function. Thus, in this example, disparity is defined by the multi-dimensional variance of each time subset (i.e. the spread of the taxa within the morphospace). Note that this metric comes with a caveat (not solved here) since it ignores covariances among the dimensions of the morphospace. We use this here because it is a standard metric used in disparity-through-time analysis (Wills, Briggs, and Fortey, n.d.).

##  ---- dispRity object ---- 
## 5 discrete time subsets for 50 elements with 48 dimensions:
##     100 - 80, 80 - 60, 60 - 40, 40 - 20, 20 - 0.
## Data was bootstrapped 100 times (method:"full").
## Disparity was calculated as: c(sum, variances).
##  ---- dispRity object ---- 
## 5 discrete time subsets for 50 elements with 48 dimensions:
##     100 - 80, 80 - 60, 60 - 40, 40 - 20, 20 - 0.
## Data was bootstrapped 100 times (method:"full") and rarefied to 6 elements.
## Disparity was calculated as: c(sum, variances).

The dispRity function does not actually display the calculated disparity values but rather only the properties of the disparity object (size, subsets, metric, etc.). To display the actual calculated scores, we need to summarise the disparity object using the S3 method summary that is applied to a dispRity object (see ?summary.dispRity for more details).

As for any R package, you can refer to the help files for each individual function for more details.

##    subsets  n   obs bs.median  2.5%   25%   75% 97.5%
## 1 100 - 80  8 1.675     1.494 1.182 1.409 1.537 1.661
## 2  80 - 60 15 1.782     1.656 1.517 1.601 1.700 1.776
## 3  60 - 40 13 1.913     1.783 1.635 1.733 1.803 1.861
## 4  40 - 20  6 2.022     1.678 1.243 1.589 1.816 1.942
## 5   20 - 0 10 1.971     1.768 1.573 1.694 1.840 1.912
##    subsets  n   obs bs.median  2.5%   25%   75% 97.5%
## 1 100 - 80  8 1.675     1.501 1.253 1.442 1.565 1.614
## 2 100 - 80  6    NA     1.464 1.004 1.377 1.586 1.705
## 3  80 - 60 15 1.782     1.649 1.504 1.592 1.699 1.778
## 4  80 - 60  6    NA     1.673 1.394 1.570 1.792 1.929
## 5  60 - 40 13 1.913     1.790 1.634 1.729 1.818 1.858
## 6  60 - 40  6    NA     1.764 1.373 1.638 1.861 1.987
## 7  40 - 20  6 2.022     1.692 1.063 1.591 1.822 1.904
## 8   20 - 0 10 1.971     1.786 1.541 1.715 1.841 1.917
## 9   20 - 0  6    NA     1.808 1.361 1.679 1.880 2.011

The summary.dispRity function comes with many options on which values to calculate (central tendency and quantiles) and on how many digits to display. Refer to the function’s manual for more details.

9.2.4 Plotting the results

It is sometimes easier to visualise the results in a plot than in a table. For that we can use the plot S3 function to plot the dispRity objects (see ?plot.dispRity for more details).

9.3 Testing differences

Finally, to draw some valid conclusions from these results, we can apply some statistical tests. We can test, for example, if mammalian disparity changed significantly through time over the last 100 million years. To do so, we can compare the means of each time-bin in a sequential manner to see whether the disparity in bin n is equal to the disparity in bin n+1, and whether this is in turn equal to the disparity in bin n+2, etc. Because our data is temporally autocorrelated (i.e. what happens in bin n+1 depends on what happened in bin n) and pseudoreplicated (i.e. each bootstrap draw creates non-independent time subsets because they are all based on the same time subsets), we apply a non-parametric mean comparison: the wilcox.test. Also, we need to apply a p-value correction (e.g. Bonferroni correction) to correct for multiple testing (see ?p.adjust for more details).

## [[1]]
##                    statistic: W
## 100 - 80 : 80 - 60          687
## 80 - 60 : 60 - 40          1121
## 60 - 40 : 40 - 20          6566
## 40 - 20 : 20 - 0           3678
## 
## [[2]]
##                         p.value
## 100 - 80 : 80 - 60 2.330360e-25
## 80 - 60 : 60 - 40  1.049970e-20
## 60 - 40 : 40 - 20  5.226732e-04
## 40 - 20 : 20 - 0   4.968995e-03
## [[1]]
##                    statistic: W
## 100 - 80 : 80 - 60          857
## 80 - 60 : 60 - 40          1049
## 60 - 40 : 40 - 20          6328
## 40 - 20 : 20 - 0           3732
## 
## [[2]]
##                         p.value
## 100 - 80 : 80 - 60 1.769873e-23
## 80 - 60 : 60 - 40  1.916852e-21
## 60 - 40 : 40 - 20  4.720974e-03
## 40 - 20 : 20 - 0   7.819379e-03

Here our results show significant changes in disparity through time between all time bins (all p-values < 0.05). However, when looking at the rarefied results, there is no significant difference between the time bins in the Palaeogene (60-40 to 40-20 Mya), suggesting that the differences detected in the first test might just be due to the differences in number of taxa sampled (13 or 6 taxa) in each time bin.

References

Beck, Robin M, and Michael S Lee. 2014. “Ancient Dates or Accelerated Rates? Morphological Clocks and the Antiquity of Placental Mammals.” Proceedings of the Royal Society B: Biological Sciences 281 (20141278): 1–10. https://doi.org/10.1098/rspb.2014.1278.

Wills, Matthew A., Derek E. G. Briggs, and Richard A. Fortey. n.d. “Disparity as an Evolutionary Index: A Comparison of Cambrian and Recent Arthropods.” Paleobiology 20 (2). Paleontological Society: 93–130. https://doi.org/10.2307/2401014.