7 The guts of the dispRity package

7.1 Manipulating dispRity objects

Disparity analysis involves a lot of manipulation of many matrices (especially when bootstrapping) which can be impractical to visualise and will quickly overwhelm your R console. Even the simple Beck and Lee 2014 example above produces an object with > 72 lines of lists of lists of matrices!

Therefore dispRity uses a specific class of object called a dispRity object. These objects allow users to use S3 method functions such as summary.dispRity, plot.dispRity and print.dispRity. dispRity also contains various utility functions that manipulate the dispRity object (e.g. sort.dispRity, extract.dispRity see the full list in the next section). These functions modify the dispRity object without having to delve into its complex structure! The full structure of a dispRity object is detailed here.

## [1] "dispRity"
## [1] "matrix"    "call"      "subsets"   "disparity"
##  ---- dispRity object ---- 
## 7 continuous (acctran) time subsets for 99 elements with 97 dimensions:
##      90, 80, 70, 60, 50 ...
## Data was bootstrapped 100 times (method:"full") and rarefied to 20, 15, 10, 5 elements.
## Disparity was calculated as: c(median, centroids).

Note that it is always possible to recall the full object using the argument all = TRUE in print.dispRity:

7.2 dispRity utilities

The package also provides some utility functions to facilitate multidimensional analysis.

7.2.1 dispRity object utilities

The first set of utilities are functions for manipulating dispRity objects:

7.2.1.1 make.dispRity

This function creates empty dispRity objects.

## Empty dispRity object.
##  ---- dispRity object ---- 
## Contains only a matrix 5x4.

7.2.1.2 fill.dispRity

This function initialises a dispRity object and generates its call properties.

## list()
##  ---- dispRity object ---- 
## 5 elements with 4 dimensions.
## $dimensions
## [1] 4

7.2.1.3 matrix.dispRity

This function extracts a specific matrix from a disparity object. The matrix can be one of the bootstrapped matrices or/and a rarefied matrix.

##  num [1:18, 1:97] -0.1038 0.2844 0.2848 0.0927 0.1619 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:18] "Leptictis" "Dasypodidae" "n24" "Potamogalinae" ...
##   ..$ : NULL
##  num [1:15, 1:97] -0.7161 0.3496 -0.573 -0.0445 -0.1427 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:15] "n7" "n34" "Maelestes" "n20" ...
##   ..$ : NULL

7.2.1.5 combine.subsets

This function allows to merge different subsets. If the

Note that the computed values (bootstrapped data + disparity metric) are not merge.

7.2.1.7 rescale.dispRity

This is the modified S3 method for scale (scaling and/or centring) that can be applied to the disparity data of a dispRity object and can take optional arguments (for example the rescaling by dividing by a maximum value).

7.2.1.8 sort.dispRity

This is the S3 method of sort for sorting the subsets alphabetically (default) or following a specific pattern.

7.3 The dispRity object content

The functions above are utilities to easily and safely access different elements in the dispRity object. Alternatively, of course, each elements can be accessed manually. Here is an explanation on how it works. The dispRity object is a list of two to four elements, each of which are detailed below:

  • $matrix: an object of class matrix, the full multidimensional space.
  • $call: an object of class list containing information on the dispRity object content.
  • $subsets: an object of class list containing the subsets of the multidimensional space.
  • $disparity: an object of class list containing the disparity values.

The dispRity object is loosely based on C structure objects. In fact, it is composed of one unique instance of a matrix (the multidimensional space) upon which the metric function is called via “pointers” to only a certain number of elements and/or dimensions of this matrix. This allows for: (1) faster and easily tractable execution time: the metric functions are called through apply family function and can be parallelised; and (2) a really low memory footprint: at any time, only one matrix is present in the R environment rather than multiple copies of it for each subset.

7.3.1 $matrix

This is the multidimensional space, stored in the R environment as a matrix object. It requires row names but not column names. By default, if the row names are missing, dispRity function will arbitrarily generate them in numeric order (i.e. rownames(matrix) <- 1:nrow(matrix)). This element of the dispRity object is never modified.

7.3.2 $call

This element contains the information on the dispRity object content. It is a list that can contain the following:

  • $call$subsets: a vector of character with information on the subsets type (either "continuous", "discrete" or "custom") and their eventual model ("acctran", "deltran", "random", "proximity", "equal.split", "gradual.split"). This element generated only once via chrono.subsets() and custom.subsets().
  • $call$dimensions: either a single numeric value indicating how many dimensions to use or a vector of numeric values indicating which specific dimensions to use. This element is by default the number of columns in $matrix but can be modified through boot.matrix() or dispRity().
  • $call$bootstrap: this is a list containing three elements:
    • [[1]]: the number of bootstrap replicates (numeric)
    • [[2]]: the bootstrap method (character)
    • [[3]]: the rarefaction levels (numeric vector)
  • $call$disparity: this is a list containing one element, $metric, that is a list containing the different functions passed to the metric argument in dispRity. These are call elements and get modified each time the dispRity function is used (the first element is the first metric(s), the second, the second metric(s), etc.).

7.3.3 $subsets

This element contain the eventual subsets of the multidimensional space. It is a list of subset names. Each subset name is in turn a list of at least one element called elements which is in turn a matrix. This elements matrix is the raw (observed) elements in the subsets. The elements matrix is composed of numeric values in one column and n rows (the number of elements in the subset). Each of these values are a “pointer” (C inspired) to the element of the $matrix. For example, lets assume a dispRity object called disparity, composed of at least one subsets called sub1:

 disparity$subsets$sub1$elements
      [,1]
 [1,]    5
 [2,]    4
 [3,]    6
 [4,]    7

The values in the matrix “point” to the elements in $matrix: here, the multidimensional space with only the 4th, 5th, 6th and 7th elements. The following elements in diparity$subsets$sub1 will correspond to the same “pointers” but drawn from the bootstrap replicates. The columns will correspond to different bootstrap replicates. For example:

 disparity$subsets$sub1[[2]]
      [,1] [,2] [,3] [,4]
 [1,]   57   43   70    4
 [2,]   43   44    4    4
 [3,]   42   84   44    1
 [4,]   84    7    2   10

This signifies that we have four bootstrap pseudo-replicates pointing each time to four elements in $matrix. The next element ([[3]]) will be the same for the eventual first rarefaction level (i.e. the resulting bootstrap matrix will have m rows where m is the number of elements for this rarefaction level). The next element after that ([[4]]) will be the same for with an other rarefaction level and so forth…

7.3.4 $disparity

The $disparity element is identical to the $subsets element structure (a list of list(s) containing matrices) but the matrices don’t contain “pointers” to $matrix but the disparity result of the disparity metric applied to the “pointers”. For example, in our first example ($elements) from above, if the disparity metric is of dimensions level 1, we would have:

 disparity$disparity$sub1$elements
      [,1]
 [1,]    1.82

This is the observed disparity (1.82) for the subset called sub1. If the disparity metric is of dimension level 2 (say the function range that outputs two values), we would have:

 disparity$disparity$sub1$elements
      [,1]
 [1,]    0.82
 [2,]    2.82

The following elements in the list follow the same logic as before: rows are disparity values (one row for a dimension level 1 metric, multiple for a dimensions level 2 metric) and columns are the bootstrap replicates (the bootstrap with all elements followed by the eventual rarefaction levels). For example for the bootstrap without rarefaction (second element of the list):

 disparity$disparity$sub1[[2]]
         [,1]     [,2]     [,3]     [,4]
[1,] 1.744668 1.777418 1.781624 1.739679