Ordering Objects using Seriation in R

Introduction

Installation

Getting started

Heatmaps with seriation

Last updated: 2023-10-05

Checks: 7 0

Knit directory: muse/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200712)

The command set.seed(20200712) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: a63df84

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a63df84. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    r_packages_4.3.0/

Untracked files:
    Untracked:  analysis/cell_ranger.Rmd
    Untracked:  analysis/complex_heatmap.Rmd
    Untracked:  analysis/sleuth.Rmd
    Untracked:  analysis/tss_xgboost.Rmd
    Untracked:  code/multiz100way/
    Untracked:  data/HG00702_SH089_CHSTrio.chr1.vcf.gz
    Untracked:  data/HG00702_SH089_CHSTrio.chr1.vcf.gz.tbi
    Untracked:  data/ncrna_NONCODE[v3.0].fasta.tar.gz
    Untracked:  data/ncrna_noncode_v3.fa
    Untracked:  data/netmhciipan.out.gz
    Untracked:  data/test
    Untracked:  export/davetang039sblog.WordPress.2023-06-30.xml
    Untracked:  export/output/
    Untracked:  women.json

Unstaged changes:
    Modified:   analysis/graph.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/seriation.Rmd) and HTML (docs/seriation.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	a63df84	Dave Tang	2023-10-05	Ordering objects using seriation

Introduction

Installation

Install stable CRAN version.

if(! "seriation" %in% installed.packages()[, 1]){
  install.packages("seriation", repos = c("https://mhahsler.r-universe.dev", "https://cloud.r-project.org/"))
}

packageVersion("seriation")

[1] '1.5.1.1'

Getting started

Use the example dataset SupremeCourt, which:

Contains a (a subset of the) decisions for the stable 8-yr period 1995-2002 of the second Rehnquist Supreme Court. Decisions are aggregated to the joint probability for disagreement between judges.

library(seriation)
data("SupremeCourt")
SupremeCourt

           Breyer Ginsburg Kennedy OConnor Rehnquist  Scalia  Souter Stevens
Breyer    0.00000  0.11966 0.25000 0.20940   0.29915 0.35256 0.11752 0.16239
Ginsburg  0.11966  0.00000 0.26790 0.25214   0.30769 0.36966 0.09615 0.14530
Kennedy   0.25000  0.26709 0.00000 0.15598   0.12179 0.18803 0.24786 0.32692
OConnor   0.20940  0.25214 0.15598 0.00000   0.16239 0.20726 0.22009 0.32906
Rehnquist 0.29915  0.30769 0.12179 0.16239   0.00000 0.14316 0.29274 0.40171
Scalia    0.35256  0.36966 0.18803 0.20726   0.14316 0.00000 0.33761 0.43803
Souter    0.11752  0.09615 0.24790 0.22009   0.29274 0.33761 0.00000 0.16880
Stevens   0.16239  0.14530 0.32692 0.32906   0.40171 0.43803 0.16880 0.00000
Thomas    0.35897  0.36752 0.17735 0.20513   0.13675 0.06624 0.33120 0.43590
           Thomas
Breyer    0.35897
Ginsburg  0.36752
Kennedy   0.17735
OConnor   0.20513
Rehnquist 0.13675
Scalia    0.06624
Souter    0.33120
Stevens   0.43590
Thomas    0.00000

Convert to distance matrix.

d <- as.dist(SupremeCourt)
d

           Breyer Ginsburg Kennedy OConnor Rehnquist  Scalia  Souter Stevens
Ginsburg  0.11966                                                           
Kennedy   0.25000  0.26709                                                  
OConnor   0.20940  0.25214 0.15598                                          
Rehnquist 0.29915  0.30769 0.12179 0.16239                                  
Scalia    0.35256  0.36966 0.18803 0.20726   0.14316                        
Souter    0.11752  0.09615 0.24790 0.22009   0.29274 0.33761                
Stevens   0.16239  0.14530 0.32692 0.32906   0.40171 0.43803 0.16880        
Thomas    0.35897  0.36752 0.17735 0.20513   0.13675 0.06624 0.33120 0.43590

Perform the default seriation method to reorder the objects.

my_order <- seriate(d)
get_order(my_order)

   Scalia    Thomas Rehnquist   Kennedy   OConnor    Souter    Breyer  Ginsburg 
        6         9         5         3         4         7         1         2 
  Stevens 
        8

Plot heatmap.

p1 <- ggpimage(d, upper_tri = TRUE) +
  ggtitle("Judges (original alphabetical order)")

p2 <- ggpimage(d, my_order, upper_tri = TRUE) +
  ggtitle("Judges (reordered by seriation)")

p1 + p2 & scale_fill_gradientn(colours = c("darkgrey", "skyblue"))

Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.

Return linear configuration where more similar objects are located closer to each other.

get_config(my_order)

    Breyer   Ginsburg    Kennedy    OConnor  Rehnquist     Scalia     Souter 
 0.2362454  0.2814388 -0.1501451 -0.1051162 -0.2656098 -0.4159504  0.2139295 
   Stevens     Thomas 
 0.6129974 -0.4077896

Plot linear configuration.

plot_config(my_order)

Heatmaps with seriation

The Wood dataset consists of:

A data matrix containing a sample of the normalized gene expression data for 6 locations in the stem of Popla trees published in the study by Herzberg et al (2001). The sample of 136 genes selected by Caraux and Pinloche (2005).

data("Wood")
dim(Wood)

[1] 136   6

Methods of interest for heatmaps are dendrogram leaf order-based methods applied to rows and columns. This is done using method = "heatmap". The actual seriation method can be passed on as parameter seriaton_method, but it has a suitable default if it is omitted.

wood_hc_complete <- seriate(Wood, method = "Heatmap", seriation_method = "HC_complete")
wood_olo_complete <- seriate(Wood, method = "Heatmap", seriation_method = "OLO_complete")
get_order(wood_hc_complete)

AI162370 AI166095 AI164136 AI165011 AI165492 AI166057 AI162004 AI163151 
      26      131       69       91      106      128       15       44 
AI164970 AI166086 AI162593 AI163756 AI164684 AI164612 AI165835 AI163485 
      88      130       32       59       80       78      119       51 
AI163315 AI161513 AI162809 AI162561 AI162652 AI165426 AI162216 AI163303 
      47        3       38       31       34      105       23       46 
AI164585 AI164635 AI165041 AI163821 AI162521 AI163812 AI166111 AI161629 
      76       79       93       62       30       61      133        6 
AI164686 AI164884 AI165668 AI163528 AI165913 AI163131 AI166101 AI165006 
      81       86      111       52      123       43      132       90 
AI165990 AI163617 AI164793 AI163249 AI163994 AI162997 AI163650 AI163758 
     126       56       85       45       66       41       58       60 
AI161573 AI162600 AI165529 AI165730 AI162928 AI165868 AI162429 AI164730 
       5       33      108      116       39      121       28       83 
AI165247 AI165289 AI166182 AI163328 AI164060 AI165272 AI165803 AI165655 
     101      103      136       48       67      102      117      110 
AI162157 AI165826 AI161500 AI165575 AI163608 AI164450 AI162158 AI165175 
      20      118        2      109       55       74       21       97 
AI165687 AI166167 AI164370 AI162148 AI165162 AI163474 AI161694 AI161899 
     113      135       72       19       96       50        9       14 
AI165836 AI164101 AI166068 AI164435 AI164711 AI165215 AI161730 AI165062 
     120       68      129       73       82      100       11       94 
AI164964 AI162452 AI165206 AI163012 AI165107 AI162249 AI163991 AI163880 
      87       29       99       42       95       24       65       63 
AI165691 AI165189 AI161827 AI163941 AI165320 AI164753 AI163624 AI162402 
     115       98       13       64      104       84       57       27 
AI161452 AI165690 AI164359 AI165949 AI163594 AI166034 AI162710 AI162940 
       1      114       71      124       54      127       36       40 
AI162318 AI164979 AI164231 AI165974 AI163580 AI161823 AI165520 AI165903 
      25       89       70      125       53       12      107      122 
AI162684 AI161697 AI162729 AI161572 AI162094 AI163338 AI166128 AI162215 
      35       10       37        4       18       49      134       22 
AI165673 AI161638 AI161674 AI164604 AI165021 AI162092 AI162060 AI164546 
     112        7        8       77       92       17       16       75

Ordering.

get_order(wood_hc_complete, 2)

P A B E C D 
1 2 3 6 4 5

get_order(wood_olo_complete, 2)

P A B C D E 
1 2 3 4 5 6

Heatmap.

p1 <- ggpimage(Wood) +
  ggtitle("Wood (no order)") +
  theme(legend.position = "none")
  
p2 <- ggpimage(Wood, wood_hc_complete) +
  ggtitle("Wood (HC complete)") +
  theme(legend.position = "none")

p3 <- ggpimage(Wood, wood_olo_complete) +
  ggtitle("Wood (OLO complete)")

p1 + p2 + p3 & scale_fill_gradientn(colours = c("skyblue", "red"))

Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.

sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] seriation_1.5.1-1    patchwork_1.1.3.9000 lubridate_1.9.2     
 [4] forcats_1.0.0        stringr_1.5.0        dplyr_1.1.2         
 [7] purrr_1.0.1          readr_2.1.4          tidyr_1.3.0         
[10] tibble_3.2.1         ggplot2_3.4.2        tidyverse_2.0.0     
[13] workflowr_1.7.0     

loaded via a namespace (and not attached):
 [1] gtable_0.3.3     xfun_0.39        bslib_0.5.0      processx_3.8.1  
 [5] callr_3.7.3      tzdb_0.4.0       vctrs_0.6.2      tools_4.3.0     
 [9] ps_1.7.5         generics_0.1.3   ca_0.71.1        fansi_1.0.4     
[13] highr_0.10       pkgconfig_2.0.3  lifecycle_1.0.3  compiler_4.3.0  
[17] farver_2.1.1     git2r_0.32.0     munsell_0.5.0    getPass_0.2-2   
[21] codetools_0.2-19 httpuv_1.6.11    htmltools_0.5.5  sass_0.4.6      
[25] yaml_2.3.7       later_1.3.1      pillar_1.9.0     jquerylib_0.1.4 
[29] whisker_0.4.1    cachem_1.0.8     iterators_1.0.14 TSP_1.2-4       
[33] foreach_1.5.2    tidyselect_1.2.0 digest_0.6.31    stringi_1.7.12  
[37] labeling_0.4.2   rprojroot_2.0.3  fastmap_1.1.1    grid_4.3.0      
[41] colorspace_2.1-0 cli_3.6.1        magrittr_2.0.3   utf8_1.2.3      
[45] withr_2.5.0      scales_1.2.1     promises_1.2.0.1 registry_0.5-1  
[49] timechange_0.2.0 rmarkdown_2.22   httr_1.4.6       hms_1.1.3       
[53] evaluate_0.21    knitr_1.43       rlang_1.1.1      Rcpp_1.0.10     
[57] glue_1.6.2       rstudioapi_0.14  jsonlite_1.8.5   R6_2.5.1        
[61] fs_1.6.2