Ordering Objects using Seriation in R

Introduction

Installation

Getting started

Heatmaps with seriation

Last updated: 2023-10-05

Checks: 7 0

Knit directory: muse/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200712)

The command set.seed(20200712) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 1e94d2f

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 1e94d2f. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    r_packages_4.3.0/

Untracked files:
    Untracked:  analysis/cell_ranger.Rmd
    Untracked:  analysis/complex_heatmap.Rmd
    Untracked:  analysis/sleuth.Rmd
    Untracked:  analysis/tss_xgboost.Rmd
    Untracked:  code/multiz100way/
    Untracked:  data/HG00702_SH089_CHSTrio.chr1.vcf.gz
    Untracked:  data/HG00702_SH089_CHSTrio.chr1.vcf.gz.tbi
    Untracked:  data/ncrna_NONCODE[v3.0].fasta.tar.gz
    Untracked:  data/ncrna_noncode_v3.fa
    Untracked:  data/netmhciipan.out.gz
    Untracked:  data/test
    Untracked:  export/davetang039sblog.WordPress.2023-06-30.xml
    Untracked:  export/output/
    Untracked:  women.json

Unstaged changes:
    Modified:   analysis/graph.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/seriation.Rmd) and HTML (docs/seriation.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	1e94d2f	Dave Tang	2023-10-05	Dendrogram
html	b57aaff	Dave Tang	2023-10-05	Build site.
Rmd	8f6754a	Dave Tang	2023-10-05	Silence
html	2e62b65	Dave Tang	2023-10-05	Build site.
Rmd	a63df84	Dave Tang	2023-10-05	Ordering objects using seriation

Introduction

Installation

Install stable CRAN version.

if(! "seriation" %in% installed.packages()[, 1]){
  install.packages("seriation", repos = c("https://mhahsler.r-universe.dev", "https://cloud.r-project.org/"))
}

packageVersion("seriation")

[1] '1.5.1.1'

Getting started

Use the example dataset SupremeCourt, which:

Contains a (a subset of the) decisions for the stable 8-yr period 1995-2002 of the second Rehnquist Supreme Court. Decisions are aggregated to the joint probability for disagreement between judges.

library(seriation)
data("SupremeCourt")
SupremeCourt

           Breyer Ginsburg Kennedy OConnor Rehnquist  Scalia  Souter Stevens
Breyer    0.00000  0.11966 0.25000 0.20940   0.29915 0.35256 0.11752 0.16239
Ginsburg  0.11966  0.00000 0.26790 0.25214   0.30769 0.36966 0.09615 0.14530
Kennedy   0.25000  0.26709 0.00000 0.15598   0.12179 0.18803 0.24786 0.32692
OConnor   0.20940  0.25214 0.15598 0.00000   0.16239 0.20726 0.22009 0.32906
Rehnquist 0.29915  0.30769 0.12179 0.16239   0.00000 0.14316 0.29274 0.40171
Scalia    0.35256  0.36966 0.18803 0.20726   0.14316 0.00000 0.33761 0.43803
Souter    0.11752  0.09615 0.24790 0.22009   0.29274 0.33761 0.00000 0.16880
Stevens   0.16239  0.14530 0.32692 0.32906   0.40171 0.43803 0.16880 0.00000
Thomas    0.35897  0.36752 0.17735 0.20513   0.13675 0.06624 0.33120 0.43590
           Thomas
Breyer    0.35897
Ginsburg  0.36752
Kennedy   0.17735
OConnor   0.20513
Rehnquist 0.13675
Scalia    0.06624
Souter    0.33120
Stevens   0.43590
Thomas    0.00000

Convert to distance matrix.

d <- as.dist(SupremeCourt)
d

           Breyer Ginsburg Kennedy OConnor Rehnquist  Scalia  Souter Stevens
Ginsburg  0.11966                                                           
Kennedy   0.25000  0.26709                                                  
OConnor   0.20940  0.25214 0.15598                                          
Rehnquist 0.29915  0.30769 0.12179 0.16239                                  
Scalia    0.35256  0.36966 0.18803 0.20726   0.14316                        
Souter    0.11752  0.09615 0.24790 0.22009   0.29274 0.33761                
Stevens   0.16239  0.14530 0.32692 0.32906   0.40171 0.43803 0.16880        
Thomas    0.35897  0.36752 0.17735 0.20513   0.13675 0.06624 0.33120 0.43590

Perform the default seriation method to reorder the objects.

my_order <- seriate(d)
get_order(my_order)

   Scalia    Thomas Rehnquist   Kennedy   OConnor    Souter    Breyer  Ginsburg 
        6         9         5         3         4         7         1         2 
  Stevens 
        8

Plot heatmap.

p1 <- ggpimage(d, upper_tri = TRUE) +
  ggtitle("Judges (original alphabetical order)")

p2 <- ggpimage(d, my_order, upper_tri = TRUE) +
  ggtitle("Judges (reordered by seriation)")

p1 + p2 & scale_fill_gradientn(colours = c("darkgrey", "skyblue"))

Version	Author	Date
2e62b65	Dave Tang	2023-10-05

Return linear configuration where more similar objects are located closer to each other.

get_config(my_order)

    Breyer   Ginsburg    Kennedy    OConnor  Rehnquist     Scalia     Souter 
 0.2362454  0.2814388 -0.1501451 -0.1051162 -0.2656098 -0.4159504  0.2139295 
   Stevens     Thomas 
 0.6129974 -0.4077896

Plot linear configuration.

plot_config(my_order)

Version	Author	Date
2e62b65	Dave Tang	2023-10-05

Heatmaps with seriation

The Wood dataset consists of:

A data matrix containing a sample of the normalized gene expression data for 6 locations in the stem of Popla trees published in the study by Herzberg et al (2001). The sample of 136 genes selected by Caraux and Pinloche (2005).

data("Wood")
dim(Wood)

[1] 136   6

Check out Wood.

head(Wood)

                  P          A          B          C          D          E
AI161452 -0.7546223 -2.2447910 -2.4157241 -0.8181829  1.0121892  0.8839819
AI161500 -2.0621934  0.2127532  0.3556842  0.2219739 -0.6714808  0.3477471
AI161513  0.1708342  1.3265617  0.4093247 -1.2003526 -3.3316990 -2.0194944
AI161572 -1.1837279 -1.5292043 -2.1512254 -1.0145349  1.1844282 -0.4033869
AI161573 -1.8637857 -2.1495779 -2.5108412 -0.8444706  1.4952223 -1.7662259
AI161629  1.5917360  1.0212036 -0.1519370 -1.3543136 -2.7099315 -1.3129411

Methods of interest for heatmaps are dendrogram leaf order-based methods applied to rows and columns. This is done using method = "heatmap". The actual seriation method can be passed on as parameter seriaton_method, but it has a suitable default if it is omitted.

wood_hc_complete <- seriate(Wood, method = "Heatmap", seriation_method = "HC_complete")
wood_olo_complete <- seriate(Wood, method = "Heatmap", seriation_method = "OLO_complete")
get_order(wood_olo_complete)

AI165492 AI166057 AI162004 AI164970 AI163151 AI166086 AI163756 AI162593 
     106      128       15       88       44      130       59       32 
AI165011 AI164136 AI166095 AI162370 AI165426 AI162652 AI162561 AI164684 
      91       69      131       26      105       34       31       80 
AI164612 AI165835 AI163485 AI162809 AI161513 AI163315 AI163528 AI166101 
      78      119       51       38        3       47       52      132 
AI163131 AI165913 AI166111 AI161629 AI164686 AI164884 AI165668 AI164585 
      43      123      133        6       81       86      111       76 
AI165041 AI164635 AI163821 AI162521 AI163812 AI163303 AI162216 AI163994 
      93       79       62       30       61       46       23       66 
AI163249 AI163617 AI164793 AI165990 AI165006 AI162997 AI163650 AI166068 
      45       56       85      126       90       41       58      129 
AI164101 AI165836 AI163012 AI165107 AI162249 AI163991 AI165062 AI161730 
      68      120       42       95       24       65       94       11 
AI164964 AI162452 AI165206 AI164435 AI165215 AI164711 AI161827 AI163941 
      87       29       99       73      100       82       13       64 
AI165320 AI165189 AI165691 AI163880 AI165690 AI161452 AI162402 AI163624 
     104       98      115       63      114        1       27       57 
AI164753 AI165949 AI164359 AI163594 AI166034 AI162940 AI162710 AI162318 
      84      124       71       54      127       40       36       25 
AI164979 AI164231 AI165974 AI161823 AI165520 AI162684 AI161697 AI162729 
      89       70      125       12      107       35       10       37 
AI165903 AI163580 AI164546 AI162060 AI162092 AI161674 AI161638 AI164604 
     122       53       75       16       17        8        7       77 
AI165021 AI163338 AI162215 AI165673 AI166128 AI161572 AI162094 AI162600 
      92       49       22      112      134        4       18       33 
AI161573 AI165868 AI162928 AI165529 AI165730 AI163758 AI164060 AI163328 
       5      121       39      108      116       60       67       48 
AI165803 AI165272 AI166182 AI165289 AI165247 AI162157 AI165826 AI165655 
     117      102      136      103      101       20      118      110 
AI162429 AI164730 AI165162 AI163474 AI161899 AI161694 AI162148 AI164370 
      28       83       96       50       14        9       19       72 
AI166167 AI165687 AI165175 AI162158 AI164450 AI163608 AI161500 AI165575 
     135      113       97       21       74       55        2      109

Ignore the numbers of the order above; they indicate the index of the gene in Wood. AI165492 and AI166057 are the most similar to each other. If I use those values in Wood, I get their expression data.

Wood[c(106, 128), ]

                 P         A         B         C          D         E
AI165492 -0.825104 0.1972707 0.4277217 0.7228185 -0.8326557 -2.503835
AI166057 -1.005708 0.5118941 0.7032220 0.6993795 -1.1373152 -2.562734

wood_olo_complete is also a hclust.

class(wood_olo_complete[[1]])

[1] "ser_permutation_vector" "hclust"

Plot hierarchical clustering result.

as.dendrogram(wood_olo_complete[[1]]) %>%
  plot(horiz = TRUE)

Gene order of the dendrogram matches the order produced by seriation using OLO complete.

o <- wood_olo_complete[[1]]$order
identical(wood_olo_complete[[1]]$labels[o], names(get_order(wood_olo_complete)))

[1] TRUE

Location ordering.

get_order(wood_hc_complete, 2)

P A B E C D 
1 2 3 6 4 5

get_order(wood_olo_complete, 2)

P A B C D E 
1 2 3 4 5 6

Heatmap.

p1 <- ggpimage(Wood) +
  ggtitle("Wood (no order)") +
  theme(legend.position = "none")
  
p2 <- ggpimage(Wood, wood_hc_complete) +
  ggtitle("Wood (HC complete)") +
  theme(legend.position = "none")

p3 <- ggpimage(Wood, wood_olo_complete) +
  ggtitle("Wood (OLO complete)")

p1 + p2 + p3 & scale_fill_gradientn(colours = c("skyblue", "red"))

Version	Author	Date
2e62b65	Dave Tang	2023-10-05

sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] seriation_1.5.1-1    dendextend_1.17.1    patchwork_1.1.3.9000
 [4] lubridate_1.9.2      forcats_1.0.0        stringr_1.5.0       
 [7] dplyr_1.1.2          purrr_1.0.1          readr_2.1.4         
[10] tidyr_1.3.0          tibble_3.2.1         ggplot2_3.4.2       
[13] tidyverse_2.0.0      workflowr_1.7.0     

loaded via a namespace (and not attached):
 [1] gtable_0.3.3      xfun_0.39         bslib_0.5.0       processx_3.8.1   
 [5] callr_3.7.3       tzdb_0.4.0        vctrs_0.6.2       tools_4.3.0      
 [9] ps_1.7.5          generics_0.1.3    ca_0.71.1         fansi_1.0.4      
[13] highr_0.10        pkgconfig_2.0.3   lifecycle_1.0.3   farver_2.1.1     
[17] compiler_4.3.0    git2r_0.32.0      munsell_0.5.0     getPass_0.2-2    
[21] codetools_0.2-19  httpuv_1.6.11     htmltools_0.5.5   sass_0.4.6       
[25] yaml_2.3.7        later_1.3.1       pillar_1.9.0      jquerylib_0.1.4  
[29] whisker_0.4.1     cachem_1.0.8      iterators_1.0.14  viridis_0.6.3    
[33] TSP_1.2-4         foreach_1.5.2     tidyselect_1.2.0  digest_0.6.31    
[37] stringi_1.7.12    labeling_0.4.2    rprojroot_2.0.3   fastmap_1.1.1    
[41] grid_4.3.0        colorspace_2.1-0  cli_3.6.1         magrittr_2.0.3   
[45] utf8_1.2.3        withr_2.5.0       scales_1.2.1      promises_1.2.0.1 
[49] registry_0.5-1    timechange_0.2.0  rmarkdown_2.22    httr_1.4.6       
[53] gridExtra_2.3     hms_1.1.3         evaluate_0.21     knitr_1.43       
[57] viridisLite_0.4.2 rlang_1.1.1       Rcpp_1.0.10       glue_1.6.2       
[61] rstudioapi_0.14   jsonlite_1.8.5    R6_2.5.1          fs_1.6.2