Structural analysis with MISTy - based on DOT deconvolution

Last updated: 2024-03-17

Checks: 7 0

Knit directory: ProtocolLabRotationSaezRodriguezGroup/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20240306)

The command set.seed(20240306) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: a5a761b

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a5a761b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  10X_Visium_ACH005.tar.gz
    Untracked:  ACH005/
    Untracked:  bc_metadata.tsv
    Untracked:  data/10X_Visium_ACH005.tar.gz
    Untracked:  data/ACH005/
    Untracked:  data/bc_metadata.tsv
    Untracked:  data/hca_p14.rds
    Untracked:  data/imc_bc_optim_zoi.RDS
    Untracked:  data/omni_resource.csv
    Untracked:  hca_p14.rds
    Untracked:  imc_bc_optim_zoi.RDS
    Untracked:  omni_resource.csv
    Untracked:  omnipathr-log/
    Untracked:  result/
    Untracked:  results/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/MistyRStructuralAnalysisPipelineDOT.Rmd) and HTML (docs/MistyRStructuralAnalysisPipelineDOT.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	ec66e86	leotenshii	2024-03-17	Build site.
html	36c6e22	leotenshii	2024-03-11	Build site.
html	a7395df	leotenshii	2024-03-11	Build site.
html	64a10dc	leotenshii	2024-03-11	added html
Rmd	38a9f93	leotenshii	2024-03-11	Revert "changed paths to data/…"
Rmd	5e9bf15	leotenshii	2024-03-11	changed paths to data/…
Rmd	5ec41b8	leotenshii	2024-03-11	upload of vignettes

Introduction

MISTy is designed to analyze spatial omics datasets within and between distinct spatial contexts referred to as views. This analysis can focus solely on structural information. Spatial transcriptomic methods such as Visium capture information from areas containing multiple cells. Then, deconvolution is applied to relate the measured data of the spots back to individual cells. In this vignette we will use the R package DOT for deconvolution.

This vignette presents a workflow for the analysis of structural data, guiding users through the application of mistyR to the results of DOT deconvolution.

The package DOT can be installed from Github remotes::install_github("saezlab/DOT").

Load the necessary packages:

# MISTy 
library(mistyR) 
library(future) 

# DOT
library(DOT)

# Loading experiment data 
library(Seurat)
library(SeuratObject)

# Data manipulation 
library(tidyverse)

Warning: Paket 'ggplot2' wurde unter R Version 4.3.3 erstellt

Warning: Paket 'readr' wurde unter R Version 4.3.3 erstellt

# Distances
library(distances)

Get and load the data

For this showcase, we use a 10X Visium spatial slide from Kuppe et al., 2022, where they created a spatial multi-omic map of human myocardial infarction. The tissue example data comes from the human heart of patient 14 which is in a later state after myocardial infarction. The Seurat object contains, among other things, the spot coordinates on the slides which we will need for decomposition First, we have to download and extract the file:

# Download the data
download.file("https://zenodo.org/records/6580069/files/10X_Visium_ACH005.tar.gz?download=1",
    destfile = "10X_Visium_ACH005.tar.gz", method = "curl")
untar("10X_Visium_ACH005.tar.gz")

The next step is to load the data and extract the location of the spots. The rows are shifted, which means that the real distances between two spots are not always the same. It is therefore advantageous to use the pixel coordinates instead of row and column numbers, as the distances between these are represented accurately.

spatial_data <- readRDS("ACH005/ACH005.rds")

geometry <- GetTissueCoordinates(spatial_data, cols = c("imagerow", "imagecol"), scale = NULL)

For deconvolution, we additionally need a reference single-cell data set containing a gene x cell count matrix and a vector containing the corresponding cell annotations. Kuppe et al., 2022, obtained from each sample isolated nuclei from the remaining tissue that they used for snRNA-seq. The data corresponding to the same patient as the spatial data will be used as reference data in DOT. First download the file:

download.file("https://www.dropbox.com/scl/fi/sq24xaavxplkc98iimvpz/hca_p14.rds?rlkey=h8cyxzhypavkydbv0z3pqadus&dl=1",
              destfile = "hca_p14.rds",
              mode = "wb")

Now load the data. From this, we retrieve a gene x cell count matrix and the respective cell annotations.

ref_data <- readRDS("hca_p14.rds")

ref_counts_P14 <- ref_data$counts
ref_ct <- ref_data$celltypes

Deconvolution with DOT

Next, we need to set up the DOT object. The two inputs we need are the count matrix and pixel coordinates of the spatial data and the count matrix and cell annotations of the single-cell reference data.

dot.srt <-setup.srt(srt_data = spatial_data@assays$Spatial@counts, srt_coords = geometry) 

dot.ref <- setup.ref(ref_data = ref_counts_P14, ref_annotations = ref_ct, 10)

dot <- create.DOT(dot.srt, dot.ref)

Now we can carry out deconvolution:

# Run DOT
dot <- run.DOT.lowresolution(dot)

The results can be found under dot@weights. To obtain the calculated cell-type proportion per spot, we normalize the result to a row sum of 1.

# Normalize DOT results
DOT_weights <- sweep(dot@weights, 1, rowSums(dot@weights), "/")

Visualize cell proportion in spots

Now we can visually explore the slide itself and the abundance of cell types at each spot.

# Tissue Slide
SpatialPlot(spatial_data, keep.scale = NULL, alpha = 0)

Version	Author	Date
64a10dc	leotenshii	2024-03-11

# Results DOT
draw_maps(geometry, 
          DOT_weights, 
          background = "white", 
          normalize = FALSE, 
          ncol = 3, 
          viridis_option = "viridis")

Version	Author	Date
64a10dc	leotenshii	2024-03-11

Based on the plots, we can observe that some cell types are found more frequently than others. Additionally, we can identify patterns in the distribution of cells, with some being widespread across the entire slide while others are concentrated in specific areas. Furthermore, there are cell types that share a similar distribution.

MISTy views

Downstream Analysis

With the collected results, we can now answer the following questions:

1. To what extent can the occurring cell types of the surrounding tissue explain the cell type composition of the spot compared to the intraview?

Here we can look at two different statistics: multi.R2 shows the total variance explained by the multiview model. gain.R2 shows the increase in explainable variance from the paraview.

misty_results %>%
  plot_improvement_stats("multi.R2") %>% 
  plot_improvement_stats("gain.R2")

Warning: Removed 11 rows containing missing values or values outside the scale range
(`geom_segment()`).

Version	Author	Date
64a10dc	leotenshii	2024-03-11

Warning: Removed 11 rows containing missing values or values outside the scale range
(`geom_segment()`).

Version	Author	Date
64a10dc	leotenshii	2024-03-11

The paraview particularly increases the explained variance for adipocytes and mast cells. In general, the significant gain in R² can be interpreted as the following:

“We can better explain the expression of marker X when we consider additional views other than the intrinsic view.”

2. What are the specific relations that can explain the contributions?

To explain the contributions, we can visualize the importance of each cell type in predicting the cell type distribution for each view separately. With trim, we display only targets with a value above 50 for multi.R2. To set an importance threshold we would apply cutoff.

First, for the intrinsic view:

misty_results %>% plot_interaction_heatmap(view = "intra", 
                                           clean = TRUE,
                                           trim.measure = "multi.R2",
                                           trim = 50)

Version	Author	Date
64a10dc	leotenshii	2024-03-11

We can observe that cardiomyocytes are a significant predictor for some cell types when in the same spot. To identify the target with the best prediction by cardiomyocytes, we can view the importance values as follows:

misty_results$importances.aggregated %>%
  filter(view == "intra", Predictor == "CM") %>%
  arrange(-Importance)

# A tibble: 11 × 5
   view  Predictor Target   Importance nsamples
   <chr> <chr>     <chr>         <dbl>    <int>
 1 intra CM        Fib           2.79         1
 2 intra CM        Endo          2.20         1
 3 intra CM        vSMCs         2.08         1
 4 intra CM        PC            1.93         1
 5 intra CM        prolif        1.23         1
 6 intra CM        Adipo         0.764        1
 7 intra CM        Myeloid       0.419        1
 8 intra CM        Lymphoid     -0.269        1
 9 intra CM        Mast         -0.464        1
10 intra CM        Neuronal     -0.575        1
11 intra CM        CM           NA            1

Let’s take a look at the spatial distribution of Cardiomyocytes and their most important target, fibroblasts, in the tissue slide:

draw_maps(geometry, 
          DOT_weights[, c("Fib", "CM")], 
          background = "white", 
          size = 1.25, 
          normalize = FALSE, 
          ncol = 1,
          viridis_option = "viridis")

Version	Author	Date
64a10dc	leotenshii	2024-03-11

We can observe that areas with high proportions of cardiomyocytes have low proportions of fibroblasts and vice versa.

Now we repeat this analysis with the paraview:

misty_results %>% plot_interaction_heatmap(view = "para.126", 
                                           clean = TRUE, 
                                           trim = 0.1,
                                           trim.measure = "gain.R2")

Version	Author	Date
64a10dc	leotenshii	2024-03-11

Here, we select the target adipocytes, as we know from previous analysis that the paraview contributes a large part to explaining its distribution. The best predictor for adipocytes are Myeloid cells. To better identify the localization of the two cell types, we set the color scaling to a smaller range, as there are a few spots with a high proportion, which makes the distribution of spots with a low proportion difficult to recognize.

draw_maps(geometry,
          DOT_weights[, c("Myeloid","Adipo")],
          background = "white",
          size = 1.25,
          normalize = FALSE, 
          ncol = 1,
          viridis_option = "viridis") +
       scale_colour_viridis_c(limits = c(0,0.33))

Version	Author	Date
64a10dc	leotenshii	2024-03-11

The plots show us that, in some places, the localization of the two cell types overlap.

Session Info

Here is the output of sessionInfo() at the point when this document was compiled.

sessionInfo()

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] distances_0.1.10   lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1     
 [5] dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.0       
 [9] tibble_3.2.1       ggplot2_3.5.0      tidyverse_2.0.0    Seurat_5.0.1      
[13] SeuratObject_5.0.1 sp_2.1-2           DOT_0.0.0.9000     future_1.33.1     
[17] mistyR_1.10.0      workflowr_1.7.1   

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.21       splines_4.3.2          later_1.3.2           
  [4] filelock_1.0.3         fields_15.2            R.oo_1.26.0           
  [7] polyclip_1.10-6        hardhat_1.3.1          pROC_1.18.5           
 [10] rpart_4.1.23           fastDummies_1.7.3      lifecycle_1.0.4       
 [13] rprojroot_2.0.4        vroom_1.6.5            globals_0.16.3        
 [16] processx_3.8.3         lattice_0.22-5         MASS_7.3-60           
 [19] magrittr_2.0.3         plotly_4.10.3          sass_0.4.8            
 [22] rmarkdown_2.25         jquerylib_0.1.4        yaml_2.3.8            
 [25] rlist_0.4.6.2          httpuv_1.6.13          sctransform_0.4.1     
 [28] spam_2.10-0            spatstat.sparse_3.0-3  reticulate_1.34.0     
 [31] cowplot_1.1.2          pbapply_1.7-2          RColorBrewer_1.1-3    
 [34] maps_3.4.2             abind_1.4-5            Rtsne_0.17            
 [37] R.utils_2.12.3         nnet_7.3-19            ipred_0.9-14          
 [40] git2r_0.33.0           lava_1.8.0             ggrepel_0.9.4         
 [43] irlba_2.3.5.1          listenv_0.9.1          spatstat.utils_3.0-4  
 [46] goftest_1.2-3          RSpectra_0.16-1        spatstat.random_3.2-2 
 [49] fitdistrplus_1.1-11    parallelly_1.37.1      leiden_0.4.3.1        
 [52] codetools_0.2-19       tidyselect_1.2.0       farver_2.1.1          
 [55] stats4_4.3.2           matrixStats_1.2.0      spatstat.explore_3.2-5
 [58] jsonlite_1.8.8         caret_6.0-94           ellipsis_0.3.2        
 [61] progressr_0.14.0       ggridges_0.5.5         survival_3.5-7        
 [64] iterators_1.0.14       foreach_1.5.2          tools_4.3.2           
 [67] ica_1.0-3              Rcpp_1.0.11            glue_1.6.2            
 [70] prodlim_2023.08.28     gridExtra_2.3          xfun_0.41             
 [73] ranger_0.16.0          withr_3.0.0            fastmap_1.1.1         
 [76] fansi_1.0.6            callr_3.7.3            digest_0.6.33         
 [79] timechange_0.3.0       R6_2.5.1               mime_0.12             
 [82] colorspace_2.1-0       scattermore_1.2        tensor_1.5            
 [85] spatstat.data_3.0-3    R.methodsS3_1.8.2      utf8_1.2.4            
 [88] generics_0.1.3         data.table_1.15.2      recipes_1.0.10        
 [91] class_7.3-22           httr_1.4.7             ridge_3.3             
 [94] htmlwidgets_1.6.4      whisker_0.4.1          ModelMetrics_1.2.2.2  
 [97] uwot_0.1.16            pkgconfig_2.0.3        gtable_0.3.4          
[100] timeDate_4032.109      lmtest_0.9-40          furrr_0.3.1           
[103] htmltools_0.5.7        dotCall64_1.1-1        scales_1.3.0          
[106] png_0.1-8              gower_1.0.1            knitr_1.45            
[109] rstudioapi_0.15.0      tzdb_0.4.0             reshape2_1.4.4        
[112] nlme_3.1-164           cachem_1.0.8           zoo_1.8-12            
[115] KernSmooth_2.23-22     parallel_4.3.2         miniUI_0.1.1.1        
[118] pillar_1.9.0           grid_4.3.2             vctrs_0.6.5           
[121] RANN_2.6.1             promises_1.2.1         xtable_1.8-4          
[124] cluster_2.1.6          archive_1.1.7          evaluate_0.23         
[127] cli_3.6.2              compiler_4.3.2         crayon_1.5.2          
[130] rlang_1.1.2            future.apply_1.11.1    labeling_0.4.3        
[133] ps_1.7.5               getPass_0.2-4          plyr_1.8.9            
[136] fs_1.6.3               stringi_1.8.3          viridisLite_0.4.2     
[139] deldir_2.0-4           assertthat_0.2.1       munsell_0.5.0         
[142] lazyeval_0.2.2         spatstat.geom_3.2-7    Matrix_1.6-4          
[145] RcppHNSW_0.5.0         hms_1.1.3              patchwork_1.2.0.9000  
[148] bit64_4.0.5            shiny_1.8.0            highr_0.10            
[151] ROCR_1.0-11            igraph_1.6.0           bslib_0.6.1           
[154] bit_4.0.5

sessionInfo()

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] distances_0.1.10   lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1     
 [5] dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.0       
 [9] tibble_3.2.1       ggplot2_3.5.0      tidyverse_2.0.0    Seurat_5.0.1      
[13] SeuratObject_5.0.1 sp_2.1-2           DOT_0.0.0.9000     future_1.33.1     
[17] mistyR_1.10.0      workflowr_1.7.1   

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.21       splines_4.3.2          later_1.3.2           
  [4] filelock_1.0.3         fields_15.2            R.oo_1.26.0           
  [7] polyclip_1.10-6        hardhat_1.3.1          pROC_1.18.5           
 [10] rpart_4.1.23           fastDummies_1.7.3      lifecycle_1.0.4       
 [13] rprojroot_2.0.4        vroom_1.6.5            globals_0.16.3        
 [16] processx_3.8.3         lattice_0.22-5         MASS_7.3-60           
 [19] magrittr_2.0.3         plotly_4.10.3          sass_0.4.8            
 [22] rmarkdown_2.25         jquerylib_0.1.4        yaml_2.3.8            
 [25] rlist_0.4.6.2          httpuv_1.6.13          sctransform_0.4.1     
 [28] spam_2.10-0            spatstat.sparse_3.0-3  reticulate_1.34.0     
 [31] cowplot_1.1.2          pbapply_1.7-2          RColorBrewer_1.1-3    
 [34] maps_3.4.2             abind_1.4-5            Rtsne_0.17            
 [37] R.utils_2.12.3         nnet_7.3-19            ipred_0.9-14          
 [40] git2r_0.33.0           lava_1.8.0             ggrepel_0.9.4         
 [43] irlba_2.3.5.1          listenv_0.9.1          spatstat.utils_3.0-4  
 [46] goftest_1.2-3          RSpectra_0.16-1        spatstat.random_3.2-2 
 [49] fitdistrplus_1.1-11    parallelly_1.37.1      leiden_0.4.3.1        
 [52] codetools_0.2-19       tidyselect_1.2.0       farver_2.1.1          
 [55] stats4_4.3.2           matrixStats_1.2.0      spatstat.explore_3.2-5
 [58] jsonlite_1.8.8         caret_6.0-94           ellipsis_0.3.2        
 [61] progressr_0.14.0       ggridges_0.5.5         survival_3.5-7        
 [64] iterators_1.0.14       foreach_1.5.2          tools_4.3.2           
 [67] ica_1.0-3              Rcpp_1.0.11            glue_1.6.2            
 [70] prodlim_2023.08.28     gridExtra_2.3          xfun_0.41             
 [73] ranger_0.16.0          withr_3.0.0            fastmap_1.1.1         
 [76] fansi_1.0.6            callr_3.7.3            digest_0.6.33         
 [79] timechange_0.3.0       R6_2.5.1               mime_0.12             
 [82] colorspace_2.1-0       scattermore_1.2        tensor_1.5            
 [85] spatstat.data_3.0-3    R.methodsS3_1.8.2      utf8_1.2.4            
 [88] generics_0.1.3         data.table_1.15.2      recipes_1.0.10        
 [91] class_7.3-22           httr_1.4.7             ridge_3.3             
 [94] htmlwidgets_1.6.4      whisker_0.4.1          ModelMetrics_1.2.2.2  
 [97] uwot_0.1.16            pkgconfig_2.0.3        gtable_0.3.4          
[100] timeDate_4032.109      lmtest_0.9-40          furrr_0.3.1           
[103] htmltools_0.5.7        dotCall64_1.1-1        scales_1.3.0          
[106] png_0.1-8              gower_1.0.1            knitr_1.45            
[109] rstudioapi_0.15.0      tzdb_0.4.0             reshape2_1.4.4        
[112] nlme_3.1-164           cachem_1.0.8           zoo_1.8-12            
[115] KernSmooth_2.23-22     parallel_4.3.2         miniUI_0.1.1.1        
[118] pillar_1.9.0           grid_4.3.2             vctrs_0.6.5           
[121] RANN_2.6.1             promises_1.2.1         xtable_1.8-4          
[124] cluster_2.1.6          archive_1.1.7          evaluate_0.23         
[127] cli_3.6.2              compiler_4.3.2         crayon_1.5.2          
[130] rlang_1.1.2            future.apply_1.11.1    labeling_0.4.3        
[133] ps_1.7.5               getPass_0.2-4          plyr_1.8.9            
[136] fs_1.6.3               stringi_1.8.3          viridisLite_0.4.2     
[139] deldir_2.0-4           assertthat_0.2.1       munsell_0.5.0         
[142] lazyeval_0.2.2         spatstat.geom_3.2-7    Matrix_1.6-4          
[145] RcppHNSW_0.5.0         hms_1.1.3              patchwork_1.2.0.9000  
[148] bit64_4.0.5            shiny_1.8.0            highr_0.10            
[151] ROCR_1.0-11            igraph_1.6.0           bslib_0.6.1           
[154] bit_4.0.5