Last updated: 2024-06-25

Checks: 6 1

Knit directory: KODAMA-Analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20240618) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version c515217. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  .gitignore

Unstaged changes:
    Modified:   analysis/MERFISH.Rmd
    Modified:   data/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/MERFISH.Rmd) and HTML (docs/MERFISH.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd e56c4ff GitHub 2024-06-20 Update MERFISH.Rmd
Rmd 9df203d GitHub 2024-06-20 Update MERFISH.Rmd
Rmd be3cdeb GitHub 2024-06-20 Update MERFISH.Rmd
Rmd 2e7fa5e GitHub 2024-06-20 Update MERFISH.Rmd
Rmd a21b903 GitHub 2024-06-20 Update MERFISH.Rmd
html ee4ee17 GitHub 2024-06-19 Add files via upload
Rmd 615fc05 GitHub 2024-06-19 Add files via upload

Describe your project.

MERFISH Data Analysis

The molecular imaging technology MERFISH (Multiplexed Error-Robust Fluorescence In Situ Hybridization) has revolutionized RNA mapping at the cellular level with unmatched spatial precision. This technique enables the simultaneous visualization of numerous genes within individual cells, offering unique insights into molecular processes in biological tissues. MERFISH data provides detailed information on cell structure and function, paving the way for new discoveries in cellular and molecular biology.

The dataset analyzed, as presented in this article, includes information on 30 mice from the hypothalamic preoptic region of the brain, derived from samples taken from various regions of the same animal. The “Bregma” column indicates the slice location in Bregma coordinates, an anatomical reference point used in neuroscience.

Our focus is on advanced analysis of MERFISH data from biological tissues, introducing key advancements such as a 12 Extended Layer Analysis method for detailed tissue structure understanding and a 3D clustering feature for accurately representing the spatial relationships and RNA interactions within cells.

Our main objective is to integrate multiple tissue sections for unified analysis across different samples. We employ data preprocessing, dimension reduction, and advanced clustering techniques to explore the relationship between tissue structure and cellular behavior, highlighting spatial variations and gene expression patterns associated with specific biological functions.

Load necessary libraries

library(Seurat)
Le chargement a nécessité le package : SeuratObject
Le chargement a nécessité le package : sp

Attachement du package : 'SeuratObject'
Les objets suivants sont masqués depuis 'package:base':

    intersect, t
library(KODAMAextra)
Le chargement a nécessité le package : KODAMA
Le chargement a nécessité le package : minerva
Le chargement a nécessité le package : Rtsne
Le chargement a nécessité le package : umap

Attachement du package : 'KODAMA'
L'objet suivant est masqué depuis 'package:umap':

    umap.defaults
Le chargement a nécessité le package : parallel
Le chargement a nécessité le package : doParallel
Le chargement a nécessité le package : foreach
Le chargement a nécessité le package : iterators
Le chargement a nécessité le package : e1071
library(ggplot2)
library(NMI)
library(mclust)
Package 'mclust' version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.
library(bluster)
library(igraph)

Attachement du package : 'igraph'
L'objet suivant est masqué depuis 'package:KODAMA':

    vertex
L'objet suivant est masqué depuis 'package:Seurat':

    components
Les objets suivants sont masqués depuis 'package:stats':

    decompose, spectrum
L'objet suivant est masqué depuis 'package:base':

    union

Downloading the Data

The data is downloaded from this link for the continuation of the work.

# Load data
data <- read.csv("data/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv")
data <- as.data.frame(data)

Data Pre-processing

In our data preprocessing process, we have undertaken a rigorous approach to ensure the quality and relevance of the information we use in our analysis. With a total of 170 genes initially included in our data, our first step was to eliminate the “Blank” genes, which are control or background genes that do not contribute to our main analysis. This elimination allowed us to reduce our data set to 165 genes. Subsequently, we identified and removed the “Fos” gene, suspected of having an unwanted or disruptive impact on our data, leaving a total of 164 genes for our analysis. These initial cleaning steps are crucial to ensure that the data we use is reliable and suitable for our analysis objectives. By then filtering the data to retain only those corresponding to Animal_ID 1 and specific Bregma values, we ensure that we focus on a subset relevant to our study. This preprocessing process is essential to prepare our data properly, by eliminating unnecessary elements and focusing on the most relevant data for our subsequent analysis.

# Data cleaning
colnames_data <- colnames(data)
blankgenes <- grep("Blank", colnames_data)
fos_index <- which(colnames(data) == "Fos")
df <- data[, -c(fos_index, blankgenes)]
selected_bregma <- unique(df$Bregma)

# Filter data for Animal_ID 1 and specific Bregma values
exp <- subset(df, Animal_ID == 1 & Bregma %in% selected_bregma)
rownames(exp) <- exp$Cell_ID
data_list <- split(exp, exp$Bregma)
data_list <- data_list[match(selected_bregma, names(data_list))]


# Initialize lists 
xy <- list()
pca <- list()
v <- list()
kk <- list()
vis <- list()
pred <- list()
refine <- list()
cons <- list()
ARI <- list()
NMI <- list()


# Normalization and dimension reduction by PCA
for (i in names(data_list)) {
  print(i)
  x <- data_list[[i]]$Centroid_X - min(data_list[[i]]$Centroid_X)
  y <- data_list[[i]]$Centroid_Y - min(data_list[[i]]$Centroid_Y)
  xy[[i]] <- cbind(x, y)
  rownames(xy[[i]]) <- rownames(data_list[[i]])
  cons[[i]] <- t(data_list[[i]][, 10:ncol(data_list[[i]])])
  colnames(cons[[i]]) <- rownames(data_list[[i]])
  v[[i]] <- t(LogNormalize(cons[[i]]))

#dimensionality reduction BY Principal Component Analysis (PCA)

  pca[[i]] <- prcomp(v[[i]], scale. = TRUE)$x[, 1:50]

#  KODAMA clustering
    kk[[i]] <- KODAMA.matrix.parallel(pca[[i]], spatial = xy[[i]], FUN = "PLS", landmarks = 10000, n.cores = 4)
    vis[[i]] <- KODAMA.visualization(kk[[i]], method = "UMAP")
    names(vis[[i]]) <- names(data_list[[i]])
    
    # Graph-based clustering
    g <- bluster::makeSNNGraph(as.matrix(vis[[i]]), k = 100)
    g_walk <- igraph::cluster_walktrap(g)
    pred[[i]] <- as.character(igraph::cut_at(g_walk, no = 8))
    refine[[i]] <- refinecluster(pred[[i]], xy[[i]], shape = "hexagon")}
[1] "0.26"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "0.21"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "0.16"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "0.11"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "0.06"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "0.01"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "-0.04"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "-0.09"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "-0.14"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "-0.19"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "-0.24"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================
[1] "-0.29"
groupe de processus socket avec 4 noeuds sur l'hôte 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================

#Visualization For better visualization of the results, it is essential to import the vis.R file at this stage.

# Importing the vis.R code
source("data/vis.R")

Visualizing the results of the Kodama clustering, a dimensionality reduction method that enhances data understanding.

# Define colors for visualizations
cols <- c("#669bbc", "#81b29a", "#f2cc8f", "#adc178",
          "#dde5b6", "#a8dadc", "#e5989b", "#e07a5f")

# Visualize clusters
plotClustersFacet(xy, pred, selected_bregma, size = 0.2) +
  scale_color_manual("Domain", values = cols) +
  guides(color = guide_legend(nrow = 1, override.aes = list(size = 2)))

plotClustersFacet(xy, refine, selected_bregma, size = 1) +
  scale_color_manual("Domain", values = cols) +
  guides(color = guide_legend(nrow = 1, override.aes = list(size = 2)))

This script illustrates a standard approach to Merfish data analysis, showcasing the use of commonly used bioinformatics tools. It provides a comprehensive overview of the Merfish data analysis and visualization process, crucial for understanding the spatial distribution of RNA in cells.


sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
[3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
[5] LC_TIME=French_France.utf8    

time zone: Africa/Johannesburg
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] igraph_2.0.3       bluster_1.12.0     mclust_6.1.1       NMI_2.0           
 [5] ggplot2_3.5.1      KODAMAextra_1.0    e1071_1.7-14       doParallel_1.0.17 
 [9] iterators_1.0.14   foreach_1.5.2      KODAMA_3.1         umap_0.2.10.0     
[13] Rtsne_0.17         minerva_1.5.10     Seurat_5.1.0       SeuratObject_5.0.2
[17] sp_2.1-4          

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.16.0      jsonlite_1.8.8        
  [4] magrittr_2.0.3         spatstat.utils_3.0-5   farver_2.1.2          
  [7] rmarkdown_2.27         fs_1.6.4               vctrs_0.6.5           
 [10] ROCR_1.0-11            spatstat.explore_3.2-7 askpass_1.2.0         
 [13] htmltools_0.5.8.1      BiocNeighbors_1.20.2   sass_0.4.9            
 [16] sctransform_0.4.1      parallelly_1.37.1      KernSmooth_2.23-24    
 [19] bslib_0.7.0            htmlwidgets_1.6.4      ica_1.0-3             
 [22] plyr_1.8.9             plotly_4.10.4          zoo_1.8-12            
 [25] cachem_1.1.0           whisker_0.4.1          mime_0.12             
 [28] lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.6-5          
 [31] R6_2.5.1               fastmap_1.2.0          fitdistrplus_1.1-11   
 [34] future_1.33.2          shiny_1.8.1.1          digest_0.6.35         
 [37] colorspace_2.1-0       S4Vectors_0.40.2       patchwork_1.2.0       
 [40] rprojroot_2.0.4        tensor_1.5             RSpectra_0.16-1       
 [43] irlba_2.3.5.1          labeling_0.4.3         progressr_0.14.0      
 [46] fansi_1.0.6            spatstat.sparse_3.0-3  httr_1.4.7            
 [49] polyclip_1.10-6        abind_1.4-5            compiler_4.3.3        
 [52] proxy_0.4-27           withr_3.0.0            BiocParallel_1.36.0   
 [55] fastDummies_1.7.3      highr_0.11             MASS_7.3-60.0.1       
 [58] openssl_2.2.0          tools_4.3.3            lmtest_0.9-40         
 [61] httpuv_1.6.15          future.apply_1.11.2    goftest_1.2-3         
 [64] glue_1.7.0             nlme_3.1-165           promises_1.3.0        
 [67] grid_4.3.3             cluster_2.1.6          reshape2_1.4.4        
 [70] snow_0.4-4             generics_0.1.3         gtable_0.3.5          
 [73] spatstat.data_3.0-4    class_7.3-22           tidyr_1.3.1           
 [76] data.table_1.15.4      utf8_1.2.4             BiocGenerics_0.48.1   
 [79] spatstat.geom_3.2-9    RcppAnnoy_0.0.22       ggrepel_0.9.5         
 [82] RANN_2.6.1             pillar_1.9.0           stringr_1.5.1         
 [85] spam_2.10-0            RcppHNSW_0.6.0         later_1.3.2           
 [88] splines_4.3.3          dplyr_1.1.4            lattice_0.22-6        
 [91] survival_3.7-0         deldir_2.0-4           tidyselect_1.2.1      
 [94] miniUI_0.1.1.1         pbapply_1.7-2          knitr_1.47            
 [97] git2r_0.33.0           gridExtra_2.3          scattermore_1.2       
[100] stats4_4.3.3           xfun_0.45              matrixStats_1.3.0     
[103] stringi_1.8.4          workflowr_1.7.1        lazyeval_0.2.2        
[106] yaml_2.3.8             evaluate_0.24.0        codetools_0.2-20      
[109] tibble_3.2.1           cli_3.6.2              uwot_0.2.2            
[112] xtable_1.8-4           reticulate_1.37.0      munsell_0.5.1         
[115] jquerylib_0.1.4        Rcpp_1.0.12            doSNOW_1.0.20         
[118] globals_0.16.3         spatstat.random_3.2-3  png_0.1-8             
[121] dotCall64_1.1-1        listenv_0.9.1          viridisLite_0.4.2     
[124] scales_1.3.0           ggridges_0.5.6         leiden_0.4.3.1        
[127] purrr_1.0.2            rlang_1.1.3            cowplot_1.1.3