Last updated: 2024-07-08

Checks: 7 0

Knit directory: KODAMA-Analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20240618)

The command set.seed(20240618) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 5224b8a

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 5224b8a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  .gitignore

Unstaged changes:
    Deleted:    _site.yml
    Modified:   data/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/simulation.Rmd) and HTML (docs/simulation.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	a6a951e	Stefano Cacciatore	2024-07-08	Build site.
Rmd	075a60e	Stefano Cacciatore	2024-07-08	Start my new project
html	db922b6	GitHub	2024-07-04	Update simulation.html
Rmd	8e5f41c	GitHub	2024-06-20	Update simulation.Rmd
html	ee4ee17	GitHub	2024-06-19	Add files via upload
Rmd	615fc05	GitHub	2024-06-19	Add files via upload

Introduction

In the context of working with biological spatial data, such as spatial sequencing data, which provide information on the location of biological molecules in a sample, the goal is to identify clusters of molecules grouped in specific areas, thus indicating regions of particular biological activity. For example, in immunology, analyzing the spatial distribution of immune cells within tissues can reveal insights into immune responses and disease mechanisms. Techniques like KODAMA and UMAP are valuable tools for this type of analysis. In the domain of immunology, we could simulate spatial data to study the distribution of immune cells within a tissue. For instance, simulating the localization of T cells and B cells within a lymph node can help us understand how these cells are organized spatially, which is critical for understanding their interactions and functions in immune responses.

Main simulation study

Scenario 1

The simulated data in this example are generated to demonstrate how to use the KODAMA and UMAP techniques for cluster and spatial domain analysis. We create two distinct regions representing areas rich in T cells and B cells within the lymph node. We generate x and y coordinates for the two regions, assigning labels to simulate different cell types (T cells and B cells).

library("KODAMA")
library("KODAMAextra")
x1=runif(100,min=-1,max=0)
x2=runif(100,min=0,max=1)
y1=runif(100)
y2=runif(100)
x=c(x1,x2)
y=c(y1,y2)
xy=cbind(x,y)

labels=rep(c(TRUE,FALSE),each=100)
ss=sample(100,5)
labels[ss]=!labels[ss]
labels=labels
region=rep(1:0,each=100)

data=cbind(rnorm(200,mean=labels,sd=0.1),
           rnorm(200,mean=labels,sd=0.1))

color.code=c("#1d79d0aa","#53ca3eaa")

###Visualization with KODAMA and UMAP

We use KODAMA and UMAP methods to analyze and visualize the data, applying dimensionality reduction and identifying clusters and spatial domains.

kk <- KODAMA.matrix.parallel(data,spatial=xy,spatial.resolution=0.1,M=100,
                      FUN="PLS",
                      landmarks = 100000,
                      splitting = 100,
                      f.par.pls =  2,
                      n.cores=4)

socket cluster with 4 nodes on host 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================

vis=RunKODAMAvisualization(kk,method="UMAP")

u=umap(data)$layout

par(mfrow=c(1,3))

plot(x,y,bg=color.code[region+1],pch=21+2*!labels,cex=1+1.5*region,axes=FALSE,main="Position")
axis(1)
axis(2,las=2)
box()

plot(vis,type="n",axes=FALSE,main="KODAMA")
axis(1)
axis(2,las=2)
box()
points(vis[!labels,],bg=color.code[region+1][!labels],pch=21+2,cex=1+1.5*region[!labels])
points(vis[labels,],bg=color.code[region+1][labels],pch=21+2*0,cex=1+1.5*region[labels])

plot(u,type="n",axes=FALSE,xlab="Dimension 1",ylab="Dimension 2",main="UMAP")
axis(1)
axis(2,las=2)
box()
points(u[!labels,],bg=color.code[region+1][!labels],pch=21+2,cex=1+1.5*region[!labels])
points(u[labels,],bg=color.code[region+1][labels],pch=21+2*0,cex=1+1.5*region[labels])

Version	Author	Date
a6a951e	Stefano Cacciatore	2024-07-08
ee4ee17	GitHub	2024-06-19

The generated visualizations show the position of the points, as well as the results of the KODAMA and UMAP analyses. The visualizations indicate the molecular positions and identified clusters, providing a better understanding of the spatial structure of the immune cells within the lymph node. This can help identify regions of interest for further analysis, with significant implications for research in immunology and the development of targeted therapies.

Scenario 2

In this section, we build upon the previous data generation and visualization methods to demonstrate advanced manipulations and visualizations using KODAMA and UMAP techniques.

###Data Generation and Setting the Random Seed

Continuing from the previous section, we set the random seed and generate data for two distinct regions, assigning labels and specific regions to each data point.

x1=runif(100,min=-1,max=0)
x2=runif(100,min=0,max=1)
y1=runif(100)
y2=runif(100)
x=c(x1,x2)
y=c(y1,y2)
xy=cbind(x,y)


labels=rep(c(TRUE,FALSE),each=100)

region=rep(1:0,each=100)

data=cbind(rnorm(200,mean=1+labels,sd=0.1),
           rnorm(200,mean=1+labels,sd=0.1),
           rnorm(200,mean=1+labels,sd=0.1),
           rnorm(200,mean=1+labels,sd=0.1))

ll=length(data)
ss=sample(ll,ll*0.5)
data[ss]=0
color.code=c("#1d79d0aa","#53ca3eaa")

sel=apply(data,1,function(x) sum(x>0))>2
data=data[sel,]
region=region[sel]
labels=labels[sel]
xy=xy[sel,]


labels=data>0
labels=labels[,1]+
  labels[,2]*2+
  labels[,3]*4+
  labels[,4]*8
labels=as.numeric(as.factor(labels))+1

###Visualization with KODAMA and UMAP

We apply Principal Component Analysis (PCA) on the filtered data and use KODAMA and UMAP methods to analyze and visualize the data, applying dimensionality reduction and identifying clusters and spatial domains

pca=prcomp(scale(data))$x

kk <- KODAMA.matrix.parallel(pca,spatial=xy,spatial.resolution=0.1,M=100,
                             FUN="PLS",
                             landmarks = 100000,
                             splitting = 100,
                             f.par.pls =  10,
                             n.cores=4)

The number of components selected for PLS-DA is too high and it will be automatically reduced to 4

socket cluster with 4 nodes on host 'localhost'
================================================================================[1] "Finished parallel computation"

[1] "Calculation of dissimilarity matrix..."
================================================================================

config=umap.defaults
config$n_neighbors=15
vis=RunKODAMAvisualization(kk,method="UMAP",config=config)

u=umap(pca)$layout
old.par = par(mfrow=c(1,3))



plot(xy,bg=labels,pch=21+2*(region),cex=2+1*region,axes=FALSE,main="Position")
axis(1)
axis(2,las=2)
box()

plot(vis,bg=labels,pch=21+2*(region),cex=1+1.5*region,axes=FALSE,main="KODAMA")
axis(1)
axis(2,las=2)
box()

plot(u,bg=labels,pch=21+2*(region),cex=1+1.5*region,axes=FALSE,main="UMAP")
axis(1)
axis(2,las=2)
box()

Version	Author	Date
a6a951e	Stefano Cacciatore	2024-07-08

par(old.par)

By integrating advanced manipulations and visualizations, we gain deeper insights into spatial data analysis, providing a foundation for further exploration and research in complex biological systems.

sessionInfo()

R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Africa/Johannesburg
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] KODAMAextra_1.0   e1071_1.7-14      doParallel_1.0.17 iterators_1.0.14 
[5] foreach_1.5.2     KODAMA_3.1        umap_0.2.10.0     Rtsne_0.17       
[9] minerva_1.5.10   

loaded via a namespace (and not attached):
 [1] sass_0.4.9        utf8_1.2.4        class_7.3-22      stringi_1.8.4    
 [5] lattice_0.22-6    digest_0.6.36     magrittr_2.0.3    evaluate_0.24.0  
 [9] grid_4.3.3        fastmap_1.2.0     rprojroot_2.0.4   workflowr_1.7.1  
[13] jsonlite_1.8.8    Matrix_1.6-5      whisker_0.4.1     RSpectra_0.16-1  
[17] doSNOW_1.0.20     promises_1.3.0    fansi_1.0.6       codetools_0.2-20 
[21] jquerylib_0.1.4   cli_3.6.2         rlang_1.1.4       cachem_1.1.0     
[25] yaml_2.3.8        tools_4.3.3       httpuv_1.6.15     reticulate_1.38.0
[29] vctrs_0.6.5       R6_2.5.1          png_0.1-8         proxy_0.4-27     
[33] lifecycle_1.0.4   git2r_0.33.0      stringr_1.5.1     fs_1.6.4         
[37] pkgconfig_2.0.3   pillar_1.9.0      bslib_0.7.0       later_1.3.2      
[41] glue_1.7.0        Rcpp_1.0.12       highr_0.11        xfun_0.45        
[45] tibble_3.2.1      rstudioapi_0.16.0 knitr_1.47        snow_0.4-4       
[49] htmltools_0.5.8.1 rmarkdown_2.27    compiler_4.3.3    askpass_1.2.0    
[53] openssl_2.2.0

About

Introduction

Main simulation study

Scenario 1

Scenario 2