Last updated: 2025-05-20

Checks: 7 0

Knit directory: CosMx_pipeline_LGA/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20250517)

The command set.seed(20250517) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 2b284f3

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 2b284f3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    NBClust-Plots/
    Ignored:    analysis/figure/
    Ignored:    data/flatFiles/CoronalHemisphere/Run1000_S1_Half_exprMat_file.csv
    Ignored:    data/flatFiles/CoronalHemisphere/Run1000_S1_Half_fov_positions_file.csv
    Ignored:    data/flatFiles/CoronalHemisphere/Run1000_S1_Half_metadata_file.csv
    Ignored:    output/processed_data/Log/
    Ignored:    output/processed_data/RC/
    Ignored:    output/processed_data/SCT/
    Ignored:    output/processed_data/exprMat_unfiltered.RDS
    Ignored:    output/processed_data/fov_positions_unfiltered.RDS
    Ignored:    output/processed_data/metadata_unfiltered.RDS
    Ignored:    output/processed_data/negMat_unfiltered.RDS
    Ignored:    output/processed_data/seu_filtered.RDS
    Ignored:    output/processed_data/seu_semifiltered.RDS

Untracked files:
    Untracked:  analysis/images/

Unstaged changes:
    Modified:   output/performance_reports/0.0_data_loading_PR.csv
    Modified:   output/performance_reports/1.0_qc_and_filtering_PR.csv
    Modified:   output/performance_reports/2.0_normalization_PR.csv
    Modified:   output/performance_reports/3.0_dimensional_reduction_PR.csv
    Modified:   output/performance_reports/4.0_insitutype_cell_typing_PR.csv
    Modified:   output/performance_reports/4.1_insitutype_unsup_clustering_PR.csv
    Modified:   output/performance_reports/4.2_seurat_unsup_clustering_PR.csv
    Modified:   output/performance_reports/5.0_RC_normalization_PR.csv
    Modified:   output/performance_reports/5.1_RC_dimensional_reduction_PR.csv
    Modified:   output/performance_reports/6.0_Log_normalization_PR.csv
    Modified:   output/performance_reports/6.1_Log_dimensional_reduction_PR.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/clus_examples.Rmd) and HTML (docs/clus_examples.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	2b284f3	lidiaga	2025-05-20	Publish all for the first time
Rmd	815e2a7	lidiaga	2025-05-19	Fix plot sizes in clus_examples.Rmd
Rmd	cf77662	lidiaga	2025-05-19	Edit website files

Introduction

Cell typing is a fundamental step in gene expression analysis, as from this point onward, the results and subsequent analyses acquire biological meaning. In practice, this can be approached through unsupervised clustering, where cells are grouped based on the similarity of their gene expression profiles [1,2], followed by annotation of the resulting clusters based on their markers [3]. However, there are other approaches, such as supervised classification, which uses reference profiles to assign predefined cell types to cells based on their expression profiles; or semi-supervised methods, which classify cells based on a reference while also allowing the detection of new clusters or rare populations [3].

While clustering in scRNA-seq relies solely on gene expression, the multimodal nature of spatial transcriptomics, which includes spatial location data and histological or immunofluorescence images, has led to the development of new clustering methods that integrate this information to improve clustering quality [3]. In this context, the InSituType algorithm, developed by Danaher et al. [4] as part of Nanostring® official tools, allows the incorporation of complementary information and supports all three mentioned approaches: unsupervised or semi-supervised clustering, and supervised classification. For this reason, it was considered a particularly suitable option for this pipeline.

In this project, the proposed pipeline incorporates supervised and unsupervised InSituType, as well as an unsupervised clustering alternative, by executing the FindNeighbors and FindClusters functions from the Seurat package:

Supervised InSituType: based on a probabilistic model, employing a Bayesian classifier in its supervised version [4].
Unsupervised InSituType: based on a probabilistic model, employing an Expectation Maximization (EM) algorithm for the unsupervised and semi-supervised methods [4].
Seurat’s unsupervised clustering, using the Louvain algorithm: Seurat clustering functions allow for the application of different graph-based clustering methods, such as the Louvain algorithm. Despite using only gene expression for clustering, this algorithm has been reported to have a strong performance with spatial transcriptomics data, comparable to methods that also incorporate spatial information [3].

For simplicity, in the main pipeline example only one approach has been shown — Supervised InSituType. However, in this section, all three alternatives can be explored.

Examples

Supervised InSituType

Cell typing with InSituType Supervised classification

To run the InSituType supervised algorithm, three inputs are needed: 1) the raw, unnormalized expression matrix; 2) a vector with the mean negative probe expression per cell; and 3) a reference profile.

Nanostring® provides various public profiles, both from scRNA-seq and CosMx™ SMI data. However, the function can also take other sources-profiles as long as they have the appropriate formatting.

With this information, the function will assigned pre-defined cell types from the reference to the cells according to their expression profiles.

Unsupervised InSituType

Unsupervised clustering with InSituType + ScType annotation based on markers

To run the InSituType unsupervised clustering method, the needed inputs are: 1) the raw, unnormalized expression matrix; 2) a vector with the mean negative probe expression per cell; and 3) a number or range of clusters to be evaluated.

If a range is provided, the algorithm executes the clustering with all the different numbers of clusters and returns the one that provides the best fit.

Afterwards, in the proposed pipeline, annotation has been performed using the ScType package, a computational method for automated annotation based on marker genes [5].

Unsupervised clustering with Seurat-Louvain + ScType annotation based on markers

In this case, the Seurat method can be run directly onto the Seurat object, providing the reduction and number of components to work with.

This algorithm, unlike InSituType, does not require a specific number of clusters to be evaluated, instead, it calculates the appropriate number of clusters for the desired resolution. Afterwards, in the proposed pipeline, annotation has been performed using the ScType package [5], as previously.

Performance

Chunk	Time_sec	Memory_Mb
Sup	988.45	63.5
Unsup	159.81	16.6
SNNClust	64.89	61.1

In terms of time, the Seurat approach is faster out of the three. followed by InSituType unsupervised. However, it has to be mentioned that, in the supervised InSitutype method, a single step provides a full annotation of the cells, while in the unsupervised clustering methods, subsequent steps are needed to annotate the obtained clusters.

On the other side, InSituType algorithm seems more memory efficient overall, showing very reduced memory consumption in the Unsupervised method and very comparable values between the supervised method and the unsupervised Seurat’s approach.

Bibligraphy

Qi R, Ma A, Ma Q, Zou Q. Clustering and classification methods for single-cell RNA-sequencing data. Briefings in Bioinformatics [Internet]. 2020 Jul 15 [cited 2025 May 7];21(4):1196–208. Available from: https://doi.org/10.1093/bib/bbz062
Zhang S, Li X, Lin J, Lin Q, Wong KC. Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA [Internet]. 2023 May [cited 2025 May 7];29(5):517–30. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10158997/
Cheng A, Hu G, Li WV. Benchmarking cell-type clustering methods for spatially resolved transcriptomics data. Briefings in Bioinformatics [Internet]. 2023 Jan 1 [cited 2025 May 7];24(1):bbac475. Available from: https://doi.org/10.1093/bib/bbac475
Danaher P, Zhao E, Yang Z, Ross D, Gregory M, Reitz Z, et al. Insitutype: likelihood-based cell typing for single cell spatial transcriptomics [Internet]. Bioinformatics; 2022 [cited 2025 May 7]. Available from: http://biorxiv.org/lookup/doi/10.1101/2022.10.19.512902
Ianevski A, Giri AK, Aittokallio T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat Commun [Internet]. 2022 Mar 10 [cited 2025 Apr 16];13(1):1246. Available from: https://www.nature.com/articles/s41467-022-28803-w

sessionInfo()