Visium part 1

Last updated: 2025-10-02

Checks: 7 0

Knit directory: 2025_cytoconnect_spatial_workshop/

This reproducible R Markdown analysis was created with workflowr (version 1.7.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20251002)

The command set.seed(20251002) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 383f3f1

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 383f3f1. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store

Untracked files:
    Untracked:  data/sc_seurat_object_10x.qs
    Untracked:  data/visium/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/visium_01.Rmd) and HTML (docs/visium_01.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	383f3f1	Givanna Putri	2025-10-02	First publish for website

Introduction

In this part of the visium workshop, we will introduce the dataset we will be using in the visium workshop, learn how to load it up into a Seurat object and interrogate it.

Load libraries

library(Seurat)
library(qs2)

The dataset

The data used for this tutorial is an FFPE tissue from a patient with colorectal cancer. It is described in a bioRxiv paper: https://doi.org/10.1101/2024.06.04.597233

The data can be downloaded from here. Choose Visium CytAssist v2 Sample P2 CRC on the tab on the left.

Obtaining the spot x gene matrix

To get a spot x gene matrix, one typically have to process the raw data using the Spaceranger software from 10x. Depending on your experiment, running Spaceranger can be very complex. It is beyond the scope of this workshop.

If you are keen to try running Spaceranger, here is a bash script you can use for this dataset:

#!/bin/bash

# make sure you specify which folder is datdir and refdir
spaceranger count \
    --id="Visium_FFPE_p2_crc" \
    --transcriptome=$refdir \
    --probe-set=$datadir/Visium_V2_Human_Colon_Cancer_P2_probe_set.csv \
    --fastqs=$datadir/Visium_V2_Human_Colon_Cancer_P2_fastqs \
    --cytaimage=$datadir/Visium_V2_Human_Colon_Cancer_P2_image.tif \
    --image=$datadir/Visium_V2_Human_Colon_Cancer_P2_tissue_image.btf \
    --slide=V53A10-078 \
    --area=B1\
    --loupe-alignment=$datadir/Visium_V2_Human_Colon_Cancer_P2_alignment_file.json \
    --localcores=20 \
    --localmem=60 \
    --create-bam=true

Before running spaceranger, make sure you download all the input files use in the code block above from here - (choose Visium CytAssist v2 Sample P2 CRC), as well as the relevant pre-built reference genome here.

Spaceranger is quite resource hungry. So make sure you run it in a computer with a sizeable amount of resources available or preferably on a High Performance Computing platform.

After running Spaceranger, you will get a lot of folders and files. The ones that you need to create a Seurat object are located in the outs folder. It will contain the following files and folders:

tree outs/

outs/
├── analysis/
│   ├── clustering/
│   ├── diffexp/
│   ├── pca/
│   ├── tsne/
│   └── umap/
├── cloupe.cloupe
├── deconvolution/
├── filtered_feature_bc_matrix/
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── filtered_feature_bc_matrix.h5
├── metrics_summary.csv
├── molecule_info.h5
├── possorted_genome_bam.bam
├── possorted_genome_bam.bam.bai
├── probe_set.csv
├── raw_feature_bc_matrix/
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── raw_feature_bc_matrix.h5
├── raw_probe_bc_matrix.h5
├── spatial/
│   ├── aligned_fiducials.jpg
│   ├── aligned_tissue_image.jpg
│   ├── cytassist_image.tiff
│   ├── detected_tissue_image.jpg
│   ├── scalefactors_json.json
│   ├── spatial_enrichment.csv
│   ├── tissue_hires_image.png
│   ├── tissue_lowres_image.png
│   └── tissue_positions.csv
└── web_summary.html

Brief description of the most frequently used files, some of which, we will be using in this workshop:

web_summary.html: Spaceranger run summary metrics and plots. This file will give you some information about the samples, e.g., how many reads were detected and successfully mapped to the reference genome, etc.
cloupe.cloupe: Loupe Browser visualization and analysis file to load into 10x Loupe Browser software. Spaceranger will run few cookie cutter analysis on your data. You can view the output by loading this file into the Loupe Browser.
spatial/: A folder containing outputs that capture the spatial information of your sample. Packages such as Seurat is likely to require files in this folder to work properly.
filtered_feature_bc_matrix (folder or the h5 file): Filtered matrix containing the number of UMIs associated with a gene (row) and a spot barcode (column). The matrix is filtered as it only includes spots associated with the tissue. These information are provided as either a combination of mtx file (for the matrix) and tsv files (for information on the features and barcodes) stored in a folder, or as one HDF5 file (the h5 file). Either can be used as an input to packages such as Seurat for downstream analyses.
raw_feature_bc_matrices (folder or the h5 file): Same information provided in filtered_feature_bc_matrix but unfiltered, i.e. keeping all the barcodes, including those that may not be associated with the tissue. This matrix is a lot larger than the one in filtered_feature_bc_matrix.

Full description of the generated files and folders are available here.

Loading data into Seurat Object

To do this, we will need the H5 file and the spatial folder that came from SpaceRanger. There is no need to purposely run SpaceRanger to get these files. 10x kindly provided them on their website: https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc (choose Visium CytAssist v2 Sample P2 CRC).

The spatial folder is required for Seurat to find the following extra information:

tissue_lowres_image.png: the image of the slide in low resolution.
scalefactors_json.json: the scale factors to convert the high-resolution coordinates and the scaled-down coordinates.
tissue_positions_list.csv: coordinates for each spot

We can use the code block below to import the data into a Seurat object. However, before doing so, you most likely have the change the data.dir parameter to point to the directory where you store the H5 file and the spatial folder. In this example, they are stored in a “data” folder of our current working directory.

dat <- Load10X_Spatial(
  data.dir = file.path(here::here(), "data", "visium", "raw"),
  filename = 'Visium_V2_Human_Colon_Cancer_P2_filtered_feature_bc_matrix.h5'
)

Inspecting Seurat object

We can inspect the object to see the number of spots and genes there are in the dataset.

# 4,269 spots - samples in the seurat object description
# 18,085 genes.
dat

An object of class Seurat 
18085 features across 4269 samples within 1 assay 
Active assay: Spatial (18085 features, 0 variable features)
 1 layer present: counts
 1 spatial field of view present: slice1

A Seurat object can contain multiple Assays to store data from different omics technologies in one object. Since we have only loaded a visium data, we only have one assay called “Spatial”. You can only have one assay “active” at a time. To change the active assay you can run the following code block - with “Spatial” being the name of the assay to activate:

DefaultAssay(dat) <- "Spatial"

Within an assay, we can have multiple Layers, each containing one spot by gene matrix. The idea is we can perform some transformations or apply some functions to the matrix in a layer and store it in another layer. This benefit will be a lot clearer as we progress through the workshop.

By default, when we run Load10x_Spatial, Seurat will store the spot by gene matrix in the Spatial assay under the counts layer. You can change this assay name by specifying a different name for the assay parameter when running the Load10x_Spatial function.

We can retrieve the spot by gene matrix like so:

# showing only 20 genes for 5 spots
head(dat[["Spatial"]]$counts, c(20, 5))

20 x 5 sparse Matrix of class "dgCMatrix"
         AACAATGTGCTCCGAG-1 AACACCATTCGCATAC-1 AACACGACAACGGAGT-1
SAMD11                    1                  1                  .
NOC2L                     4                  .                 12
KLHL17                    .                  .                  1
PLEKHN1                   2                  .                 11
PERM1                     .                  .                  .
HES4                      2                  1                  3
ISG15                    10                  3                 18
AGRN                      6                  .                 25
RNF223                    .                  .                  1
C1orf159                  3                  .                  1
TTLL10                    1                  .                  .
TNFRSF18                  1                  .                  2
TNFRSF4                   .                  .                  1
SDF4                     29                  3                 45
B3GALT6                   6                  .                 12
C1QTNF12                  1                  .                  .
UBE2J2                    6                  2                 14
SCNN1D                    .                  .                  .
ACAP3                    14                  1                  8
PUSL1                     3                  .                  4
         AACACGCAGATAACAA-1 AACACGTTGATACCGC-1
SAMD11                    .                  .
NOC2L                     .                  .
KLHL17                    1                  1
PLEKHN1                   .                  2
PERM1                     .                  1
HES4                      .                  .
ISG15                     .                  1
AGRN                      .                  .
RNF223                    .                  .
C1orf159                  .                  .
TTLL10                    .                  .
TNFRSF18                  .                  1
TNFRSF4                   .                  2
SDF4                      5                  7
B3GALT6                   .                  2
C1QTNF12                  .                  .
UBE2J2                    1                  2
SCNN1D                    .                  1
ACAP3                     .                  5
PUSL1                     1                  .

By default, genes (features) are stored as rows and spots are stored as columns.

Gene and spot metadata

We can retrieve the name of the genes we have loaded in like so:

# only showing 20 genes. Remove the head function to show everything.
head(rownames(dat), 20)

 [1] "SAMD11"   "NOC2L"    "KLHL17"   "PLEKHN1"  "PERM1"    "HES4"    
 [7] "ISG15"    "AGRN"     "RNF223"   "C1orf159" "TTLL10"   "TNFRSF18"
[13] "TNFRSF4"  "SDF4"     "B3GALT6"  "C1QTNF12" "UBE2J2"   "SCNN1D"  
[19] "ACAP3"    "PUSL1"

# only showing 20 genes. Remove the head function to show everything.
head(Features(dat), 20)

 [1] "SAMD11"   "NOC2L"    "KLHL17"   "PLEKHN1"  "PERM1"    "HES4"    
 [7] "ISG15"    "AGRN"     "RNF223"   "C1orf159" "TTLL10"   "TNFRSF18"
[13] "TNFRSF4"  "SDF4"     "B3GALT6"  "C1QTNF12" "UBE2J2"   "SCNN1D"  
[19] "ACAP3"    "PUSL1"

And the unique name assigned to each spot like so:

# only showing 20 spots Remove the head function to show everything.
head(colnames(dat), 20)

 [1] "AACAATGTGCTCCGAG-1" "AACACCATTCGCATAC-1" "AACACGACAACGGAGT-1"
 [4] "AACACGCAGATAACAA-1" "AACACGTTGATACCGC-1" "AACACTCGTGAGCTTC-1"
 [7] "AACAGCCTCCTGACTA-1" "AACAGGAATTCTGTGA-1" "AACAGGCCAGGCCGCT-1"
[10] "AACAGGCCATTGTCAC-1" "AACAGTCGAGGTAGAT-1" "AACAGTCTGACGAATT-1"
[13] "AACATACTCCAGCTGA-1" "AACATAGAAGACTGTT-1" "AACATAGAAGGTGAGT-1"
[16] "AACATAGAAGTGCGCA-1" "AACATAGGAGGCGTCC-1" "AACATAGGTCCATAGC-1"
[19] "AACATAGGTTCCGCAC-1" "AACATAGGTTGGCACC-1"

# only showing 20 spots Remove the head function to show everything.
head(Cells(dat), 20)

 [1] "AACAATGTGCTCCGAG-1" "AACACCATTCGCATAC-1" "AACACGACAACGGAGT-1"
 [4] "AACACGCAGATAACAA-1" "AACACGTTGATACCGC-1" "AACACTCGTGAGCTTC-1"
 [7] "AACAGCCTCCTGACTA-1" "AACAGGAATTCTGTGA-1" "AACAGGCCAGGCCGCT-1"
[10] "AACAGGCCATTGTCAC-1" "AACAGTCGAGGTAGAT-1" "AACAGTCTGACGAATT-1"
[13] "AACATACTCCAGCTGA-1" "AACATAGAAGACTGTT-1" "AACATAGAAGGTGAGT-1"
[16] "AACATAGAAGTGCGCA-1" "AACATAGGAGGCGTCC-1" "AACATAGGTCCATAGC-1"
[19] "AACATAGGTTCCGCAC-1" "AACATAGGTTGGCACC-1"

We can also query the seurat object as if it is a table by using base R functions like ncol, nrow, and dim:

# how many spots
ncol(dat)

[1] 4269

# how many genes
nrow(dat)

[1] 18085

# how many genes and spots (rows and cols respectively)
dim(dat)

[1] 18085  4269

Metadata

Metadata for the spots can be retrieved like so:

# only showing 20 rows. Remove the head function to show everything.
head(dat[[]], 20)

                      orig.ident nCount_Spatial nFeature_Spatial
AACAATGTGCTCCGAG-1 SeuratProject         159480            12673
AACACCATTCGCATAC-1 SeuratProject           5719             3393
AACACGACAACGGAGT-1 SeuratProject         244735            13370
AACACGCAGATAACAA-1 SeuratProject          12387             6003
AACACGTTGATACCGC-1 SeuratProject          58894            11198
AACACTCGTGAGCTTC-1 SeuratProject          55548            10947
AACAGCCTCCTGACTA-1 SeuratProject         270292            13726
AACAGGAATTCTGTGA-1 SeuratProject          14250             5628
AACAGGCCAGGCCGCT-1 SeuratProject         146950            13120
AACAGGCCATTGTCAC-1 SeuratProject         120019            12746
AACAGTCGAGGTAGAT-1 SeuratProject         200220            13057
AACAGTCTGACGAATT-1 SeuratProject         136015            12656
AACATACTCCAGCTGA-1 SeuratProject          65435            12270
AACATAGAAGACTGTT-1 SeuratProject         200960            12695
AACATAGAAGGTGAGT-1 SeuratProject           8817             4874
AACATAGAAGTGCGCA-1 SeuratProject         233746            13541
AACATAGGAGGCGTCC-1 SeuratProject          54057            11153
AACATAGGTCCATAGC-1 SeuratProject         117304            12415
AACATAGGTTCCGCAC-1 SeuratProject          11350             5506
AACATAGGTTGGCACC-1 SeuratProject         146714            12390

Here, each row is a spot. nCount_Spatial shows the total number of transcripts (specifically total number of unique UMIs) in a spot. nFeature_Spatial shows how many unique genes are expressed per spot. orig.ident refers to the “original identity” of the sample. These information are all pre-filled and computed by the Load10x_Spatialfunction.

The metadata in Seurat object is actually stored as a data.frame. Thus, if you know the column you want to inspect, you can interact with it as if it is a data.frame, like so

# only showing 20 rows. Remove the head function to show everything.
head(dat$nCount_Spatial, 20)

AACAATGTGCTCCGAG-1 AACACCATTCGCATAC-1 AACACGACAACGGAGT-1 AACACGCAGATAACAA-1 
            159480               5719             244735              12387 
AACACGTTGATACCGC-1 AACACTCGTGAGCTTC-1 AACAGCCTCCTGACTA-1 AACAGGAATTCTGTGA-1 
             58894              55548             270292              14250 
AACAGGCCAGGCCGCT-1 AACAGGCCATTGTCAC-1 AACAGTCGAGGTAGAT-1 AACAGTCTGACGAATT-1 
            146950             120019             200220             136015 
AACATACTCCAGCTGA-1 AACATAGAAGACTGTT-1 AACATAGAAGGTGAGT-1 AACATAGAAGTGCGCA-1 
             65435             200960               8817             233746 
AACATAGGAGGCGTCC-1 AACATAGGTCCATAGC-1 AACATAGGTTCCGCAC-1 AACATAGGTTGGCACC-1 
             54057             117304              11350             146714

We can also add more metadata to it using data.frame notation. For example, we can add a separate column identifying the patient the data originates from:

dat$patient_id <- "P2"
# only showing 20 rows. Remove the head function to show everything.
head(dat[[]], 20)

                      orig.ident nCount_Spatial nFeature_Spatial patient_id
AACAATGTGCTCCGAG-1 SeuratProject         159480            12673         P2
AACACCATTCGCATAC-1 SeuratProject           5719             3393         P2
AACACGACAACGGAGT-1 SeuratProject         244735            13370         P2
AACACGCAGATAACAA-1 SeuratProject          12387             6003         P2
AACACGTTGATACCGC-1 SeuratProject          58894            11198         P2
AACACTCGTGAGCTTC-1 SeuratProject          55548            10947         P2
AACAGCCTCCTGACTA-1 SeuratProject         270292            13726         P2
AACAGGAATTCTGTGA-1 SeuratProject          14250             5628         P2
AACAGGCCAGGCCGCT-1 SeuratProject         146950            13120         P2
AACAGGCCATTGTCAC-1 SeuratProject         120019            12746         P2
AACAGTCGAGGTAGAT-1 SeuratProject         200220            13057         P2
AACAGTCTGACGAATT-1 SeuratProject         136015            12656         P2
AACATACTCCAGCTGA-1 SeuratProject          65435            12270         P2
AACATAGAAGACTGTT-1 SeuratProject         200960            12695         P2
AACATAGAAGGTGAGT-1 SeuratProject           8817             4874         P2
AACATAGAAGTGCGCA-1 SeuratProject         233746            13541         P2
AACATAGGAGGCGTCC-1 SeuratProject          54057            11153         P2
AACATAGGTCCATAGC-1 SeuratProject         117304            12415         P2
AACATAGGTTCCGCAC-1 SeuratProject          11350             5506         P2
AACATAGGTTGGCACC-1 SeuratProject         146714            12390         P2

To access the metadata associated with the genes, we can use similar notation but also include which assay we want to retrieve the metadata from.

dat[['Spatial']][[]]

data frame with 0 columns and 18085 rows

Currently, we have nothing in it. So do not be alarmed.

Let’s save the object now so we can use it in the subsequent sections. We can either save it as an RDS object using saveRDS(dat, "output/visium/filtered_seurat.qs") or as a QS object using the qs2 package. The latter is a lot quicker.

qs_save(dat, file.path(here::here(), "data", "visium", "data", "visium_seurat.qs2"))

sessionInfo()

R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] qs2_0.1.5          Seurat_5.3.0       SeuratObject_5.1.0 sp_2.2-0          
[5] workflowr_1.7.2   

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.17.1      jsonlite_2.0.0        
  [4] magrittr_2.0.4         spatstat.utils_3.1-5   farver_2.1.2          
  [7] rmarkdown_2.29         fs_1.6.6               vctrs_0.6.5           
 [10] ROCR_1.0-11            spatstat.explore_3.5-2 htmltools_0.5.8.1     
 [13] sass_0.4.10            sctransform_0.4.2      parallelly_1.45.1     
 [16] KernSmooth_2.23-26     bslib_0.9.0            htmlwidgets_1.6.4     
 [19] ica_1.0-3              plyr_1.8.9             plotly_4.11.0         
 [22] zoo_1.8-14             cachem_1.1.0           whisker_0.4.1         
 [25] igraph_2.1.4           mime_0.13              lifecycle_1.0.4       
 [28] pkgconfig_2.0.3        Matrix_1.7-3           R6_2.6.1              
 [31] fastmap_1.2.0          fitdistrplus_1.2-4     future_1.67.0         
 [34] shiny_1.11.1           digest_0.6.37          colorspace_2.1-1      
 [37] patchwork_1.3.2        ps_1.9.1               rprojroot_2.1.1       
 [40] tensor_1.5.1           RSpectra_0.16-2        irlba_2.3.5.1         
 [43] progressr_0.15.1       spatstat.sparse_3.1-0  httr_1.4.7            
 [46] polyclip_1.10-7        abind_1.4-8            compiler_4.5.1        
 [49] here_1.0.2             bit64_4.6.0-1          S7_0.2.0              
 [52] fastDummies_1.7.5      MASS_7.3-65            tools_4.5.1           
 [55] lmtest_0.9-40          httpuv_1.6.16          future.apply_1.20.0   
 [58] goftest_1.2-3          glue_1.8.0             callr_3.7.6           
 [61] nlme_3.1-168           promises_1.3.3         grid_4.5.1            
 [64] Rtsne_0.17             getPass_0.2-4          cluster_2.1.8.1       
 [67] reshape2_1.4.4         generics_0.1.4         hdf5r_1.3.12          
 [70] gtable_0.3.6           spatstat.data_3.1-6    tidyr_1.3.1           
 [73] data.table_1.17.8      stringfish_0.17.0      spatstat.geom_3.5-0   
 [76] RcppAnnoy_0.0.22       ggrepel_0.9.6          RANN_2.6.2            
 [79] pillar_1.11.0          stringr_1.5.2          spam_2.11-1           
 [82] RcppHNSW_0.6.0         later_1.4.4            splines_4.5.1         
 [85] dplyr_1.1.4            lattice_0.22-7         bit_4.6.0             
 [88] survival_3.8-3         deldir_2.0-4           tidyselect_1.2.1      
 [91] miniUI_0.1.2           pbapply_1.7-4          knitr_1.50            
 [94] git2r_0.36.2           gridExtra_2.3          scattermore_1.2       
 [97] xfun_0.53              matrixStats_1.5.0      stringi_1.8.7         
[100] lazyeval_0.2.2         yaml_2.3.10            evaluate_1.0.5        
[103] codetools_0.2-20       tibble_3.3.0           cli_3.6.5             
[106] uwot_0.2.3             RcppParallel_5.1.10    xtable_1.8-4          
[109] reticulate_1.43.0      processx_3.8.6         jquerylib_0.1.4       
[112] dichromat_2.0-0.1      Rcpp_1.1.0             globals_0.18.0        
[115] spatstat.random_3.4-1  png_0.1-8              spatstat.univar_3.1-4 
[118] parallel_4.5.1         ggplot2_4.0.0          dotCall64_1.2         
[121] listenv_0.9.1          viridisLite_0.4.2      scales_1.4.0          
[124] ggridges_0.5.6         purrr_1.1.0            rlang_1.1.6           
[127] cowplot_1.2.0