Last updated: 2025-10-02
Checks: 7 0
Knit directory:
2025_cytoconnect_spatial_workshop/
This reproducible R Markdown analysis was created with workflowr (version 1.7.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20251002) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 383f3f1. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Untracked files:
Untracked: data/sc_seurat_object_10x.qs
Untracked: data/visium/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/visium_01.Rmd) and HTML
(docs/visium_01.html) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote), click on the
hyperlinks in the table below to view the files as they were in that
past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 383f3f1 | Givanna Putri | 2025-10-02 | First publish for website |
In this part of the visium workshop, we will introduce the dataset we will be using in the visium workshop, learn how to load it up into a Seurat object and interrogate it.
library(Seurat)
library(qs2)
The data used for this tutorial is an FFPE tissue from a patient with colorectal cancer. It is described in a bioRxiv paper: https://doi.org/10.1101/2024.06.04.597233
The data can be downloaded from here. Choose Visium CytAssist v2 Sample P2 CRC on the tab on the left.
To get a spot x gene matrix, one typically have to process the raw data using the Spaceranger software from 10x. Depending on your experiment, running Spaceranger can be very complex. It is beyond the scope of this workshop.
If you are keen to try running Spaceranger, here is a bash script you can use for this dataset:
#!/bin/bash
# make sure you specify which folder is datdir and refdir
spaceranger count \
--id="Visium_FFPE_p2_crc" \
--transcriptome=$refdir \
--probe-set=$datadir/Visium_V2_Human_Colon_Cancer_P2_probe_set.csv \
--fastqs=$datadir/Visium_V2_Human_Colon_Cancer_P2_fastqs \
--cytaimage=$datadir/Visium_V2_Human_Colon_Cancer_P2_image.tif \
--image=$datadir/Visium_V2_Human_Colon_Cancer_P2_tissue_image.btf \
--slide=V53A10-078 \
--area=B1\
--loupe-alignment=$datadir/Visium_V2_Human_Colon_Cancer_P2_alignment_file.json \
--localcores=20 \
--localmem=60 \
--create-bam=true
Before running spaceranger, make sure you download all the input files use in the code block above from here - (choose Visium CytAssist v2 Sample P2 CRC), as well as the relevant pre-built reference genome here.
Spaceranger is quite resource hungry. So make sure you run it in a computer with a sizeable amount of resources available or preferably on a High Performance Computing platform.
After running Spaceranger, you will get a lot of folders and files.
The ones that you need to create a Seurat object are located in the
outs folder. It will contain the following files and
folders:
tree outs/
outs/
├── analysis/
│ ├── clustering/
│ ├── diffexp/
│ ├── pca/
│ ├── tsne/
│ └── umap/
├── cloupe.cloupe
├── deconvolution/
├── filtered_feature_bc_matrix/
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── filtered_feature_bc_matrix.h5
├── metrics_summary.csv
├── molecule_info.h5
├── possorted_genome_bam.bam
├── possorted_genome_bam.bam.bai
├── probe_set.csv
├── raw_feature_bc_matrix/
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── raw_feature_bc_matrix.h5
├── raw_probe_bc_matrix.h5
├── spatial/
│ ├── aligned_fiducials.jpg
│ ├── aligned_tissue_image.jpg
│ ├── cytassist_image.tiff
│ ├── detected_tissue_image.jpg
│ ├── scalefactors_json.json
│ ├── spatial_enrichment.csv
│ ├── tissue_hires_image.png
│ ├── tissue_lowres_image.png
│ └── tissue_positions.csv
└── web_summary.html
Brief description of the most frequently used files, some of which, we will be using in this workshop:
web_summary.html: Spaceranger run summary metrics and
plots. This file will give you some information about the samples, e.g.,
how many reads were detected and successfully mapped to the reference
genome, etc.cloupe.cloupe: Loupe Browser visualization and analysis
file to load into 10x
Loupe Browser software. Spaceranger will run few cookie cutter
analysis on your data. You can view the output by loading this file into
the Loupe Browser.spatial/: A folder containing outputs that capture the
spatial information of your sample. Packages such as Seurat is likely to
require files in this folder to work properly.filtered_feature_bc_matrix (folder or the h5 file):
Filtered matrix containing the number of UMIs associated with a
gene (row) and a spot barcode (column). The matrix is filtered as it
only includes spots associated with the tissue. These information are
provided as either a combination of mtx file (for the matrix) and tsv
files (for information on the features and barcodes) stored in a folder,
or as one HDF5 file (the h5 file). Either can be used as an input to
packages such as Seurat for downstream analyses.raw_feature_bc_matrices (folder or the h5 file): Same
information provided in filtered_feature_bc_matrix but
unfiltered, i.e. keeping all the barcodes, including those that
may not be associated with the tissue. This matrix is a lot larger than
the one in filtered_feature_bc_matrix.Full description of the generated files and folders are available here.
To do this, we will need the H5 file and the spatial folder that came from SpaceRanger. There is no need to purposely run SpaceRanger to get these files. 10x kindly provided them on their website: https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc (choose Visium CytAssist v2 Sample P2 CRC).
The spatial folder is required for Seurat to find the following extra information:
tissue_lowres_image.png: the image of the slide in low
resolution.scalefactors_json.json: the scale factors to convert
the high-resolution coordinates and the scaled-down coordinates.tissue_positions_list.csv: coordinates for each
spotWe can use the code block below to import the data into a Seurat
object. However, before doing so, you most likely have the change the
data.dir parameter to point to the directory where you
store the H5 file and the spatial folder. In this example, they are
stored in a “data” folder of our current working directory.
dat <- Load10X_Spatial(
data.dir = file.path(here::here(), "data", "visium", "raw"),
filename = 'Visium_V2_Human_Colon_Cancer_P2_filtered_feature_bc_matrix.h5'
)
We can inspect the object to see the number of spots and genes there are in the dataset.
# 4,269 spots - samples in the seurat object description
# 18,085 genes.
dat
An object of class Seurat
18085 features across 4269 samples within 1 assay
Active assay: Spatial (18085 features, 0 variable features)
1 layer present: counts
1 spatial field of view present: slice1
A Seurat object can contain multiple Assays to store data from different omics technologies in one object. Since we have only loaded a visium data, we only have one assay called “Spatial”. You can only have one assay “active” at a time. To change the active assay you can run the following code block - with “Spatial” being the name of the assay to activate:
DefaultAssay(dat) <- "Spatial"
Within an assay, we can have multiple Layers, each containing one spot by gene matrix. The idea is we can perform some transformations or apply some functions to the matrix in a layer and store it in another layer. This benefit will be a lot clearer as we progress through the workshop.
By default, when we run Load10x_Spatial, Seurat will
store the spot by gene matrix in the Spatial assay under
the counts layer. You can change this assay name by
specifying a different name for the assay parameter when
running the Load10x_Spatial function.
We can retrieve the spot by gene matrix like so:
# showing only 20 genes for 5 spots
head(dat[["Spatial"]]$counts, c(20, 5))
20 x 5 sparse Matrix of class "dgCMatrix"
AACAATGTGCTCCGAG-1 AACACCATTCGCATAC-1 AACACGACAACGGAGT-1
SAMD11 1 1 .
NOC2L 4 . 12
KLHL17 . . 1
PLEKHN1 2 . 11
PERM1 . . .
HES4 2 1 3
ISG15 10 3 18
AGRN 6 . 25
RNF223 . . 1
C1orf159 3 . 1
TTLL10 1 . .
TNFRSF18 1 . 2
TNFRSF4 . . 1
SDF4 29 3 45
B3GALT6 6 . 12
C1QTNF12 1 . .
UBE2J2 6 2 14
SCNN1D . . .
ACAP3 14 1 8
PUSL1 3 . 4
AACACGCAGATAACAA-1 AACACGTTGATACCGC-1
SAMD11 . .
NOC2L . .
KLHL17 1 1
PLEKHN1 . 2
PERM1 . 1
HES4 . .
ISG15 . 1
AGRN . .
RNF223 . .
C1orf159 . .
TTLL10 . .
TNFRSF18 . 1
TNFRSF4 . 2
SDF4 5 7
B3GALT6 . 2
C1QTNF12 . .
UBE2J2 1 2
SCNN1D . 1
ACAP3 . 5
PUSL1 1 .
By default, genes (features) are stored as rows and spots are stored as columns.
We can retrieve the name of the genes we have loaded in like so:
# only showing 20 genes. Remove the head function to show everything.
head(rownames(dat), 20)
[1] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" "PERM1" "HES4"
[7] "ISG15" "AGRN" "RNF223" "C1orf159" "TTLL10" "TNFRSF18"
[13] "TNFRSF4" "SDF4" "B3GALT6" "C1QTNF12" "UBE2J2" "SCNN1D"
[19] "ACAP3" "PUSL1"
# only showing 20 genes. Remove the head function to show everything.
head(Features(dat), 20)
[1] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" "PERM1" "HES4"
[7] "ISG15" "AGRN" "RNF223" "C1orf159" "TTLL10" "TNFRSF18"
[13] "TNFRSF4" "SDF4" "B3GALT6" "C1QTNF12" "UBE2J2" "SCNN1D"
[19] "ACAP3" "PUSL1"
And the unique name assigned to each spot like so:
# only showing 20 spots Remove the head function to show everything.
head(colnames(dat), 20)
[1] "AACAATGTGCTCCGAG-1" "AACACCATTCGCATAC-1" "AACACGACAACGGAGT-1"
[4] "AACACGCAGATAACAA-1" "AACACGTTGATACCGC-1" "AACACTCGTGAGCTTC-1"
[7] "AACAGCCTCCTGACTA-1" "AACAGGAATTCTGTGA-1" "AACAGGCCAGGCCGCT-1"
[10] "AACAGGCCATTGTCAC-1" "AACAGTCGAGGTAGAT-1" "AACAGTCTGACGAATT-1"
[13] "AACATACTCCAGCTGA-1" "AACATAGAAGACTGTT-1" "AACATAGAAGGTGAGT-1"
[16] "AACATAGAAGTGCGCA-1" "AACATAGGAGGCGTCC-1" "AACATAGGTCCATAGC-1"
[19] "AACATAGGTTCCGCAC-1" "AACATAGGTTGGCACC-1"
# only showing 20 spots Remove the head function to show everything.
head(Cells(dat), 20)
[1] "AACAATGTGCTCCGAG-1" "AACACCATTCGCATAC-1" "AACACGACAACGGAGT-1"
[4] "AACACGCAGATAACAA-1" "AACACGTTGATACCGC-1" "AACACTCGTGAGCTTC-1"
[7] "AACAGCCTCCTGACTA-1" "AACAGGAATTCTGTGA-1" "AACAGGCCAGGCCGCT-1"
[10] "AACAGGCCATTGTCAC-1" "AACAGTCGAGGTAGAT-1" "AACAGTCTGACGAATT-1"
[13] "AACATACTCCAGCTGA-1" "AACATAGAAGACTGTT-1" "AACATAGAAGGTGAGT-1"
[16] "AACATAGAAGTGCGCA-1" "AACATAGGAGGCGTCC-1" "AACATAGGTCCATAGC-1"
[19] "AACATAGGTTCCGCAC-1" "AACATAGGTTGGCACC-1"
We can also query the seurat object as if it is a table by using base
R functions like ncol, nrow, and
dim:
# how many spots
ncol(dat)
[1] 4269
# how many genes
nrow(dat)
[1] 18085
# how many genes and spots (rows and cols respectively)
dim(dat)
[1] 18085 4269
Metadata for the spots can be retrieved like so:
# only showing 20 rows. Remove the head function to show everything.
head(dat[[]], 20)
orig.ident nCount_Spatial nFeature_Spatial
AACAATGTGCTCCGAG-1 SeuratProject 159480 12673
AACACCATTCGCATAC-1 SeuratProject 5719 3393
AACACGACAACGGAGT-1 SeuratProject 244735 13370
AACACGCAGATAACAA-1 SeuratProject 12387 6003
AACACGTTGATACCGC-1 SeuratProject 58894 11198
AACACTCGTGAGCTTC-1 SeuratProject 55548 10947
AACAGCCTCCTGACTA-1 SeuratProject 270292 13726
AACAGGAATTCTGTGA-1 SeuratProject 14250 5628
AACAGGCCAGGCCGCT-1 SeuratProject 146950 13120
AACAGGCCATTGTCAC-1 SeuratProject 120019 12746
AACAGTCGAGGTAGAT-1 SeuratProject 200220 13057
AACAGTCTGACGAATT-1 SeuratProject 136015 12656
AACATACTCCAGCTGA-1 SeuratProject 65435 12270
AACATAGAAGACTGTT-1 SeuratProject 200960 12695
AACATAGAAGGTGAGT-1 SeuratProject 8817 4874
AACATAGAAGTGCGCA-1 SeuratProject 233746 13541
AACATAGGAGGCGTCC-1 SeuratProject 54057 11153
AACATAGGTCCATAGC-1 SeuratProject 117304 12415
AACATAGGTTCCGCAC-1 SeuratProject 11350 5506
AACATAGGTTGGCACC-1 SeuratProject 146714 12390
Here, each row is a spot. nCount_Spatial shows the total
number of transcripts (specifically total number of unique UMIs) in a
spot. nFeature_Spatial shows how many unique genes are
expressed per spot. orig.ident refers to the “original
identity” of the sample. These information are all pre-filled and
computed by the Load10x_Spatialfunction.
The metadata in Seurat object is actually stored as a data.frame. Thus, if you know the column you want to inspect, you can interact with it as if it is a data.frame, like so
# only showing 20 rows. Remove the head function to show everything.
head(dat$nCount_Spatial, 20)
AACAATGTGCTCCGAG-1 AACACCATTCGCATAC-1 AACACGACAACGGAGT-1 AACACGCAGATAACAA-1
159480 5719 244735 12387
AACACGTTGATACCGC-1 AACACTCGTGAGCTTC-1 AACAGCCTCCTGACTA-1 AACAGGAATTCTGTGA-1
58894 55548 270292 14250
AACAGGCCAGGCCGCT-1 AACAGGCCATTGTCAC-1 AACAGTCGAGGTAGAT-1 AACAGTCTGACGAATT-1
146950 120019 200220 136015
AACATACTCCAGCTGA-1 AACATAGAAGACTGTT-1 AACATAGAAGGTGAGT-1 AACATAGAAGTGCGCA-1
65435 200960 8817 233746
AACATAGGAGGCGTCC-1 AACATAGGTCCATAGC-1 AACATAGGTTCCGCAC-1 AACATAGGTTGGCACC-1
54057 117304 11350 146714
We can also add more metadata to it using data.frame notation. For example, we can add a separate column identifying the patient the data originates from:
dat$patient_id <- "P2"
# only showing 20 rows. Remove the head function to show everything.
head(dat[[]], 20)
orig.ident nCount_Spatial nFeature_Spatial patient_id
AACAATGTGCTCCGAG-1 SeuratProject 159480 12673 P2
AACACCATTCGCATAC-1 SeuratProject 5719 3393 P2
AACACGACAACGGAGT-1 SeuratProject 244735 13370 P2
AACACGCAGATAACAA-1 SeuratProject 12387 6003 P2
AACACGTTGATACCGC-1 SeuratProject 58894 11198 P2
AACACTCGTGAGCTTC-1 SeuratProject 55548 10947 P2
AACAGCCTCCTGACTA-1 SeuratProject 270292 13726 P2
AACAGGAATTCTGTGA-1 SeuratProject 14250 5628 P2
AACAGGCCAGGCCGCT-1 SeuratProject 146950 13120 P2
AACAGGCCATTGTCAC-1 SeuratProject 120019 12746 P2
AACAGTCGAGGTAGAT-1 SeuratProject 200220 13057 P2
AACAGTCTGACGAATT-1 SeuratProject 136015 12656 P2
AACATACTCCAGCTGA-1 SeuratProject 65435 12270 P2
AACATAGAAGACTGTT-1 SeuratProject 200960 12695 P2
AACATAGAAGGTGAGT-1 SeuratProject 8817 4874 P2
AACATAGAAGTGCGCA-1 SeuratProject 233746 13541 P2
AACATAGGAGGCGTCC-1 SeuratProject 54057 11153 P2
AACATAGGTCCATAGC-1 SeuratProject 117304 12415 P2
AACATAGGTTCCGCAC-1 SeuratProject 11350 5506 P2
AACATAGGTTGGCACC-1 SeuratProject 146714 12390 P2
To access the metadata associated with the genes, we can use similar notation but also include which assay we want to retrieve the metadata from.
dat[['Spatial']][[]]
data frame with 0 columns and 18085 rows
Currently, we have nothing in it. So do not be alarmed.
Let’s save the object now so we can use it in the subsequent
sections. We can either save it as an RDS object using
saveRDS(dat, "output/visium/filtered_seurat.qs") or as a QS
object using the qs2
package. The latter is a lot quicker.
qs_save(dat, file.path(here::here(), "data", "visium", "data", "visium_seurat.qs2"))
sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] qs2_0.1.5 Seurat_5.3.0 SeuratObject_5.1.0 sp_2.2-0
[5] workflowr_1.7.2
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_2.0.0
[4] magrittr_2.0.4 spatstat.utils_3.1-5 farver_2.1.2
[7] rmarkdown_2.29 fs_1.6.6 vctrs_0.6.5
[10] ROCR_1.0-11 spatstat.explore_3.5-2 htmltools_0.5.8.1
[13] sass_0.4.10 sctransform_0.4.2 parallelly_1.45.1
[16] KernSmooth_2.23-26 bslib_0.9.0 htmlwidgets_1.6.4
[19] ica_1.0-3 plyr_1.8.9 plotly_4.11.0
[22] zoo_1.8-14 cachem_1.1.0 whisker_0.4.1
[25] igraph_2.1.4 mime_0.13 lifecycle_1.0.4
[28] pkgconfig_2.0.3 Matrix_1.7-3 R6_2.6.1
[31] fastmap_1.2.0 fitdistrplus_1.2-4 future_1.67.0
[34] shiny_1.11.1 digest_0.6.37 colorspace_2.1-1
[37] patchwork_1.3.2 ps_1.9.1 rprojroot_2.1.1
[40] tensor_1.5.1 RSpectra_0.16-2 irlba_2.3.5.1
[43] progressr_0.15.1 spatstat.sparse_3.1-0 httr_1.4.7
[46] polyclip_1.10-7 abind_1.4-8 compiler_4.5.1
[49] here_1.0.2 bit64_4.6.0-1 S7_0.2.0
[52] fastDummies_1.7.5 MASS_7.3-65 tools_4.5.1
[55] lmtest_0.9-40 httpuv_1.6.16 future.apply_1.20.0
[58] goftest_1.2-3 glue_1.8.0 callr_3.7.6
[61] nlme_3.1-168 promises_1.3.3 grid_4.5.1
[64] Rtsne_0.17 getPass_0.2-4 cluster_2.1.8.1
[67] reshape2_1.4.4 generics_0.1.4 hdf5r_1.3.12
[70] gtable_0.3.6 spatstat.data_3.1-6 tidyr_1.3.1
[73] data.table_1.17.8 stringfish_0.17.0 spatstat.geom_3.5-0
[76] RcppAnnoy_0.0.22 ggrepel_0.9.6 RANN_2.6.2
[79] pillar_1.11.0 stringr_1.5.2 spam_2.11-1
[82] RcppHNSW_0.6.0 later_1.4.4 splines_4.5.1
[85] dplyr_1.1.4 lattice_0.22-7 bit_4.6.0
[88] survival_3.8-3 deldir_2.0-4 tidyselect_1.2.1
[91] miniUI_0.1.2 pbapply_1.7-4 knitr_1.50
[94] git2r_0.36.2 gridExtra_2.3 scattermore_1.2
[97] xfun_0.53 matrixStats_1.5.0 stringi_1.8.7
[100] lazyeval_0.2.2 yaml_2.3.10 evaluate_1.0.5
[103] codetools_0.2-20 tibble_3.3.0 cli_3.6.5
[106] uwot_0.2.3 RcppParallel_5.1.10 xtable_1.8-4
[109] reticulate_1.43.0 processx_3.8.6 jquerylib_0.1.4
[112] dichromat_2.0-0.1 Rcpp_1.1.0 globals_0.18.0
[115] spatstat.random_3.4-1 png_0.1-8 spatstat.univar_3.1-4
[118] parallel_4.5.1 ggplot2_4.0.0 dotCall64_1.2
[121] listenv_0.9.1 viridisLite_0.4.2 scales_1.4.0
[124] ggridges_0.5.6 purrr_1.1.0 rlang_1.1.6
[127] cowplot_1.2.0