Last updated: 2019-10-31

Checks: 5 1

Knit directory: STUtility_web_site/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20191031)

The command set.seed(20191031) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: detected

The following chunks had caches available:

annotation
package_infotable
read_input_1

To ensure reproducibility of the results, delete the cache directory about_cache and re-run the analysis. To have workflowr automatically delete the cache directory prior to building the file, set delete_cache = TRUE when running wflow_build() or wflow_publish().

Repository version: 19ba86c

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/about_cache/
    Ignored:    analysis/getting_started_cache/
    Ignored:    docs/.DS_Store

Untracked files:
    Untracked:  analysis/style.css
    Untracked:  docs/assets/

Unstaged changes:
    Modified:   analysis/_site.yml
    Modified:   analysis/getting_started.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
html	a3cff7a	Ludvig Larsson	2019-10-31	Build site.
html	1a8c056	Ludvig Larsson	2019-10-31	Build site.
Rmd	e8e6c33	Ludvig Larsson	2019-10-31	Added theme
html	530e39a	Ludvig Larsson	2019-10-31	Build site.
Rmd	b7681ac	Ludvig Larsson	2019-10-31	Publish the initial files for myproject
html	2241363	Ludvig Larsson	2019-10-31	Build site.
Rmd	8bae4fa	Ludvig Larsson	2019-10-31	Publish the initial files for myproject
html	a53305c	Ludvig Larsson	2019-10-31	Build site.
html	6f61b95	Ludvig Larsson	2019-10-31	Build site.
html	41209f8	Ludvig Larsson	2019-10-31	Build site.
html	d778f61	Ludvig Larsson	2019-10-31	Build site.
Rmd	77ea122	Ludvig Larsson	2019-10-31	Publish the initial files for myproject
html	bc408e0	Ludvig Larsson	2019-10-31	Build site.
html	ee690f3	Ludvig Larsson	2019-10-31	hubba
Rmd	f3d1542	Ludvig Larsson	2019-10-31	Start workflowr project.

Background

Spatial Transcriptomics (ST)

The tutorial aims to lay the foundation of best practice for ST data analysis. At such, the user is probably already familiar with the underlying method and a detailed description is therefore found elsewhere. Interested readers are pointed to the original publication from 2016 (https://science.sciencemag.org/content/353/6294/78).

Schematic Spatial Transcriptomics

In short, there are two main output components from an ST experiment; (i) the gene expression data and (ii) the image data.

All the steps explained in this guide could be performed with only the expression data. However, the image data, apart from being fundamental to the biological interpretation, is used to filter out capture-spots that lies directly under the tissue. This filtering excludes the unwanted data points, lowering the memory burden of the data objects created as well as removing informational noise.

An introductory animation is available on our website: http://www.spatialresearch.org/

Selecting spots

The gene expression data consists of a count matrix with genes in rows and capture “spots” in columns. Each spot represents a small area on an ST array from which the captured transcripts have been barcoded with a unique sequence. The unique barcode makes it possible to map the transcripts onto a spatial position on the tissue section and would be equivalent to a cell specific barcode in scRNA-seq data but can tag a mixture of transcripts from multiple cells. The spatial position of a spot is an (x, y) coordinate that defines the centroid of the spot area. These spatial coordinates are stored in the spot ids (column names) and allows us to visualize gene expression (and other spot features) in the array grid system. However, if you want to overlay a visualization on top the HE image you want to make sure that the spot coordinates are exact in relation to morphological features of the image. When the spots are printed onto the ST array surface, they will sometimes deviate from the (x, y) coordinates given by the spot ids and should therefore be adjusted. In addition to the spot adjustment, you will also need to label the spots that are located directly under the tissue. Spot adjustment and selection can be done automatically using our ST spot detector web tool which outputs a table of adjusted coordinates and labels for the spots under tissue.

Multiple samples

The STUtility tool was developed with the goal of multiple sample inputs. As with all biological data, using multiple samples add power to the analysis and is a necessity to enable comprehensive insight which otherwise suffers from stochastic uncertainty. Within this vignette, we display how you can input multiple samples, look for aggravating circumstances like batch effects and missing data, apply methods to correct such if present, get a holistic picture of your data as well as conduct more in depth analysis in various ways.

Seurat workflow

We have extensively tried different methods and workflows for handling ST data. While all roads lead to Rome, as of the date of this writing we find the Seurat approach [https://satijalab.org/seurat/] to be the most well suited for this type of data. Seurat is an R package designed for single-cell RNAseq data. Obviously, this deviates from the data that the ST technology currently produce, as the resolution on the array implies that each capture-spot consists of transcripts originating from multiple cells. Nevertheless, the characteristics of the ST data resembles that of scRNAseq to a large extent. Note that the STUtility package requires Seurat v3.0 or higher.

The data obtained from an ST experiment can treated like a sc-RNA-seq experiment and be processed and analyzed using the Seurat package. However, what STUtility provides is a way of utilizing the spatial component of the dataset, mostly for visualization purposes. All the image related information is stored in the Seurat object but only using the workflow described in the Getting Started section.

10X Visium array

In late 2018, the company Spatial Transcriptomics was acquired by 10X Genomics, which since then have been developing the new version of the technological platform that our research group have been using in the past years, called Visium. There are some changes in the experimental protocol for the Visium, and the type of output and subsequently input to this R tool. Since our goal is to have this R tool compatible to past and future versions of the technology, both are supported. If you are working with the Visium platform, please see [The 10X Visium platform].

Naming conventions

For users familiar with the Seurat workflow, there are a number of Seruat plotting functions, e.g. Seurat::FeaturePlot(), those plotting functions all have a “ST version”, which is called upon by adding “ST.” prior to the original function name e.g. STutility::ST.FeaturePlot().

The external STUtility functions are following a PascalCase convention.

A work by Joseph Bergenstråhle and Ludvig Larsson

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.3.0 Rcpp_1.0.2      digest_0.6.22   rprojroot_1.3-2
 [5] backports_1.1.5 git2r_0.26.1    magrittr_1.5    evaluate_0.14  
 [9] rlang_0.4.1     stringi_1.4.3   fs_1.3.1        whisker_0.4    
[13] rmarkdown_1.16  tools_3.6.1     stringr_1.4.0   glue_1.3.1     
[17] xfun_0.10       yaml_2.2.0      compiler_3.6.1  htmltools_0.4.0
[21] knitr_1.25

STUtility - Vignette

Joseph Bergenstråhle, Royal Institute of Technology (KTH)

Ludvig Larsson, Royal Institute of Technology (KTH)