DenoIST (Denoising Image-based Spatial Transcriptomics) is a method
for identifying and removing contamination artefacts in image-based
single-cell transcriptomics (IST) data. This vignette shows how to use
it with a SpatialExperiment object or a matrix with
coordinates as a separate input.
For demonstration, we will use a small Xenium sample from a lung fibrosis study (Vannan & Lyu et. al, 2025). It can be downloaded at GSE250346.
From the raw Xenium output, we can then construct a
SpatialExperiment object with
SpatialExperimentIO.
suppressPackageStartupMessages({
library(DenoIST)
library(SpatialExperiment)
library(ggplot2)
library(patchwork)
})dir = "/mnt/beegfs/mccarthy/backed_up/general/rlyu/Dataset/LFST_2022/GEO_2025/VUHD116A/relabel_output-XETG00048__0003817__VUHD116A__20230308__003730/outs/"
spe <- readXeniumSXE(dir, returnType = "SPE")
saveRDS(spe, "example_spe.rds")For this vignette, we will use a pre-saved
SpatialExperiment object generated from the code above.
spe <- readRDS('example_spe.rds')
spe
#> class: SpatialExperiment
#> dim: 343 12915
#> metadata(4): experiment.xenium transcripts cell_boundaries nucleus_boundaries
#> assays(1): counts
#> rownames(343): ABCC2 ACKR1 ... YAP1 ZEB1
#> rowData names(3): ID Symbol Type
#> colnames(12915): aaaaaaab-1 aaaaaaac-1 ... aaaadchc-1 aaaadchd-1
#> colData names(10): cell_id transcript_counts ... nucleus_area sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(3): NegControlProbe UnassignedCodeword NegControlCodeword
#> spatialCoords names(2) : x_centroid y_centroid
#> imgData names(0):Since the transcript file is too big to upload, for demonstration we will read in a very small subset to show the what the format is like:
tx <- readRDS('tx_sub.rds')
tx
#> X transcript_id cell_id overlaps_nucleus feature_name
#> 1 1 VUHD116A_281474976711003 VUHD116A_UNASSIGNED 0 NegControlCodeword_0517
#> 2 2 VUHD116A_281474976711398 VUHD116A_UNASSIGNED 0 COL1A1
#> 3 3 VUHD116A_281474976711406 VUHD116A_UNASSIGNED 0 LYZ
#> 4 4 VUHD116A_281474976711409 VUHD116A_UNASSIGNED 0 LAMP3
#> 5 5 VUHD116A_281474976711415 VUHD116A_UNASSIGNED 0 LYZ
#> 6 6 VUHD116A_281474976711418 VUHD116A_UNASSIGNED 0 LYZ
#> 7 7 VUHD116A_281474976711421 VUHD116A_UNASSIGNED 0 NUCB2
#> 8 8 VUHD116A_281474976712029 VUHD116A_UNASSIGNED 0 SFTPC
#> 9 9 VUHD116A_281474976712842 VUHD116A_UNASSIGNED 0 COL1A2
#> 10 10 VUHD116A_281474976712849 VUHD116A_UNASSIGNED 0 FGFBP2
#> x_location y_location z_location qv fov_name nucleus_distance
#> 1 202.896320 184.33257 17.63912 2.910288 Q15 329.0720
#> 2 7.082226 110.41018 12.68721 40.000000 Q15 537.2145
#> 3 76.882706 196.56784 12.74130 40.000000 Q15 439.8191
#> 4 101.830030 107.37660 13.04323 40.000000 Q15 455.0005
#> 5 149.067670 231.72057 12.96732 10.863165 Q15 360.2229
#> 6 175.447590 171.59155 13.25221 3.295905 Q15 359.2819
#> 7 179.364910 233.80177 13.01727 10.531877 Q15 330.7972
#> 8 201.708700 90.84219 12.75298 24.772476 Q15 383.3843
#> 9 73.322300 147.30553 14.10094 7.225322 Q15 461.5779
#> 10 127.075035 61.95987 21.58544 3.041668 Q15 459.5809You should only need to use 1 function most of the time, unless you
are trying to debug or understand the process. The main function is
denoist(), which takes a SpatialExperiment
object (or a matrix with coordinates), plus the transcript data frame as
input. It will return a list containing the memberships, adjusted
counts, and parameters for each gene.
The distance parameter specifies the maximum distance to
consider for local background estimation. The nbins
parameter specifies the number of bins to use for hexagonal binning,
which is used for calculating background transcript contamination. They
have default values but you can adjust them based on your data. For
example if your data is very small in size then perhaps a lower
distance and nbins would be better.
You should also check whether transcript data frame has the correct
columns. The tx_x and tx_y parameters specify
the column names for the x and y coordinates, respectively. The
feature_label parameter specifies the column name for the
gene of each transcript. In this example, they are called
x_location, y_location and
feature_name. You can also speed up the process with more
cpus via the cl option (which is highly recommended).
Lastly, you can specify an output directory with out_dir
to save the results automatically. If you don’t want to, just leave it
empty.
# DenoIST
result <- denoist(mat = spe, tx = tx, coords = NULL, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 12)Not that this is exactly the same as extracting the matrix and coordinates out manually:
count_mat <- assay(spe)
coords <- spatialCoords(spe)
result <- denoist(mat = count_mat, tx = tx, coords = coords, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 1)This is useful if you want to run DenoIST on a matrix that is not a
SpatialExperiment object.
The most useful output in result should be the
adjusted_counts.
result$adjusted_counts[1:5, 1:5]
#> 5 x 5 sparse Matrix of class "dgCMatrix"
#> aaaaaaab-1 aaaaaaac-1 aaaaaaad-1 aaaaaaae-1 aaaaaaaf-1
#> ABCC2 . . . . .
#> ACKR1 . . . . .
#> ACTA2 . . . 13 9
#> AGER . . 1 . .
#> AGR3 . . . . .The memberships and params are useful for
debugging if the adjusted counts are not what you expect.
result$memberships[1:5, 1:5]
#> aaaaaaab-1 aaaaaaac-1 aaaaaaad-1 aaaaaaae-1 aaaaaaaf-1
#> ABCC2 0 0 0 0 0
#> ACKR1 0 0 0 0 0
#> ACTA2 0 0 0 1 1
#> AGER 0 0 1 0 0
#> AGR3 0 0 0 0 0result$params[[42]]
#> $memberships
#> [1] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [45] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [89] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [133] 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [177] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [221] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1
#> [265] 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [309] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>
#> $posterior
#> [1] 0.7201987 0.7258556 0.5455571 0.6786789 0.7144710 0.7996963 0.7258556 0.6471338
#> [9] 0.6724878 0.7028072 0.7258556 0.7258556 0.7258556 0.6662361 0.9531125 0.7028072
#> [17] 0.6908731 0.8448978 0.6908731 0.7144710 0.6968734 0.6908731 0.7028072 0.6724878
#> [25] 0.7028072 0.7201987 0.5455571 0.7258556 0.6848078 0.7258556 0.6968734 0.7258556
#> [33] 0.7258556 0.7258556 0.7258556 0.7201987 0.7258556 0.6471338 0.7258556 0.7258556
#> [41] 0.7258556 0.9656349 0.7258556 0.7258556 0.7258556 0.7258556 0.7258556 0.7258556
#> [49] 0.8411598 0.7258556 0.7144710 0.7258556 0.7258556 0.7258556 0.6786789 0.9282427
#> [57] 0.8254747 0.5282724 0.7258556 0.7258556 0.7258556 0.7144710 0.7258556 0.7258556
#> [65] 0.7144710 0.7258556 0.7258556 0.7201987 0.6471338 0.7258556 0.7086735 0.7258556
#> [73] 0.7258556 0.7258556 0.8448978 0.7201987 0.6908731 0.6758313 0.8431829 0.7201987
#> [81] 0.7258556 0.7258556 0.6786789 0.7086735 0.7558684 0.7258556 0.7258556 0.9233165
#> [89] 0.7258556 0.7258556 0.7258556 0.6848078 0.7258556 0.7610434 0.7258556 0.6908731
#> [97] 0.7258556 0.7258556 0.7144710 0.6908731 0.6848078 0.5871575 0.7996963 0.8129271
#> [105] 0.7258556 0.7258556 0.7086735 0.9236264 0.8858805 0.7144710 0.6209235 0.7258556
#> [113] 0.8758555 0.7258556 0.8085931 0.7258556 0.7258556 0.9661364 0.7201987 0.9394897
#> [121] 0.7144710 0.8129271 0.7028072 0.7144710 0.8485637 0.7258556 0.7144710 0.9852349
#> [129] 0.7258556 0.7258556 0.7258556 0.7144710 0.7258556 0.6848078 0.7258556 0.7086735
#> [137] 0.7258556 0.6908731 0.5871575 0.7258556 0.7258556 0.7258556 0.7258556 0.7258556
#> [145] 0.6968734 0.9735373 0.8411598 0.7258556 0.7951330 0.6848078 0.7086735 0.7201987
#> [153] 0.8152308 0.7028072 0.8556817 0.7904930 0.7288739 0.6007819 0.8213675 0.7201987
#> [161] 0.7201987 0.7144710 0.7258556 0.7258556 0.7258556 0.7258556 0.7144710 0.7201987
#> [169] 0.7144710 0.7201987 0.7086735 0.9327380 0.6848078 0.7258556 0.7201987 0.6406567
#> [177] 0.7258556 0.6848078 0.8129271 0.6406567 0.7428118 0.6968734 0.7086735 0.7258556
#> [185] 0.8448978 0.7258556 0.7258556 0.7201987 0.7258556 0.7258556 0.7258556 0.7258556
#> [193] 0.6908731 0.7258556 0.7258556 0.7028072 0.7258556 0.4261710 0.7258556 0.7201987
#> [201] 0.7144710 0.7258556 0.7258556 0.7258556 0.6786789 0.8448978 0.7201987 0.7258556
#> [209] 0.7258556 0.7506180 0.7258556 0.9924496 0.7201987 0.8448978 0.7201987 0.6724878
#> [217] 0.7144710 0.7258556 0.7428118 0.7258556 0.7258556 0.7258556 0.7258556 0.9091725
#> [225] 0.7258556 0.8505093 0.7201987 0.7201987 0.7258556 0.7086735 0.6968734 0.6604173
#> [233] 0.6599253 0.7144710 0.7258556 0.6535573 0.7086735 0.8914688 0.6075374 0.7086735
#> [241] 0.8485637 0.7258556 0.8373490 0.7258556 0.6786789 0.8295070 0.9371001 0.8505093
#> [249] 0.7857763 0.9909515 0.2914157 0.7258556 0.7201987 0.5385451 0.6819891 0.7144710
#> [257] 0.7258556 0.7144710 0.9019275 0.7086735 0.8887053 0.6441532 0.7201987 0.7028072
#> [265] 0.7258556 0.7262917 0.7761126 0.7258556 0.6406567 0.5733983 0.7258556 0.7258556
#> [273] 0.4610173 0.3589379 0.7144710 0.7258556 0.6724878 0.7086735 0.6968734 0.6209235
#> [281] 0.7258556 0.7258556 0.6724878 0.2266190 0.4437953 0.6275495 0.6662361 0.6968734
#> [289] 0.8556817 0.7201987 0.8739496 0.7258556 0.7258556 0.7258556 0.8505093 0.7258556
#> [297] 0.6908731 0.7258556 0.8417449 0.8041829 0.9344887 0.7258556 0.9212923 0.6786789
#> [305] 0.6599253 0.7996963 0.6142521 0.7258556 0.7258556 0.7258556 0.6142521 0.7201987
#> [313] 0.7201987 0.7258556 0.7201987 0.7258556 0.7201987 0.8334650 0.7258556 0.7258556
#> [321] 0.7258556 0.7258556 0.7258556 0.8448978 0.8521581 0.7118059 0.7258556 0.9222318
#> [329] 0.7144710 0.7028072 0.6908731 0.9980048 0.7258556 0.7258556 0.7144710 0.7144710
#> [337] 0.8591352 0.7201987 0.7258556 0.9137313 0.9865918 0.7000789 0.6848078
#>
#> $lambda1
#> [1] 0.04887504
#>
#> $lambda2
#> [1] 0.02062609
#>
#> $pi
#> [1] 0.7314639
#>
#> $log_lik
#> [1] -193.9758We can also quickly visualise the difference.
# a custom helper function for plotting
plot_feature_scatter <- function(coords, mat, feature, size = 0.1, output_filename = NULL) {
# Create a data frame from the coordinates and features
plot_data <- data.frame(x = coords[, 1], y = coords[, 2], feature = mat[feature,])
# Create the scatterplot
p <- ggplot(plot_data, aes(x = x, y = y, color = feature)) +
geom_point(size = size, alpha = 0.5) + # Make the dots smaller
theme_minimal() +
labs(title = feature, x = "X Coordinate", y = "Y Coordinate", color = "Feature") +
theme(legend.position = "right") +
scale_color_viridis_c() # Use a color palette with higher contrast
# Save the plot if an output filename is provided
if (!is.null(output_filename)) {
ggsave(output_filename, p, width = 10, height = 8, units = "in", dpi = 300)
}
# Return the plot
return(p)
}orig <- plot_feature_scatter(coords = spatialCoords(spe), log(assay(spe)+1), feature = "EPAS1") + ggtitle('Original')
adj <- plot_feature_scatter(coords = spatialCoords(spe), log(result$adjusted_counts+1), feature = "EPAS1") + ggtitle('DenoIST adjusted')
(orig + adj) + plot_layout(guides = "collect")