Denoising Imaged-based Spatial Transcriptomics data with DenoIST

Aaron Kwok

2025-05-16

Introduction

DenoIST (Denoising Image-based Spatial Transcriptomics) is a method for identifying and removing contamination artefacts in image-based single-cell transcriptomics (IST) data. This vignette shows how to use it with a SpatialExperiment object or a matrix with coordinates as a separate input.

Load data

For demonstration, we will use a small Xenium sample from a lung fibrosis study (Vannan & Lyu et. al, 2025). It can be downloaded at GSE250346.

From the raw Xenium output, we can then construct a SpatialExperiment object with SpatialExperimentIO.

suppressPackageStartupMessages({
  library(DenoIST)
  library(SpatialExperiment)
  library(ggplot2)
  library(patchwork)
})
dir = "/mnt/beegfs/mccarthy/backed_up/general/rlyu/Dataset/LFST_2022/GEO_2025/VUHD116A/relabel_output-XETG00048__0003817__VUHD116A__20230308__003730/outs/"
spe <- readXeniumSXE(dir, returnType = "SPE")
saveRDS(spe, "example_spe.rds")

For this vignette, we will use a pre-saved SpatialExperiment object generated from the code above.

spe <- readRDS('example_spe.rds')
spe
#> class: SpatialExperiment 
#> dim: 343 12915 
#> metadata(4): experiment.xenium transcripts cell_boundaries nucleus_boundaries
#> assays(1): counts
#> rownames(343): ABCC2 ACKR1 ... YAP1 ZEB1
#> rowData names(3): ID Symbol Type
#> colnames(12915): aaaaaaab-1 aaaaaaac-1 ... aaaadchc-1 aaaadchd-1
#> colData names(10): cell_id transcript_counts ... nucleus_area sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(3): NegControlProbe UnassignedCodeword NegControlCodeword
#> spatialCoords names(2) : x_centroid y_centroid
#> imgData names(0):

Since the transcript file is too big to upload, for demonstration we will read in a very small subset to show the what the format is like:

tx <- readRDS('tx_sub.rds')
tx
#>     X            transcript_id             cell_id overlaps_nucleus            feature_name
#> 1   1 VUHD116A_281474976711003 VUHD116A_UNASSIGNED                0 NegControlCodeword_0517
#> 2   2 VUHD116A_281474976711398 VUHD116A_UNASSIGNED                0                  COL1A1
#> 3   3 VUHD116A_281474976711406 VUHD116A_UNASSIGNED                0                     LYZ
#> 4   4 VUHD116A_281474976711409 VUHD116A_UNASSIGNED                0                   LAMP3
#> 5   5 VUHD116A_281474976711415 VUHD116A_UNASSIGNED                0                     LYZ
#> 6   6 VUHD116A_281474976711418 VUHD116A_UNASSIGNED                0                     LYZ
#> 7   7 VUHD116A_281474976711421 VUHD116A_UNASSIGNED                0                   NUCB2
#> 8   8 VUHD116A_281474976712029 VUHD116A_UNASSIGNED                0                   SFTPC
#> 9   9 VUHD116A_281474976712842 VUHD116A_UNASSIGNED                0                  COL1A2
#> 10 10 VUHD116A_281474976712849 VUHD116A_UNASSIGNED                0                  FGFBP2
#>    x_location y_location z_location        qv fov_name nucleus_distance
#> 1  202.896320  184.33257   17.63912  2.910288      Q15         329.0720
#> 2    7.082226  110.41018   12.68721 40.000000      Q15         537.2145
#> 3   76.882706  196.56784   12.74130 40.000000      Q15         439.8191
#> 4  101.830030  107.37660   13.04323 40.000000      Q15         455.0005
#> 5  149.067670  231.72057   12.96732 10.863165      Q15         360.2229
#> 6  175.447590  171.59155   13.25221  3.295905      Q15         359.2819
#> 7  179.364910  233.80177   13.01727 10.531877      Q15         330.7972
#> 8  201.708700   90.84219   12.75298 24.772476      Q15         383.3843
#> 9   73.322300  147.30553   14.10094  7.225322      Q15         461.5779
#> 10 127.075035   61.95987   21.58544  3.041668      Q15         459.5809

Denoising the data

You should only need to use 1 function most of the time, unless you are trying to debug or understand the process. The main function is denoist(), which takes a SpatialExperiment object (or a matrix with coordinates), plus the transcript data frame as input. It will return a list containing the memberships, adjusted counts, and parameters for each gene.

The distance parameter specifies the maximum distance to consider for local background estimation. The nbins parameter specifies the number of bins to use for hexagonal binning, which is used for calculating background transcript contamination. They have default values but you can adjust them based on your data. For example if your data is very small in size then perhaps a lower distance and nbins would be better.

You should also check whether transcript data frame has the correct columns. The tx_x and tx_y parameters specify the column names for the x and y coordinates, respectively. The feature_label parameter specifies the column name for the gene of each transcript. In this example, they are called x_location, y_location and feature_name. You can also speed up the process with more cpus via the cl option (which is highly recommended).

Lastly, you can specify an output directory with out_dir to save the results automatically. If you don’t want to, just leave it empty.

# DenoIST
result <- denoist(mat = spe, tx = tx, coords = NULL, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 12)

Not that this is exactly the same as extracting the matrix and coordinates out manually:

count_mat <- assay(spe)
coords <- spatialCoords(spe)

result <- denoist(mat = count_mat, tx = tx, coords = coords, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 1)

This is useful if you want to run DenoIST on a matrix that is not a SpatialExperiment object.

Check results

The most useful output in result should be the adjusted_counts.

result <- readRDS('result.rds')
result$adjusted_counts[1:5, 1:5]
#> 5 x 5 sparse Matrix of class "dgCMatrix"
#>       aaaaaaab-1 aaaaaaac-1 aaaaaaad-1 aaaaaaae-1 aaaaaaaf-1
#> ABCC2          .          .          .          .          .
#> ACKR1          .          .          .          .          .
#> ACTA2          .          .          .         13          9
#> AGER           .          .          1          .          .
#> AGR3           .          .          .          .          .

The memberships and params are useful for debugging if the adjusted counts are not what you expect.

result$memberships[1:5, 1:5]
#>       aaaaaaab-1 aaaaaaac-1 aaaaaaad-1 aaaaaaae-1 aaaaaaaf-1
#> ABCC2          0          0          0          0          0
#> ACKR1          0          0          0          0          0
#> ACTA2          0          0          0          1          1
#> AGER           0          0          1          0          0
#> AGR3           0          0          0          0          0
result$params[[42]]
#> $memberships
#>   [1] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [45] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [89] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [133] 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [177] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [221] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1
#> [265] 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [309] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 
#> $posterior
#>   [1] 0.7201987 0.7258556 0.5455571 0.6786789 0.7144710 0.7996963 0.7258556 0.6471338
#>   [9] 0.6724878 0.7028072 0.7258556 0.7258556 0.7258556 0.6662361 0.9531125 0.7028072
#>  [17] 0.6908731 0.8448978 0.6908731 0.7144710 0.6968734 0.6908731 0.7028072 0.6724878
#>  [25] 0.7028072 0.7201987 0.5455571 0.7258556 0.6848078 0.7258556 0.6968734 0.7258556
#>  [33] 0.7258556 0.7258556 0.7258556 0.7201987 0.7258556 0.6471338 0.7258556 0.7258556
#>  [41] 0.7258556 0.9656349 0.7258556 0.7258556 0.7258556 0.7258556 0.7258556 0.7258556
#>  [49] 0.8411598 0.7258556 0.7144710 0.7258556 0.7258556 0.7258556 0.6786789 0.9282427
#>  [57] 0.8254747 0.5282724 0.7258556 0.7258556 0.7258556 0.7144710 0.7258556 0.7258556
#>  [65] 0.7144710 0.7258556 0.7258556 0.7201987 0.6471338 0.7258556 0.7086735 0.7258556
#>  [73] 0.7258556 0.7258556 0.8448978 0.7201987 0.6908731 0.6758313 0.8431829 0.7201987
#>  [81] 0.7258556 0.7258556 0.6786789 0.7086735 0.7558684 0.7258556 0.7258556 0.9233165
#>  [89] 0.7258556 0.7258556 0.7258556 0.6848078 0.7258556 0.7610434 0.7258556 0.6908731
#>  [97] 0.7258556 0.7258556 0.7144710 0.6908731 0.6848078 0.5871575 0.7996963 0.8129271
#> [105] 0.7258556 0.7258556 0.7086735 0.9236264 0.8858805 0.7144710 0.6209235 0.7258556
#> [113] 0.8758555 0.7258556 0.8085931 0.7258556 0.7258556 0.9661364 0.7201987 0.9394897
#> [121] 0.7144710 0.8129271 0.7028072 0.7144710 0.8485637 0.7258556 0.7144710 0.9852349
#> [129] 0.7258556 0.7258556 0.7258556 0.7144710 0.7258556 0.6848078 0.7258556 0.7086735
#> [137] 0.7258556 0.6908731 0.5871575 0.7258556 0.7258556 0.7258556 0.7258556 0.7258556
#> [145] 0.6968734 0.9735373 0.8411598 0.7258556 0.7951330 0.6848078 0.7086735 0.7201987
#> [153] 0.8152308 0.7028072 0.8556817 0.7904930 0.7288739 0.6007819 0.8213675 0.7201987
#> [161] 0.7201987 0.7144710 0.7258556 0.7258556 0.7258556 0.7258556 0.7144710 0.7201987
#> [169] 0.7144710 0.7201987 0.7086735 0.9327380 0.6848078 0.7258556 0.7201987 0.6406567
#> [177] 0.7258556 0.6848078 0.8129271 0.6406567 0.7428118 0.6968734 0.7086735 0.7258556
#> [185] 0.8448978 0.7258556 0.7258556 0.7201987 0.7258556 0.7258556 0.7258556 0.7258556
#> [193] 0.6908731 0.7258556 0.7258556 0.7028072 0.7258556 0.4261710 0.7258556 0.7201987
#> [201] 0.7144710 0.7258556 0.7258556 0.7258556 0.6786789 0.8448978 0.7201987 0.7258556
#> [209] 0.7258556 0.7506180 0.7258556 0.9924496 0.7201987 0.8448978 0.7201987 0.6724878
#> [217] 0.7144710 0.7258556 0.7428118 0.7258556 0.7258556 0.7258556 0.7258556 0.9091725
#> [225] 0.7258556 0.8505093 0.7201987 0.7201987 0.7258556 0.7086735 0.6968734 0.6604173
#> [233] 0.6599253 0.7144710 0.7258556 0.6535573 0.7086735 0.8914688 0.6075374 0.7086735
#> [241] 0.8485637 0.7258556 0.8373490 0.7258556 0.6786789 0.8295070 0.9371001 0.8505093
#> [249] 0.7857763 0.9909515 0.2914157 0.7258556 0.7201987 0.5385451 0.6819891 0.7144710
#> [257] 0.7258556 0.7144710 0.9019275 0.7086735 0.8887053 0.6441532 0.7201987 0.7028072
#> [265] 0.7258556 0.7262917 0.7761126 0.7258556 0.6406567 0.5733983 0.7258556 0.7258556
#> [273] 0.4610173 0.3589379 0.7144710 0.7258556 0.6724878 0.7086735 0.6968734 0.6209235
#> [281] 0.7258556 0.7258556 0.6724878 0.2266190 0.4437953 0.6275495 0.6662361 0.6968734
#> [289] 0.8556817 0.7201987 0.8739496 0.7258556 0.7258556 0.7258556 0.8505093 0.7258556
#> [297] 0.6908731 0.7258556 0.8417449 0.8041829 0.9344887 0.7258556 0.9212923 0.6786789
#> [305] 0.6599253 0.7996963 0.6142521 0.7258556 0.7258556 0.7258556 0.6142521 0.7201987
#> [313] 0.7201987 0.7258556 0.7201987 0.7258556 0.7201987 0.8334650 0.7258556 0.7258556
#> [321] 0.7258556 0.7258556 0.7258556 0.8448978 0.8521581 0.7118059 0.7258556 0.9222318
#> [329] 0.7144710 0.7028072 0.6908731 0.9980048 0.7258556 0.7258556 0.7144710 0.7144710
#> [337] 0.8591352 0.7201987 0.7258556 0.9137313 0.9865918 0.7000789 0.6848078
#> 
#> $lambda1
#> [1] 0.04887504
#> 
#> $lambda2
#> [1] 0.02062609
#> 
#> $pi
#> [1] 0.7314639
#> 
#> $log_lik
#> [1] -193.9758

We can also quickly visualise the difference.

# a custom helper function for plotting
plot_feature_scatter <- function(coords, mat, feature, size = 0.1, output_filename = NULL) {
  # Create a data frame from the coordinates and features
  plot_data <- data.frame(x = coords[, 1], y = coords[, 2], feature = mat[feature,])
  
  # Create the scatterplot
  p <- ggplot(plot_data, aes(x = x, y = y, color = feature)) +
    geom_point(size = size, alpha = 0.5) +  # Make the dots smaller
    theme_minimal() +
    labs(title = feature, x = "X Coordinate", y = "Y Coordinate", color = "Feature") +
    theme(legend.position = "right") +
    scale_color_viridis_c()  # Use a color palette with higher contrast
  
  # Save the plot if an output filename is provided
  if (!is.null(output_filename)) {
    ggsave(output_filename, p, width = 10, height = 8, units = "in", dpi = 300)
  }
  
  # Return the plot
  return(p)
}
orig <- plot_feature_scatter(coords = spatialCoords(spe), log(assay(spe)+1), feature = "EPAS1") + ggtitle('Original')
adj <- plot_feature_scatter(coords = spatialCoords(spe), log(result$adjusted_counts+1), feature = "EPAS1") + ggtitle('DenoIST adjusted')
(orig + adj) + plot_layout(guides = "collect")