Last updated: 2025-03-18

Checks: 6 1

Knit directory: cf-eti-bal/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20240504)

The command set.seed(20240504) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: adc05cc

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version adc05cc. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .pki/
    Ignored:    cf-eti-bal/.DS_Store
    Ignored:    cf-eti-bal/.RData
    Ignored:    cf-eti-bal/analysis/.DS_Store
    Ignored:    cf-eti-bal/analysis/.Rhistory
    Ignored:    cf-eti-bal/analysis/ADGRE_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/ADGRE_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/ANNEXIN_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/ANNEXIN_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/APP_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/APP_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/ApoE_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/ApoE_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/CD45_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/CD45_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/CypA_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/CypA_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/FN1_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/FN1_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/GALECTIN_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/GALECTIN_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/GRN_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/GRN_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/MHC-II_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/MHC-II_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/MHC-I_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/MHC-I_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/MIF_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/MIF_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/MK_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/MK_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/PECAM1_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/PECAM1_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/RESISTIN_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/RESISTIN_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/VISFATIN_hierarchy_aggregate.png
    Ignored:    cf-eti-bal/analysis/VISFATIN_hierarchy_individual.png
    Ignored:    cf-eti-bal/analysis/figure/
    Ignored:    cf-eti-bal/data/.DS_Store

Untracked files:
    Untracked:  .Renviron
    Untracked:  .Rprofile
    Untracked:  ._.DS_Store
    Untracked:  ._G000323_Neeland
    Untracked:  .cache/
    Untracked:  .conda/
    Untracked:  .gitattributes
    Untracked:  .gitignore
    Untracked:  .local/
    Untracked:  .subversion/
    Untracked:  C133_Neeland_batch1/
    Untracked:  CF-BAL/
    Untracked:  CF-ETI-scRNA/
    Untracked:  Carraro_2020/
    Untracked:  G000323_Neeland/
    Untracked:  Li_2022/
    Untracked:  Loske_2021/
    Untracked:  SingleCellExperiment_tutorial/
    Untracked:  archive_CF-ETI-scRNA/
    Untracked:  archive_G000323_Neeland/
    Untracked:  archive_cf-eti-bal/
    Untracked:  backup/
    Untracked:  cellranger/
    Untracked:  cf-eti-bal/._.DS_Store
    Untracked:  cf-eti-bal/._RecM.markers.csv
    Untracked:  cf-eti-bal/._Rplots.pdf
    Untracked:  cf-eti-bal/._TRM-CCL.phate.markers.csv
    Untracked:  cf-eti-bal/._TRM-CCL.phate.markers_v250318.csv
    Untracked:  cf-eti-bal/._TRM.markers.csv
    Untracked:  cf-eti-bal/._eti.seurat.metadata.csv
    Untracked:  cf-eti-bal/._pairedRecMProp.csv
    Untracked:  cf-eti-bal/ETI.profibroticScore.fig.jpeg
    Untracked:  cf-eti-bal/RecM.markers.csv
    Untracked:  cf-eti-bal/Rplots.pdf
    Untracked:  cf-eti-bal/TRM-CCL.phate.markers.csv
    Untracked:  cf-eti-bal/TRM-CCL.phate.markers_v250318.csv
    Untracked:  cf-eti-bal/TRM.markers.csv
    Untracked:  cf-eti-bal/analysis/._.DS_Store
    Untracked:  cf-eti-bal/analysis/._04.annotation.Rmd
    Untracked:  cf-eti-bal/analysis/._06.TRM_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/._10.ETI_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/01.demux_BP1.Rmd
    Untracked:  cf-eti-bal/analysis/01.demux_BP2.Rmd
    Untracked:  cf-eti-bal/analysis/01.demux_Pi1.Rmd
    Untracked:  cf-eti-bal/analysis/01.demux_capture1.Rmd
    Untracked:  cf-eti-bal/analysis/02.qc_BP1.Rmd
    Untracked:  cf-eti-bal/analysis/02.qc_BP2.Rmd
    Untracked:  cf-eti-bal/analysis/02.qc_Pi1.Rmd
    Untracked:  cf-eti-bal/analysis/02.qc_capture1.Rmd
    Untracked:  cf-eti-bal/analysis/03.combine.Rmd
    Untracked:  cf-eti-bal/analysis/04.annotation.Rmd
    Untracked:  cf-eti-bal/analysis/05.RecM_analysis.v2.Rmd
    Untracked:  cf-eti-bal/analysis/06.TRM_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/06.TRM_analysis.v2.Rmd
    Untracked:  cf-eti-bal/analysis/07.Epithelial_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/08.TNK_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/09.CD8T_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/09.Others_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/09.cellchat.Rmd
    Untracked:  cf-eti-bal/analysis/09.cellchat.v2.Rmd
    Untracked:  cf-eti-bal/analysis/10.ETI_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/11.MDM-PLA2G7_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/12.TRM_public_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/13.TRM-CCL_analysis.Rmd
    Untracked:  cf-eti-bal/analysis/ADGRE_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/ADGRE_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/ANNEXIN_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/ANNEXIN_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/APP_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/APP_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/ApoE_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/ApoE_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/CD45_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/CD45_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/CypA_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/CypA_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/FN1_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/FN1_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/GALECTIN_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/GALECTIN_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/GRN_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/GRN_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/LAMININ_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/MHC-II_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/MHC-II_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/MHC-I_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/MHC-I_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/MIF_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/MIF_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/MK_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/MK_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/PECAM1_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/PECAM1_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/PHATE_check.Rmd
    Untracked:  cf-eti-bal/analysis/RESISTIN_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/RESISTIN_hierarchy_individual.svg
    Untracked:  cf-eti-bal/analysis/TrialandError.Rmd
    Untracked:  cf-eti-bal/analysis/VISFATIN_hierarchy_aggregate.svg
    Untracked:  cf-eti-bal/analysis/VISFATIN_hierarchy_individual.svg
    Untracked:  cf-eti-bal/before.csv
    Untracked:  cf-eti-bal/capture.list
    Untracked:  cf-eti-bal/clustering.pbs
    Untracked:  cf-eti-bal/code/._bal_RecM.integration_clustering.R
    Untracked:  cf-eti-bal/code/adam.process.R
    Untracked:  cf-eti-bal/code/annotation_HLCAv2.R
    Untracked:  cf-eti-bal/code/azimuth.pbs
    Untracked:  cf-eti-bal/code/bal.manual_annot.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_CD8T.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_Epithelial.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_Epithelial.integration_clustering.round2.R
    Untracked:  cf-eti-bal/code/bal_GOenrichment.R
    Untracked:  cf-eti-bal/code/bal_GOenrichment_noETI.R
    Untracked:  cf-eti-bal/code/bal_GOenrichment_v2.R
    Untracked:  cf-eti-bal/code/bal_GOenrichment_v3.R
    Untracked:  cf-eti-bal/code/bal_GOenrichment_v4.R
    Untracked:  cf-eti-bal/code/bal_Others.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_RecM.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_RecM.public_integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_RecM_Control.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_T.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_T.integration_clustering.round2.R
    Untracked:  cf-eti-bal/code/bal_TNK.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_TNK.integration_clustering.round2.R
    Untracked:  cf-eti-bal/code/bal_TNK.integration_clustering.round3.R
    Untracked:  cf-eti-bal/code/bal_TRM-CCL.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_TRM.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_TRM.public.CF_integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_TRM.public.HC_integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_TRM.public_integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_TRMimmune.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_mac.integration_clustering.R
    Untracked:  cf-eti-bal/code/bal_mac.merge_clustering.R
    Untracked:  cf-eti-bal/code/bal_mac.merge_clustering.round2.R
    Untracked:  cf-eti-bal/code/bal_others.merge_clustering.R
    Untracked:  cf-eti-bal/code/bal_tnk.merge_clustering.R
    Untracked:  cf-eti-bal/code/bal_tnk.merge_clustering.round2.R
    Untracked:  cf-eti-bal/code/cellbender.pbs
    Untracked:  cf-eti-bal/code/cellbender.sh
    Untracked:  cf-eti-bal/code/cellranger-multi.sh
    Untracked:  cf-eti-bal/code/cellsnp-lite.pbs
    Untracked:  cf-eti-bal/code/cellsnp-lite.sh
    Untracked:  cf-eti-bal/code/cellsnp-lite_emptyDrops.pbs
    Untracked:  cf-eti-bal/code/cellsnp-lite_emptyDrops.sh
    Untracked:  cf-eti-bal/code/clustering.R
    Untracked:  cf-eti-bal/code/clustering_try.R
    Untracked:  cf-eti-bal/code/integration.R
    Untracked:  cf-eti-bal/code/integration_batch.R
    Untracked:  cf-eti-bal/code/integration_batchID.R
    Untracked:  cf-eti-bal/code/integration_clustering.R
    Untracked:  cf-eti-bal/code/mac.subcluster.R
    Untracked:  cf-eti-bal/code/qsub_azimuth.sh
    Untracked:  cf-eti-bal/code/qsub_cellbender.sh
    Untracked:  cf-eti-bal/code/qsub_cellsnp-lite.sh
    Untracked:  cf-eti-bal/code/qsub_cellsnp-lite_emptyDrops.sh
    Untracked:  cf-eti-bal/code/qsub_vireo.sh
    Untracked:  cf-eti-bal/code/qsub_vireo_emptyDrops.sh
    Untracked:  cf-eti-bal/code/vireo.pbs
    Untracked:  cf-eti-bal/code/vireo.sh
    Untracked:  cf-eti-bal/code/vireo_emptyDrops.pbs
    Untracked:  cf-eti-bal/code/vireo_emptyDrops.sh
    Untracked:  cf-eti-bal/code/writeDE.R
    Untracked:  cf-eti-bal/data.use.csv
    Untracked:  cf-eti-bal/data/._.DS_Store
    Untracked:  cf-eti-bal/data/._Fig2B.raw.csv
    Untracked:  cf-eti-bal/data/._Fig2B.stat.CFvsHC.csv
    Untracked:  cf-eti-bal/data/._Fig2B.stat.csv
    Untracked:  cf-eti-bal/data/._Fig3A.raw.csv
    Untracked:  cf-eti-bal/data/._Fig3A.stat.csv
    Untracked:  cf-eti-bal/data/._Li.stat.csv
    Untracked:  cf-eti-bal/data/._Liao.stat.csv
    Untracked:  cf-eti-bal/data/._Morrell.stat.csv
    Untracked:  cf-eti-bal/data/._TRM.phate.Cepo.markers.csv
    Untracked:  cf-eti-bal/data/._TableE2.csv
    Untracked:  cf-eti-bal/data/._Wendisch.stat.csv
    Untracked:  cf-eti-bal/data/._cftr.exp.csv
    Untracked:  cf-eti-bal/data/CellChat/
    Untracked:  cf-eti-bal/data/DE/
    Untracked:  cf-eti-bal/data/Fig2B.raw.csv
    Untracked:  cf-eti-bal/data/Fig2B.stat.CFvsHC.csv
    Untracked:  cf-eti-bal/data/Fig2B.stat.csv
    Untracked:  cf-eti-bal/data/Fig3A.raw.csv
    Untracked:  cf-eti-bal/data/Fig3A.stat.csv
    Untracked:  cf-eti-bal/data/FindAllMarkers/
    Untracked:  cf-eti-bal/data/GOBP/
    Untracked:  cf-eti-bal/data/Li.stat.csv
    Untracked:  cf-eti-bal/data/Liao.stat.csv
    Untracked:  cf-eti-bal/data/Morrell.stat.csv
    Untracked:  cf-eti-bal/data/SCEs/
    Untracked:  cf-eti-bal/data/TRM.phate.Cepo.markers.csv
    Untracked:  cf-eti-bal/data/TableE2.csv
    Untracked:  cf-eti-bal/data/Wendisch.stat.csv
    Untracked:  cf-eti-bal/data/cellbender/
    Untracked:  cf-eti-bal/data/cellranger/
    Untracked:  cf-eti-bal/data/cellsnp-lite/
    Untracked:  cf-eti-bal/data/cellsnp-lite_emptyDrops/
    Untracked:  cf-eti-bal/data/cftr.exp.csv
    Untracked:  cf-eti-bal/data/emptyDrops/
    Untracked:  cf-eti-bal/data/pathway/
    Untracked:  cf-eti-bal/data/plots/
    Untracked:  cf-eti-bal/data/public_datasets/
    Untracked:  cf-eti-bal/data/sample_sheets/
    Untracked:  cf-eti-bal/data/tradeSeq/
    Untracked:  cf-eti-bal/data/tradeseq/
    Untracked:  cf-eti-bal/data/vireo/
    Untracked:  cf-eti-bal/data/vireo_emptyDrops/
    Untracked:  cf-eti-bal/eti.seurat.metadata.csv
    Untracked:  cf-eti-bal/integration.pbs
    Untracked:  cf-eti-bal/minmax.csv
    Untracked:  cf-eti-bal/notes/
    Untracked:  cf-eti-bal/obsolete/
    Untracked:  cf-eti-bal/pairedRecMProp.csv
    Untracked:  cf-eti-bal/pairedRecMProp.jpeg
    Untracked:  cf-eti-bal/phate_batch.jpeg
    Untracked:  cf-eti-bal/phate_dens_batch.jpeg
    Untracked:  cf-eti-bal/phate_dens_condition.jpeg
    Untracked:  cf-eti-bal/phate_fastMNN.jpeg
    Untracked:  cf-eti-bal/qsub_clustering.sh
    Untracked:  cf-eti-bal/qsub_integration.sh
    Untracked:  cf-eti-bal/queue_log/
    Untracked:  cf-eti-bal/vireo.list
    Untracked:  cf-eti-bal/workflowr.sh
    Untracked:  cf-eti-bal/zscore.csv
    Untracked:  cf-pbmc-bal/
    Untracked:  chapter3/
    Untracked:  chapter4/
    Untracked:  healthy-bal/
    Untracked:  presentation/

Staged changes:
    New:        cf-eti-bal/.Rprofile
    New:        cf-eti-bal/.gitattributes
    New:        cf-eti-bal/.gitignore
    New:        cf-eti-bal/README.md
    New:        cf-eti-bal/_workflowr.yml
    New:        cf-eti-bal/analysis/about.Rmd
    New:        cf-eti-bal/analysis/index.Rmd
    New:        cf-eti-bal/analysis/license.Rmd
    New:        cf-eti-bal/cf-eti-bal.Rproj
    New:        cf-eti-bal/code/README.md
    New:        cf-eti-bal/data/README.md
    New:        cf-eti-bal/output/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
xaringanExtra::use_panelset()

1 Load packages

suppressPackageStartupMessages({
library(DropletUtils)
library(here)
library(ggplot2)
library(Seurat)
library(cowplot)
library(patchwork)
library(scater)
library(dplyr)
library(vcfR)
library(janitor)
library(stringr)
})
set.seed(1990)

2 Set names for this pool

# Specify batch name
batch_name <- "BP1"
num_of_captures <- 3

# Specify capture name
capture_names <- c(paste0(batch_name, "-c",1:num_of_captures))
capture_names <- setNames(capture_names, capture_names)

# Assign sample ID to HTO ID
# Manually list sample names matching HTO id (in alphabetical order)
samples <- c("M1C198B", #HTO3
             "M1C208A", #HTO6
             "M1C166(1)", #HTO7
             "M1C199B", #HTO8
             "M1C207B", #HTO10
             "M1N087", #HTO12
             "M1N080" #HTO13
             )

3 Read CellBender outputs

Ambient RNA and empty droplets were removed by CellBender (see code/cellbender.sh).

captures <- setNames(
  here("data",
       "cellbender",
       capture_names,
       paste0(capture_names,".cellbender_filtered.h5")),
  capture_names)

sce <- read10xCounts(samples = captures, col.names = TRUE)
stopifnot(!anyDuplicated(colnames(sce)))

sce <- splitAltExps(
  sce,
  rowData(sce)$Type,
  "Gene Expression")

# Tidy up colData
sce$Capture <- factor(sce$Sample)
capture_names <- levels(sce$Capture)
capture_names <- setNames(capture_names, capture_names)
sce$Sample <- NULL

# Number of droplets after CellBender
dim(sce)

[1] 36601 62050

4 HTO demultiplexing

HTODemux from the Seurat package was used to demultiplex droplets. Demultiplexing was performed separately on each capture.

4.1 Prepare HTO data

# change DelayedMatrix to dgCMatrix
assay(sce, "counts") <- as(assay(sce, "counts"), "dgCMatrix")

# Read features.csv used in cellranger multi
features <- read.csv(here("data",
                          "sample_sheets",
                          paste0(batch_name,".features.csv")))

# HTO id are always in alphabetical order
features

            id         name read pattern        sequence     feature_type
1  Human_HTO_3  Human_HTO_3   R2  5P(BC) TTCCGCCTCTCTTTG Antibody Capture
2  Human_HTO_6  Human_HTO_6   R2  5P(BC) GGTTGCCAGATGTCA Antibody Capture
3  Human_HTO_7  Human_HTO_7   R2  5P(BC) TGTCTTTCCTGCCAG Antibody Capture
4  Human_HTO_8  Human_HTO_8   R2  5P(BC) CTCCTCTGCAATTAC Antibody Capture
5 Human_HTO_10 Human_HTO_10   R2  5P(BC) ATTGACCCGCGTTAG Antibody Capture
6 Human_HTO_12 Human_HTO_12   R2  5P(BC) TAACGACCAGCCATA Antibody Capture
7 Human_HTO_13 Human_HTO_13   R2  5P(BC) AAATCTCTCAGGCTC Antibody Capture

# Rename assay
is_hto <- grepl("^Human_HTO", rownames(altExp(sce, "Antibody Capture")))
altExp(sce, "HTO") <- altExp(sce, "Antibody Capture")[is_hto, ]
altExp(sce, "Antibody Capture") <- NULL

4.2 HTO Demultiplexing

# load in umi and hto matrix
umis <- counts(sce)
htos <- counts(altExp(sce))

# Select cell barcodes detected by both RNA and HTO
joint.bcs <- intersect(colnames(umis), colnames(htos))

# Subset RNA and HTO counts by joint cell barcodes
umis <- umis[, joint.bcs]
htos <- as.matrix(htos[, joint.bcs])

# Confirm that the HTO have the correct names
rownames(htos)

[1] "Human_HTO_3"  "Human_HTO_6"  "Human_HTO_7"  "Human_HTO_8"  "Human_HTO_10"
[6] "Human_HTO_12" "Human_HTO_13"

# Perform demultiplexing separately on each capture
obj.list <- list()

lappend <- function (lst, ...){
  lst <- c(lst, list(...))
  return(lst)
}

for (cn in capture_names) {
  message(cn)
  umi = as.matrix(umis[, sce$Capture == cn])
  hto = as.matrix(htos[, sce$Capture == cn])
  
  # Setup Seurat object
  hashtag <- CreateSeuratObject(counts = umi)
  
  # Normalize RNA data with log normalization
  hashtag <- NormalizeData(hashtag)
  
  # Find and scale variable features
  hashtag <- FindVariableFeatures(hashtag, selection.method = "mean.var.plot")
  hashtag <- ScaleData(hashtag, features = VariableFeatures(hashtag))
  
  # Add HTO data as a new assay independent from RNA
  hashtag[["HTO"]] <- CreateAssayObject(counts = hto)
  
  # Normalize HTO data, here we use centered log-ratio (CLR) transformation
  hashtag <- NormalizeData(hashtag, assay = "HTO", normalization.method = "CLR")
  
  # Demultiplex cells based on HTO enrichment
  hashtag <- HTODemux(hashtag, assay = "HTO", positive.quantile = 0.99)
  
  # Rename hash.ID column 
  hashtag$hash.ID <- factor(gsub("-","_",hashtag$hash.ID), 
                            levels = c(features$id, "Doublet", "Negative"))
  obj.list <- lappend(obj.list, hashtag)
}

names(obj.list) <- capture_names
hashtag <- NULL

4.3 Add HTO demultiplexing results to SCE

# add hash.ID column to SCE object
hash.ID <- Reduce(c,
                  lapply(obj.list, function(x) 
                    setNames(x@meta.data$hash.ID, colnames(x))))
sce$HTODemux.ID <- hash.ID

# Store HTODemux results to SCE object
sce$HTODemux_result <- bind_rows(lapply(obj.list, function(x) x@meta.data))

# Add sample ID to SCE object
sampleID.HTO <- setNames(c(samples,"Doublet","Negative"), levels(sce$HTODemux.ID))
sampleID.HTO <- sampleID.HTO[sce$HTODemux.ID]
names(sampleID.HTO) <- colnames(sce)
sce$sampleID.HTO <- factor(sampleID.HTO, levels=c(samples,"Doublet","Negative"))

5 Visualize HTO demultiplexing results

Some code in this section are adapted from internal code by Dr. Peter F. Hickey.

# add colour
sce$colours <- S4Vectors::make_zero_col_DFrame(ncol(sce))

hto_colours <- setNames(
  c(palette.colors(nlevels(sce$HTODemux.ID)-2, "Tableau 10"),"#ced4da","#6c757d"),
  levels(sce$HTODemux.ID))
sce$colours$hto_colours <- hto_colours[sce$HTODemux.ID]

sample_colours <- setNames(
  c(palette.colors(nlevels(sce$sampleID.HTO)-2, "Tableau 10"),"#ced4da","#6c757d"),
  levels(sce$sampleID.HTO))
sce$colours$sample_colours <- sample_colours[sce$sampleID.HTO]

capture_colours <- setNames(
  palette.colors(nlevels(sce$Capture), "Accent"),
  levels(sce$Capture))
sce$colours$capture_colours <- capture_colours[sce$Capture]

5.1 Number of singlets

BP1-c1

table(obj.list$`BP1-c1`$HTO_classification.global)


 Doublet Negative  Singlet 
    8382     4778    10705

BP1-c2

table(obj.list$`BP1-c2`$HTO_classification.global)


 Doublet Negative  Singlet 
    2426     5413    10396

BP1-c3

table(obj.list$`BP1-c3`$HTO_classification.global)


 Doublet Negative  Singlet 
    2944     6229    10777

5.2 HTO expression in ridge plot

BP1-c1

# Group cells based on the max HTO signal
Idents(obj.list$`BP1-c1`) <- "hash.ID"
RidgePlot(obj.list$`BP1-c1`, assay = "HTO", 
          features = rownames(obj.list$`BP1-c1`[["HTO"]]))

BP1-c2

# Group cells based on the max HTO signal
Idents(obj.list$`BP1-c2`) <- "hash.ID"
RidgePlot(obj.list$`BP1-c2`, assay = "HTO", 
          features = rownames(obj.list$`BP1-c2`[["HTO"]]))

BP1-c3

# Group cells based on the max HTO signal
Idents(obj.list$`BP1-c3`) <- "hash.ID"
RidgePlot(obj.list$`BP1-c3`, assay = "HTO", 
          features = rownames(obj.list$`BP1-c3`[["HTO"]]))

5.3 HTO expression in heatmap

BP1-c1

HTOHeatmap(obj.list$`BP1-c1`, assay = "HTO", ncells = 5000) + 
  ggtitle("BP1-c1") + 
  theme(plot.title = element_text(hjust=0.5, size=10))

BP1-c2

HTOHeatmap(obj.list$`BP1-c2`, assay = "HTO", ncells = 5000) + 
  ggtitle("BP1-c2") + 
  theme(plot.title = element_text(hjust=0.5, size=12))

BP1-c3

HTOHeatmap(obj.list$`BP1-c3`, assay = "HTO", ncells = 5000) + 
  ggtitle("BP1-c3") + 
  theme(plot.title = element_text(hjust=0.5, size=12))

5.4 tSNE embeddings for HTOs

BP1-c1

# remove negative cells from the object
hashtag.subset <- subset(obj.list$`BP1-c1`, idents = "Negative", invert = TRUE)

# calculate a tSNE embedding of the HTO data
DefaultAssay(hashtag.subset) <- "HTO"
hashtag.subset <- ScaleData(hashtag.subset, features = rownames(hashtag.subset))
hashtag.subset <- RunPCA(hashtag.subset, features = rownames(hashtag.subset), approx = FALSE)
hashtag.subset <- RunTSNE(hashtag.subset, dims = 1:length(rownames(hashtag.subset)), 
                          perplexity = 100, check_duplicates = FALSE)

# tSNE plot
DimPlot(hashtag.subset)

BP1-c2

# remove negative cells from the object
hashtag.subset <- subset(obj.list$`BP1-c2`, idents = "Negative", invert = TRUE)

# calculate a tSNE embedding of the HTO data
DefaultAssay(hashtag.subset) <- "HTO"
hashtag.subset <- ScaleData(hashtag.subset, features = rownames(hashtag.subset))
hashtag.subset <- RunPCA(hashtag.subset, features = rownames(hashtag.subset), approx = FALSE)
hashtag.subset <- RunTSNE(hashtag.subset, dims = 1:length(rownames(hashtag.subset)), 
                          perplexity = 100, check_duplicates = FALSE)

# tSNE plot
DimPlot(hashtag.subset)

BP1-c3

# remove negative cells from the object
hashtag.subset <- subset(obj.list$`BP1-c3`, idents = "Negative", invert = TRUE)

# calculate a tSNE embedding of the HTO data
DefaultAssay(hashtag.subset) <- "HTO"
hashtag.subset <- ScaleData(hashtag.subset, features = rownames(hashtag.subset))
hashtag.subset <- RunPCA(hashtag.subset, features = rownames(hashtag.subset), approx = FALSE)
hashtag.subset <- RunTSNE(hashtag.subset, dims = 1:length(rownames(hashtag.subset)), 
                          perplexity = 100, check_duplicates = FALSE)

# tSNE plot
DimPlot(hashtag.subset)

5.5 Number of demultiplexed droplets

df <- as.data.frame(colData(sce))

Named by HTO

# Number of droplets per HTO classification
p1 <- ggplot(df) + 
  geom_bar(
    aes(
      x = factor(hash.ID, levels(sce$HTODemux.ID)),
      fill=hash.ID), 
    position = position_stack(reverse = TRUE)) + 
  geom_text(stat='count', aes(x = hash.ID, label=..count..), hjust=1, size=2.5) +
  coord_flip() +
  scale_fill_manual(values = hto_colours) +
  ylab("Number of droplets") +
  xlab("HTODemux classification") +
  theme_cowplot(font_size = 8) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

p1 + NoLegend() + p1 + facet_grid(~sce$Capture) + plot_layout(widths=c(1, length(capture_names)))

Named by sample ID

# Number of droplets per sample
p2 <- ggplot(df) + 
  geom_bar(
    aes(
      x = factor(sampleID.HTO, levels(sce$sampleID.HTO)),
      fill=sampleID.HTO), 
    position = position_stack(reverse = TRUE)) + 
  geom_text(stat='count', aes(x = sampleID.HTO, label=..count..), hjust=1, size=2.5) +
  coord_flip() +
  scale_fill_manual(values = sample_colours) +
  ylab("Number of droplets") +
  xlab("Sample ID") +
  theme_cowplot(font_size = 8) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

p2 + NoLegend() + p2 + facet_grid(~sce$Capture) + plot_layout(widths=c(1, length(capture_names)))

5.6 Proportion of singlets per capture

p3 <- ggplot(df) + 
  geom_bar(
    aes(
      x = Capture,
      fill= HTODemux_result.HTO_classification.global), 
    position = position_fill(reverse = TRUE)) + 
  coord_flip() +
  scale_fill_manual(values = setNames(c("#6c757d","#ced4da","#cfe1b9"),
                                      c("Negative","Doublet","Singlet"))) +
  ylab("Proportion of droplets") +
  xlab("HTODemux classification") +
  theme_cowplot(font_size = 8) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

p3 +
  guides(fill=guide_legend(title="HTO_classification.global"))

6 Genetic demultiplexing

Some code in this section are adapted from internal code by Dr. Peter F. Hickey.

SNPs were called from each capture using cellsnp-lite v1.2.3. Based on the SNP information, cells were assigned to genetic donors using vireo v0.5.8.

6.1 Read vireo results

vireo_df <- do.call(
  rbind,
  lapply(capture_names, function(cn) {
    message(cn)
    vireo_df <- read.table(
      here("data", "vireo", cn, "donor_ids.tsv"),
      header = TRUE)
    # Rename 'doublet' and 'unassigned' to match the terms used in HTODemux 
    vireo_df$donor_id <- gsub("doublet", "Doublet", vireo_df$donor_id)
    vireo_df$donor_id <- gsub("unassigned", "Negative", vireo_df$donor_id)
    vireo_df$donor_id <- paste0(cn, "_", vireo_df$donor_id)
    captureNumber <- str_sub(cn, start= -1)
    vireo_df$colname <- paste0(captureNumber, "_", vireo_df$cell)
    # reorder columns to matches SCE.
    j <- match(colnames(sce)[sce$Capture == cn], vireo_df$colname)
    stopifnot(!anyNA(j))
    vireo_df <- vireo_df[j, ]
  }))

6.2 Summarise vireo results

knitr::kable(
  tabyl(vireo_df, donor_id) %>%
    adorn_pct_formatting(1),
  caption = "Assignment of droplets to donors using vireo.")

Assignment of droplets to donors using vireo.
donor_id	n	percent
BP1-c1_Doublet	2733	4.4%
BP1-c1_Negative	3871	6.2%
BP1-c1_donor0	3159	5.1%
BP1-c1_donor1	2358	3.8%
BP1-c1_donor2	2025	3.3%
BP1-c1_donor3	4526	7.3%
BP1-c1_donor4	2571	4.1%
BP1-c1_donor5	1794	2.9%
BP1-c1_donor6	828	1.3%
BP1-c2_Doublet	1652	2.7%
BP1-c2_Negative	2928	4.7%
BP1-c2_donor0	2428	3.9%
BP1-c2_donor1	1829	2.9%
BP1-c2_donor2	3604	5.8%
BP1-c2_donor3	1678	2.7%
BP1-c2_donor4	642	1.0%
BP1-c2_donor5	1509	2.4%
BP1-c2_donor6	1965	3.2%
BP1-c3_Doublet	2200	3.5%
BP1-c3_Negative	2803	4.5%
BP1-c3_donor0	769	1.2%
BP1-c3_donor1	1668	2.7%
BP1-c3_donor2	2006	3.2%
BP1-c3_donor3	3831	6.2%
BP1-c3_donor4	1741	2.8%
BP1-c3_donor5	2659	4.3%
BP1-c3_donor6	2273	3.7%

6.3 Identify the best match between donors from the combination of captures

# capture 1 is used as base
cn1 <- capture_names[1]

# read vcf
f1 <- here("data","vireo",cn1,"GT_donors.vireo.vcf.gz")
x1 <- read.vcfR(f1, verbose=FALSE)

# create unique ID for each locus in each capture.
y1 <- paste(
  x1@fix[,"CHROM"],
  x1@fix[,"POS"],
  x1@fix[,"REF"],
  x1@fix[,"ALT"],
  sep = "_")

# match donors of every remaining captures to capture 1
f.best_match_df <- data.frame()

for (cn2 in capture_names[2:length(capture_names)]) {
  
  # read vcf
  f2 <- here("data","vireo",cn2,"GT_donors.vireo.vcf.gz")
  x2 <- read.vcfR(f2, verbose=FALSE)
  
  # create unique ID for each locus in each capture.
  y2 <- paste(
    x2@fix[,"CHROM"],
    x2@fix[,"POS"],
    x2@fix[,"REF"],
    x2@fix[,"ALT"],
    sep = "_")
  
  # only keep the loci in common between the 2 captures.
  i1 <- na.omit(match(y2, y1))
  i2 <- na.omit(match(y1, y2))
  
  # construct genotype matrix at common loci from the 2 captures.
  donor_names <- colnames(x1@gt)[-1]
  
  g1 <- apply(
    x1@gt[i1, donor_names],
    2,
    function(x) sapply(strsplit(x, ":"), `[[`, 1))
  g2 <- apply(
    x2@gt[i2, donor_names],
    2,
    function(x) sapply(strsplit(x, ":"), `[[`, 1))
  
  # count number of genotype matches between pairs of donors (one from each 
  # capture) and convert to a proportion.
  z <- matrix(
    NA_real_,
    nrow = length(donor_names),
    ncol = length(donor_names),
    dimnames = list(donor_names, donor_names))
  
  for (i in rownames(z)) {
    for (j in colnames(z)) {
      z[i, j] <- sum(g1[, i] == g2[, j]) / nrow(g1)
    }
  }
  rownames(z) <- paste(cn1, rownames(z),sep="_")
  colnames(z) <- paste(cn2, colnames(z),sep="_")
  
  # look for the best match between donors
  best_match_df <- data.frame(
    cn1 = rownames(z),
    cn2 = apply(
      z, 1,
      function(x) colnames(z)[which.max(x)]),
    check.names = FALSE)
  
  knitr::kable(
    dplyr::select(best_match_df, everything()),
    caption = paste0("Best match between donors between ", 
                     cn1, " and ", cn2, "."),
    row.names = FALSE)
  
  stopifnot(identical(colnames(sce), vireo_df$colname))
  
  # visualize the best match in a heatmap
  pheatmap::pheatmap(
    z,
    color = viridisLite::inferno(101),
    cluster_rows = FALSE,
    cluster_cols = FALSE,
    main = paste0("Proportion of matching genotypes between ", 
                  cn1, " and ", cn2, "."))
  
  # add matching results to the final best match data frame
  if (nrow(f.best_match_df)==0) {
    f.best_match_df <- best_match_df
  } else {
    f.best_match_df <- f.best_match_df %>% 
      mutate(c=best_match_df$cn2)
    colnames(f.best_match_df) <- paste0("cn",1:ncol(f.best_match_df))
  }
}

# add genetic donor name
f.best_match_df$genetic_donor <- paste0("donor_", LETTERS[1:length(donor_names)])

knitr::kable(
  dplyr::select(f.best_match_df, genetic_donor, everything()),
  caption = "Best match between donors from the two captures.",
  row.names = FALSE)

Best match between donors from the two captures.
genetic_donor	cn1	cn2	cn3
donor_A	BP1-c1_donor0	BP1-c2_donor0	BP1-c3_donor5
donor_B	BP1-c1_donor1	BP1-c2_donor1	BP1-c3_donor2
donor_C	BP1-c1_donor2	BP1-c2_donor3	BP1-c3_donor4
donor_D	BP1-c1_donor3	BP1-c2_donor2	BP1-c3_donor3
donor_E	BP1-c1_donor4	BP1-c2_donor6	BP1-c3_donor6
donor_F	BP1-c1_donor5	BP1-c2_donor5	BP1-c3_donor1
donor_G	BP1-c1_donor6	BP1-c2_donor4	BP1-c3_donor0

# transform to a long data frame 
best_match_long_df <- tidyr::pivot_longer(
  data=f.best_match_df,
  cols=starts_with("cn")) %>%
  dplyr::select(genetic_donor, value)

6.4 Add genetic demultiplexing results to SCE

# add genetic donor to SCE object
sce$vireo <- DataFrame(
  vireo_df[, setdiff(colnames(vireo_df), c("cell", "colname"))])

sce$genetic_donor <- left_join(
  vireo_df,
  best_match_long_df, 
  by = c("donor_id" = "value")) %>%
  mutate(
    genetic_donor = factor(
      case_when(
        is.na(genetic_donor) ~ sub(paste0(batch_name,"-c[0-9]_"), "", donor_id),
        TRUE ~ genetic_donor),
      levels = c(paste0("donor_", LETTERS[1:length(donor_names)]), "Doublet", "Negative"))) %>%
  pull(genetic_donor)

# generate a table of HTO to genetic droplets
tb <- as.data.frame.matrix(
  table(as.data.frame(colData(sce)[, c("sampleID.HTO", "genetic_donor")]))) %>%
  dplyr::select(1:length(donor_names)) %>% # discard columns of Doublets and Negatives
  slice_head(n=length(donor_names)) # discard rows of Doublets and Negatives

# match donor to sample ID according to the best match with HTO
sampleID.genetics <- setNames(c(rownames(tb)[apply(tb, MARGIN = 2, which.max)],
                                "Doublet","Negative"),
                              c(colnames(tb),
                                "Doublet","Negative"))
sampleID.genetics <- sampleID.genetics[sce$genetic_donor]
names(sampleID.genetics) <- colnames(sce)
sce$sampleID.genetics <- factor(sampleID.genetics, 
                                levels=c(samples,"Doublet","Negative"))

7 Visualize genetic demultiplexing results

7.1 Table of number of droplets

janitor::tabyl(
  as.data.frame(colData(sce)[, c("HTODemux.ID", "genetic_donor")]),
  HTODemux.ID,
  genetic_donor) |>
  adorn_title(placement = "combined") |>
  adorn_totals("both") |>
  knitr::kable(
    caption = "Number of droplets assigned to each `HTO`/`Genetic donor` combination.")

Number of droplets assigned to each `HTO`/`Genetic donor` combination.
HTODemux.ID/genetic_donor	donor_A	donor_B	donor_C	donor_D	donor_E	donor_F	donor_G	Doublet	Negative	Total
Human_HTO_3	4917	42	7	32	34	8	6	353	446	5845
Human_HTO_6	26	64	12	5965	45	21	2	376	1141	7652
Human_HTO_7	50	102	3398	91	77	23	9	214	937	4901
Human_HTO_8	41	108	20	98	90	38	1625	167	859	3046
Human_HTO_10	34	63	32	84	58	2752	13	174	1279	4489
Human_HTO_12	19	50	10	28	2771	13	7	203	501	3602
Human_HTO_13	24	1819	9	19	25	6	1	144	296	2343
Doublet	1567	796	1092	2101	926	631	340	4268	2031	13752
Negative	1568	3149	864	3543	2783	1479	236	686	2112	16420
Total	8246	6193	5444	11961	6809	4971	2239	6585	9602	62050

7.2 Number of droplets (named by HTO and genetic donor)

# add colour
genetic_donor_colours <- setNames(
  c(palette.colors(nlevels(sce$genetic_donor)-2, "Set3"), "#d4a276","#9c6644"),
  levels(sce$genetic_donor))
sce$colours$genetic_donor_colours <- genetic_donor_colours[sce$genetic_donor]

# number of droplets assigned by HTO method
p1 <- ggcells(sce) + 
  geom_bar(aes(x = HTODemux.ID, fill = HTODemux.ID)) + 
  geom_text(stat='count', aes(x = HTODemux.ID, label=..count..), 
            hjust=1, size=2) +
  coord_flip() + 
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,16000,4000),limits=c(0,18000)) +
  scale_fill_manual(values = sce$colours$hto_colours) 

p1.facet <- ggcells(sce) + 
  geom_bar(aes(x = HTODemux.ID, fill = HTODemux.ID)) + 
  geom_text(stat='count', aes(x = HTODemux.ID, label=..count..), 
            hjust=1, size=2) +
  #ggtitle("By genetics") +
  ylab("Number of droplets") + 
  facet_grid(~Capture, scales = "fixed", space = "fixed") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,8000,2000), limits = c(0,10000)) +
  scale_fill_manual(values = sce$colours$hto_colours) +
  theme(axis.title.y = element_blank()) +
  coord_flip()

# number of droplets assigned by genetic method
p2 <- ggcells(sce) + 
  geom_bar(aes(x = genetic_donor, fill = genetic_donor)) + 
  geom_text(stat='count', aes(x = genetic_donor, label=..count..,),
            hjust=1, size=2) +
  coord_flip() + 
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,16000,4000),limits=c(0,18000)) +
  scale_fill_manual(values = sce$colours$genetic_donor_colours)

p2.facet <- ggcells(sce) + 
  geom_bar(aes(x = genetic_donor, fill = genetic_donor)) + 
  geom_text(stat='count', aes(x = genetic_donor, label=..count..), 
            hjust=1, size=2) +
  #ggtitle("By genetics") +
  ylab("Number of droplets") + 
  facet_grid(~Capture, scales = "fixed", space = "fixed") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,8000,2000), limits = c(0,10000)) +
  scale_fill_manual(values = sce$colours$genetic_donor_colours) +
  theme(axis.title.y = element_blank()) +
  coord_flip()

(p1 + p1.facet+plot_layout(width=c(1,2))) /
  (p2 + p2.facet+plot_layout(width=c(1,2))) +
  plot_layout(guides="collect")

7.3 Proportion of droplets (named by HTO and genetic donor)

# proportion of genetically assigned droplets in each HTO
p3 <- ggcells(sce) + 
  geom_bar(
    aes(x = HTODemux.ID, fill = genetic_donor),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 8) + 
  scale_fill_manual(values = sce$colours$genetic_donor_colours)

# proportion of HTO assigned droplets in each genetic donor
p4 <- ggcells(sce) + 
  geom_bar(
    aes(x = genetic_donor, fill = HTODemux.ID),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 8) + 
  scale_fill_manual(values = sce$colours$hto_colours)

(p3 + p3 + facet_grid(~Capture) + plot_layout(widths = c(1, 2))) /
  (p4 + p4 + facet_grid(~Capture) + plot_layout(widths = c(1, 2))) +
  plot_layout(guides = "collect")

7.4 Number of droplets (named by sample ID)

# add colour
sampleID.HTO_colours <- setNames(
  c(palette.colors(nlevels(sce$sampleID.HTO)-2, "Tableau 10"), "#ced4da","#6c757d"),
  levels(sce$sampleID.HTO))
sce$colours$sampleID.HTO_colours <- sampleID.HTO_colours[sce$sampleID.HTO]

sampleID.genetics_colours <- setNames(
  c(palette.colors(nlevels(sce$sampleID.genetics)-2, "Set3"), "#d4a276","#9c6644"),
  levels(sce$sampleID.genetics))
sce$colours$sampleID.genetics_colours <- sampleID.genetics_colours[sce$sampleID.genetics]

# number of droplets assigned by HTO method
p5 <- ggcells(sce) + 
  geom_bar(aes(x = sampleID.HTO, fill = sampleID.HTO)) + 
  geom_text(stat='count', aes(x = sampleID.HTO, label=..count..), hjust=1, size=2) +
  coord_flip() + 
  ggtitle("By HTO") +
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,16000,4000),limits=c(0,18000)) +
  scale_fill_manual(values = sce$colours$sampleID.HTO_colours) +
  theme(axis.title.y = element_blank()) +
  guides(fill=guide_legend(title="sampleID"))

p5.facet <- ggcells(sce) + 
  geom_bar(aes(x = sampleID.HTO, fill = sampleID.HTO)) + 
  geom_text(stat='count', aes(x = sampleID.HTO, label=..count..), 
            hjust=1, size=2) +
  #ggtitle("By HTO") +
  ylab("Number of droplets") + 
  facet_grid(~Capture, scales = "fixed", space = "fixed") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,8000,2000), limits = c(0,10000)) +
  scale_fill_manual(values = sce$colours$sampleID.HTO_colours) +
  theme(axis.title.y = element_blank()) +
  guides(fill=guide_legend(title="sampleID")) +
    coord_flip()

# number of droplets assigned by genetic method
p6 <- ggcells(sce) + 
  geom_bar(aes(x = sampleID.genetics, fill = sampleID.genetics)) + 
  geom_text(stat='count', aes(x = sampleID.genetics, label=..count..), hjust=1, size=2) +
  coord_flip() + 
  ggtitle("By genetics") +
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,16000,4000), limits = c(0,18000)) +
  scale_fill_manual(values = sce$colours$sampleID.genetics_colours) +
  theme(axis.title.y = element_blank()) +
  guides(fill=FALSE)

p6.facet <- ggcells(sce) + 
  geom_bar(aes(x = sampleID.genetics, fill = sampleID.genetics)) + 
  geom_text(stat='count', aes(x = sampleID.genetics, label=..count..), 
            hjust=1, size=2) +
  ylab("Number of droplets") + 
  facet_grid(~Capture, scales = "fixed", space = "fixed") + 
  theme_cowplot(font_size = 8) + 
  scale_y_continuous(breaks=seq(0,8000,2000), limits = c(0,10000)) +
  scale_fill_manual(values = sce$colours$sampleID.genetics_colours) +
  theme(axis.title.y = element_blank()) +
  guides(fill=guide_legend(title="sampleID")) +
  coord_flip()

(p5+p5.facet+plot_layout(width=c(1,2))) /
  (p6+p6.facet+plot_layout(width=c(1,2))) +
  plot_layout(guides="collect")

7.5 Proportion of droplets (named by sample ID)

# proportion of genetically assigned droplets in each HTO
p7 <- ggcells(sce) + 
  geom_bar(
    aes(x = sampleID.HTO, fill = sampleID.genetics),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 8) + 
  scale_fill_manual(values = sce$colours$sampleID.genetics_colours)

# proportion of HTO assigned droplets in each genetic donor
p8 <- ggcells(sce) + 
  geom_bar(
    aes(x = sampleID.genetics, fill = sampleID.HTO),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 8) + 
  scale_fill_manual(values = sce$colours$sampleID.HTO_colours)

((p7 + ggtitle("By HTO")) +
    p7 + facet_grid(~Capture) + plot_layout(widths = c(1, 2))) /
  ((p8 + ggtitle("By genetics")) + 
     p8 + facet_grid(~Capture) + plot_layout(widths = c(1, 2))) +
  plot_layout(guides = "collect")

8 Save object

demux_dir <- here("data","SCEs","demux")
if(!dir.exists(demux_dir)) {
  dir.create(demux_dir, recursive = TRUE)
}

out <- paste0(demux_dir,'/',
              paste0(batch_name,".cellbender.demux.SCE.rds"))

if(!file.exists(out)) saveRDS(sce, out)

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /hpc/software/installed/R/4.1.2/lib64/R/lib/libRblas.so
LAPACK: /hpc/software/installed/R/4.1.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] stringr_1.5.1               janitor_2.1.0              
 [3] vcfR_1.12.0                 dplyr_1.1.4                
 [5] scater_1.22.0               scuttle_1.4.0              
 [7] patchwork_1.2.0             cowplot_1.1.3              
 [9] SeuratObject_5.0.1          Seurat_4.4.0               
[11] ggplot2_3.4.0               here_1.0.1                 
[13] DropletUtils_1.14.2         SingleCellExperiment_1.16.0
[15] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[17] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
[19] IRanges_2.28.0              S4Vectors_0.32.4           
[21] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[23] matrixStats_1.1.0           workflowr_1.7.0            

loaded via a namespace (and not attached):
  [1] utf8_1.2.4                spatstat.explore_3.2-7   
  [3] reticulate_1.36.1.9000    R.utils_2.12.2           
  [5] tidyselect_1.2.1          htmlwidgets_1.6.4        
  [7] grid_4.1.2                BiocParallel_1.28.3      
  [9] Rtsne_0.17                munsell_0.5.1            
 [11] ScaledMatrix_1.2.0        codetools_0.2-18         
 [13] ica_1.0-3                 future_1.33.2            
 [15] miniUI_0.1.1.1            withr_3.0.0              
 [17] spatstat.random_3.2-3     colorspace_2.1-0         
 [19] progressr_0.14.0          highr_0.10               
 [21] knitr_1.46                rstudioapi_0.13          
 [23] ROCR_1.0-11               tensor_1.5               
 [25] listenv_0.9.1             labeling_0.4.3           
 [27] git2r_0.31.0              GenomeInfoDbData_1.2.7   
 [29] polyclip_1.10-6           pheatmap_1.0.12          
 [31] farver_2.1.1              rhdf5_2.38.1             
 [33] rprojroot_2.0.4           parallelly_1.37.1        
 [35] vctrs_0.6.5               generics_0.1.3           
 [37] xfun_0.43                 R6_2.5.1                 
 [39] ggbeeswarm_0.6.0          rsvd_1.0.5               
 [41] locfit_1.5-9.4            bitops_1.0-7             
 [43] rhdf5filters_1.6.0        spatstat.utils_3.0-4     
 [45] DelayedArray_0.20.0       promises_1.3.0           
 [47] scales_1.3.0              pinfsc50_1.2.0           
 [49] beeswarm_0.4.0            gtable_0.3.5             
 [51] beachmat_2.10.0           globals_0.16.3           
 [53] processx_3.8.0            goftest_1.2-3            
 [55] xaringanExtra_0.7.0       spam_2.10-0              
 [57] rlang_1.1.3               splines_4.1.2            
 [59] lazyeval_0.2.2            spatstat.geom_3.2-9      
 [61] yaml_2.3.8                reshape2_1.4.4           
 [63] abind_1.4-5               httpuv_1.6.15            
 [65] tools_4.1.2               ellipsis_0.3.2           
 [67] jquerylib_0.1.4           RColorBrewer_1.1-3       
 [69] ggridges_0.5.6            Rcpp_1.0.12              
 [71] plyr_1.8.9                sparseMatrixStats_1.6.0  
 [73] zlibbioc_1.40.0           purrr_1.0.1              
 [75] RCurl_1.98-1.9            ps_1.7.2                 
 [77] deldir_2.0-4              pbapply_1.7-2            
 [79] viridis_0.6.2             zoo_1.8-12               
 [81] ggrepel_0.9.1             cluster_2.1.2            
 [83] fs_1.6.4                  magrittr_2.0.3           
 [85] data.table_1.15.4         scattermore_1.2          
 [87] lmtest_0.9-40             RANN_2.6.1               
 [89] whisker_0.4               fitdistrplus_1.1-11      
 [91] mime_0.12                 evaluate_0.23            
 [93] xtable_1.8-4              gridExtra_2.3            
 [95] compiler_4.1.2            tibble_3.2.1             
 [97] KernSmooth_2.23-20        R.oo_1.24.0              
 [99] htmltools_0.5.8.1         mgcv_1.8-39              
[101] later_1.3.2               tidyr_1.2.0              
[103] lubridate_1.8.0           MASS_7.3-55              
[105] Matrix_1.6-5              permute_0.9-7            
[107] cli_3.6.2                 R.methodsS3_1.8.1        
[109] parallel_4.1.2            dotCall64_1.1-1          
[111] igraph_2.0.3              pkgconfig_2.0.3          
[113] getPass_0.2-2             sp_2.1-3                 
[115] plotly_4.10.4.9000        spatstat.sparse_3.0-3    
[117] memuse_4.2-1              vipor_0.4.5              
[119] bslib_0.3.1               dqrng_0.3.2              
[121] XVector_0.34.0            snakecase_0.11.0         
[123] callr_3.7.3               digest_0.6.35            
[125] sctransform_0.4.1         RcppAnnoy_0.0.19         
[127] vegan_2.5-7               spatstat.data_3.0-4      
[129] rmarkdown_2.26            leiden_0.4.3.1           
[131] uwot_0.1.14               edgeR_3.36.0             
[133] DelayedMatrixStats_1.16.0 shiny_1.8.1.1            
[135] lifecycle_1.0.4           nlme_3.1-155             
[137] jsonlite_1.8.8            Rhdf5lib_1.16.0          
[139] BiocNeighbors_1.12.0      viridisLite_0.4.2        
[141] limma_3.50.3              fansi_1.0.6              
[143] pillar_1.9.0              lattice_0.20-45          
[145] ggrastr_1.0.1             fastmap_1.1.1            
[147] httr_1.4.7                survival_3.2-13          
[149] glue_1.7.0                png_0.1-8                
[151] stringi_1.8.3             sass_0.4.9               
[153] HDF5Array_1.22.1          BiocSingular_1.10.0      
[155] ape_5.6-1                 irlba_2.3.5.1            
[157] future.apply_1.11.2

Demultiplexing the BAL pool: BP1

Anson Wong

2025-03-18

1 Load packages

2 Set names for this pool

3 Read CellBender outputs

4 HTO demultiplexing

4.1 Prepare HTO data

4.2 HTO Demultiplexing

4.3 Add HTO demultiplexing results to SCE

5 Visualize HTO demultiplexing results

5.1 Number of singlets

BP1-c1

BP1-c2

BP1-c3

5.2 HTO expression in ridge plot

BP1-c1

BP1-c2

BP1-c3

5.3 HTO expression in heatmap

BP1-c1

BP1-c2

BP1-c3

5.4 tSNE embeddings for HTOs

BP1-c1

BP1-c2

BP1-c3

5.5 Number of demultiplexed droplets

Named by HTO

Named by sample ID

5.6 Proportion of singlets per capture

6 Genetic demultiplexing

6.1 Read vireo results

6.2 Summarise vireo results

6.3 Identify the best match between donors from the combination of captures

6.4 Add genetic demultiplexing results to SCE

7 Visualize genetic demultiplexing results

7.1 Table of number of droplets

7.2 Number of droplets (named by HTO and genetic donor)

7.3 Proportion of droplets (named by HTO and genetic donor)

7.4 Number of droplets (named by sample ID)

7.5 Proportion of droplets (named by sample ID)

8 Save object