• Introduction
  • Load libraries
  • Load data
  • Harmony with kNN
    • Supercell
    • Singlecell
  • rPCA
    • Supercell
    • Single cell
  • References

In this analysis, we explore the potential of implementing a cell type label transfer workflow from a CITEseq dataset onto a collection of supercells derived from a cytometry data.

The reference CITEseq data used in this study is obtained from a bone marrow sample of a healthy adult, quantified using AbSeq (Triana et al. 2021)

For cytometry data, we utilised a healthy bone marrow sample (Levine et al. 2015) from a benchmarking study on clustering (Weber and Robinson 2016).

The label transfer workflow is as the following. For the cytometry data, we began with transforming markers using an arcsinh transformation with a co-factor of 5, while for the CITEseq data, we used Centered Log Ratio (CLR) transformation. Subsequently, SuperCellCyto was applied with a gamma value of 20. Following this, for both the supercells and CITEseq data, we retain only the common proteins/markers. Lastly, we utilised either the Seurat rPCA (Hao et al. 2021) or Harmony (Korsunsky et al. 2019) and k-Nearest Neighbor (kNN) methods to perform the label tranfer. For the latter, Harmony was employed for integration the supercell with the CITEseq data, while kNN was employed assigning supercell with the cell type annotation in the CITEseq data.

The scripts necessary to replicate the workflow are available in code/label_transfer directory.

The ensuing results are derived from the aforementioned workflow application.

Load libraries


Load data

harmony_res <- fread(here("output", "label_transfer", "harmony_knn.csv"))
rpca_res <- fread(here("output", "label_transfer", "seurat_rPCA.csv"))
rpca_res_singlecell <- fread(here("output", "label_transfer", "seurat_rPCA_singlecell.csv"))
harmony_res_singlecell <- fread(here("output", "label_transfer", "harmony_knn_singlecell.csv"))

We shall remove the unassigned cells as we don’t know their identity.

harmony_res <- harmony_res[Gated_Population != "unassigned"]
rpca_res <- rpca_res[Gated_Population != "unassigned"]
rpca_res_singlecell <- rpca_res_singlecell[Gated_Population != "unassigned"]
harmony_res_singlecell <- harmony_res_singlecell[Gated_Population != "unassigned"]

Harmony with kNN


conf_mat_harmony_supercell <- with(harmony_res, table(predicted_population, Gated_Population))
conf_mat_proportion <- sweep(conf_mat_harmony_supercell, 2, colSums(conf_mat_harmony_supercell), "/")
conf_mat_harmony_supercell_dt <- data.table(conf_mat_proportion)
conf_mat_harmony_supercell_dt <- conf_mat_harmony_supercell_dt[order(Gated_Population, predicted_population)]
conf_mat_harmony_supercell_dt <- conf_mat_harmony_supercell_dt[N > 0]
ggplot(conf_mat_harmony_supercell_dt, aes(x=Gated_Population, y=predicted_population)) +
  geom_point(aes(size = N, fill = N), pch=21, color="grey") +
  scale_fill_distiller(palette = "RdBu", direction = -1) +
  theme_minimal() +
    panel.border = element_rect(colour = "black", fill=NA, linewidth=0.5),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
    panel.grid.major = element_blank()
  ) +
  labs(x = "Actual Label", y = "Predicted Label", size = "Proportion", fill = "Proportion",
       title = "Harmony Combined with kNN")

conf_mat_harmony_singlecell <- with(harmony_res_singlecell, table(predicted_population, Gated_Population))
conf_mat_proportion <- sweep(conf_mat_harmony_singlecell, 2, colSums(conf_mat_harmony_singlecell), "/")
conf_mat_harmony_singlecell_dt <- data.table(conf_mat_proportion)
conf_mat_harmony_singlecell_dt <- conf_mat_harmony_singlecell_dt[order(Gated_Population, predicted_population)]
conf_mat_harmony_singlecell_dt <- conf_mat_harmony_singlecell_dt[N > 0]
ggplot(conf_mat_harmony_singlecell_dt, aes(x=Gated_Population, y=predicted_population)) +
  geom_point(aes(size = N, fill = N), pch=21, color="grey") +
  scale_fill_distiller(palette = "RdBu", direction = -1) +
  theme_minimal() +
    panel.border = element_rect(colour = "black", fill=NA, linewidth=0.5),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
    panel.grid.major = element_blank()
  ) +
  labs(x = "Actual Label", y = "Predicted Label", size = "Proportion", fill = "Proportion",
       title = "Harmony Combined with kNN Single Cell")



conf_mat_rpca_supercell <- with(rpca_res, table(predicted_population, Gated_Population))
conf_mat_proportion <- sweep(conf_mat_rpca_supercell, 2, colSums(conf_mat_rpca_supercell), "/")
conf_mat_rpca_supercell_dt <- data.table(conf_mat_proportion)
conf_mat_rpca_supercell_dt <- conf_mat_rpca_supercell_dt[order(Gated_Population, predicted_population)]
conf_mat_rpca_supercell_dt <- conf_mat_rpca_supercell_dt[N > 0]
ggplot(conf_mat_rpca_supercell_dt, aes(x=Gated_Population, y=predicted_population)) +
  geom_point(aes(size = N, fill = N), pch=21, color="grey") +
  scale_fill_distiller(palette = "RdBu", direction = -1) +
  theme_minimal() +
    panel.border = element_rect(colour = "black", fill=NA, linewidth=0.5),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
    panel.grid.major = element_blank()
  ) +
  labs(x = "Actual Label", y = "Predicted Label", size = "Proportion", fill = "Proportion",
       title = "Seurat rPCA")

Single cell

conf_mat_rpca_singlecell <- with(rpca_res_singlecell, table(predicted_population, Gated_Population))
conf_mat_proportion <- sweep(conf_mat_rpca_singlecell, 2, colSums(conf_mat_rpca_singlecell), "/")
conf_mat_rpca_singlecell_dt <- data.table(conf_mat_proportion)
conf_mat_rpca_singlecell_dt <- conf_mat_rpca_singlecell_dt[order(Gated_Population, predicted_population)]
conf_mat_rpca_singlecell_dt <- conf_mat_rpca_singlecell_dt[N > 0]
ggplot(conf_mat_rpca_singlecell_dt, aes(x=Gated_Population, y=predicted_population)) +
  geom_point(aes(size = N, fill = N), pch=21, color="grey") +
  scale_fill_distiller(palette = "RdBu", direction = -1) +
  theme_minimal() +
    panel.border = element_rect(colour = "black", fill=NA, linewidth=0.5),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
    panel.grid.major = element_blank()
  ) +
  labs(x = "Actual Label", y = "Predicted Label", size = "Proportion", fill = "Proportion",
       title = "Seurat rPCA Single Cell")


