The previous analysis applied different matrix factorization approaches to the full pancreas data set. A key challenge in analyzing the full pancreas data set is that there are large batch or data-set effects, which some matrix factorization approaches have difficulty dealing with (particularly the topic model). Here we look more closely at a couple of the individual data sets to highlight better how the different factorizations yield different representations of the underlying structure in the cells without the added complication of dealing with the batch effects.
First, load the packages needed for this analysis.
Set the seed for reproducibility.
We start with the “CEL-Seq2” data from the Muraro et al 2016 paper (The data were generated using the CEL-Seq2 protocol.)
Load the CEL-Seq2 pancreas data and the outputs generated by running
the compute_pancreas_celseq2_factors.R
i <- which(sample_info$tech == "celseq2")
sample_info <- sample_info[i,]
counts <- counts[i,]
sample_info <- transform(sample_info,celltype = factor(celltype))
