Introduction

Load libraries

Young and old female and male PBMCs

Broad cell types analysis
Refined cell types analysis
Young vs Old
Male vs female

Heart development analysis

Anova test with propeller.anova
Modelling development as a continuous variable

COVID data

Figure 3

Last updated: 2022-08-18

Checks: 7 0

Knit directory: propeller-paper-analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20220531)

The command set.seed(20220531) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 2586453

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 2586453. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/cold_warm_fresh_cellinfo.txt
    Ignored:    data/covid.cell.annotation.meta.txt
    Ignored:    data/heartFYA.Rds
    Ignored:    data/pool_1.rds

Untracked files:
    Untracked:  analysis/PlotsForPaper.Rmd
    Untracked:  code/SimCode.R
    Untracked:  code/SimCodeTrueDiff.R
    Untracked:  code/auroc.R
    Untracked:  code/convertData.R
    Untracked:  data/CTpropsTransposed.txt
    Untracked:  data/CelltypeLevels.csv
    Untracked:  data/TypeIErrTables.Rdata
    Untracked:  data/appnote1cdata.rdata
    Untracked:  data/cellinfo.csv
    Untracked:  data/nullsimsVaryN_results.Rdata
    Untracked:  data/sampleinfo.csv
    Untracked:  output/1x/
    Untracked:  output/Fig1ab.pdf
    Untracked:  output/Fig1cde.pdf
    Untracked:  output/Fig2ab.pdf
    Untracked:  output/Fig2abc.pdf
    Untracked:  output/Fig2c.pdf
    Untracked:  output/Figure1.pdf
    Untracked:  output/Figure1.png
    Untracked:  output/Figure2-01.png
    Untracked:  output/Figure2-with#.pdf
    Untracked:  output/Figure2-with#.png
    Untracked:  output/Figure2.ai
    Untracked:  output/Figure2.pdf
    Untracked:  output/Figure2.png
    Untracked:  output/Figure2_E_annotatedwithProp.pdf
    Untracked:  output/Figure3.pdf
    Untracked:  output/Figure3.png
    Untracked:  output/PDF/
    Untracked:  output/SuppFig4.pdf
    Untracked:  output/SuppFig4.png
    Untracked:  output/SuppTrueDiff10.pdf
    Untracked:  output/SuppTrueDiff20.pdf
    Untracked:  output/SuppTrueDiff3.pdf
    Untracked:  output/Supplementary Figure 2 v2.png
    Untracked:  output/Supplementary Figure 2.ai
    Untracked:  output/Supplementary Figure 2.pdf
    Untracked:  output/Supplementary Figure 2.png
    Untracked:  output/SupplementaryFigure3.pdf
    Untracked:  output/TrueDiffSimResults.Rda
    Untracked:  output/covidResults.pdf
    Untracked:  output/example_simdata.pdf
    Untracked:  output/extremeCaseTrueProps20CT.pdf
    Untracked:  output/fig2d.pdf
    Untracked:  output/gude-2022-06-27.log
    Untracked:  output/gude-2022-06-29.log
    Untracked:  output/heartResults.pdf
    Untracked:  output/heatmap20CT.pdf
    Untracked:  output/legend-fig2d.pdf
    Untracked:  output/pbmcOldYoungResults.pdf
    Untracked:  output/type1error5.csv
    Untracked:  output/typeIerrorResults.Rda

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/RealDataAnalysis.Rmd) and HTML (docs/RealDataAnalysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	2586453	bphipson	2022-08-18	update analysis scripts
html	1613612	bphipson	2022-06-06	Build site.
Rmd	622390e	bphipson	2022-06-06	update real data analysis
html	a39985a	bphipson	2022-06-03	Build site.
Rmd	cf6f450	bphipson	2022-06-03	add real data analysis

Introduction

We analysed three different publicly available single cell datasets to highlight the different types of models that can be fitted within the propeller framework.

Young and old female and male PBMCs
- Huang, Zhaohao, Binyao Chen, Xiuxing Liu, He Li, Lihui Xie, Yuehan Gao, Runping Duan, et al. 2021. “Effects of Sex and Aging on the Immune Cell Landscape as Assessed by Single-Cell Transcriptomic Analysis.” Proceedings of the National Academy of Sciences of the United States of America 118 (33). https://doi.org/10.1073/pnas.2023216118.
Healthy human heart biopsies across development
- Sim, Choon Boon, Belinda Phipson, Mark Ziemann, Haloom Rafehi, Richard J. Mills, Kevin I. Watt, Kwaku D. Abu-Bonsrah, et al. 2021. “Sex-Specific Control of Human Heart Maturation by the Progesterone Receptor.” Circulation 143 (16): 1614–28.
Bronchoalveolar lavage fluid in a COVID19 dataset
- Liao, Mingfeng, Yang Liu, Jing Yuan, Yanling Wen, Gang Xu, Juanjuan Zhao, Lin Cheng, et al. 2020. “Single-Cell Landscape of Bronchoalveolar Immune Cells in Patients with COVID-19.” Nature Medicine 26 (6): 842–44.

For all three datasets we use the logit transformation, except for the covid dataset. In the covid dataset, there is clearly an outlier for the plasma cell type. Using the logit transformation, plasma is found to be significantly enriched in severe covid, but using the arcsin square root transform shows that it is not significantly enriched. While logit may be more powerful according to the simulations, it appears sensitive to outliers, and in this case we would prefer to use the arcsin square root transform.

Load libraries

library(speckle)
library(limma)
library(edgeR)
library(pheatmap)
library(gt)

source("./code/convertData.R")

Young and old female and male PBMCs

This dataset was published in PNAS in 2021 and examined PBMCs from 20 individuals. The dataset was balanced in terms of age and sex (5 samples in each group: Male + Young, Male + Old, Female + Young, Female + Old).

The data analysis reported in the paper was ANOVA directly on cell type proportions modelling sex and age plus interaction. It was not clear from the description whether separate models were fitted for sex and age, removing the interaction term in order to interpret the effects for sex and age correctly.

From the supplementary data I have extracted the cell type proportions and using information from the paper I have converted the proportions into the necessary data type (a list object) to analyse with propeller.

Broad cell types analysis

sexCT <- read.delim("./data/CTpropsTransposed.txt", row.names = 1)
sexprops <- sexCT/100
prop.list <- convertDataToList(sexprops,data.type="proportions", transform="asin",
                               scale.fac=174684/20)

sampleinfo <- read.csv("./data/sampleinfo.csv", nrows = 20)
celltypes <- read.csv("./data/CelltypeLevels.csv")
group.immune <- paste(sampleinfo$Sex, sampleinfo$Age, sep=".")

celltypes$Celltype_L0 <- celltypes$Celltype_L1
celltypes$Celltype_L0[celltypes$Celltype_L1 == "CD4" | celltypes$Celltype_L1 == "CD8" | celltypes$Celltype_L1 == "CD4-CD8-" | celltypes$Celltype_L1 == "CD4+CD8+" | celltypes$Celltype_L1 == "T-mito"] <- "TC"

gt(data.frame(table(sampleinfo$Sex, sampleinfo$Age)), rownames_to_stub = TRUE, caption="Sample info for aging dataset")

Sample info for aging dataset
	Var1	Var2	Freq
1	female	old	5
2	male	old	5
3	female	young	5
4	male	young	5

levels(factor(celltypes$Celltype_L0))

[1] "BC" "DC" "MC" "NK" "TC"

sexprops.broad <-  matrix(NA,nrow=length(levels(factor(celltypes$Celltype_L0))), ncol=ncol(sexprops)) 
rownames(sexprops.broad) <- levels(factor(celltypes$Celltype_L0))
colnames(sexprops.broad) <- colnames(sexprops)

for(i in 1:ncol(sexprops.broad)) sexprops.broad[,i] <- tapply(sexprops[,i],celltypes$Celltype_L0,sum)

prop.broad.list <- convertDataToList(sexprops.broad,data.type="proportions", transform="asin", scale.fac=174684/20)

par(mfrow=c(1,5))
for(i in 1:nrow(sexprops.broad)){
  stripchart(as.numeric(sexprops.broad[i,])~group.immune,
           vertical=TRUE, pch=c(8,16), method="jitter",
           col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
           ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
           group.names=c("F.Old","F.Young","M.Old","M.Young"))
  title(rownames(sexprops.broad)[i], cex.main=1.5, adj=0)
}

designAS <- model.matrix(~0+sampleinfo$Age + sampleinfo$Sex)
colnames(designAS) <- c("old","young","MvsF")

# Young vs old
mycontr <- makeContrasts(young-old, levels=designAS)
propeller.ttest(prop.list = prop.broad.list,design = designAS, contrasts = mycontr,
                robust=TRUE,trend=FALSE,sort=TRUE)

   PropMean.old PropMean.young PropRatio Tstatistic     P.Value        FDR
MC   0.19656790     0.14320118 0.7285074 -3.0721365 0.006146365 0.03073183
TC   0.53445740     0.60263724 1.1275684  1.9680058 0.063474712 0.15868678
NK   0.15808995     0.13347185 0.8442779 -1.0554668 0.304133538 0.50688923
BC   0.09432895     0.10249865 1.0866086  0.6569470 0.518905633 0.64863204
DC   0.01655580     0.01819108 1.0987734  0.4041554 0.690503393 0.69050339

designSex <- model.matrix(~0+sampleinfo$Sex + sampleinfo$Age)
colnames(designSex) <- c("female","male","YvO")

# Male vs female
mycontr <- makeContrasts(male-female, levels=designSex)
propeller.ttest(prop.list = prop.broad.list,design = designSex, contrasts = mycontr,
                robust=TRUE,trend=FALSE,sort=TRUE)

   PropMean.female PropMean.male PropRatio Tstatistic    P.Value       FDR
NK       0.1169176    0.17464421 1.4937377  2.3732661 0.02805937 0.1402968
TC       0.5973764    0.53971825 0.9034811 -1.6579946 0.11334335 0.2833584
DC       0.0157718    0.01897507 1.2031011  1.0868808 0.29036247 0.4839375
BC       0.1011660    0.09566164 0.9455912 -0.3897772 0.70093229 0.8761654
MC       0.1687683    0.17100082 1.0132286  0.1074843 0.91550468 0.9155047

Refined cell types analysis

Young vs Old

Set up design matrix using a means model, taking into account age and sex.

designAS <- model.matrix(~0+sampleinfo$Age + sampleinfo$Sex)
colnames(designAS) <- c("old","young","MvsF")

# Young vs old
mycontr <- makeContrasts(young-old, levels=designAS)
propeller.ttest(prop.list = prop.list,design = designAS, contrasts = mycontr,
                robust=TRUE,trend=FALSE,sort=TRUE)

          PropMean.old PropMean.young PropRatio Tstatistic      P.Value
CD8.Naive 0.0251316033   0.0901386289 3.5866645  4.5220310 0.0002519973
CD16      0.0346697484   0.0184110417 0.5310405 -3.9241289 0.0009547072
TREG      0.0224289338   0.0173984106 0.7757128 -2.7351400 0.0133315595
T-mito    0.0050757865   0.0030999505 0.6107330 -2.6230363 0.0169438969
INTER     0.0237901860   0.0160354568 0.6740366 -2.4875531 0.0225506463
CD4-CD8-  0.0160339954   0.0301971880 1.8833227  2.2851533 0.0343944316
CD14      0.1381079661   0.1087546814 0.7874613 -2.0325551 0.0568152857
ABC       0.0042268748   0.0024242272 0.5735271 -1.6729544 0.1110894015
CD4.Naive 0.1048008357   0.1320789977 1.2602857  1.4434939 0.1657237113
CD8.CTL   0.1154952554   0.0782456785 0.6774796 -1.2843478 0.2149863850
CD8.Tem   0.0639345157   0.0802785262 1.2556367  1.2590561 0.2237791504
MBC       0.0358736549   0.0452983191 1.2627183  1.2537215 0.2256688587
CDC1      0.0005479607   0.0008647764 1.5781725  1.0625556 0.3015992495
NK3       0.0886767142   0.0704022112 0.7939199 -0.8885615 0.3857294967
pre-DC    0.0002285096   0.0003193005 1.3973177  0.8863058 0.3867803942
NK1       0.0125858088   0.0102540186 0.8147286 -0.8762818 0.3920651391
PC        0.0054708831   0.0049397487 0.9029162 -0.8219337 0.4215370844
CD4.Tem   0.0873700973   0.0832684783 0.9530547 -0.4812340 0.6360341593
CD4+CD8+  0.0142514516   0.0133167354 0.9344126 -0.3026144 0.7655453386
NBC       0.0487575349   0.0498363513 1.0221261  0.2928181 0.7729439779
PDC       0.0031666984   0.0034597825 1.0925519  0.2888826 0.7758695169
CDC2      0.0126126342   0.0135472159 1.0740989  0.2557422 0.8009596764
NK2       0.0568274310   0.0528156246 0.9294037 -0.1699836 0.8668815789
CD4.Tcm   0.0799349205   0.0746146497 0.9334425 -0.1236681 0.9029215277
                  FDR
CD8.Naive 0.006047935
CD16      0.011456486
TREG      0.101663382
T-mito    0.101663382
INTER     0.108243102
CD4-CD8-  0.137577726
CD14      0.194795265
ABC       0.333268204
CD4.Naive 0.441929897
CD8.CTL   0.451337717
CD8.Tem   0.451337717
MBC       0.451337717
CDC1      0.556798615
NK3       0.588097709
pre-DC    0.588097709
NK1       0.588097709
PC        0.595111178
CD4.Tem   0.848045546
CD4+CD8+  0.873774192
NBC       0.873774192
PDC       0.873774192
CDC2      0.873774192
NK2       0.902921528
CD4.Tcm   0.902921528

Visualise significant cell types:

par(mfrow=c(1,2))
stripchart(as.numeric(sexprops["CD8.Naive",])~group.immune,
           vertical=TRUE, pch=c(8,16), method="jitter",
           col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
           ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
           group.names=c("F.Old","F.Young","M.Old","M.Young"))
title("CD8.Naive: Young Vs Old", cex.main=1.5, adj=0)
text(3.2,0.18, labels = "Adj.Pval = 0.004")

stripchart(as.numeric(sexprops["CD16",])~group.immune,
           vertical=TRUE, pch=c(8,16), method="jitter",
           col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
           ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
           group.names=c("F.Old","F.Young","M.Old","M.Young"))
title("CD16: Young Vs Old", cex.main=1.5, adj=0)
text(2.2,0.049, labels = "Adj.Pval = 0.023")

pbmc.oldyoung <- propeller.ttest(prop.list = prop.list,design = designAS, contrasts = mycontr,
                robust=TRUE,trend=FALSE,sort=TRUE)
sig.pbmc <- rownames(pbmc.oldyoung)

pdf(file="output/pbmcOldYoungResults.pdf",width = 13, height=13)
par(mfrow=c(6,4))
par(mar=c(4,5,2,2))
for(i in 1:length(sig.pbmc)){
stripchart(as.numeric(sexprops[sig.pbmc[i],])~group.immune,
           vertical=TRUE, pch=c(8,16), method="jitter",
           col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
           ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
           group.names=c("F.Old","F.Young","M.Old","M.Young"))
title(paste(sig.pbmc[i],": Young Vs Old", sep=""), cex.main=1.5, adj=0)
legend("top", legend = paste("Adj.Pval = ",round(pbmc.oldyoung$FDR,3)[i],sep=""),cex=1.5,bty="n",bg="n")  
}
dev.off()

png 
  2

Male vs female

designSex <- model.matrix(~0+sampleinfo$Sex + sampleinfo$Age)
colnames(designSex) <- c("female","male","YvO")

# Male vs female
mycontr <- makeContrasts(male-female, levels=designSex)
propeller.ttest(prop.list = prop.list,design = designSex, contrasts = mycontr,
                robust=TRUE,trend=FALSE,sort=TRUE)

          PropMean.female PropMean.male PropRatio Tstatistic    P.Value
PC           0.0071388502   0.003271782 0.4583065 -2.9676619 0.00804369
CD4-CD8-     0.0169315724   0.029299611 1.7304720  1.7466913 0.09739463
CDC2         0.0114064513   0.014753399 1.2934258  1.7028982 0.10525677
pre-DC       0.0001775081   0.000370302 2.0861130  1.6922855 0.10729321
NK2          0.0457752659   0.063867790 1.3952467  1.5792666 0.13134444
T-mito       0.0034714265   0.004704310 1.3551520  1.5108683 0.14764218
NK3          0.0607526719   0.098326253 1.6184680  1.3381691 0.19717535
CD8.CTL      0.1180843551   0.075656579 0.6406994 -1.2934355 0.21189365
NK1          0.0103896562   0.012450171 1.1983237  1.0360062 0.31349508
TREG         0.0209130057   0.018914339 0.9044295 -0.9699518 0.34452822
PDC          0.0035994459   0.003027035 0.8409725 -0.8723142 0.39416997
CDC1         0.0005883982   0.000824339 1.4009883  0.8506963 0.40576781
CD16         0.0281162892   0.024964501 0.8879017 -0.7867408 0.44141885
CD4.Tcm      0.0843546548   0.070194915 0.8321404 -0.7086577 0.48743577
CD8.Tem      0.0747782768   0.069434765 0.9285419 -0.6242353 0.54015091
CD8.Naive    0.0547935108   0.060476721 1.1037205  0.6099297 0.54938164
CD4.Naive    0.1243810258   0.112498808 0.9044692 -0.6072558 0.55111630
CD4+CD8+     0.0134400072   0.014128180 1.0512033  0.4069711 0.68867913
ABC          0.0035141234   0.003136979 0.8926774 -0.3947226 0.69754192
CD14         0.1200231846   0.126839463 1.0567913  0.3939839 0.69813009
INTER        0.0206287843   0.019196858 0.9305860 -0.3590350 0.72361649
CD4.Tem      0.0862285523   0.084410023 0.9789104 -0.2035515 0.84094249
NBC          0.0497589512   0.048834935 0.9814301  0.1775313 0.86103455
MBC          0.0407540321   0.040417942 0.9917532 -0.1763579 0.86194297
                FDR
PC        0.1930486
CD4-CD8-  0.5905687
CDC2      0.5905687
pre-DC    0.5905687
NK2       0.5905687
T-mito    0.5905687
NK3       0.6356810
CD8.CTL   0.6356810
NK1       0.7780465
TREG      0.7780465
PDC       0.7780465
CDC1      0.7780465
CD16      0.7780465
CD4.Tcm   0.7780465
CD8.Tem   0.7780465
CD8.Naive 0.7780465
CD4.Naive 0.7780465
CD4+CD8+  0.8269903
ABC       0.8269903
CD14      0.8269903
INTER     0.8269903
CD4.Tem   0.8619430
NBC       0.8619430
MBC       0.8619430

Heart development analysis

This dataset was published by Sim et al in 2021 and looked at heart development across fetal, young and adult developmental time points. There are three samples at each developmental time point. There is also mix of male and female samples and one of the key findings of the study was transcriptional differences in cardiomyocyte development between males and females.

Here we look at differences in cellular composition of human hearts across the developmental trajectory, taking sex into account.

We can simply perform an anova test to find any differences between the three time points. The propeller.anova function can be called to do this directly.

We can also look at development as a continuous trajectory and model development as a continuous variable by getting the transformed proportions and using fitting functions in limma directly.

Both of these analysis approaches are shown below.

heart.info <- read.csv(file="./data/cellinfo.csv", row.names = 1)
heart.counts <- table(heart.info$Celltype, heart.info$Sample)
trueprops <- rowSums(heart.counts)/sum(rowSums(heart.counts))

heart.info$Group <- NA
heart.info$Group[grep("f",heart.info$Sample)] <- "fetal"
heart.info$Group[grep("y",heart.info$Sample)] <- "young"
heart.info$Group[grep("a",heart.info$Sample)] <- "adult"

sample <- factor(heart.info$Sample, levels= paste(rep(c("f","y","a"), each=3),c(1:3),sep=""))
group <- factor(heart.info$Group, levels=c("fetal","young","adult"))

grp <- factor(rep(c("fetal","young","adult"),each=3), levels=c("fetal","young","adult"))
sex <- factor(c("m","m","f","m","f","m","f","m","m"))
dose <- rep(c(1,2,3), each=3)

The sample information is summarised below:

gt(data.frame(Sample=1:9,Group=grp, Sex=sex),caption="Sample info for heart dataset")

Sample info for heart dataset
Sample	Group	Sex
1	fetal	m
2	fetal	m
3	fetal	f
4	young	m
5	young	f
6	young	m
7	adult	f
8	adult	m
9	adult	m

Anova test with propeller.anova

For the original analysis in the published paper by Sim et al., we performed an ANOVA using propeller(logit).

prop.logit <- getTransformedProps(clusters = heart.info$Celltype, sample=sample,
                                  transform = "logit")
design.anova <- model.matrix(~0+grp+sex)

propeller.anova(prop.logit,design = design.anova, coef=c(1,2,3), robust=TRUE, 
                trend = FALSE, sort=TRUE)

                    PropMean.grpfetal PropMean.grpyoung PropMean.grpadult
Erythroid                 0.004433044        0.00000000       0.000000000
Immune cells              0.027545963        0.10875124       0.189587828
Cardiomyocytes            0.682410381        0.42676145       0.273546585
Fibroblast                0.111342233        0.26192406       0.298689329
Neurons                   0.012643346        0.02620977       0.011380837
Epicardial cells          0.051414853        0.07541028       0.093157709
Smooth muscle cells       0.008101973        0.00846540       0.009099294
Endothelial cells         0.102108207        0.09247781       0.124538418
                    Fstatistic      P.Value          FDR
Erythroid           42.8302506 1.162495e-13 9.299960e-13
Immune cells        10.3529005 9.295704e-05 3.718282e-04
Cardiomyocytes       7.6881182 8.440468e-04 2.250792e-03
Fibroblast           4.0630697 2.057471e-02 4.114942e-02
Neurons              1.3712270 2.592465e-01 4.147943e-01
Epicardial cells     0.7965363 4.541612e-01 6.055483e-01
Smooth muscle cells  0.3492357 7.062149e-01 7.326515e-01
Endothelial cells    0.3122046 7.326515e-01 7.326515e-01

Modelling development as a continuous variable

Here we model the data in a different way with development as a continuous variable, and include sex as an additional covariate to control for.

des.dose <- model.matrix(~dose + sex)
des.dose

  (Intercept) dose sexm
1           1    1    1
2           1    1    1
3           1    1    0
4           1    2    1
5           1    2    0
6           1    2    1
7           1    3    0
8           1    3    1
9           1    3    1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"

fit <- lmFit(prop.logit$TransformedProps,des.dose)
fit <- eBayes(fit, robust=TRUE)
topTable(fit,coef=2)

                          logFC    AveExpr          t      P.Value   adj.P.Val
Immune cells         1.01359669 -2.4222033  4.2685426 0.0007551546 0.004980697
Erythroid           -1.62960543 -7.8002460 -4.0140721 0.0012451742 0.004980697
Cardiomyocytes      -0.91607837 -0.1936611 -3.4462160 0.0038586975 0.010289860
Fibroblast           0.62700537 -1.3756904  2.5326823 0.0236986525 0.047397305
Epicardial cells     0.29219063 -2.6566961  1.2026143 0.2487822518 0.398051603
Smooth muscle cells  0.18767427 -4.9125588  0.6717760 0.5125008180 0.603572791
Endothelial cells    0.13688475 -2.1652913  0.6467132 0.5281261918 0.603572791
Neurons             -0.05626966 -4.1954452 -0.2315895 0.8201585898 0.820158590
                             B
Immune cells        -0.3850285
Erythroid           -0.8666233
Cardiomyocytes      -1.9515138
Fibroblast          -3.6607326
Epicardial cells    -5.6855671
Smooth muscle cells -6.1705968
Endothelial cells   -6.1872807
Neurons             -6.3739773

fit.plot <- lmFit(prop.logit$Proportions,des.dose)
fit.plot <- eBayes(fit.plot, robust=TRUE)

The two analyses identify the same cell types as significantly enriched/depleted, although there is a change in the order of significance.

Three significant cell types are visualised below.

par(mfrow=c(1,3))
stripchart(as.numeric(prop.logit$Proportions["Immune cells",])~grp,
           vertical=TRUE, pch=16, method="jitter",
           col = ggplotColors(4),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Immune development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Immune cells",1], b=fit.plot$coefficients["Immune cells",2], lty=2, lwd=2)


stripchart(as.numeric(prop.logit$Proportions["Cardiomyocytes",])~grp,
           vertical=TRUE, pch=16, method="jitter",
           col = ggplotColors(4),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Cardiomyocyte development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Cardiomyocytes",1], b=fit.plot$coefficients["Cardiomyocytes",2], lty=2, lwd=2)
text(2.6,0.77, labels = "Adj.Pval = 0.01")

stripchart(as.numeric(prop.logit$Proportions["Fibroblast",])~grp,
           vertical=TRUE, pch=16, method="jitter",
           col = ggplotColors(4),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Fibroblast development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Fibroblast",1], b=fit.plot$coefficients["Fibroblast",2], lty=2, lwd=2)

sig.heart <- rownames(topTable(fit,coef=2))
grp <- factor(rep(c("fetal","young","adult"),each=3), levels=c("fetal","young","adult"))

pdf(file="output/heartResults.pdf",width = 13, height=6)
par(mfrow=c(2,4))
par(mar=c(4,5,2,2))
for(i in 1:length(sig.heart)){
  
stripchart(as.numeric(prop.logit$Proportions[sig.heart[i],])~grp,
           vertical=TRUE, pch=16, method="jitter",
           col = ggplotColors(4),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title(sig.heart[i], cex.main=1.5, adj=0)
legend("top", legend = paste("Adj.Pval = ",round(topTable(fit,coef=2)$adj.P.Val,3)[i],sep=""),cex=1.5,bty="n",bg="n")
#abline(a=fit.plot$coefficients[sig.heart[i],1], b=fit.plot$coefficients[sig.heart[i],2], lty=2, lwd=2)
}
dev.off()

png 
  2

COVID data

This dataset was published by Liao et al in 2020 in Nature Medicine. They compared moderate and severe covid to healthy controls. They sampled bronchoalveolar lavage fluid from each individual.

For this dataset there are no additional covariates and we use the propeller() function with the cell level annotation information for cell type, sample, and group. The function automatically detects more than two groups and performs an ANOVA for each cell type. We fit with both the logit and arcsin square root transformed data to show the effect of an outlier sample in the plasma cell type.

covid <- read.delim("./data/covid.cell.annotation.meta.txt")

propeller using logit transformed proportions:

output.logit <- propeller(clusters=covid$celltype, sample=covid$sample_new, group=covid$group, transform="logit")
output.logit

            BaselineProp  PropMean.HC  PropMean.M  PropMean.S Fstatistic
Neutrophil   0.024417668 0.0000000000 0.001204431 0.055593846 34.5347175
Plasma       0.015817544 0.0002244276 0.002149909 0.050912723  8.7181367
pDC          0.002309574 0.0011342772 0.008209773 0.001065692  5.7852728
NK           0.016425326 0.0088943293 0.052465923 0.017978750  5.3912409
T            0.117241275 0.0945944806 0.325029563 0.137097435  3.1551241
mDC          0.014860286 0.0237761816 0.030831149 0.008875553  2.4842289
B            0.003342805 0.0035032662 0.012989518 0.004005391  2.3945864
Epithelial   0.053652014 0.1302459194 0.051903406 0.118455372  1.8134669
Macrophages  0.750869889 0.7352897831 0.512995844 0.604316237  1.6207397
Mast         0.001063620 0.0023373349 0.002220485 0.001699001  0.6920735
                 P.Value          FDR
Neutrophil  3.546468e-07 3.546468e-06
Plasma      2.056673e-03 1.028336e-02
pDC         1.052218e-02 3.493472e-02
NK          1.397389e-02 3.493472e-02
T           6.468749e-02 1.293750e-01
mDC         1.090338e-01 1.673738e-01
B           1.171617e-01 1.673738e-01
Epithelial  1.901866e-01 2.377332e-01
Macrophages 2.239111e-01 2.487901e-01
Mast        5.127128e-01 5.127128e-01

propeller using arcsin square root transformed proportions:

output.asin <- propeller(clusters=covid$celltype, sample=covid$sample_new, group=covid$group, transform="asin")
output.asin

            BaselineProp  PropMean.HC  PropMean.M  PropMean.S Fstatistic
pDC          0.002309574 0.0011342772 0.008209773 0.001065692 7.66800413
Neutrophil   0.024417668 0.0000000000 0.001204431 0.055593846 7.69977312
NK           0.016425326 0.0088943293 0.052465923 0.017978750 7.52069109
T            0.117241275 0.0945944806 0.325029563 0.137097435 5.37765924
mDC          0.014860286 0.0237761816 0.030831149 0.008875553 5.03722483
B            0.003342805 0.0035032662 0.012989518 0.004005391 3.18020059
Plasma       0.015817544 0.0002244276 0.002149909 0.050912723 1.37935620
Macrophages  0.750869889 0.7352897831 0.512995844 0.604316237 1.21863016
Epithelial   0.053652014 0.1302459194 0.051903406 0.118455372 0.30529601
Mast         0.001063620 0.0023373349 0.002220485 0.001699001 0.04651485
                P.Value        FDR
pDC         0.007891726 0.02801024
Neutrophil  0.008006769 0.02801024
NK          0.008403072 0.02801024
T           0.023311489 0.05466535
mDC         0.027332677 0.05466535
B           0.080297449 0.13382908
Plasma      0.291776927 0.41544492
Macrophages 0.332355933 0.41544492
Epithelial  0.742906637 0.82545182
Mast        0.954732486 0.95473249

props.covid <- getTransformedProps(clusters=covid$celltype, sample=covid$sample_new,
                                   transform="logit")

par(mfrow=c(1,1))
grp.covid <- rep(c("Control","Moderate","Severe"), c(4, 3, 6))
stripchart(as.numeric(props.covid$Proportions["Neutrophil",])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Neutrophils in severe covid", cex.main=1.5, adj=0)
text(1.5,0.195, labels = "Adj.Pval = 3.5e-06")

Number of samples in each group:

table(grp.covid)

grp.covid
 Control Moderate   Severe 
       4        3        6

All significant cell types

par(mfrow=c(2,2))
stripchart(as.numeric(props.covid$Proportions["Neutrophil",])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Neutrophils in covid", cex.main=1.5, adj=0)

stripchart(as.numeric(props.covid$Proportions["Plasma",])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Plasma in covid - outlier sample", cex.main=1.5, adj=0)

stripchart(as.numeric(props.covid$Proportions["pDC",])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("pDC in covid", cex.main=1.5, adj=0)

stripchart(as.numeric(props.covid$Proportions["NK",])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("NK in covid", cex.main=1.5, adj=0)

sig.covid <- rownames(output.logit)

pdf(file="output/covidResults.pdf",width = 13, height=5)
par(mfrow=c(2,5))
par(mar=c(4,5,2,2))
for(i in 1:length(sig.covid)){
  
stripchart(as.numeric(props.covid$Proportions[sig.covid[i],])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.25, cex.lab=1.5)

title(sig.covid[i], cex.main=1.5, adj=0)
legend("top", legend = paste("Adj.Pval = ",round(output.asin[sig.covid[i],]$FDR,3),sep=""),cex=1.5,bty="n",bg="n")

}
dev.off()

png 
  2

Figure 3

layout(matrix(c(1,1,2,3,3,4,5,5,6,7,7,7,8,8,8,9,9,9), 2, 9, byrow = TRUE))

par(mar=c(5.5,5.5,3,0))
barplot(prop.logit$Proportions, col=ggplotColors(nrow(prop.logit$Proportions)),
        cex.lab=1.5, cex.axis = 1.5, cex.names=1.5, ylab="Proportions",xlab="Samples",las=2)
title("a) Human heart development", adj=0,cex.main=1.5)
par(mar=c(0,0,0,0))
plot(1, type = "n", xlab = "", ylab = "", xaxt="n",yaxt="n", bty="n")
legend("left",fill=ggplotColors(8),legend=rownames(prop.logit$Proportions), cex=1.25,
       bty="n")

par(mar=c(5.5,5.5,3,0))
par(mgp=c(4,1,0))
barplot(prop.list$Proportions, col=ggplotColors(24),las=2, cex.lab=1.5, 
        cex.axis = 1.5, cex.names=1.5, ylab="Proportions",xlab="Samples")
title("b) Old and young PBMCs", adj=0,cex.main=1.5)

par(mar=c(0,0,0,0))
plot(1, type = "n", xlab = "", ylab = "", xaxt="n",yaxt="n", bty="n")
legend("left",fill=ggplotColors(24),legend=rownames(prop.list$Proportions), cex=1.25,
       bty="n")


par(mar=c(5.5,5.5,3,0))
barplot(props.covid$Proportions, col=ggplotColors(nrow(props.covid$Proportions)),
        cex.lab=1.5, cex.axis = 1.5, cex.names=1.5, ylab="Proportions",xlab="Samples",las=2)
title("c) COVID vs healthy controls ", adj=0,cex.main=1.5)

par(mar=c(0,0,0,0))
plot(1, type = "n", xlab = "", ylab = "", xaxt="n",yaxt="n", bty="n")
legend("left",fill=ggplotColors(10),legend=rownames(props.covid$Proportions), cex=1.25,
       bty="n")

par(mar=c(5,5.5,3,2))
stripchart(as.numeric(prop.logit$Proportions["Cardiomyocytes",])~grp,
           vertical=TRUE, pch=16, method="jitter",
           col = ggplotColors(4),cex=2, 
           ylab="Proportions",cex.axis=1.5, cex.lab=1.5)
title("d) Cardiomyocyte development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Cardiomyocytes",1], b=fit.plot$coefficients["Cardiomyocytes",2], lty=2, lwd=2)
text(2.6,0.77, labels = "Adj.Pval = 0.01",cex=1.5)


stripchart(as.numeric(sexprops["CD8.Naive",])~group.immune,
           vertical=TRUE, pch=c(8,16), method="jitter",
           col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
           ylab="Proportions", cex.axis=1.5, cex.lab=1.5,
           group.names=c("F.Old","F.Young","M.Old","M.Young"))
title("e) CD8.Naive: Young Vs Old", cex.main=1.5, adj=0)
text(3.2,0.18, labels = "Adj.Pval = 0.004",cex=1.5)



grp.covid <- rep(c("Control","Moderate","Severe"), c(4, 3, 6))
stripchart(as.numeric(props.covid$Proportions["Neutrophil",])~grp.covid,
           vertical=TRUE, pch=16, method="jitter",
           col = c(4,"purple",2),cex=2, 
           ylab="Proportions",cex.axis=1.5, cex.lab=1.5)
title("f) Neutrophils in severe covid", cex.main=1.5, adj=0)
text(1.5,0.195, labels = "Adj.Pval = 0.028",cex=1.5)

sessionInfo()

R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gt_0.6.0        pheatmap_1.0.12 edgeR_3.38.1    limma_3.52.1   
[5] speckle_0.99.0  workflowr_1.7.0

loaded via a namespace (and not attached):
  [1] backports_1.4.1             plyr_1.8.7                 
  [3] igraph_1.3.1                lazyeval_0.2.2             
  [5] sp_1.4-7                    splines_4.2.0              
  [7] BiocParallel_1.30.2         listenv_0.8.0              
  [9] scattermore_0.8             GenomeInfoDb_1.32.2        
 [11] ggplot2_3.3.6               digest_0.6.29              
 [13] htmltools_0.5.2             fansi_1.0.3                
 [15] checkmate_2.1.0             magrittr_2.0.3             
 [17] memoise_2.0.1               tensor_1.5                 
 [19] cluster_2.1.3               ROCR_1.0-11                
 [21] globals_0.15.0              Biostrings_2.64.0          
 [23] matrixStats_0.62.0          spatstat.sparse_2.1-1      
 [25] colorspace_2.0-3            blob_1.2.3                 
 [27] ggrepel_0.9.1               xfun_0.31                  
 [29] dplyr_1.0.9                 callr_3.7.0                
 [31] crayon_1.5.1                RCurl_1.98-1.6             
 [33] jsonlite_1.8.0              org.Mm.eg.db_3.15.0        
 [35] progressr_0.10.0            spatstat.data_2.2-0        
 [37] survival_3.3-1              zoo_1.8-10                 
 [39] glue_1.6.2                  polyclip_1.10-0            
 [41] gtable_0.3.0                zlibbioc_1.42.0            
 [43] XVector_0.36.0              leiden_0.4.2               
 [45] DelayedArray_0.22.0         SingleCellExperiment_1.18.0
 [47] future.apply_1.9.0          BiocGenerics_0.42.0        
 [49] abind_1.4-5                 scales_1.2.0               
 [51] DBI_1.1.2                   spatstat.random_2.2-0      
 [53] miniUI_0.1.1.1              Rcpp_1.0.8.3               
 [55] viridisLite_0.4.0           xtable_1.8-4               
 [57] reticulate_1.25             spatstat.core_2.4-4        
 [59] bit_4.0.4                   stats4_4.2.0               
 [61] htmlwidgets_1.5.4           httr_1.4.3                 
 [63] RColorBrewer_1.1-3          ellipsis_0.3.2             
 [65] Seurat_4.1.1                ica_1.0-2                  
 [67] scuttle_1.6.2               pkgconfig_2.0.3            
 [69] uwot_0.1.11                 sass_0.4.1                 
 [71] deldir_1.0-6                locfit_1.5-9.5             
 [73] utf8_1.2.2                  tidyselect_1.1.2           
 [75] rlang_1.0.2                 reshape2_1.4.4             
 [77] later_1.3.0                 AnnotationDbi_1.58.0       
 [79] munsell_0.5.0               tools_4.2.0                
 [81] cachem_1.0.6                cli_3.3.0                  
 [83] generics_0.1.2              RSQLite_2.2.14             
 [85] ggridges_0.5.3              evaluate_0.15              
 [87] stringr_1.4.0               fastmap_1.1.0              
 [89] yaml_2.3.5                  goftest_1.2-3              
 [91] org.Hs.eg.db_3.15.0         processx_3.5.3             
 [93] knitr_1.39                  bit64_4.0.5                
 [95] fs_1.5.2                    fitdistrplus_1.1-8         
 [97] purrr_0.3.4                 RANN_2.6.1                 
 [99] KEGGREST_1.36.0             sparseMatrixStats_1.8.0    
[101] pbapply_1.5-0               future_1.26.1              
[103] nlme_3.1-157                whisker_0.4                
[105] mime_0.12                   compiler_4.2.0             
[107] rstudioapi_0.13             plotly_4.10.0              
[109] png_0.1-7                   spatstat.utils_2.3-1       
[111] statmod_1.4.36              tibble_3.1.7               
[113] bslib_0.3.1                 stringi_1.7.6              
[115] highr_0.9                   ps_1.7.0                   
[117] rgeos_0.5-9                 lattice_0.20-45            
[119] Matrix_1.4-1                vctrs_0.4.1                
[121] pillar_1.7.0                lifecycle_1.0.1            
[123] spatstat.geom_2.4-0         lmtest_0.9-40              
[125] jquerylib_0.1.4             RcppAnnoy_0.0.19           
[127] data.table_1.14.2           cowplot_1.1.1              
[129] bitops_1.0-7                irlba_2.3.5                
[131] GenomicRanges_1.48.0        httpuv_1.6.5               
[133] patchwork_1.1.1             R6_2.5.1                   
[135] promises_1.2.0.1            KernSmooth_2.23-20         
[137] gridExtra_2.3               IRanges_2.30.0             
[139] parallelly_1.31.1           codetools_0.2-18           
[141] MASS_7.3-57                 assertthat_0.2.1           
[143] SummarizedExperiment_1.26.1 rprojroot_2.0.3            
[145] SeuratObject_4.1.0          sctransform_0.3.3          
[147] S4Vectors_0.34.0            GenomeInfoDbData_1.2.8     
[149] mgcv_1.8-40                 parallel_4.2.0             
[151] beachmat_2.12.0             rpart_4.1.16               
[153] grid_4.2.0                  tidyr_1.2.0                
[155] DelayedMatrixStats_1.18.0   rmarkdown_2.14             
[157] MatrixGenerics_1.8.0        Rtsne_0.16                 
[159] git2r_0.30.1                getPass_0.2-2              
[161] Biobase_2.56.0              shiny_1.7.1

Application to real single cell RNA-seq datasets

Belinda Phipson

06/03/2022