Last updated: 2022-08-18
Checks: 7 0
Knit directory: propeller-paper-analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220531)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 2586453. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/cold_warm_fresh_cellinfo.txt
Ignored: data/covid.cell.annotation.meta.txt
Ignored: data/heartFYA.Rds
Ignored: data/pool_1.rds
Untracked files:
Untracked: analysis/PlotsForPaper.Rmd
Untracked: code/SimCode.R
Untracked: code/SimCodeTrueDiff.R
Untracked: code/auroc.R
Untracked: code/convertData.R
Untracked: data/CTpropsTransposed.txt
Untracked: data/CelltypeLevels.csv
Untracked: data/TypeIErrTables.Rdata
Untracked: data/appnote1cdata.rdata
Untracked: data/cellinfo.csv
Untracked: data/nullsimsVaryN_results.Rdata
Untracked: data/sampleinfo.csv
Untracked: output/1x/
Untracked: output/Fig1ab.pdf
Untracked: output/Fig1cde.pdf
Untracked: output/Fig2ab.pdf
Untracked: output/Fig2abc.pdf
Untracked: output/Fig2c.pdf
Untracked: output/Figure1.pdf
Untracked: output/Figure1.png
Untracked: output/Figure2-01.png
Untracked: output/Figure2-with#.pdf
Untracked: output/Figure2-with#.png
Untracked: output/Figure2.ai
Untracked: output/Figure2.pdf
Untracked: output/Figure2.png
Untracked: output/Figure2_E_annotatedwithProp.pdf
Untracked: output/Figure3.pdf
Untracked: output/Figure3.png
Untracked: output/PDF/
Untracked: output/SuppFig4.pdf
Untracked: output/SuppFig4.png
Untracked: output/SuppTrueDiff10.pdf
Untracked: output/SuppTrueDiff20.pdf
Untracked: output/SuppTrueDiff3.pdf
Untracked: output/Supplementary Figure 2 v2.png
Untracked: output/Supplementary Figure 2.ai
Untracked: output/Supplementary Figure 2.pdf
Untracked: output/Supplementary Figure 2.png
Untracked: output/SupplementaryFigure3.pdf
Untracked: output/TrueDiffSimResults.Rda
Untracked: output/covidResults.pdf
Untracked: output/example_simdata.pdf
Untracked: output/extremeCaseTrueProps20CT.pdf
Untracked: output/fig2d.pdf
Untracked: output/gude-2022-06-27.log
Untracked: output/gude-2022-06-29.log
Untracked: output/heartResults.pdf
Untracked: output/heatmap20CT.pdf
Untracked: output/legend-fig2d.pdf
Untracked: output/pbmcOldYoungResults.pdf
Untracked: output/type1error5.csv
Untracked: output/typeIerrorResults.Rda
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/RealDataAnalysis.Rmd
) and
HTML (docs/RealDataAnalysis.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 2586453 | bphipson | 2022-08-18 | update analysis scripts |
html | 1613612 | bphipson | 2022-06-06 | Build site. |
Rmd | 622390e | bphipson | 2022-06-06 | update real data analysis |
html | a39985a | bphipson | 2022-06-03 | Build site. |
Rmd | cf6f450 | bphipson | 2022-06-03 | add real data analysis |
We analysed three different publicly available single cell datasets to highlight the different types of models that can be fitted within the propeller framework.
For all three datasets we use the logit transformation, except for the covid dataset. In the covid dataset, there is clearly an outlier for the plasma cell type. Using the logit transformation, plasma is found to be significantly enriched in severe covid, but using the arcsin square root transform shows that it is not significantly enriched. While logit may be more powerful according to the simulations, it appears sensitive to outliers, and in this case we would prefer to use the arcsin square root transform.
library(speckle)
library(limma)
library(edgeR)
library(pheatmap)
library(gt)
source("./code/convertData.R")
This dataset was published in PNAS in 2021 and examined PBMCs from 20 individuals. The dataset was balanced in terms of age and sex (5 samples in each group: Male + Young, Male + Old, Female + Young, Female + Old).
The data analysis reported in the paper was ANOVA directly on cell type proportions modelling sex and age plus interaction. It was not clear from the description whether separate models were fitted for sex and age, removing the interaction term in order to interpret the effects for sex and age correctly.
From the supplementary data I have extracted the cell type proportions and using information from the paper I have converted the proportions into the necessary data type (a list object) to analyse with propeller.
sexCT <- read.delim("./data/CTpropsTransposed.txt", row.names = 1)
sexprops <- sexCT/100
prop.list <- convertDataToList(sexprops,data.type="proportions", transform="asin",
scale.fac=174684/20)
sampleinfo <- read.csv("./data/sampleinfo.csv", nrows = 20)
celltypes <- read.csv("./data/CelltypeLevels.csv")
group.immune <- paste(sampleinfo$Sex, sampleinfo$Age, sep=".")
celltypes$Celltype_L0 <- celltypes$Celltype_L1
celltypes$Celltype_L0[celltypes$Celltype_L1 == "CD4" | celltypes$Celltype_L1 == "CD8" | celltypes$Celltype_L1 == "CD4-CD8-" | celltypes$Celltype_L1 == "CD4+CD8+" | celltypes$Celltype_L1 == "T-mito"] <- "TC"
gt(data.frame(table(sampleinfo$Sex, sampleinfo$Age)), rownames_to_stub = TRUE, caption="Sample info for aging dataset")
Var1 | Var2 | Freq | |
---|---|---|---|
1 | female | old | 5 |
2 | male | old | 5 |
3 | female | young | 5 |
4 | male | young | 5 |
levels(factor(celltypes$Celltype_L0))
[1] "BC" "DC" "MC" "NK" "TC"
sexprops.broad <- matrix(NA,nrow=length(levels(factor(celltypes$Celltype_L0))), ncol=ncol(sexprops))
rownames(sexprops.broad) <- levels(factor(celltypes$Celltype_L0))
colnames(sexprops.broad) <- colnames(sexprops)
for(i in 1:ncol(sexprops.broad)) sexprops.broad[,i] <- tapply(sexprops[,i],celltypes$Celltype_L0,sum)
prop.broad.list <- convertDataToList(sexprops.broad,data.type="proportions", transform="asin", scale.fac=174684/20)
par(mfrow=c(1,5))
for(i in 1:nrow(sexprops.broad)){
stripchart(as.numeric(sexprops.broad[i,])~group.immune,
vertical=TRUE, pch=c(8,16), method="jitter",
col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
group.names=c("F.Old","F.Young","M.Old","M.Young"))
title(rownames(sexprops.broad)[i], cex.main=1.5, adj=0)
}
designAS <- model.matrix(~0+sampleinfo$Age + sampleinfo$Sex)
colnames(designAS) <- c("old","young","MvsF")
# Young vs old
mycontr <- makeContrasts(young-old, levels=designAS)
propeller.ttest(prop.list = prop.broad.list,design = designAS, contrasts = mycontr,
robust=TRUE,trend=FALSE,sort=TRUE)
PropMean.old PropMean.young PropRatio Tstatistic P.Value FDR
MC 0.19656790 0.14320118 0.7285074 -3.0721365 0.006146365 0.03073183
TC 0.53445740 0.60263724 1.1275684 1.9680058 0.063474712 0.15868678
NK 0.15808995 0.13347185 0.8442779 -1.0554668 0.304133538 0.50688923
BC 0.09432895 0.10249865 1.0866086 0.6569470 0.518905633 0.64863204
DC 0.01655580 0.01819108 1.0987734 0.4041554 0.690503393 0.69050339
designSex <- model.matrix(~0+sampleinfo$Sex + sampleinfo$Age)
colnames(designSex) <- c("female","male","YvO")
# Male vs female
mycontr <- makeContrasts(male-female, levels=designSex)
propeller.ttest(prop.list = prop.broad.list,design = designSex, contrasts = mycontr,
robust=TRUE,trend=FALSE,sort=TRUE)
PropMean.female PropMean.male PropRatio Tstatistic P.Value FDR
NK 0.1169176 0.17464421 1.4937377 2.3732661 0.02805937 0.1402968
TC 0.5973764 0.53971825 0.9034811 -1.6579946 0.11334335 0.2833584
DC 0.0157718 0.01897507 1.2031011 1.0868808 0.29036247 0.4839375
BC 0.1011660 0.09566164 0.9455912 -0.3897772 0.70093229 0.8761654
MC 0.1687683 0.17100082 1.0132286 0.1074843 0.91550468 0.9155047
Set up design matrix using a means model, taking into account age and sex.
designAS <- model.matrix(~0+sampleinfo$Age + sampleinfo$Sex)
colnames(designAS) <- c("old","young","MvsF")
# Young vs old
mycontr <- makeContrasts(young-old, levels=designAS)
propeller.ttest(prop.list = prop.list,design = designAS, contrasts = mycontr,
robust=TRUE,trend=FALSE,sort=TRUE)
PropMean.old PropMean.young PropRatio Tstatistic P.Value
CD8.Naive 0.0251316033 0.0901386289 3.5866645 4.5220310 0.0002519973
CD16 0.0346697484 0.0184110417 0.5310405 -3.9241289 0.0009547072
TREG 0.0224289338 0.0173984106 0.7757128 -2.7351400 0.0133315595
T-mito 0.0050757865 0.0030999505 0.6107330 -2.6230363 0.0169438969
INTER 0.0237901860 0.0160354568 0.6740366 -2.4875531 0.0225506463
CD4-CD8- 0.0160339954 0.0301971880 1.8833227 2.2851533 0.0343944316
CD14 0.1381079661 0.1087546814 0.7874613 -2.0325551 0.0568152857
ABC 0.0042268748 0.0024242272 0.5735271 -1.6729544 0.1110894015
CD4.Naive 0.1048008357 0.1320789977 1.2602857 1.4434939 0.1657237113
CD8.CTL 0.1154952554 0.0782456785 0.6774796 -1.2843478 0.2149863850
CD8.Tem 0.0639345157 0.0802785262 1.2556367 1.2590561 0.2237791504
MBC 0.0358736549 0.0452983191 1.2627183 1.2537215 0.2256688587
CDC1 0.0005479607 0.0008647764 1.5781725 1.0625556 0.3015992495
NK3 0.0886767142 0.0704022112 0.7939199 -0.8885615 0.3857294967
pre-DC 0.0002285096 0.0003193005 1.3973177 0.8863058 0.3867803942
NK1 0.0125858088 0.0102540186 0.8147286 -0.8762818 0.3920651391
PC 0.0054708831 0.0049397487 0.9029162 -0.8219337 0.4215370844
CD4.Tem 0.0873700973 0.0832684783 0.9530547 -0.4812340 0.6360341593
CD4+CD8+ 0.0142514516 0.0133167354 0.9344126 -0.3026144 0.7655453386
NBC 0.0487575349 0.0498363513 1.0221261 0.2928181 0.7729439779
PDC 0.0031666984 0.0034597825 1.0925519 0.2888826 0.7758695169
CDC2 0.0126126342 0.0135472159 1.0740989 0.2557422 0.8009596764
NK2 0.0568274310 0.0528156246 0.9294037 -0.1699836 0.8668815789
CD4.Tcm 0.0799349205 0.0746146497 0.9334425 -0.1236681 0.9029215277
FDR
CD8.Naive 0.006047935
CD16 0.011456486
TREG 0.101663382
T-mito 0.101663382
INTER 0.108243102
CD4-CD8- 0.137577726
CD14 0.194795265
ABC 0.333268204
CD4.Naive 0.441929897
CD8.CTL 0.451337717
CD8.Tem 0.451337717
MBC 0.451337717
CDC1 0.556798615
NK3 0.588097709
pre-DC 0.588097709
NK1 0.588097709
PC 0.595111178
CD4.Tem 0.848045546
CD4+CD8+ 0.873774192
NBC 0.873774192
PDC 0.873774192
CDC2 0.873774192
NK2 0.902921528
CD4.Tcm 0.902921528
Visualise significant cell types:
par(mfrow=c(1,2))
stripchart(as.numeric(sexprops["CD8.Naive",])~group.immune,
vertical=TRUE, pch=c(8,16), method="jitter",
col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
group.names=c("F.Old","F.Young","M.Old","M.Young"))
title("CD8.Naive: Young Vs Old", cex.main=1.5, adj=0)
text(3.2,0.18, labels = "Adj.Pval = 0.004")
stripchart(as.numeric(sexprops["CD16",])~group.immune,
vertical=TRUE, pch=c(8,16), method="jitter",
col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
group.names=c("F.Old","F.Young","M.Old","M.Young"))
title("CD16: Young Vs Old", cex.main=1.5, adj=0)
text(2.2,0.049, labels = "Adj.Pval = 0.023")
pbmc.oldyoung <- propeller.ttest(prop.list = prop.list,design = designAS, contrasts = mycontr,
robust=TRUE,trend=FALSE,sort=TRUE)
sig.pbmc <- rownames(pbmc.oldyoung)
pdf(file="output/pbmcOldYoungResults.pdf",width = 13, height=13)
par(mfrow=c(6,4))
par(mar=c(4,5,2,2))
for(i in 1:length(sig.pbmc)){
stripchart(as.numeric(sexprops[sig.pbmc[i],])~group.immune,
vertical=TRUE, pch=c(8,16), method="jitter",
col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
ylab="Proportions", cex.axis=1.25, cex.lab=1.5,
group.names=c("F.Old","F.Young","M.Old","M.Young"))
title(paste(sig.pbmc[i],": Young Vs Old", sep=""), cex.main=1.5, adj=0)
legend("top", legend = paste("Adj.Pval = ",round(pbmc.oldyoung$FDR,3)[i],sep=""),cex=1.5,bty="n",bg="n")
}
dev.off()
png
2
designSex <- model.matrix(~0+sampleinfo$Sex + sampleinfo$Age)
colnames(designSex) <- c("female","male","YvO")
# Male vs female
mycontr <- makeContrasts(male-female, levels=designSex)
propeller.ttest(prop.list = prop.list,design = designSex, contrasts = mycontr,
robust=TRUE,trend=FALSE,sort=TRUE)
PropMean.female PropMean.male PropRatio Tstatistic P.Value
PC 0.0071388502 0.003271782 0.4583065 -2.9676619 0.00804369
CD4-CD8- 0.0169315724 0.029299611 1.7304720 1.7466913 0.09739463
CDC2 0.0114064513 0.014753399 1.2934258 1.7028982 0.10525677
pre-DC 0.0001775081 0.000370302 2.0861130 1.6922855 0.10729321
NK2 0.0457752659 0.063867790 1.3952467 1.5792666 0.13134444
T-mito 0.0034714265 0.004704310 1.3551520 1.5108683 0.14764218
NK3 0.0607526719 0.098326253 1.6184680 1.3381691 0.19717535
CD8.CTL 0.1180843551 0.075656579 0.6406994 -1.2934355 0.21189365
NK1 0.0103896562 0.012450171 1.1983237 1.0360062 0.31349508
TREG 0.0209130057 0.018914339 0.9044295 -0.9699518 0.34452822
PDC 0.0035994459 0.003027035 0.8409725 -0.8723142 0.39416997
CDC1 0.0005883982 0.000824339 1.4009883 0.8506963 0.40576781
CD16 0.0281162892 0.024964501 0.8879017 -0.7867408 0.44141885
CD4.Tcm 0.0843546548 0.070194915 0.8321404 -0.7086577 0.48743577
CD8.Tem 0.0747782768 0.069434765 0.9285419 -0.6242353 0.54015091
CD8.Naive 0.0547935108 0.060476721 1.1037205 0.6099297 0.54938164
CD4.Naive 0.1243810258 0.112498808 0.9044692 -0.6072558 0.55111630
CD4+CD8+ 0.0134400072 0.014128180 1.0512033 0.4069711 0.68867913
ABC 0.0035141234 0.003136979 0.8926774 -0.3947226 0.69754192
CD14 0.1200231846 0.126839463 1.0567913 0.3939839 0.69813009
INTER 0.0206287843 0.019196858 0.9305860 -0.3590350 0.72361649
CD4.Tem 0.0862285523 0.084410023 0.9789104 -0.2035515 0.84094249
NBC 0.0497589512 0.048834935 0.9814301 0.1775313 0.86103455
MBC 0.0407540321 0.040417942 0.9917532 -0.1763579 0.86194297
FDR
PC 0.1930486
CD4-CD8- 0.5905687
CDC2 0.5905687
pre-DC 0.5905687
NK2 0.5905687
T-mito 0.5905687
NK3 0.6356810
CD8.CTL 0.6356810
NK1 0.7780465
TREG 0.7780465
PDC 0.7780465
CDC1 0.7780465
CD16 0.7780465
CD4.Tcm 0.7780465
CD8.Tem 0.7780465
CD8.Naive 0.7780465
CD4.Naive 0.7780465
CD4+CD8+ 0.8269903
ABC 0.8269903
CD14 0.8269903
INTER 0.8269903
CD4.Tem 0.8619430
NBC 0.8619430
MBC 0.8619430
This dataset was published by Sim et al in 2021 and looked at heart development across fetal, young and adult developmental time points. There are three samples at each developmental time point. There is also mix of male and female samples and one of the key findings of the study was transcriptional differences in cardiomyocyte development between males and females.
Here we look at differences in cellular composition of human hearts across the developmental trajectory, taking sex into account.
We can simply perform an anova test to find any differences between the three time points. The propeller.anova function can be called to do this directly.
We can also look at development as a continuous trajectory and model development as a continuous variable by getting the transformed proportions and using fitting functions in limma directly.
Both of these analysis approaches are shown below.
heart.info <- read.csv(file="./data/cellinfo.csv", row.names = 1)
heart.counts <- table(heart.info$Celltype, heart.info$Sample)
trueprops <- rowSums(heart.counts)/sum(rowSums(heart.counts))
heart.info$Group <- NA
heart.info$Group[grep("f",heart.info$Sample)] <- "fetal"
heart.info$Group[grep("y",heart.info$Sample)] <- "young"
heart.info$Group[grep("a",heart.info$Sample)] <- "adult"
sample <- factor(heart.info$Sample, levels= paste(rep(c("f","y","a"), each=3),c(1:3),sep=""))
group <- factor(heart.info$Group, levels=c("fetal","young","adult"))
grp <- factor(rep(c("fetal","young","adult"),each=3), levels=c("fetal","young","adult"))
sex <- factor(c("m","m","f","m","f","m","f","m","m"))
dose <- rep(c(1,2,3), each=3)
The sample information is summarised below:
gt(data.frame(Sample=1:9,Group=grp, Sex=sex),caption="Sample info for heart dataset")
Sample | Group | Sex |
---|---|---|
1 | fetal | m |
2 | fetal | m |
3 | fetal | f |
4 | young | m |
5 | young | f |
6 | young | m |
7 | adult | f |
8 | adult | m |
9 | adult | m |
For the original analysis in the published paper by Sim et al., we performed an ANOVA using propeller(logit).
prop.logit <- getTransformedProps(clusters = heart.info$Celltype, sample=sample,
transform = "logit")
design.anova <- model.matrix(~0+grp+sex)
propeller.anova(prop.logit,design = design.anova, coef=c(1,2,3), robust=TRUE,
trend = FALSE, sort=TRUE)
PropMean.grpfetal PropMean.grpyoung PropMean.grpadult
Erythroid 0.004433044 0.00000000 0.000000000
Immune cells 0.027545963 0.10875124 0.189587828
Cardiomyocytes 0.682410381 0.42676145 0.273546585
Fibroblast 0.111342233 0.26192406 0.298689329
Neurons 0.012643346 0.02620977 0.011380837
Epicardial cells 0.051414853 0.07541028 0.093157709
Smooth muscle cells 0.008101973 0.00846540 0.009099294
Endothelial cells 0.102108207 0.09247781 0.124538418
Fstatistic P.Value FDR
Erythroid 42.8302506 1.162495e-13 9.299960e-13
Immune cells 10.3529005 9.295704e-05 3.718282e-04
Cardiomyocytes 7.6881182 8.440468e-04 2.250792e-03
Fibroblast 4.0630697 2.057471e-02 4.114942e-02
Neurons 1.3712270 2.592465e-01 4.147943e-01
Epicardial cells 0.7965363 4.541612e-01 6.055483e-01
Smooth muscle cells 0.3492357 7.062149e-01 7.326515e-01
Endothelial cells 0.3122046 7.326515e-01 7.326515e-01
Here we model the data in a different way with development as a continuous variable, and include sex as an additional covariate to control for.
des.dose <- model.matrix(~dose + sex)
des.dose
(Intercept) dose sexm
1 1 1 1
2 1 1 1
3 1 1 0
4 1 2 1
5 1 2 0
6 1 2 1
7 1 3 0
8 1 3 1
9 1 3 1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"
fit <- lmFit(prop.logit$TransformedProps,des.dose)
fit <- eBayes(fit, robust=TRUE)
topTable(fit,coef=2)
logFC AveExpr t P.Value adj.P.Val
Immune cells 1.01359669 -2.4222033 4.2685426 0.0007551546 0.004980697
Erythroid -1.62960543 -7.8002460 -4.0140721 0.0012451742 0.004980697
Cardiomyocytes -0.91607837 -0.1936611 -3.4462160 0.0038586975 0.010289860
Fibroblast 0.62700537 -1.3756904 2.5326823 0.0236986525 0.047397305
Epicardial cells 0.29219063 -2.6566961 1.2026143 0.2487822518 0.398051603
Smooth muscle cells 0.18767427 -4.9125588 0.6717760 0.5125008180 0.603572791
Endothelial cells 0.13688475 -2.1652913 0.6467132 0.5281261918 0.603572791
Neurons -0.05626966 -4.1954452 -0.2315895 0.8201585898 0.820158590
B
Immune cells -0.3850285
Erythroid -0.8666233
Cardiomyocytes -1.9515138
Fibroblast -3.6607326
Epicardial cells -5.6855671
Smooth muscle cells -6.1705968
Endothelial cells -6.1872807
Neurons -6.3739773
fit.plot <- lmFit(prop.logit$Proportions,des.dose)
fit.plot <- eBayes(fit.plot, robust=TRUE)
The two analyses identify the same cell types as significantly enriched/depleted, although there is a change in the order of significance.
Three significant cell types are visualised below.
par(mfrow=c(1,3))
stripchart(as.numeric(prop.logit$Proportions["Immune cells",])~grp,
vertical=TRUE, pch=16, method="jitter",
col = ggplotColors(4),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Immune development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Immune cells",1], b=fit.plot$coefficients["Immune cells",2], lty=2, lwd=2)
stripchart(as.numeric(prop.logit$Proportions["Cardiomyocytes",])~grp,
vertical=TRUE, pch=16, method="jitter",
col = ggplotColors(4),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Cardiomyocyte development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Cardiomyocytes",1], b=fit.plot$coefficients["Cardiomyocytes",2], lty=2, lwd=2)
text(2.6,0.77, labels = "Adj.Pval = 0.01")
stripchart(as.numeric(prop.logit$Proportions["Fibroblast",])~grp,
vertical=TRUE, pch=16, method="jitter",
col = ggplotColors(4),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Fibroblast development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Fibroblast",1], b=fit.plot$coefficients["Fibroblast",2], lty=2, lwd=2)
sig.heart <- rownames(topTable(fit,coef=2))
grp <- factor(rep(c("fetal","young","adult"),each=3), levels=c("fetal","young","adult"))
pdf(file="output/heartResults.pdf",width = 13, height=6)
par(mfrow=c(2,4))
par(mar=c(4,5,2,2))
for(i in 1:length(sig.heart)){
stripchart(as.numeric(prop.logit$Proportions[sig.heart[i],])~grp,
vertical=TRUE, pch=16, method="jitter",
col = ggplotColors(4),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title(sig.heart[i], cex.main=1.5, adj=0)
legend("top", legend = paste("Adj.Pval = ",round(topTable(fit,coef=2)$adj.P.Val,3)[i],sep=""),cex=1.5,bty="n",bg="n")
#abline(a=fit.plot$coefficients[sig.heart[i],1], b=fit.plot$coefficients[sig.heart[i],2], lty=2, lwd=2)
}
dev.off()
png
2
This dataset was published by Liao et al in 2020 in Nature Medicine. They compared moderate and severe covid to healthy controls. They sampled bronchoalveolar lavage fluid from each individual.
For this dataset there are no additional covariates and we use the propeller() function with the cell level annotation information for cell type, sample, and group. The function automatically detects more than two groups and performs an ANOVA for each cell type. We fit with both the logit and arcsin square root transformed data to show the effect of an outlier sample in the plasma cell type.
covid <- read.delim("./data/covid.cell.annotation.meta.txt")
propeller using logit transformed proportions:
output.logit <- propeller(clusters=covid$celltype, sample=covid$sample_new, group=covid$group, transform="logit")
output.logit
BaselineProp PropMean.HC PropMean.M PropMean.S Fstatistic
Neutrophil 0.024417668 0.0000000000 0.001204431 0.055593846 34.5347175
Plasma 0.015817544 0.0002244276 0.002149909 0.050912723 8.7181367
pDC 0.002309574 0.0011342772 0.008209773 0.001065692 5.7852728
NK 0.016425326 0.0088943293 0.052465923 0.017978750 5.3912409
T 0.117241275 0.0945944806 0.325029563 0.137097435 3.1551241
mDC 0.014860286 0.0237761816 0.030831149 0.008875553 2.4842289
B 0.003342805 0.0035032662 0.012989518 0.004005391 2.3945864
Epithelial 0.053652014 0.1302459194 0.051903406 0.118455372 1.8134669
Macrophages 0.750869889 0.7352897831 0.512995844 0.604316237 1.6207397
Mast 0.001063620 0.0023373349 0.002220485 0.001699001 0.6920735
P.Value FDR
Neutrophil 3.546468e-07 3.546468e-06
Plasma 2.056673e-03 1.028336e-02
pDC 1.052218e-02 3.493472e-02
NK 1.397389e-02 3.493472e-02
T 6.468749e-02 1.293750e-01
mDC 1.090338e-01 1.673738e-01
B 1.171617e-01 1.673738e-01
Epithelial 1.901866e-01 2.377332e-01
Macrophages 2.239111e-01 2.487901e-01
Mast 5.127128e-01 5.127128e-01
propeller using arcsin square root transformed proportions:
output.asin <- propeller(clusters=covid$celltype, sample=covid$sample_new, group=covid$group, transform="asin")
output.asin
BaselineProp PropMean.HC PropMean.M PropMean.S Fstatistic
pDC 0.002309574 0.0011342772 0.008209773 0.001065692 7.66800413
Neutrophil 0.024417668 0.0000000000 0.001204431 0.055593846 7.69977312
NK 0.016425326 0.0088943293 0.052465923 0.017978750 7.52069109
T 0.117241275 0.0945944806 0.325029563 0.137097435 5.37765924
mDC 0.014860286 0.0237761816 0.030831149 0.008875553 5.03722483
B 0.003342805 0.0035032662 0.012989518 0.004005391 3.18020059
Plasma 0.015817544 0.0002244276 0.002149909 0.050912723 1.37935620
Macrophages 0.750869889 0.7352897831 0.512995844 0.604316237 1.21863016
Epithelial 0.053652014 0.1302459194 0.051903406 0.118455372 0.30529601
Mast 0.001063620 0.0023373349 0.002220485 0.001699001 0.04651485
P.Value FDR
pDC 0.007891726 0.02801024
Neutrophil 0.008006769 0.02801024
NK 0.008403072 0.02801024
T 0.023311489 0.05466535
mDC 0.027332677 0.05466535
B 0.080297449 0.13382908
Plasma 0.291776927 0.41544492
Macrophages 0.332355933 0.41544492
Epithelial 0.742906637 0.82545182
Mast 0.954732486 0.95473249
props.covid <- getTransformedProps(clusters=covid$celltype, sample=covid$sample_new,
transform="logit")
par(mfrow=c(1,1))
grp.covid <- rep(c("Control","Moderate","Severe"), c(4, 3, 6))
stripchart(as.numeric(props.covid$Proportions["Neutrophil",])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Neutrophils in severe covid", cex.main=1.5, adj=0)
text(1.5,0.195, labels = "Adj.Pval = 3.5e-06")
Number of samples in each group:
table(grp.covid)
grp.covid
Control Moderate Severe
4 3 6
All significant cell types
par(mfrow=c(2,2))
stripchart(as.numeric(props.covid$Proportions["Neutrophil",])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Neutrophils in covid", cex.main=1.5, adj=0)
stripchart(as.numeric(props.covid$Proportions["Plasma",])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("Plasma in covid - outlier sample", cex.main=1.5, adj=0)
stripchart(as.numeric(props.covid$Proportions["pDC",])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("pDC in covid", cex.main=1.5, adj=0)
stripchart(as.numeric(props.covid$Proportions["NK",])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title("NK in covid", cex.main=1.5, adj=0)
sig.covid <- rownames(output.logit)
pdf(file="output/covidResults.pdf",width = 13, height=5)
par(mfrow=c(2,5))
par(mar=c(4,5,2,2))
for(i in 1:length(sig.covid)){
stripchart(as.numeric(props.covid$Proportions[sig.covid[i],])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.25, cex.lab=1.5)
title(sig.covid[i], cex.main=1.5, adj=0)
legend("top", legend = paste("Adj.Pval = ",round(output.asin[sig.covid[i],]$FDR,3),sep=""),cex=1.5,bty="n",bg="n")
}
dev.off()
png
2
layout(matrix(c(1,1,2,3,3,4,5,5,6,7,7,7,8,8,8,9,9,9), 2, 9, byrow = TRUE))
par(mar=c(5.5,5.5,3,0))
barplot(prop.logit$Proportions, col=ggplotColors(nrow(prop.logit$Proportions)),
cex.lab=1.5, cex.axis = 1.5, cex.names=1.5, ylab="Proportions",xlab="Samples",las=2)
title("a) Human heart development", adj=0,cex.main=1.5)
par(mar=c(0,0,0,0))
plot(1, type = "n", xlab = "", ylab = "", xaxt="n",yaxt="n", bty="n")
legend("left",fill=ggplotColors(8),legend=rownames(prop.logit$Proportions), cex=1.25,
bty="n")
par(mar=c(5.5,5.5,3,0))
par(mgp=c(4,1,0))
barplot(prop.list$Proportions, col=ggplotColors(24),las=2, cex.lab=1.5,
cex.axis = 1.5, cex.names=1.5, ylab="Proportions",xlab="Samples")
title("b) Old and young PBMCs", adj=0,cex.main=1.5)
par(mar=c(0,0,0,0))
plot(1, type = "n", xlab = "", ylab = "", xaxt="n",yaxt="n", bty="n")
legend("left",fill=ggplotColors(24),legend=rownames(prop.list$Proportions), cex=1.25,
bty="n")
par(mar=c(5.5,5.5,3,0))
barplot(props.covid$Proportions, col=ggplotColors(nrow(props.covid$Proportions)),
cex.lab=1.5, cex.axis = 1.5, cex.names=1.5, ylab="Proportions",xlab="Samples",las=2)
title("c) COVID vs healthy controls ", adj=0,cex.main=1.5)
par(mar=c(0,0,0,0))
plot(1, type = "n", xlab = "", ylab = "", xaxt="n",yaxt="n", bty="n")
legend("left",fill=ggplotColors(10),legend=rownames(props.covid$Proportions), cex=1.25,
bty="n")
par(mar=c(5,5.5,3,2))
stripchart(as.numeric(prop.logit$Proportions["Cardiomyocytes",])~grp,
vertical=TRUE, pch=16, method="jitter",
col = ggplotColors(4),cex=2,
ylab="Proportions",cex.axis=1.5, cex.lab=1.5)
title("d) Cardiomyocyte development", cex.main=1.5, adj=0)
abline(a=fit.plot$coefficients["Cardiomyocytes",1], b=fit.plot$coefficients["Cardiomyocytes",2], lty=2, lwd=2)
text(2.6,0.77, labels = "Adj.Pval = 0.01",cex=1.5)
stripchart(as.numeric(sexprops["CD8.Naive",])~group.immune,
vertical=TRUE, pch=c(8,16), method="jitter",
col = c(ggplotColors(20)[20],"hotpink",4, "darkblue"),cex=2,
ylab="Proportions", cex.axis=1.5, cex.lab=1.5,
group.names=c("F.Old","F.Young","M.Old","M.Young"))
title("e) CD8.Naive: Young Vs Old", cex.main=1.5, adj=0)
text(3.2,0.18, labels = "Adj.Pval = 0.004",cex=1.5)
grp.covid <- rep(c("Control","Moderate","Severe"), c(4, 3, 6))
stripchart(as.numeric(props.covid$Proportions["Neutrophil",])~grp.covid,
vertical=TRUE, pch=16, method="jitter",
col = c(4,"purple",2),cex=2,
ylab="Proportions",cex.axis=1.5, cex.lab=1.5)
title("f) Neutrophils in severe covid", cex.main=1.5, adj=0)
text(1.5,0.195, labels = "Adj.Pval = 0.028",cex=1.5)
sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gt_0.6.0 pheatmap_1.0.12 edgeR_3.38.1 limma_3.52.1
[5] speckle_0.99.0 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] backports_1.4.1 plyr_1.8.7
[3] igraph_1.3.1 lazyeval_0.2.2
[5] sp_1.4-7 splines_4.2.0
[7] BiocParallel_1.30.2 listenv_0.8.0
[9] scattermore_0.8 GenomeInfoDb_1.32.2
[11] ggplot2_3.3.6 digest_0.6.29
[13] htmltools_0.5.2 fansi_1.0.3
[15] checkmate_2.1.0 magrittr_2.0.3
[17] memoise_2.0.1 tensor_1.5
[19] cluster_2.1.3 ROCR_1.0-11
[21] globals_0.15.0 Biostrings_2.64.0
[23] matrixStats_0.62.0 spatstat.sparse_2.1-1
[25] colorspace_2.0-3 blob_1.2.3
[27] ggrepel_0.9.1 xfun_0.31
[29] dplyr_1.0.9 callr_3.7.0
[31] crayon_1.5.1 RCurl_1.98-1.6
[33] jsonlite_1.8.0 org.Mm.eg.db_3.15.0
[35] progressr_0.10.0 spatstat.data_2.2-0
[37] survival_3.3-1 zoo_1.8-10
[39] glue_1.6.2 polyclip_1.10-0
[41] gtable_0.3.0 zlibbioc_1.42.0
[43] XVector_0.36.0 leiden_0.4.2
[45] DelayedArray_0.22.0 SingleCellExperiment_1.18.0
[47] future.apply_1.9.0 BiocGenerics_0.42.0
[49] abind_1.4-5 scales_1.2.0
[51] DBI_1.1.2 spatstat.random_2.2-0
[53] miniUI_0.1.1.1 Rcpp_1.0.8.3
[55] viridisLite_0.4.0 xtable_1.8-4
[57] reticulate_1.25 spatstat.core_2.4-4
[59] bit_4.0.4 stats4_4.2.0
[61] htmlwidgets_1.5.4 httr_1.4.3
[63] RColorBrewer_1.1-3 ellipsis_0.3.2
[65] Seurat_4.1.1 ica_1.0-2
[67] scuttle_1.6.2 pkgconfig_2.0.3
[69] uwot_0.1.11 sass_0.4.1
[71] deldir_1.0-6 locfit_1.5-9.5
[73] utf8_1.2.2 tidyselect_1.1.2
[75] rlang_1.0.2 reshape2_1.4.4
[77] later_1.3.0 AnnotationDbi_1.58.0
[79] munsell_0.5.0 tools_4.2.0
[81] cachem_1.0.6 cli_3.3.0
[83] generics_0.1.2 RSQLite_2.2.14
[85] ggridges_0.5.3 evaluate_0.15
[87] stringr_1.4.0 fastmap_1.1.0
[89] yaml_2.3.5 goftest_1.2-3
[91] org.Hs.eg.db_3.15.0 processx_3.5.3
[93] knitr_1.39 bit64_4.0.5
[95] fs_1.5.2 fitdistrplus_1.1-8
[97] purrr_0.3.4 RANN_2.6.1
[99] KEGGREST_1.36.0 sparseMatrixStats_1.8.0
[101] pbapply_1.5-0 future_1.26.1
[103] nlme_3.1-157 whisker_0.4
[105] mime_0.12 compiler_4.2.0
[107] rstudioapi_0.13 plotly_4.10.0
[109] png_0.1-7 spatstat.utils_2.3-1
[111] statmod_1.4.36 tibble_3.1.7
[113] bslib_0.3.1 stringi_1.7.6
[115] highr_0.9 ps_1.7.0
[117] rgeos_0.5-9 lattice_0.20-45
[119] Matrix_1.4-1 vctrs_0.4.1
[121] pillar_1.7.0 lifecycle_1.0.1
[123] spatstat.geom_2.4-0 lmtest_0.9-40
[125] jquerylib_0.1.4 RcppAnnoy_0.0.19
[127] data.table_1.14.2 cowplot_1.1.1
[129] bitops_1.0-7 irlba_2.3.5
[131] GenomicRanges_1.48.0 httpuv_1.6.5
[133] patchwork_1.1.1 R6_2.5.1
[135] promises_1.2.0.1 KernSmooth_2.23-20
[137] gridExtra_2.3 IRanges_2.30.0
[139] parallelly_1.31.1 codetools_0.2-18
[141] MASS_7.3-57 assertthat_0.2.1
[143] SummarizedExperiment_1.26.1 rprojroot_2.0.3
[145] SeuratObject_4.1.0 sctransform_0.3.3
[147] S4Vectors_0.34.0 GenomeInfoDbData_1.2.8
[149] mgcv_1.8-40 parallel_4.2.0
[151] beachmat_2.12.0 rpart_4.1.16
[153] grid_4.2.0 tidyr_1.2.0
[155] DelayedMatrixStats_1.18.0 rmarkdown_2.14
[157] MatrixGenerics_1.8.0 Rtsne_0.16
[159] git2r_0.30.1 getPass_0.2-2
[161] Biobase_2.56.0 shiny_1.7.1