Last updated: 2022-09-30

Checks: 7 0

Knit directory: scATACseq-topics/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200729) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b779c73. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/Buenrostro_2018/
    Ignored:    data/Cusanovich_2018/processed_data/Cusanovich_2018_Kidney.RData
    Ignored:    output/Buenrostro_2018/binarized/filtered_peaks/de-buenrostro2018-k=10-noshrink.RData
    Ignored:    output/Buenrostro_2018/binarized/filtered_peaks/fit-Buenrostro2018-binarized-filtered-scd-ex-k=10.rds
    Ignored:    output/Buenrostro_2018/binarized/filtered_peaks/fit-Buenrostro2018-binarized-filtered-scd-ex-k=8.rds
    Ignored:    output/Cusanovich_2018/.DS_Store
    Ignored:    output/Cusanovich_2018/fit-Cusanovich2018-scd-ex-k=13.rds
    Ignored:    output/Cusanovich_2018/tissues/.DS_Store
    Ignored:    output/Cusanovich_2018/tissues/de-cusanovich2018-kidney-k=10-noshrink.RData
    Ignored:    output/Cusanovich_2018/tissues/fit-Cusanovich2018-Kidney-scd-ex-k=10.rds
    Ignored:    output/Cusanovich_2018/tissues/gene-scores-cusanovich2018-kidney-k=10.RData
    Ignored:    scripts/seq_gene.md.gz

Untracked files:
    Untracked:  analysis/fit-Buenrostro2018-binarized-scd-ex-k=10.rds
    Untracked:  data/Buenrostro_2018_binarized.RData
    Untracked:  output/Buenrostro_2018/binarized/filtered_peaks/Buenrostro_2018_binarized_filtered.RData
    Untracked:  plots/
    Untracked:  scripts/fit-buenrostro-2018-k=8.rds

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/cusanovich2018_kidney_k10.Rmd) and HTML (docs/cusanovich2018_kidney_k10.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd b779c73 Peter Carbonetto 2022-09-30 workflowr::wflow_publish("cusanovich2018_kidney_k10.Rmd")
html 3912476 Peter Carbonetto 2022-09-29 Build site.
Rmd 91dda41 Peter Carbonetto 2022-09-29 workflowr::wflow_publish("analysis/cusanovich2018_kidney_k10.Rmd")
Rmd 36327f7 Peter Carbonetto 2022-09-29 I have a first draft of the function for creating the volcano enrichment plots.
Rmd d111ca6 Peter Carbonetto 2022-09-28 Edit to cusanovich2018_kidney_k10 analysis.
Rmd 494e06a Peter Carbonetto 2022-09-28 Working on adding volcano plots to cusanovich2018_kidney_k10 analysis.
Rmd b3b7350 Peter Carbonetto 2022-09-16 Made some improvements to exploratory analysis temp.R.
html 822da57 Peter Carbonetto 2022-08-05 Added some accompanying text to the cusanovich2018_kidney_k10 analysis.
Rmd 8d16590 Peter Carbonetto 2022-08-05 workflowr::wflow_publish("analysis/cusanovich2018_kidney_k10.Rmd")
html 0d6c9fd Peter Carbonetto 2022-08-05 Added structure plots to cusanovich2018_kidney_k10 analysis.
Rmd d9cfa44 Peter Carbonetto 2022-08-05 workflowr::wflow_publish("analysis/cusanovich2018_kidney_k10.Rmd")
html 68fba7a Peter Carbonetto 2022-08-05 First build of cusanovich2018_kidney_k10 analysis.
Rmd 4549e6a Peter Carbonetto 2022-08-05 workflowr::wflow_publish("analysis/cusanovich2018_kidney_k10.Rmd")

Add overview of the analysis here.

Load the packages used in the analysis below.

library(Matrix)
library(pathways)
library(fastTopics)
library(ggplot2)
library(ggrepel)
library(cowplot)
source("code/kidney_genes.R")
source("code/volcano_plots.R")

Load the binarized count data.

load("data/Cusanovich_2018/processed_data/Cusanovich_2018_Kidney.RData")

Load the \(K = 10\) topic model fit.

fit <- readRDS(file.path("output/Cusanovich_2018/tissues",
                         "fit-Cusanovich2018-Kidney-scd-ex-k=10.rds"))$fit
fit <- poisson2multinom(fit)

Load the gene-wise results obtained from the de_analysis of the peaks.

load(file.path("output/Cusanovich_2018/tissues",
               "de-gene-cusanovich2018-kidney-k=10.RData"))

This next Structure plot shows the correspondence between the topics and the final cell-type labels from Cusanovich et al (“cell_label”). Some of more distinct cell-types (e.g., loop of henle, podocytes) are captured by a single topic, Note that some of the cell-type labels are combined to improve the Structure plot; in particular, several of the labels that only appear once or a few times are combined into the “Other or unknown” grouping.

set.seed(1)
x <- samples$cell_label
x[x == "Activated B cells" | x == "B cells" | x == "Alveolar macrophages" |
  x == "Monocytes" | x == "NK cells" | x == "T cells" | 
  x == "Regulatory T cells" | x == "Dendritic cells" |
  x == "Macrophages"] <- "Immune cells"
x[x == "Endothelial I (glomerular)" |
  x == "Endothelial I cells"] <- "Endothelial I cells"
x[x == "Collisions" | x == "Enterocytes" | x == "Hepatocytes" |
  x == "Sperm" | x == "Type II pneumocytes" | x == "Endothelial II cells" |
  x == "Hematopoietic progenitors" | x == "Unknown"] <- "Other or unknown"
x <- factor(x,c("Collecting duct","DCT/CD","Distal convoluted tubule",
                "Loop of henle","Proximal tubule","Proximal tubule S3",
                "Podocytes","Endothelial I cells","Immune cells",
                "Other or unknown"))
samples <- transform(samples,cell_label = x)
topic_colors <- c("orange","darkmagenta","magenta","gold","peru",
                  "red","lightgray","limegreen","dodgerblue","royalblue")
p1 <- structure_plot(fit,grouping = samples$cell_label,gap = 20,
                     colors = topic_colors,verbose = FALSE)
print(p1)

Version Author Date
0d6c9fd Peter Carbonetto 2022-08-05

Let’s now visualize the topics without the benefit of the previously inferred clusters. Instead, we can use PCs computed from the topic proportions to subdivide the cells into 5 clusters. (Here we are assigning labels to these clusters only to make it easier to refer to them.)

pca <- prcomp(fit$L)$x
pc1 <- pca[,1]
pc2 <- pca[,2]
x <- rep("other",nrow(pca))
x[pc1 < 0] <- "proximal tubule"
x[pc1 > 0 & pc2 > 0.2] <- "endothelial I"
rows <- which(x == "other")
fit2 <- select_loadings(fit,loadings = rows)
pca  <- prcomp(fit2$L)$x
y    <- rep("DCT + CD",nrow(pca))
pc1  <- pca[,1]
pc2  <- pca[,2]
y[pc1 < 0 & pc2 > -0.15] <- "loop of henle"
y[pc1 >= 0 & pc2 > -0.15] <- "podocytes + other"
x[rows] <- y
samples$cluster <- factor(x,c("DCT + CD","loop of henle","proximal tubule",
                              "podocytes + other","endothelial I"))

This is what the Structure plot looks like without any external grouping of the cells:

set.seed(1)
p2 <- structure_plot(fit,grouping = samples$cluster,gap = 20,
                     colors = topic_colors,verbose = FALSE)
print(p2)

Version Author Date
0d6c9fd Peter Carbonetto 2022-08-05

The topics clearly identify four distinctive cell types—(1) loop of henle, (2) proximal tubule, (3) distal convoluted tubule (DCT) and collecting duct (CD), and (4) endothelial cells—and within these distinctive cell types there is sometimes a good amount of continuous variation in expression that is further captured by the topics, such as the two topics for DCT and CD, and the heterogeneity among the proximal tubule cells captured by several topics.

Next, let’s see if the gene ranking generated from the differential accessibility analysis of the topics supports these interpretations, and helps us interpret some of the other topics that are less clear such as topic 9.

In this next code chunk we prepare the gene_info and de_gene data structures for the volcano plots.

data(gene_sets_mouse)
gene_info <- gene_sets_mouse$gene_info
gene_info <- subset(gene_info,!is.na(Ensembl))
gene_info <- subset(gene_info,!duplicated(Ensembl))
ids <- gene_info$Ensembl
ids <- intersect(ids,rownames(de_gene$coef))
de_gene$coef  <- de_gene$coef[ids,]
de_gene$logLR <- de_gene$logLR[ids,]
ids       <- rownames(de_gene$coef)
gene_info <- subset(gene_info,is.element(Ensembl,ids))
rownames(gene_info) <- gene_info$Ensembl
gene_info <- gene_info[ids,]
genes     <- gene_info$Symbol

Add text here.

k <- 2
label_genes <- (is.element(genes,cd_genes) & de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 35 & de_gene$coef[,k] > 0) |
               (de_gene$coef[,k] > 3)
k2_genes <- genes
k2_genes[!label_genes] <- ""
p2 <- volcano_plot_enrich(de_gene,k = 2,labels = k2_genes,
                          highlight = is.element(genes,cd_genes),
                          ymax = 100)
k <- 5
label_genes <- (is.element(genes,dct_genes) & de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 40 & de_gene$coef[,k] > 1) |
               (de_gene$coef[,k] > 3)
k5_genes <- genes
k5_genes[!label_genes] <- ""
p5 <- volcano_plot_enrich(de_gene,k = 5,labels = k5_genes,
                          highlight = is.element(genes,dct_genes),
                          ymax = 100)
k <- 3
label_genes <- (is.element(genes,endo_genes) & de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 85 & de_gene$coef[,k] > 2.25) |
               (de_gene$coef[,k] > 4.5)
k3_genes <- genes
k3_genes[!label_genes] <- ""
p3 <- volcano_plot_enrich(de_gene,k = 3,labels = k3_genes,
                          highlight = is.element(genes,endo_genes),
                          ymax = 125)
k <- 1
label_genes <- (is.element(genes,podo_genes) & de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 50 & de_gene$coef[,k] > 2) |
               (de_gene$logLR[,k] > 20 & de_gene$coef[,k] > 3.75)
k1_genes <- genes
k1_genes[!label_genes] <- ""
p1 <- volcano_plot_enrich(de_gene,k = 1,labels = k1_genes,
                          highlight = is.element(genes,podo_genes),
                          ymax = 80)
k <- 8
label_genes <- (is.element(genes,loh_genes) & de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 60 & de_gene$coef[,k] > 0) |
               (de_gene$coef[,k] > 3)
k8_genes <- genes
k8_genes[!label_genes] <- ""
p8 <- volcano_plot_enrich(de_gene,k = 8,labels = k8_genes,
                          highlight = is.element(genes,loh_genes),
                          ymax = 140)
plot_grid(p2,p5,
          p3,p1,
          p8,
          nrow = 3,ncol = 2)

Version Author Date
3912476 Peter Carbonetto 2022-09-29

Add text here.

k <- 9
label_genes <- (is.element(genes,c(pt_genes,S1_genes)) &
                de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 100 & de_gene$coef[,k] > 1) |
               (de_gene$coef[,k] > 3)
k9_genes <- genes
k9_genes[!label_genes] <- ""
k9_highlight <- rep(0,length(genes))
k9_highlight[is.element(genes,pt_genes)] <- 1
k9_highlight[is.element(genes,S1_genes)] <- 2
p9 <- volcano_plot_enrich(de_gene,k = 9,labels = k9_genes,
                          highlight = factor(k9_highlight),ymax = 200)
k <- 10
label_genes <- (is.element(genes,c(pt_genes,S3_genes)) &
                de_gene$logLR[,k] > 4) |
               (de_gene$logLR[,k] > 65 & de_gene$coef[,k] > 1.5) |
               (de_gene$coef[,k] > 3)
k10_genes <- genes
k10_genes[!label_genes] <- ""
k10_highlight <- rep(0,length(genes))
k10_highlight[is.element(genes,pt_genes)] <- 1
k10_highlight[is.element(genes,S3_genes)] <- 2
p10 <- volcano_plot_enrich(de_gene,k = 10,labels = k10_genes,
                           highlight = factor(k10_highlight),ymax = 200)
plot_grid(p9,p10,nrow = 1,ncol = 2)

Version Author Date
3912476 Peter Carbonetto 2022-09-29

To do:

  • Compile enrichment results into tables.

  • Create volcano plots for remaining topics.

  • Put together volcano plots for paper.

# Add remaining volcano plots here.

Diagram from osmosis.com:

kidney diagram


sessionInfo()
# R version 3.6.2 (2019-12-12)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Catalina 10.15.7
# 
# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] cowplot_1.1.1      ggrepel_0.9.1      ggplot2_3.3.6      fastTopics_0.6-131
# [5] pathways_0.1-20    Matrix_1.4-2      
# 
# loaded via a namespace (and not attached):
#  [1] mcmc_0.9-6          fs_1.5.2            progress_1.2.2     
#  [4] httr_1.4.2          rprojroot_1.3-2     tools_3.6.2        
#  [7] backports_1.1.5     bslib_0.3.1         utf8_1.1.4         
# [10] R6_2.4.1            irlba_2.3.3         uwot_0.1.10        
# [13] DBI_1.1.0           lazyeval_0.2.2      colorspace_1.4-1   
# [16] withr_2.5.0         tidyselect_1.1.1    gridExtra_2.3      
# [19] prettyunits_1.1.1   compiler_3.6.2      git2r_0.29.0       
# [22] quantreg_5.54       SparseM_1.78        plotly_4.9.2       
# [25] labeling_0.3        sass_0.4.0          scales_1.1.0       
# [28] SQUAREM_2017.10-1   quadprog_1.5-8      pbapply_1.5-1      
# [31] mixsqp_0.3-46       systemfonts_1.0.2   stringr_1.4.0      
# [34] digest_0.6.23       rmarkdown_2.11      MCMCpack_1.4-5     
# [37] pkgconfig_2.0.3     htmltools_0.5.2     fastmap_1.1.0      
# [40] invgamma_1.1        highr_0.8           htmlwidgets_1.5.1  
# [43] rlang_0.4.11        jquerylib_0.1.4     generics_0.0.2     
# [46] farver_2.0.1        jsonlite_1.7.2      BiocParallel_1.18.1
# [49] dplyr_1.0.7         magrittr_2.0.1      Rcpp_1.0.8         
# [52] munsell_0.5.0       fansi_0.4.0         lifecycle_1.0.0    
# [55] stringi_1.4.3       whisker_0.4         yaml_2.2.0         
# [58] MASS_7.3-51.4       Rtsne_0.15          grid_3.6.2         
# [61] parallel_3.6.2      promises_1.1.0      crayon_1.4.1       
# [64] lattice_0.20-38     hms_1.1.0           knitr_1.37         
# [67] pillar_1.6.2        fgsea_1.15.1        fastmatch_1.1-0    
# [70] glue_1.4.2          evaluate_0.14       data.table_1.12.8  
# [73] RcppParallel_5.1.5  vctrs_0.3.8         httpuv_1.5.2       
# [76] MatrixModels_0.4-1  gtable_0.3.0        purrr_0.3.4        
# [79] tidyr_1.1.3         assertthat_0.2.1    ashr_2.2-54        
# [82] xfun_0.29           coda_0.19-3         later_1.0.0        
# [85] ragg_0.3.1          viridisLite_0.3.0   truncnorm_1.0-8    
# [88] tibble_3.1.3        workflowr_1.7.0     ellipsis_0.3.2