Last updated: 2024-09-11

Knit directory: lung_lymph_scMultiomics/

GoM DE analysis on u19 dataset

Model fitting


N_updates = 150 N_topics = 12 ## Model evaluation

check the convergence

Version Author Date
68d9e18 Jing Gu 2024-05-09
Model overview:
  Number of data rows, n: 53647
  Number of data cols, m: 17420
  Rank/number of topics, k: 12
Evaluation of model fit (170 updates performed):
  Poisson NMF log-likelihood: -1.997995557900e+08
  Multinomial topic model log-likelihood: -1.995509032083e+08
  Poisson NMF deviance: +2.634951598369e+08
  Max KKT residual: +1.430262e-02
Set show.size.factors = TRUE, show.mixprops = TRUE and/or show.topic.reps = TRUE in print(...) for more information

Version Author Date
68d9e18 Jing Gu 2024-05-09
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was

Version Author Date
68d9e18 Jing Gu 2024-05-09

Visualize topics with structural plots

  • plot by tissue

    Version Author Date
    68d9e18 Jing Gu 2024-05-09
  • plot by tissue and cell-type

Version Author Date
68d9e18 Jing Gu 2024-05-09

Characterize topics with enrichment analysis

Two ways to perform enrichment test:

A test purely depends on occurrences, asking whether the genes related to a topic occur more frequently in GO term compared to the background genes.

  • two-sample T test or regression (MAGMA)

It tests whether the genes in a set is more associated with phenotype than those outside of a set.

GO enrichment results (ORA method)

Top genes ranked by loadings


  • Top 500 loading genes vs. all genes
  • Database: Biological_Process, Cellular_Component, Molecular_Function

Top loading genes are enriched in large number of GO terms, which have broad functions.

Topic-specific genes through DE analysis


  • Up-regulated genes specific to each topic (z > 0 and lfsr < 0.01) vs. all genes
  • Database: Biological_Process, Cellular_Component, Molecular_Function

Volcano plots for GoM DE results

Axis: the z-scores for posterior mean log-fold change estimates vs. log-fold change

From volcano plots, we see several genes that encode different types of cytokines are present in topic 6. Chemokine ligands (CCL4, CCL20, etc.) are proteins that signal leukocyte migration, while cytokines (IL17A, IL22) are interleukins that signal immune cells to defend against pathogens.

[1] "Number of up-regulated genes in each topic against the rest:"
  t1   t2   t3   t4   t5   t6   t7   t8   t9  t10  t11  t12 
1602  122  594  562  391  381  267  505  111  626  465  489 

GO Enrichment result table

All topics have fewer number of GO terms with enrichment, except for topic 1.

Visualizing GO enrichments with ComplexHeatmap

Parameters: - -log10(FDR) as value input - No clustering due to missing data - top 10 GO terms shown for each topic

Legend: - Color scale capped by FDR = \(10^{-15}\)

BP results show k1 enriched for granulocyte activation and neutrophil mediated immunity, as well as cell adhesion and motility. These pathways are highly relevant to Asthma. Due to high number of GO terms over-represented by k1 genes, we may repeat the topic modeling by increasing topic number. Several topics like k2, 6, 7, 9, 10 are strongly enriched with protein localization. Topics k3-6 have genes over-represented in T-cell activation, while k10, 12 in B-cell activation.

MF results show broad enrichment of molecular binding, with k7 highly enriched for cell adhesion molecular binding.

Selecting by FDR

Version Author Date
2e475c9 Jing Gu 2024-05-13
68d9e18 Jing Gu 2024-05-09

Identify topics correlated with tissue difference

  1. We can test whether topic proportion is correlated with the tissue of origin \[ F = \beta X_{\text{tissue}} + \text{Covariates} + \epsilon \]
  2. perform T-test to see whether topic proportions between two tissues are significantly different

Barplot for topic proportions in cell types

  • The topic proportion between tissue are similar across T cell subsets.
  • \(CD8^+\) T cells have the highest proportion of k3 compared to others, similar as NK cells.
  • The topic k1 dominates lung monocytes, and this topic separates ILC3 cells from other T/NK cells in lungs.
  • The topic proportion that differs the most between tissue occurs in B cells(eg. k4, k10, k12)
          Naive_B Memory_B Monocytes  ILC3 CD4_T CD8_T  Treg  Th17    NK Other
  lungs      1174     5287      1010   499  6980 12210  1336  2732  8067   145
  spleens    1710    10507        22    52   886   421    47    68   464    30

Perform t-test while adjusting for confounders


Test mean difference between tissue one donor at a time and then do meta-analysis with Fisher’s method


X-axis denotes cell types and y-axis denotes the topics. For major cell types, we saw majority of topics have significant differences in proportions between tissue.

Version Author Date
87c183b Jing Gu 2024-05-25
95bfdff Jing Gu 2024-05-13
2e475c9 Jing Gu 2024-05-13

Find evidence for the function of TFs with motif enrichment in genes loaded on topic 5

Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was

Version Author Date
744d4f0 Jing Gu 2024-05-22

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/

 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] catecolors_0.1        ComplexHeatmap_2.14.0 colorRamp2_0.1.0     
 [4] tidyr_1.3.1           dplyr_1.1.4           poolr_1.1-1          
 [7] cowplot_1.1.3         ggplot2_3.4.0         fastTopics_0.6-175   
[10] Matrix_1.6-5          workflowr_1.7.1      

loaded via a namespace (and not attached):
  [1] matrixStats_1.2.0   fs_1.6.4            RColorBrewer_1.1-3 
  [4] doParallel_1.0.17   progress_1.2.3      httr_1.4.7         
  [7] rprojroot_2.0.4     tools_4.2.0         bslib_0.7.0        
 [10] DT_0.33             utf8_1.2.4          R6_2.5.1           
 [13] irlba_2.3.5.1       BiocGenerics_0.44.0 uwot_0.2.2         
 [16] lazyeval_0.2.2      colorspace_2.1-0    GetoptLong_1.0.5   
 [19] withr_3.0.0         tidyselect_1.2.1    prettyunits_1.2.0  
 [22] processx_3.8.3      compiler_4.2.0      git2r_0.33.0       
 [25] cli_3.6.2           Cairo_1.6-2         plotly_4.10.4      
 [28] labeling_0.4.3      sass_0.4.9          scales_1.3.0       
 [31] SQUAREM_2021.1      quadprog_1.5-8      callr_3.7.3        
 [34] pbapply_1.7-2       mixsqp_0.3-54       stringr_1.5.1      
 [37] digest_0.6.35       rmarkdown_2.26      RhpcBLASctl_0.23-42
 [40] pkgconfig_2.0.3     htmltools_0.5.8.1   highr_0.10         
 [43] fastmap_1.1.1       invgamma_1.1        GlobalOptions_0.1.2
 [46] htmlwidgets_1.6.4   rlang_1.1.3         rstudioapi_0.15.0  
 [49] farver_2.1.1        shape_1.4.6         jquerylib_0.1.4    
 [52] generics_0.1.3      jsonlite_1.8.8      crosstalk_1.2.1    
 [55] gtools_3.9.5        magrittr_2.0.3      S4Vectors_0.36.2   
 [58] Rcpp_1.0.12         munsell_0.5.1       fansi_1.0.6        
 [61] lifecycle_1.0.4     stringi_1.7.6       whisker_0.4.1      
 [64] yaml_2.3.8          mathjaxr_1.6-0      Rtsne_0.17         
 [67] parallel_4.2.0      promises_1.3.0      ggrepel_0.9.5      
 [70] crayon_1.5.2        lattice_0.22-5      circlize_0.4.15    
 [73] hms_1.1.3           knitr_1.46          ps_1.7.6           
 [76] pillar_1.9.0        rjson_0.2.21        stats4_4.2.0       
 [79] codetools_0.2-19    glue_1.7.0          evaluate_0.23      
 [82] getPass_0.2-2       data.table_1.15.4   RcppParallel_5.1.7 
 [85] vctrs_0.6.5         png_0.1-8           httpuv_1.6.14      
 [88] foreach_1.5.2       gtable_0.3.5        purrr_1.0.2        
 [91] clue_0.3-65         ashr_2.2-63         cachem_1.0.8       
 [94] xfun_0.43           later_1.3.2         viridisLite_0.4.2  
 [97] truncnorm_1.0-9     tibble_3.2.1        iterators_1.0.14   
[100] IRanges_2.32.0      cluster_2.1.6