Last updated: 2020-08-28

Checks: 7 0

Knit directory: jesslyn_ovca/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200713)

The command set.seed(20200713) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: cd835db

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cd835db. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    GO_results/.DS_Store
    Ignored:    GO_results/GO_PCA/.DS_Store
    Ignored:    GO_results/GO_Vln/.DS_Store
    Ignored:    GO_results/GO_Vln/PDX/.DS_Store
    Ignored:    GO_results/GO_plots/.DS_Store
    Ignored:    GO_results/GO_plots/PDX/.DS_Store
    Ignored:    GO_results/GO_plots/PDX/F/
    Ignored:    GO_results/GO_plots/SS2/.DS_Store
    Ignored:    GO_results/Tables/.DS_Store
    Ignored:    GO_results/Tables/PDX/.DS_Store
    Ignored:    GO_results/Tables/PDX/F/
    Ignored:    GO_results/Tables/SS2/.DS_Store
    Ignored:    analysis/.DS_Store
    Ignored:    code/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/HTAPP/
    Ignored:    data/Izar_2020/
    Ignored:    data/gene_lists/.DS_Store
    Ignored:    data/gene_lists/GO_PDX/.DS_Store
    Ignored:    data/gene_lists/GO_SS2/.DS_Store
    Ignored:    data/gene_lists/extra/.DS_Store
    Ignored:    jesslyn_plots/
    Ignored:    mike_plots/
    Ignored:    old/.DS_Store
    Ignored:    old/edited/.DS_Store
    Ignored:    renv/.DS_Store
    Ignored:    renv/library/
    Ignored:    renv/python/
    Ignored:    renv/staging/
    Ignored:    vignettes/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/03.1_Izar2020_PDX_DEAnalysis.Rmd) and HTML (docs/03.1_Izar2020_PDX_DEAnalysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	58e936c	jgoh2	2020-08-27	SS2 and PDX GO
Rmd	56eea68	jgoh2	2020-08-27	SS2 and PDX GO
Rmd	4bca25d	jgoh2	2020-08-27	SS2 and PDX GO plots
Rmd	6759890	jgoh2	2020-08-25	GO work in progress
html	b7634b5	jgoh2	2020-08-23	Build site.
Rmd	e35baf5	jgoh2	2020-08-23	Cell cycle ggplot
html	2c1accf	jgoh2	2020-08-19	Build site.
Rmd	729ebf4	jgoh2	2020-08-19	workflowr::wflow_publish(files = files)
html	92568b8	jgoh2	2020-08-17	Build site.
Rmd	d60aeff	jgoh2	2020-08-17	More DE Analysis of other hallmarks
html	80345fe	jgoh2	2020-08-15	Build site.
Rmd	b1ab47b	jgoh2	2020-08-15	workflowr::wflow_publish(files = files)
html	789982d	jgoh2	2020-08-14	Build site.
Rmd	2880f59	jgoh2	2020-08-14	workflowr::wflow_publish(files = files)
html	e05c328	jgoh2	2020-08-12	Build site.
Rmd	b5bbe6d	jgoh2	2020-08-12	workflowr::wflow_publish(files = files)
html	84edf85	jgoh2	2020-08-12	Build site.
Rmd	d504c59	jgoh2	2020-08-12	workflowr::wflow_publish(files = files)
html	a3ddf54	jgoh2	2020-08-07	Build site.
html	d801a7a	jgoh2	2020-08-04	Build site.
html	26f64a3	jgoh2	2020-08-03	Build site.
Rmd	eaa900e	jgoh2	2020-08-03	workflowr::wflow_publish(files = files)
html	2dc1bee	jgoh2	2020-07-31	Build site.
Rmd	284aad4	jgoh2	2020-07-31	workflowr::wflow_publish(files = files)
Rmd	c8bb9fc	jgoh2	2020-07-30	PDX Exploratory + DE + Cell Cycle Analyses
html	c8bb9fc	jgoh2	2020-07-30	PDX Exploratory + DE + Cell Cycle Analyses
html	6be6c85	jgoh2	2020-07-28	Build site.
Rmd	cdd10f9	jgoh2	2020-07-28	SS2 DE Analysis
html	cdd10f9	jgoh2	2020-07-28	SS2 DE Analysis
html	35c7947	jgoh2	2020-07-27	SS2 Analysis Part 1 and 2
Rmd	8ca1e01	jgoh2	2020-07-27	PDX analysis edits
html	8ca1e01	jgoh2	2020-07-27	PDX analysis edits
html	f1acd7b	jgoh2	2020-07-24	Move PDX_choices.Rmd to old
Rmd	bc21d3a	jgoh2	2020-07-23	PDX DE Analysis
html	bc21d3a	jgoh2	2020-07-23	PDX DE Analysis
Rmd	e27cfd1	jgoh2	2020-07-22	Moved files out of the analysis folder + AddModulescore in read_Izar_2020.R
Rmd	979ae91	jgoh2	2020-07-20	Reorganize PDX code and add to the analysis folder

IZAR 2020 PDX (COHORT 3) DATA DIFFERANTIAL EXPRESISON ANALYSIS

OVERVIEW

The PDX data from Izar2020 consists of only Malignant cells from three HGSOC PDX models derived from patients with different treatment histories were selected for implantation:
- DF20 (BRCA WT treatment-naive, clinically platinum sensitive)
- DF101 (BRCA1 mutant, 2 lines of prior therapy, clinically platinum resistant)
- DF68 (BRCA1 mutant, 6 lines of prior therapy, clinically platinum resistant)
After tumors were established, animals were divided into two groups per model:
- Vehicle (treated with DMSO)
- Carboplatin (treated with IP carboplatin)
  - Carboplatin-treated mice for minimal residual disease (MRD) group were harvested for scRNA-seq
  - The remaining carboplatin-treated mice were harvested at endpoint (vehicle)
In our 5-part analysis of the Izar 2020 PDX data, we are interested in identifying differentially expressed genes and hallmark genesets between treatment statuses within each model.
We split our PDX analysis into three parts:
1. Load Data and Create PDX Seurat Object
  1. The code to this part of our analysis is in the read_Izar_2020.R file in the code folder. During this part of our analysis we:
    1. Load in PDX count matrix and Create Seurat Object
    2. Assign Metadata identities including:
      - Mouse ID
      - Model ID
      - Treatment Status
    3. Score cells for cell cycle and hallmark genesets - Note: It does not matter whether we call AddModuleScore before or after subsetting and scaling each model because AddModuleScore uses the data slot.
    4. Save Seurat Object
2. Process Data and Exploratory Data Analysis
  1. The code to this part of our analysis can be found in the 03_Izar2020_PDX_Load file in the old/edited folder. During this part of our analysis we:
    1. Load in PDX Seurat Object from Part 1 and subset by model. Continue analysis separately for each model.
    2. Scale and FindVariableFeatures (prepares data for dimensionality reduction)
    3. Dimensionality Reduction (PCA + UMAP)
    4. Save Seurat Objects
3. Exploratory Data Analysis
  1. The code to this part of our analysis can be found in the 03.0_Izar2020_PDX_Exploratory Analysis file in the analysis folder. During this part of our analysis we:
    1. Load in PDX Seurat Object from Part 2. Analyze separately for each model.
    2. Compute summary metrics for PDX data such as:
      - Number of cells per model per treatment
      - Number of cells per treatment per cell cycle phase
    3. Visualize how cells separate based on metadata identities via UMAP - Intermodel heterogeneity: How do cells separate by model? - Intramodel heterogeneity:
      - How do cells separate by treament status?
      - How do cells separate by cell cycle phase?
4. DE Analysis
  1. TYPE #1 DE ANALYSIS: Visualizing and Quantifying Differentially Expression on Predefined GO Genesets
    1. Violin Plots and UMAP
    2. Gene Set Enrichment Analysis (GSEA)
  2. TYPE #2 DE ANALYSIS: Finding DE Genes from scratch
    1. Volcano Plots
5. CELL CYCLE ANALYSIS
  1. TYPE #1 CELL CYCLE ANALYSIS: Examine correlation between treatment condition and cell cycle phase
  2. Evaluate the idea that cell cycle might influence expression of signatures

This is the fourth part of our 5-part analysis of the Izar 2020 PDX (Cohort 3) data.

DIFFERENTIAL EXPRESSION ANALYSIS IN-DEPTH EXPLANATION

We are interested in answering a few questions for our DE Analysis:

DE ANALYSIS #1. Visualizing and Quantifying DE Hallmark Genesets

QUESTION Are modules (OXPHOS and UPR) differentially expressed across treatment conditions within each model?
- APPROACH #1 Violin Plots and UMAP
  - Visualize differences in hallmark score across treatment conditions with:
    - Violin Plot
    - UMAP by treatment vs. UMAP by hallmark score
  - Statistical test: is the difference in hallmark score across treatment conditions statistically significant?
- APPROACH #2 GSEA
  - GSEA enrichment plots for hallmarks of interest between condition.1 vs. condition.2
  - Rank genes and compute GSEA enrichment scores
  - Statistical test: how significant are the enrichment scores?

DE ANALYSIS #2. Identifying Individual DE Genes

STEP 1 LOAD IN SEURAT OBJECTS AND GENESETS

# Load packages
source(here::here('packages.R'))

#Read in PDX RDS object
PDX_All = readRDS("data/Izar_2020/test/jesslyn_PDX_All_processed.RDS")
PDX_DF20 = readRDS("data/Izar_2020/test/jesslyn_PDX_DF20_processed.RDS")
PDX_DF101 = readRDS("data/Izar_2020/test/jesslyn_PDX_DF101_processed.RDS")
PDX_DF68 = readRDS("data/Izar_2020/test/jesslyn_PDX_DF68_processed.RDS")

#Read in hallmarks of interest
hallmark_names = read_lines("data/gene_lists/hallmarks.txt")
hallmark.list <- vector(mode = "list", length = length(hallmark_names))
names(hallmark.list) <- hallmark_names

for(hm in hallmark_names){
  if(file.exists(glue("data/gene_lists/hallmarks/{hm}_updated.txt"))){
    file <- read_lines(glue("data/gene_lists/hallmarks/{hm}_updated.txt"), skip = 1)
    hallmark.list[[hm]] <- file
  }
  else{
    file <- read_lines(glue("data/gene_lists/extra/{hm}.txt"), skip =2)
    hallmark.list[[hm]] <- file
  }
}

#center module and cell cycle scores and reassign to the metadata of each Seurat object
hm.names <- names(PDX_All@meta.data)[9:57]

for(i in hm.names){
    DF20.hm.centered <- scale(PDX_DF20[[i]], center = TRUE, scale = FALSE)
    PDX_DF20 <- AddMetaData(PDX_DF20, DF20.hm.centered, col.name = glue("{i}.centered"))
    
    DF101.hm.centered <- scale(PDX_DF101[[i]], center = TRUE, scale = FALSE)
    PDX_DF101 <- AddMetaData(PDX_DF101, DF101.hm.centered, col.name = glue("{i}.centered"))
    
    DF68.hm.centered <- scale(PDX_DF68[[i]], center = TRUE, scale = FALSE)
    PDX_DF68 <- AddMetaData(PDX_DF68, DF68.hm.centered, col.name = glue("{i}.centered"))
}

STEP 2 DETERMINING OXPHOS GENESET

Before doing any DE Analysis on the genesets, we investigate which of the three OXPHOS genesets we found is best for our data. The OXPHOS genesets that we test are listed as follows:
- Unupdated Version of HALLMARK_OXIDATIVE_PHOSPHORYLATION (200 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_OXIDATIVE_PHOSPHORYLATION
- Updated Version of HALLMARK_OXIDATIVE_PHOSPHORYLATION (200 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_OXIDATIVE_PHOSPHORYLATION
- GO_OXIDATIVE_PHOSPHORYLATION (144 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/GO_OXIDATIVE_PHOSPHORYLATION
- KEGG_OXIDATIVE_PHOSPHORYLATION (131 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/KEGG_OXIDATIVE_PHOSPHORYLATION
We determine which geneset to use by asking these questions:
1. How many and which genes are not found in the PDX Seurat Object for each geneset?
2. What makes the genesets different from each other? Which genes in each geneset are the most DE (or have the higest logFC)? Are they the same?
  1. Volcano Plot: Call FindMarkers on each model, plot onto Volcano Plot, label cells by each geneset
3. Which geneset gives us the most statistically significant results?
  1. VlnPlot: Plot hallmark scores for each geneset group by treatment status for each model
  2. Label the VlnPlot with the number of genes found in PDX Seurat Object for each OXPHOS geneset

ANSWERING QUESTION #1: How many and which genes are not found in the PDX Seurat Object for each geneset?

hm.length.df <- data.frame(
  "UNUPDATED.OXPHOS" = length(hallmark.list[["UNUPDATED.OXPHOS"]]),
  "HALLMARK.OXPHOS" = length(hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]]), 
  "GO.OXPHOS" = length(hallmark.list[["GO.OXPHOS"]]), 
  "KEGG.OXPHOS" = length(hallmark.list[["KEGG.OXPHOS"]])
)

Found.df <- data.frame(
  "UNUPDATED.OXPHOS" = sum((hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(PDX_All))),
  "HALLMARK.OXPHOS" = sum((hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]] %in% rownames(PDX_All))), 
  "GO.OXPHOS" = sum((hallmark.list[["GO.OXPHOS"]] %in% rownames(PDX_All))), 
  "KEGG.OXPHOS" = sum((hallmark.list[["KEGG.OXPHOS"]] %in% rownames(PDX_All)))
)

PFound.df <- data.frame(
  "UNUPDATED.OXPHOS" = (sum((hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(PDX_All)))/length(hallmark.list[["UNUPDATED.OXPHOS"]]))*100,
  "HALLMARK.OXPHOS" = (sum((hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]] %in% rownames(PDX_All)))/length(hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]])) * 100, 
  "GO.OXPHOS" = (sum((hallmark.list[["GO.OXPHOS"]] %in% rownames(PDX_All)))/length(hallmark.list[["GO.OXPHOS"]]))*100, 
  "KEGG.OXPHOS" = (sum((hallmark.list[["KEGG.OXPHOS"]] %in% rownames(PDX_All)))/length(hallmark.list[["KEGG.OXPHOS"]]))*100
)

NA.df <- data.frame(
  "UNUPDATED.OXPHOS" = sum(!(hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(PDX_All))),
  "HALLMARK.OXPHOS" = sum(!(hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]] %in% rownames(PDX_All))), 
  "GO.OXPHOS" = sum(!(hallmark.list[["GO.OXPHOS"]] %in% rownames(PDX_All))), 
  "KEGG.OXPHOS" = sum(!(hallmark.list[["KEGG.OXPHOS"]] %in% rownames(PDX_All)))
)

all.df <- rbind(hm.length.df, Found.df, PFound.df, NA.df)
rownames(all.df) <- c("NumGenes", "Found", "%Found", "Not Found")
all.df[,"GO.OXPHOS"] <- round(all.df[,"GO.OXPHOS"])
all.df[,"KEGG.OXPHOS"] <- round(all.df[,"KEGG.OXPHOS"])
all.df

          UNUPDATED.OXPHOS HALLMARK.OXPHOS GO.OXPHOS KEGG.OXPHOS
NumGenes               200             200       144         131
Found                  184             182       107         101
%Found                  92              91        74          77
Not Found               16              18        37          30

# IDENTIFY GENES THAT ARE NOT FOUND -----------
NA.genes.df <- vector("list", length = 4)
names <- c("UNUPDATED.OXPHOS", "HALLMARK_OXIDATIVE_PHOSPHORYLATION", "GO.OXPHOS", "KEGG.OXPHOS")
names(NA.genes.df) <- names
for(i in names){
  NA.genes.df[[i]] <- (hallmark.list[[i]])[which(!(hallmark.list[[i]] %in% rownames(PDX_All)))]
}

OBSERVATIONS
- The UNUPDATED.OXPHOS genelist actually has the most genes that are found in the PDX object.
- Tried to delete the dash (-) in genes with (“MT-”), and also tried to delete the MT part of the gene name, but those genes remain not to be found in the PDX object.
- Since the UNUPDATED.OXPHOS genelist works better than the updated version (HALLMARK_OXIDATIVE_PHOSPHORYLATION), we will continue our comparison of genesets using the UNUPDATED version.
We now wonder:
- Which OXPHOS genes are the most DE if we call FindMarkers?
- Which geneset(s) do the most DE OXPHOS genes belong to?

ANSWERING QUESTION 2: Which genes in each geneset are the most DE? Are they the same? * Used the wilcoxon rank sum test for FindMarkers

PDXs <- c(PDX_DF20, PDX_DF101, PDX_DF68)
PDX.names <- c("DF20", "DF101", "DF68")
oxphos.hm <- c("UNUPDATED.OXPHOS", "GO.OXPHOS", "KEGG.OXPHOS")
PDX.hm.plots <- vector("list", length = 3)
names(PDX.hm.plots) <- PDX.names

markers <- vector("list", length = 3)
names(markers) <- PDX.names

markers[["DF20"]] <- FindMarkers(PDX_DF20, group.by = "treatment.status", ident.1 = "MRD", ident.2 = "vehicle", test.use = "wilcox", logfc.threshold = 0) 
markers[["DF101"]]  <- FindMarkers(PDX_DF101, group.by = "treatment.status", ident.1 = "MRD", ident.2 = "vehicle", test.use = "wilcox", logfc.threshold = 0)
markers[["DF68"]] <- FindMarkers(PDX_DF68, group.by = "treatment.status", ident.1 = "MRD", ident.2 = "vehicle", test.use = "wilcox", logfc.threshold = 0)

for(i in 1:length(PDXs)){
    PDX <- PDX.names[[i]]
    DF.hm.plot <- vector("list", length = 3)
    names(DF.hm.plot) <- oxphos.hm
    marker <- markers[[PDX]]
    
  for(oxphos in oxphos.hm){
    
    avgLFC <- rownames(marker[which(abs(marker$avg_logFC) > 0.5),])
    keyvals <- ifelse(!(rownames(marker) %in% hallmark.list[[oxphos]]), 'black', 
                      ifelse(abs(marker$avg_logFC) > 0.5, 'red', 'grey'))
    names(keyvals)[keyvals == 'red'] <- 'OXPHOS & logFC'
    names(keyvals)[keyvals == 'grey'] <- 'OXPHOS x logFC'
    names(keyvals)[keyvals == 'black'] <- 'Not OXPHOS'
    found = length(avgLFC[which(avgLFC %in% hallmark.list[[oxphos]])])
    
    p <- EnhancedVolcano(marker, 
           lab = rownames(marker),
           selectLab = avgLFC[which(avgLFC %in% hallmark.list[[oxphos]])],
           labCol = "red",
           x='avg_logFC', y='p_val_adj', pCutoff = 0.05, 
           FCcutoff = 0.5, 
           colCustom = keyvals,
           pointSize = c(ifelse(rownames(marker) %in% hallmark.list[[oxphos]], 2.5,1)), 
           drawConnectors = TRUE,
           boxedLabels = TRUE,
           labvjust = 1,
           title= glue("{PDX.names[[i]]} MRD vs. vehicle"), subtitle= "LogFC cutoff: 0.5, p cutoff: 0.05",
           caption = glue("{oxphos}: {found} high LFC oxphos genes found")
           )
    DF.hm.plot[[oxphos]] <- p
  }
  
  PDX.hm.plots[[PDX]] <- DF.hm.plot[["UNUPDATED.OXPHOS"]] + DF.hm.plot[["GO.OXPHOS"]] + DF.hm.plot[["KEGG.OXPHOS"]]
}

PDX.hm.plots[["DF20"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30

PDX.hm.plots[["DF101"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30

PDX.hm.plots[["DF68"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30

# number of DE oxphos genes found in each geneset within each model -----------------
DF20.de.df <- data.frame(
  "UNUPDATED.OXPHOS" = sum(rownames(markers[["DF20"]][which(abs(markers[["DF20"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["UNUPDATED.OXPHOS"]]),
  "GO.OXPHOS" = sum(rownames(markers[["DF20"]][which(abs(markers[["DF20"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["GO.OXPHOS"]]),
  "KEGG.OXPHOS" = sum(rownames(markers[["DF20"]][which(abs(markers[["DF20"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["KEGG.OXPHOS"]])
)

DF101.de.df <- data.frame(
  "UNUPDATED.OXPHOS" = sum(rownames(markers[["DF101"]][which(abs(markers[["DF101"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["UNUPDATED.OXPHOS"]]),
  "GO.OXPHOS" = sum(rownames(markers[["DF101"]][which(abs(markers[["DF101"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["GO.OXPHOS"]]),
  "KEGG.OXPHOS" = sum(rownames(markers[["DF101"]][which(abs(markers[["DF101"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["KEGG.OXPHOS"]])
)

DF68.de.df <- data.frame(
  "UNUPDATED.OXPHOS" = sum(rownames(markers[["DF68"]][which(abs(markers[["DF68"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["UNUPDATED.OXPHOS"]]),
  "GO.OXPHOS" = sum(rownames(markers[["DF68"]][which(abs(markers[["DF68"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["GO.OXPHOS"]]),
  "KEGG.OXPHOS" = sum(rownames(markers[["DF68"]][which(abs(markers[["DF68"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["KEGG.OXPHOS"]])
)

all.de.df <- rbind(DF20.de.df, DF101.de.df, DF68.de.df)
rownames(all.de.df) <- c("DF20.MRDvVehicle", "DF101.MRDvVehicle", "DF68.MRDvVehicle")
all.de.df

                  UNUPDATED.OXPHOS GO.OXPHOS KEGG.OXPHOS
DF20.MRDvVehicle                24        15           9
DF101.MRDvVehicle               65        50          38
DF68.MRDvVehicle                47        25          20

percent.logFC <- data.frame(
  "UNUPDATED.OXPHOS" = round(all.de.df[, "UNUPDATED.OXPHOS"]/all.df["Found", "UNUPDATED.OXPHOS"]*100, 2), 
  "GO.OXPHOS" = round(all.de.df[, "GO.OXPHOS"]/all.df["Found", "GO.OXPHOS"]*100, 2), 
  "KEGG.OXPHOS" = round(all.de.df[, "KEGG.OXPHOS"]/all.df["Found", "KEGG.OXPHOS"]*100, 2)
)
rownames(percent.logFC) <- c("DF20 %Found highLFC", "DF101 %Found highLFC", "DF68 %Found highLFC")
rbind(all.df[,names(all.df) != "HALLMARK.OXPHOS"], percent.logFC)

                     UNUPDATED.OXPHOS GO.OXPHOS KEGG.OXPHOS
NumGenes                       200.00    144.00      131.00
Found                          184.00    107.00      101.00
%Found                          92.00     74.00       77.00
Not Found                       16.00     37.00       30.00
DF20 %Found highLFC             13.04     14.02        8.91
DF101 %Found highLFC            35.33     46.73       37.62
DF68 %Found highLFC             25.54     23.36       19.80

OBSERVATIONS
- UNUPDATED.OXPHOS geneset consistently has the most OXPHOS genes with high logFC (logFC > 0.5, regardless of padj) relative to the other two genesets for each model comparison between MRD and vehicle. However, GO.OXPHOS seems to have the highest percentage of OXPHOS genes with high logFC out of the OXPHOS genes Found on average.
- Although it seems that there are a few overlaps of OXPHOS genes found, it is hard to visualize which ones overlap, and whether the ones that overlap are among the top genes with high logFC (regardless of padj). We therefore create tables for each model, extract the top 5 high logFC genes from each geneset and compare if they’re the same.

DF20.top5 <- markers[["DF20"]] %>% arrange(-abs(avg_logFC))
DF101.top5 <- markers[["DF101"]] %>% arrange(-abs(avg_logFC))
DF68.top5 <- markers[["DF68"]] %>% arrange(-abs(avg_logFC))

DF20.gs.df <- data.frame(
  "UNUPDATED.OXPHOS" = head(rownames(DF20.top5)[which(rownames(DF20.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5), 
  "UNUPDATED.OXPHOS" = select(DF20.top5[head(rownames(DF20.top5)[which(rownames(DF20.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5),], avg_logFC),
  "GO.OXPHOS" = head(rownames(DF20.top5)[which(rownames(DF20.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),
  "GO.OXPHOS" = select(DF20.top5[head(rownames(DF20.top5)[which(rownames(DF20.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),], avg_logFC),
  "KEGG.OXPHOS" = head(rownames(DF20.top5)[which(rownames(DF20.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5), 
  "KEGG.OXPHOS" = select(DF20.top5[head(rownames(DF20.top5)[which(rownames(DF20.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5),], avg_logFC)
)
rownames(DF20.gs.df) <- seq(from = 1, length = nrow(DF20.gs.df))

DF101.gs.df <- data.frame(
  "UNUPDATED.OXPHOS" = head(rownames(DF101.top5)[which(rownames(DF101.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5), 
  "UNUPDATED.OXPHOS" = select(DF101.top5[head(rownames(DF101.top5)[which(rownames(DF101.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5),], avg_logFC),
  "GO.OXPHOS" = head(rownames(DF101.top5)[which(rownames(DF101.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),
  "GO.OXPHOS" = select(DF101.top5[head(rownames(DF101.top5)[which(rownames(DF101.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),], avg_logFC),
  "KEGG.OXPHOS" = head(rownames(DF101.top5)[which(rownames(DF101.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5), 
  "KEGG.OXPHOS" = select(DF101.top5[head(rownames(DF101.top5)[which(rownames(DF101.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5),], avg_logFC)
)
rownames(DF101.gs.df) <- seq(from = 1, length = nrow(DF20.gs.df))


DF68.gs.df <- data.frame(
  "UNUPDATED.OXPHOS" = head(rownames(DF68.top5)[which(rownames(DF68.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5), 
  "UNUPDATED.OXPHOS" = select(DF68.top5[head(rownames(DF68.top5)[which(rownames(DF68.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5),], avg_logFC),
  "GO.OXPHOS" = head(rownames(DF68.top5)[which(rownames(DF68.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),
  "GO.OXPHOS" = select(DF68.top5[head(rownames(DF68.top5)[which(rownames(DF68.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),], avg_logFC),
  "KEGG.OXPHOS" = head(rownames(DF68.top5)[which(rownames(DF68.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5), 
  "KEGG.OXPHOS" = select(DF68.top5[head(rownames(DF68.top5)[which(rownames(DF68.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5),], avg_logFC)
)
rownames(DF68.gs.df) <- seq(from = 1, length = nrow(DF20.gs.df))

DF20.gs.df

  UNUPDATED.OXPHOS  avg_logFC GO.OXPHOS avg_logFC.1 KEGG.OXPHOS avg_logFC.2
1           TIMM10  1.1454854     NUPR1   1.9607774    NDUFA4L2  -1.0646767
2             IDH1  1.0249810     SURF1  -0.9575538      NDUFB3   0.7948077
3            SURF1 -0.9575538      TEFM   0.9397055       COX10  -0.6567115
4           ACADVL -0.9479864     CCNB1   0.9113314       COX15  -0.6314722
5           MRPL35 -0.8855315    NDUFB3   0.7948077       COX5A   0.6066344

DF101.gs.df

  UNUPDATED.OXPHOS avg_logFC GO.OXPHOS avg_logFC.1 KEGG.OXPHOS avg_logFC.2
1          ATP6V0C  2.803572    NDUFA9  -1.0849423     ATP6V0C   2.8035722
2              BAX  1.723575      SDHD  -1.0338880      NDUFA9  -1.0849423
3             ECI1 -1.563555   NDUFA12  -0.9821144        SDHD  -1.0338880
4             OPA1  1.259500    NDUFB1  -0.9460268      NDUFB1  -0.9460268
5          ALDH6A1  1.184952     COX7B  -0.9355418       COX7B  -0.9355418

DF68.gs.df

  UNUPDATED.OXPHOS avg_logFC GO.OXPHOS avg_logFC.1 KEGG.OXPHOS avg_logFC.2
1             DLST -1.687756     SURF1   1.5060351    ATP6V1G2   -1.518730
2            SURF1  1.506035     PINK1   1.1743957     ATP6V0C   -1.376521
3          ATP6V0C -1.376521   NDUFAF1   1.0715203    ATP6V1B2   -1.226365
4            TIMM9 -1.204707    NDUFB3   1.0008525      TCIRG1    1.057882
5           TCIRG1  1.057882     COX15   0.8470823      NDUFB3    1.000852

OBSERVATIONS
- Note: All of these top 5 high logFC OXPHOS genes (logFC > 0.5) have a padj of 1.00
- There are not a lot of overlaps of high logFC OXPHOS genes across all three genesets for all three models
- Seems like a trend where the top high logFC OXPHOS genes present within the UNUPDATED OXPHOS geneset have higher logFC values in comparison to the other two genesets.
Our answers to QUESTION #1 and #2 seem to suggest that UNUPDATED.OXPHOS hallmark geneset is the best our of all the genesets tested because:
1. It has the least number of genes that are not found within the PDX Seurat Object, and the highest percentage of genes that are found.
2. It has the most number of high logFC OXPHOS genes (logFC > 0.5, regardless of padj) relative to the other two genesets for each model comparison between MRD and vehicle
3. The high logFC OXPHOS genes present within this geneset have relatively higher logFC values (they are more DE, regardless of padj) in comparison to the high logFC OXPHOS genes present in the other two genesets.
We answer our 3rd question to confirm whether the UNUPDATED.OXPHOS geneset is indeed the most appropriate one for our data relative to the other two genesets.

ANSWERING QUESTION #3: which geneset gives us the most statistically significant results

oxphos.centered <- c("UNUPDATED.OXPHOS37.centered", "GO.OXPHOS35.centered", "KEGG.OXPHOS36.centered")
Oxphos.Vln.plots <- vector("list", length(PDXs))
names(Oxphos.Vln.plots) <- PDX.names

for (i in 1:length(PDXs)){
  obj <- PDXs[[i]]
  name <- PDX.names[[i]]
  numCells <- nrow(PDXs[[i]]@meta.data)
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
if(name == "DF68"){
  p <- VlnPlot(obj, features = oxphos.centered, group.by = "treatment.status", pt.size = 0, cols = c("#00AFBB", "#E7B800", "#FC4E07"), combine = F, y.max = 1.5)
}
if(name == "DF101"){
   p <- VlnPlot(obj, features = oxphos.centered, group.by = "treatment.status", pt.size = 0, cols = c("#00AFBB", "#E7B800", "#FC4E07"), combine = F, y.max = 1.8)
}
else{
  p <- VlnPlot(obj, features = oxphos.centered, group.by = "treatment.status", pt.size = 0, cols = c("#00AFBB", "#E7B800", "#FC4E07"), combine = F, y.max = 2.0)
}
  
  unupdated.found <- sum(hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(obj))
  unupdated.length <- length(hallmark.list[["UNUPDATED.OXPHOS"]])
  unupdated.pFound <- round((unupdated.found / unupdated.length)*100, 2)
  
  p[[1]] <- p[[1]] + labs(title = glue("{name} UNUPDATED_OXPHOS scores across treatment"), x = name, subtitle = glue("{numCells} Malignant Cells, {unupdated.found} out of {unupdated.length} OXPHOS genes found ({unupdated.pFound}%)")) + 
    theme(plot.title = element_text(size = 12), plot.caption = element_text(size = 10)) + 
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(obj$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(obj$UNUPDATED.OXPHOS37.centered) -0.03) + 
    geom_text(label = paste(sum(obj$treatment.status == "MRD"), "cells"), x = "MRD", y = min(obj$UNUPDATED.OXPHOS37.centered) - 0.03) + 
    geom_text(label = paste(sum(obj$treatment.status == "relapse"), "cells"), x = "relapse", y = min(obj$UNUPDATED.OXPHOS37.centered) - 0.03) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  go.found <- sum(hallmark.list[["GO.OXPHOS"]] %in% rownames(obj))
  go.length <- length(hallmark.list[["GO.OXPHOS"]])
  go.pFound <- round((go.found / go.length)*100, 2)
  p[[2]] <- p[[2]] + labs(title = glue("{name} GO_OXPHOS scores across treatment"), x = name, subtitle = glue("{numCells} Malignant Cells, {go.found} out of {go.length} OXPHOS genes found ({go.pFound}%)")) +
    theme(plot.title = element_text(size = 12), plot.caption = element_text(size = 10)) + 
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(obj$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(obj$GO.OXPHOS35.centered) -0.03) + 
    geom_text(label = paste(sum(obj$treatment.status == "MRD"), "cells"), x = "MRD", y = min(obj$GO.OXPHOS35.centered) - 0.03) + 
    geom_text(label = paste(sum(obj$treatment.status == "relapse"), "cells"), x = "relapse", y = min(obj$GO.OXPHOS35.centered) - 0.03) +
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  kegg.found <- sum(hallmark.list[["KEGG.OXPHOS"]] %in% rownames(obj))
  kegg.length <- length(hallmark.list[["KEGG.OXPHOS"]])
  kegg.pFound <- round((kegg.found / kegg.length)*100, 2)
  p[[3]] <- p[[3]] + labs(title = glue("{name} KEGG_OXPHOS scores across treatment"), x = name, subtitle = glue("{numCells} Malignant Cells, {kegg.found} out of {kegg.length} OXPHOS genes found ({kegg.pFound}%)")) +
    theme(plot.title = element_text(size = 12), plot.caption = element_text(size = 10)) + 
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(obj$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(obj$KEGG.OXPHOS36.centered) -0.03) + 
    geom_text(label = paste(sum(obj$treatment.status == "MRD"), "cells"), x = "MRD", y = min(obj$KEGG.OXPHOS36.centered) - 0.03) + 
    geom_text(label = paste(sum(obj$treatment.status == "relapse"), "cells"), x = "relapse", y = min(obj$KEGG.OXPHOS36.centered) - 0.03) +
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  p <- p[[1]] + p[[2]] + p[[3]] + plot_layout(guides= 'collect')
  
  Oxphos.Vln.plots[[name]] <- p
}

Oxphos.Vln.plots[["DF20"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30
8ca1e01	jgoh2	2020-07-27

Oxphos.Vln.plots[["DF101"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30
8ca1e01	jgoh2	2020-07-27

Oxphos.Vln.plots[["DF68"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30
8ca1e01	jgoh2	2020-07-27

OBSERVATIONS
- The overall trend we see between treatment groups within each model is the same across all three oxphos genesets. However, it seems like the difference in scores is most obvious/drastic when using the KEGG geneset.
- More comparisons are statistically significant when using the GO or KEGG genesets in comparison to using the HALLMARK geneset.
  - All comparisons within DF101 are significant
    - Vehicle > MRD > Relapse (does not support our hypothesis)
  - DF68 MRD > Relapse is significant (supports our hypothesis)
SUMMARY STATISTICS

#DF20 ---------------------------------
DF20.vehicle <- subset(PDX_DF20, subset = (treatment.status == "vehicle"))
DF20.MRD <- subset(PDX_DF20, subset = (treatment.status == "MRD"))
DF20.relapse <- subset(PDX_DF20, subset = (treatment.status == "relapse"))

DF20.hm.oxphos.df <- data.frame(
  "MRDvsVehicle" = wilcox.test(DF20.MRD$UNUPDATED.OXPHOS37.centered, DF20.vehicle$UNUPDATED.OXPHOS37.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF20.MRD$UNUPDATED.OXPHOS37.centered, DF20.relapse$UNUPDATED.OXPHOS37.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF20.vehicle$UNUPDATED.OXPHOS37.centered, DF20.relapse$UNUPDATED.OXPHOS37.centered)$p.value
)

DF20.go.oxphos.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF20.MRD$GO.OXPHOS35.centered, DF20.vehicle$GO.OXPHOS35.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF20.MRD$GO.OXPHOS35.centered, DF20.relapse$GO.OXPHOS35.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF20.vehicle$GO.OXPHOS35.centered, DF20.relapse$GO.OXPHOS35.centered)$p.value
)

DF20.kegg.oxphos.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF20.MRD$KEGG.OXPHOS36.centered, DF20.vehicle$KEGG.OXPHOS36.centered)$p.value,
  "MRDvsRelapse" = wilcox.test(DF20.MRD$KEGG.OXPHOS36.centered, DF20.relapse$KEGG.OXPHOS36.centered)$p.value,
  "VehiclevsRelapse" = wilcox.test(DF20.vehicle$KEGG.OXPHOS36.centered, DF20.relapse$KEGG.OXPHOS36.centered)$p.value
)

#DF101 ---------------------------------
DF101.vehicle <- subset(PDX_DF101, subset = (treatment.status == "vehicle"))
DF101.MRD <- subset(PDX_DF101, subset = (treatment.status == "MRD"))
DF101.relapse <- subset(PDX_DF101, subset = (treatment.status == "relapse"))

DF101.hm.oxphos.df <- data.frame(
  "MRDvsVehicle" = wilcox.test(DF101.MRD$UNUPDATED.OXPHOS37.centered, DF101.vehicle$UNUPDATED.OXPHOS37.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF101.MRD$UNUPDATED.OXPHOS37.centered, DF101.relapse$UNUPDATED.OXPHOS37.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF101.vehicle$UNUPDATED.OXPHOS37.centered, DF101.relapse$UNUPDATED.OXPHOS37.centered)$p.value
)

DF101.go.oxphos.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF101.MRD$GO.OXPHOS35.centered, DF101.vehicle$GO.OXPHOS35.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF101.MRD$GO.OXPHOS35.centered, DF101.relapse$GO.OXPHOS35.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF101.vehicle$GO.OXPHOS35.centered, DF101.relapse$GO.OXPHOS35.centered)$p.value
)

DF101.kegg.oxphos.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF101.MRD$KEGG.OXPHOS36.centered, DF101.vehicle$KEGG.OXPHOS36.centered)$p.value,
  "MRDvsRelapse" = wilcox.test(DF101.MRD$KEGG.OXPHOS36.centered, DF101.relapse$KEGG.OXPHOS36.centered)$p.value,
  "VehiclevsRelapse" = wilcox.test(DF101.vehicle$KEGG.OXPHOS36.centered, DF101.relapse$KEGG.OXPHOS36.centered)$p.value
)

#DF68 ---------------------------------
DF68.vehicle <- subset(PDX_DF68, subset = (treatment.status == "vehicle"))
DF68.MRD <- subset(PDX_DF68, subset = (treatment.status == "MRD"))
DF68.relapse <- subset(PDX_DF68, subset = (treatment.status == "relapse"))

DF68.hm.oxphos.df <- data.frame(
  "MRDvsVehicle" = wilcox.test(DF68.MRD$UNUPDATED.OXPHOS37.centered, DF68.vehicle$UNUPDATED.OXPHOS37.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF68.MRD$UNUPDATED.OXPHOS37.centered, DF68.relapse$UNUPDATED.OXPHOS37.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF68.vehicle$UNUPDATED.OXPHOS37.centered, DF68.relapse$UNUPDATED.OXPHOS37.centered)$p.value
)

DF68.go.oxphos.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF68.MRD$GO.OXPHOS35.centered, DF68.vehicle$GO.OXPHOS35.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF68.MRD$GO.OXPHOS35.centered, DF68.relapse$GO.OXPHOS35.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF68.vehicle$GO.OXPHOS35.centered, DF68.relapse$GO.OXPHOS35.centered)$p.value
)

DF68.kegg.oxphos.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF68.MRD$KEGG.OXPHOS36.centered, DF68.vehicle$KEGG.OXPHOS36.centered)$p.value,
  "MRDvsRelapse" = wilcox.test(DF68.MRD$KEGG.OXPHOS36.centered, DF68.relapse$KEGG.OXPHOS36.centered)$p.value,
  "VehiclevsRelapse" = wilcox.test(DF68.vehicle$KEGG.OXPHOS36.centered, DF68.relapse$KEGG.OXPHOS36.centered)$p.value
)

#combine ------------------------------
hm.oxphos.DF <- rbind(DF20.hm.oxphos.df, DF101.hm.oxphos.df, DF68.hm.oxphos.df)
rownames(hm.oxphos.DF) <- c("HM.OXPHOS.DF20", "HM.OXPHOS.DF101", "HM.OXPHOS.DF68")

go.oxphos.DF <- rbind(DF20.go.oxphos.df, DF101.go.oxphos.df, DF68.go.oxphos.df)
rownames(go.oxphos.DF) <- c("GO.OXPHOS.DF20", "GO.OXPHOS.DF101", "GO.OXPHOS.DF68")

kegg.oxphos.DF <- rbind(DF20.kegg.oxphos.df, DF101.kegg.oxphos.df, DF68.kegg.oxphos.df)
rownames(kegg.oxphos.DF) <- c("KEGG.OXPHOS.DF20", "KEGG.OXPHOS.DF101", "KEGG.OXPHOS.DF68")

all.oxphos.DF <- rbind(hm.oxphos.DF, go.oxphos.DF, kegg.oxphos.DF)
DT::datatable(all.oxphos.DF) %>% 
  DT::formatSignif(names(all.oxphos.DF), digits = 2) %>% 
  DT::formatStyle(names(all.oxphos.DF), color = DT::styleInterval(0.05, c('red', 'black')))

We also tested how the UNUPDATED version of HALLMARK_UNFOLDED_PROTEIN_RESPONSE differs from the updated version.

upr.length.df <- data.frame(
  "UNUPDATED.HM.UPR" = length(hallmark.list[["UNUPDATED.UPR"]]),
  "HALLMARK.UPR" = length(hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]])
)

Found.upr.df <- data.frame(
  "UNUPDATED.HM.UPR" = sum((hallmark.list[["UNUPDATED.UPR"]] %in% rownames(PDX_All))),
  "HALLMARK.UPR" = sum((hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]] %in% rownames(PDX_All)))
)

PFound.upr.df <- data.frame(
  "UNUPDATED.HM.UPR" = (sum((hallmark.list[["UNUPDATED.UPR"]] %in% rownames(PDX_All)))/length(hallmark.list[["UNUPDATED.UPR"]]))*100,
  "HALLMARK.UPR" = (sum((hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]] %in% rownames(PDX_All)))/length(hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]])) * 100
)

upr.df <- rbind(upr.length.df, Found.upr.df, PFound.upr.df)
upr.df[,"UNUPDATED.HM.UPR"] <- round(upr.df[,"UNUPDATED.HM.UPR"])
upr.df[,"HALLMARK.UPR"] <- round(upr.df[,"HALLMARK.UPR"])
rownames(upr.df) <- c("Num UPR Genes", "Num Found", "% Found")
upr.df

              UNUPDATED.HM.UPR HALLMARK.UPR
Num UPR Genes              113          113
Num Found                  107          105
% Found                     95           93

Since the UNUPDATED version identifies two more genes than the updated version, we decide to proceed with the UNUPDATED.UPR geneset.

After deciding the geneset to use for our data, we can now analyze the differential expression of OXPHOS and UPR genes across treatment condition within each model.

STEP 3 DE ANALYSIS #1. VISUALIZING AND QUANTIFYING DE HALLMARK GENESETS

QUESTION Are modules (OXPHOS and UPR) differentially expressed across treatment conditions within each model?
HYPOTHESIS We hypothesize that the MRD treatment condition will express enriched levels of OXPHOS and UPR hallmarks relative to cells in the vehicle and relapse treatment conditions (MRD > vehicle, MRD > relapse).
- APPROACH #1 Violin Plots and UMAP
  - Center OXPHOS and UPR module scores at 0 and reassign to metadata
  - Visualize differences in hallmark score across treatment conditions with Violin Plot (center module scores at 0)
  - Statistical test (wilcoxon rank sum test): is the difference in hallmark score across treatment conditions statistically significant?

#Swarm plots 
hms.centered <- c("UNUPDATED.OXPHOS37.centered", "UNUPDATED.UPR38.centered")
hms <- c("UNUPDATED.OXPHOS", "UNUPDATED.UPR")
hms.names <- c("OXPHOS", "UPR")
hms.plots <- vector("list", length = 2)
names(hms.plots) <- hms.names
PDXs <- c(PDX_DF20, PDX_DF101, PDX_DF68)
PDX.names <- c("DF20", "DF101", "DF68")

#OXPHOS ----------
oxphos.swarm.plots <- vector("list", length = 3)
names(oxphos.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(PDX))
  feature.length <- length(hallmark.list[["UNUPDATED.OXPHOS"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = UNUPDATED.OXPHOS37.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} OXPHOS expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} OXPHOS genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["UNUPDATED.OXPHOS37.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["UNUPDATED.OXPHOS37.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["UNUPDATED.OXPHOS37.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  oxphos.swarm.plots[[PDX.name]] <- p 
}

oxphos.swarm.plots[["DF20"]] + oxphos.swarm.plots[["DF101"]] + oxphos.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)

Version	Author	Date
80345fe	jgoh2	2020-08-15
84edf85	jgoh2	2020-08-12
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30
8ca1e01	jgoh2	2020-07-27
bc21d3a	jgoh2	2020-07-23

#UPR --------------
upr.swarm.plots <- vector("list", length = 3)
names(upr.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(hallmark.list[["UNUPDATED.UPR"]] %in% rownames(PDX))
  feature.length <- length(hallmark.list[["UNUPDATED.UPR"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = UNUPDATED.UPR38.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} UPR expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} UPR genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["UNUPDATED.UPR38.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["UNUPDATED.UPR38.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["UNUPDATED.UPR38.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  upr.swarm.plots[[PDX.name]] <- p 
}

upr.swarm.plots[["DF20"]] + upr.swarm.plots[["DF101"]] + upr.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)

Version	Author	Date
80345fe	jgoh2	2020-08-15
84edf85	jgoh2	2020-08-12
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
c8bb9fc	jgoh2	2020-07-30
8ca1e01	jgoh2	2020-07-27
bc21d3a	jgoh2	2020-07-23

CONCLUSIONS
- The only statistically significant difference (p < 0.05) in hallmark scores is between treatment conditions within model DF101 for OXPHOS
- DF101 OXPHOS: Vehicle > MRD > Relapse
- However, the trends found in DF101 do not agree with our hypothesis that cells in MRD differentially enrich OXPHOS genes)

Considering how our approach with analyzing and comparing module score did not support our hypothesis, we try to confirm our results again with our second appraoch: GSEA

STATISTICAL SUMMARY

#DF20 ---------------------------------

DF20.oxphos.df <- data.frame(
  "MRDvsVehicle" = wilcox.test(DF20.MRD$UNUPDATED.OXPHOS37.centered, DF20.vehicle$UNUPDATED.OXPHOS37.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF20.MRD$UNUPDATED.OXPHOS37.centered, DF20.relapse$UNUPDATED.OXPHOS37.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF20.vehicle$UNUPDATED.OXPHOS37.centered, DF20.relapse$UNUPDATED.OXPHOS37.centered)$p.value
)
DF20.UPR.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF20.MRD$UNUPDATED.UPR38.centered, DF20.vehicle$UNUPDATED.UPR38.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF20.MRD$UNUPDATED.UPR38.centered, DF20.relapse$UNUPDATED.UPR38.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF20.vehicle$UNUPDATED.UPR38.centered, DF20.relapse$UNUPDATED.UPR38.centered)$p.value
)
#DF101 ---------------------------------

DF101.oxphos.df <- data.frame(
  "MRDvsVehicle" = wilcox.test(DF101.MRD$UNUPDATED.OXPHOS37.centered, DF101.vehicle$UNUPDATED.OXPHOS37.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF101.MRD$UNUPDATED.OXPHOS37.centered, DF101.relapse$UNUPDATED.OXPHOS37.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF101.vehicle$UNUPDATED.OXPHOS37.centered, DF101.relapse$UNUPDATED.OXPHOS37.centered)$p.value
)
DF101.UPR.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF101.MRD$UNUPDATED.UPR38.centered, DF101.vehicle$UNUPDATED.UPR38.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF101.MRD$UNUPDATED.UPR38.centered, DF101.relapse$UNUPDATED.UPR38.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF101.vehicle$UNUPDATED.UPR38.centered, DF101.relapse$UNUPDATED.UPR38.centered)$p.value
)
#DF68 ---------------------------------

DF68.oxphos.df <- data.frame(
  "MRDvsVehicle" = wilcox.test(DF68.MRD$UNUPDATED.OXPHOS37.centered, DF68.vehicle$UNUPDATED.OXPHOS37.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF68.MRD$UNUPDATED.OXPHOS37.centered, DF68.relapse$UNUPDATED.OXPHOS37.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF68.vehicle$UNUPDATED.OXPHOS37.centered, DF68.relapse$UNUPDATED.OXPHOS37.centered)$p.value
)
DF68.UPR.df <- 
  data.frame(
  "MRDvsVehicle" = wilcox.test(DF68.MRD$UNUPDATED.UPR38.centered, DF68.vehicle$UNUPDATED.UPR38.centered)$p.value, 
  "MRDvsRelapse" = wilcox.test(DF68.MRD$UNUPDATED.UPR38.centered, DF68.relapse$UNUPDATED.UPR38.centered)$p.value, 
  "VehiclevsRelapse" = wilcox.test(DF68.vehicle$UNUPDATED.UPR38.centered, DF68.relapse$UNUPDATED.UPR38.centered)$p.value
)
#combine ------------------------------
oxphos.DF <- rbind(DF20.oxphos.df, DF101.oxphos.df, DF68.oxphos.df)
rownames(oxphos.DF) <- c("OXPHOS.DF20", "OXPHOS.DF101", "OXPHOS.DF68")
UPR.DF <- rbind(DF20.UPR.df, DF101.UPR.df, DF68.UPR.df)
rownames(UPR.DF) <- c("UPR.DF20", "UPR.DF101", "UPR.DF68")
all.DF <- rbind(oxphos.DF, UPR.DF)
DT::datatable(all.DF) %>% 
  DT::formatSignif(names(all.DF), digits = 2) %>% 
  DT::formatStyle(names(all.DF), color = DT::styleInterval(0.05, c('red', 'black')))

APPROACH #2 Geneset Enrichment Analysis (GSEA) - GSEA enrichment plots for hallmarks of interest between condition.1 vs. condition.2 - Rank genes and compute GSEA enrichment scores - Statistical test: how significant are the enrichment scores?

have not finalized the statistical test and ranking method to use for GSEA

STEP 4 DE ANALYSIS #2. IDENTIFYING INDIVIDUAL DE GENES

have not finalized the statistical test to use for volcano plot (need to match what we use for GSEA)

PDX.markers <- vector("list", length = 3)
names(PDX.markers) <- PDX.names

for(i in 1:3){
  PDX.markers[[i]] <- list(
  FindMarkers(PDXs[[i]], group.by = "treatment.status", ident.1 = "MRD", ident.2 = "vehicle", logfc.threshold = 0), 
  FindMarkers(PDXs[[i]], group.by = "treatment.status", ident.1 = "MRD", ident.2 = "relapse", logfc.threshold = 0),
  FindMarkers(PDXs[[i]], group.by = "treatment.status", ident.1 = "vehicle", ident.2 = "relapse", logfc.threshold = 0)
  )
}

PDX.all.plots <- vector("list", length = 3)
names(PDX.all.plots) <- PDX.names

PDX.UPR.plots <- vector("list", length = 3)
names(PDX.UPR.plots) <- PDX.names

PDX.OXPHOS.plots <- vector("list", length = 3)
names(PDX.OXPHOS.plots) <- PDX.names

comparisons <- c("MRD vehicle", "MRD relapse", "vehicle relapse")

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  obj.all.plots <- vector("list", length = 3)
  obj.UPR.plots <- vector("list", length = 3)
  obj.OXPHOS.plots <- vector("list", length = 3)
  
  for(i in 1:3){
  marker <- PDX.markers[[PDX.name]][[i]]
  comp = comparisons[[i]]
  comp.split <- stringr::str_split(comp, pattern = " ")
  group.1 <- comp.split[[1]][1]
  group.2 <- comp.split[[1]][2]
  
  obj.all.plots[[i]] <- DEAnalysis_code(PDX, markers = marker, group.by = "treatment.status", group.1 = group.1, group.2 = group.2, graph = TRUE) + labs(title = paste(PDX.name, group.1, "vs", group.2, "DE  Genes"))
  
  obj.UPR.plots[[i]] <- DEAnalysis_code(PDX, markers = marker, group.by = "treatment.status", group.1 = group.1, group.2 = group.2, geneset = hallmark.list[["UNUPDATED.UPR"]]) + labs(title = paste(PDX.name, group.1, "vs", group.2, "DE UPR Genes"))
  
  obj.OXPHOS.plots[[i]] <- DEAnalysis_code(PDX, markers = marker, group.by = "treatment.status", group.1 = group.1, group.2 = group.2, geneset = hallmark.list[["UNUPDATED.OXPHOS"]]) + labs(title = paste(PDX.name, group.1, "vs", group.2, "DE OXPHOS Genes"))
  }
  
  PDX.all.plots[[PDX.name]] <- obj.all.plots[[1]] + obj.all.plots[[2]] + obj.all.plots[[3]] + plot_layout(ncol = 3, nrow =1)
  PDX.UPR.plots[[PDX.name]] <- obj.UPR.plots[[1]] + obj.UPR.plots[[2]] + obj.UPR.plots[[3]] + plot_layout(ncol = 3, nrow =1)
  PDX.OXPHOS.plots[[PDX.name]] <- obj.OXPHOS.plots[[1]] + obj.OXPHOS.plots[[2]] + obj.OXPHOS.plots[[3]] + plot_layout(ncol = 3, nrow =1)
}

PDX.all.plots[["DF20"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.OXPHOS.plots[["DF20"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.UPR.plots[["DF20"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.all.plots[["DF101"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.OXPHOS.plots[["DF101"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.UPR.plots[["DF101"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.all.plots[["DF68"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.OXPHOS.plots[["DF68"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

PDX.UPR.plots[["DF68"]]

Version	Author	Date
e05c328	jgoh2	2020-08-12

OBSERVATIONS
- UPR: no DE UPR genes at all across treatment condition within all three models
- OXPHOS:
  - DF20: ACADVL consistently depleted in MRD ; NDUFB2 enriched in relapse relative to vehicle
  - DF101: CASP7 enriched in MRD relative to relapse ; COX5B, COX7B, UOCR11, NDUFA4 enriched in relapse relative to vehicle
  - DF68: NDUFA1 enriched in MRD relative to relapse
- Results are not consistent across models. We cannot conclude that OXPHOS or UPR genes are enriched in MRD.

STEP 5. ANALYZING OTHER GENE SETS

Since we did not obtain any conclusive results from analyzing OXPHOS and UPR expression across treatment samples within each SS2 Patient, we now examine the expression of other gene sets to find potential trends. We are specifically interested in these gene sets:
- Stemness (stemness_genes44, RAMALHO_STEMNESS_UP45, CANCER_PROGENITORS_UP46)
- Apoptosis (HALLMARK_APOPTOSIS2)
- Angiogenesis (HALLMARK_ANGIOGENESIS1)
- Epithelial Mesenchymal Transition (HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION6)
- Fatty Acid Metabolism (HALLMARK_FATTY_ACID_METABOLISM9)
- Hypoxia (HALLMARK_HYPOXIA13)
- ROS Pathway (HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY30)

#difference in mean 
hallmarks <- names(PDX_DF20@meta.data)[59:105][-(39:44)]

mean.diff <- data.frame()
for(i in 1:length(hallmarks)){
  hm <- hallmarks[[i]]
  df <- data.frame(
    "DF20.MV" = (mean(deframe(DF20.MRD[[hm]])) - mean(deframe(DF20.vehicle[[hm]]))), 
    "DF20.MR" = (mean(deframe(DF20.MRD[[hm]])) - mean(deframe(DF20.relapse[[hm]]))), 
    "DF20.VR" = (mean(deframe(DF20.vehicle[[hm]])) - mean(deframe(DF20.relapse[[hm]]))), 
    "DF101.MV" = (mean(deframe(DF101.MRD[[hm]])) - mean(deframe(DF101.vehicle[[hm]]))), 
    "DF101.MR" = (mean(deframe(DF101.MRD[[hm]])) - mean(deframe(DF101.relapse[[hm]]))), 
    "DF101.VR" = (mean(deframe(DF101.vehicle[[hm]])) - mean(deframe(DF101.relapse[[hm]]))),  
    "DF68.MV" = (mean(deframe(DF68.MRD[[hm]])) - mean(deframe(DF68.vehicle[[hm]]))), 
    "DF68.MR" = (mean(deframe(DF68.MRD[[hm]])) - mean(deframe(DF68.relapse[[hm]]))), 
    "DF68.VR" = (mean(deframe(DF68.vehicle[[hm]])) - mean(deframe(DF68.relapse[[hm]]))) 
    
  )
  mean.diff <- rbind(mean.diff, df)
}
rownames(mean.diff) <- hallmarks

#significance ----------
sig.diff <- data.frame()
for(i in 1:length(hallmarks)){
  hm <- hallmarks[[i]]
  df <- data.frame(
    "DF20.MV.sig" = wilcox.test(deframe(DF20.MRD[[hm]]), deframe(DF20.vehicle[[hm]]))$p.value, 
    "DF20.MR.sig" = wilcox.test(deframe(DF20.MRD[[hm]]), deframe(DF20.relapse[[hm]]))$p.value,  
    "DF20.VR.sig" = wilcox.test(deframe(DF20.vehicle[[hm]]), deframe(DF20.relapse[[hm]]))$p.value, 
    "DF101.MV.sig" = wilcox.test(deframe(DF101.MRD[[hm]]), deframe(DF101.vehicle[[hm]]))$p.value, 
    "DF101.MR.sig" = wilcox.test(deframe(DF101.MRD[[hm]]), deframe(DF101.relapse[[hm]]))$p.value,  
    "DF101.VR.sig" = wilcox.test(deframe(DF101.vehicle[[hm]]), deframe(DF101.relapse[[hm]]))$p.value,
    "DF68.MV.sig" = wilcox.test(deframe(DF68.MRD[[hm]]), deframe(DF68.vehicle[[hm]]))$p.value, 
    "DF68.MR.sig" = wilcox.test(deframe(DF68.MRD[[hm]]), deframe(DF68.relapse[[hm]]))$p.value,  
    "DF68.VR.sig" = wilcox.test(deframe(DF68.vehicle[[hm]]), deframe(DF68.relapse[[hm]]))$p.value
  )
  sig.diff <- rbind(sig.diff, df)
}
rownames(sig.diff) <- hallmarks

#combine diff with significance ------------
sig.diff <- rownames_to_column(sig.diff)
mean.diff <- rownames_to_column(mean.diff)
both <- merge(mean.diff, sig.diff, by = "rowname")
both <- column_to_rownames(both)

DT::datatable(both, options = list(
    columnDefs = list(list(targets = c(10,11,12,13,14,15,16,17,18), visible = FALSE))
)) %>% 
  DT::formatSignif(names(both), digits = 3) %>% 
  DT::formatStyle('DF20.MV', 'DF20.MV.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF20.MR', 'DF20.MR.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF20.VR', 'DF20.VR.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF101.MV', 'DF101.MV.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF101.MR', 'DF101.MR.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF101.VR', 'DF101.VR.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF68.MV', 'DF68.MV.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF68.MR', 'DF68.MR.sig',
  color = DT::styleInterval(0.05, c('red', 'black'))) %>% 
  DT::formatStyle('DF68.VR', 'DF68.VR.sig',
  color = DT::styleInterval(0.05, c('red', 'black')))

**OBSERVATIONS
- Consistent statistically significant trends:
  - HALLMARK_APOPTOSIS2
    - DF101: MRD > vehicle and relapse
  - HALLMARK_DNA_REPAIR4
    - DF101 and DF68: MRD > relapse
  - HALLMARK_ESTROGEN_RESPONSE_LATE8
    - DF101: MRD > vehicle and relapse
  - HALLMARK_FATTY_ACID_METABOLISM9
    - DF20 and DF101: vehicle > relapse
    - DF101: MRD > relapse
  - HALLMARK_INTERFERON_ALPHA_RESPONSE17
    - DF20 and DF101: MRD > vehicle and relapse
    - DF20: vehicle > relapse
  - HALLMARK_PEROXISOME27
    - DF20 and DF101: MRD > relapse; vehicle > relapse
  - RAMALHO_STEMNESS_UP45
    - DF20: MRD > relapse; vehicle > relapse
  - stemness_genes44
    - DF20: MRD > relapse

VIOLIN PLOTS OF THE HALLMARKS MENTIONED ABOVE * Plot swarm plots of the hallmarks above that are also significant and agree with our hypothesis in our PDX analysis:

#HALLMARK_DNA_REPAIR4 ----------
dna.swarm.plots <- vector("list", length = 3)
names(dna.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(hallmark.list[["HALLMARK_DNA_REPAIR"]] %in% rownames(PDX))
  feature.length <- length(hallmark.list[["HALLMARK_DNA_REPAIR"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = HALLMARK_DNA_REPAIR4.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} DNA Repair expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} DNA Repair genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["HALLMARK_DNA_REPAIR4.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["HALLMARK_DNA_REPAIR4.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["HALLMARK_DNA_REPAIR4.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  dna.swarm.plots[[PDX.name]] <- p 
}

dna.swarm.plots[["DF20"]] + dna.swarm.plots[["DF101"]] + dna.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)

Version	Author	Date
92568b8	jgoh2	2020-08-17

#HALLMARK_FATTY_ACID_METABOLISM9 ----------
fa.swarm.plots <- vector("list", length = 3)
names(fa.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(hallmark.list[["HALLMARK_FATTY_ACID_METABOLISM"]] %in% rownames(PDX))
  feature.length <- length(hallmark.list[["HALLMARK_FATTY_ACID_METABOLISM"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = HALLMARK_FATTY_ACID_METABOLISM9.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} FA Metabolism expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} FA Metabolism genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["HALLMARK_FATTY_ACID_METABOLISM9.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["HALLMARK_FATTY_ACID_METABOLISM9.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["HALLMARK_FATTY_ACID_METABOLISM9.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  fa.swarm.plots[[PDX.name]] <- p 
}

fa.swarm.plots[["DF20"]] + fa.swarm.plots[["DF101"]] + fa.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)

Version	Author	Date
92568b8	jgoh2	2020-08-17

#HALLMARK_PEROXISOME27 ----------
perox.swarm.plots <- vector("list", length = 3)
names(perox.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(hallmark.list[["HALLMARK_PEROXISOME"]] %in% rownames(PDX))
  feature.length <- length(hallmark.list[["HALLMARK_PEROXISOME"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = HALLMARK_PEROXISOME27.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} Peroxisome expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} Peroxisome genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["HALLMARK_PEROXISOME27.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["HALLMARK_PEROXISOME27.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["HALLMARK_PEROXISOME27.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  perox.swarm.plots[[PDX.name]] <- p 
}

perox.swarm.plots[["DF20"]] + perox.swarm.plots[["DF101"]] + perox.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)

Version	Author	Date
92568b8	jgoh2	2020-08-17

#RAMALHO_STEMNESS_UP46 ----------
stem.swarm.plots <- vector("list", length = 3)
names(stem.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(hallmark.list[["RAMALHO_STEMNESS_UP"]] %in% rownames(PDX))
  feature.length <- length(hallmark.list[["RAMALHO_STEMNESS_UP"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = RAMALHO_STEMNESS_UP46.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} Stemness gene expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} Stemness genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["RAMALHO_STEMNESS_UP46.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["RAMALHO_STEMNESS_UP46.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["RAMALHO_STEMNESS_UP46.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  stem.swarm.plots[[PDX.name]] <- p 
}

stem.swarm.plots[["DF20"]] + stem.swarm.plots[["DF101"]] + stem.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)

Version	Author	Date
b7634b5	jgoh2	2020-08-23
92568b8	jgoh2	2020-08-17

STEP 6. GENE ONTOLOGY FUNCTIONAL ANALYSIS

Our gene-set enrichement analyses above show us that the gene-sets we hypothesized to be differentially expressed in p.1 are not DE based on our results. Here, instead of testing for a specific gene-set, we use gene ontology functional analysis to verify if genes of interest are more often associated with certain biological functions than what would be expected in a random set of genes. We conduct GO analysis on a number of gene lists:

PC loadings from PC1-PC10
1. Loadings are sorted in both directions, and the top 500 loadings are extracted (in both directions).
2. Insignificantly associated laodings are first filtered based on the JackStraw output (padj > 0.05) before being sorted as described above.
Output from FindMarkes
1. Filter out insignificant genes (padj > 0.05) and sort by logFC both ways

PART 1: PC LOADINGS FROM PC1-PC10 OPTION i

library(pcaExplorer)
library(org.Hs.eg.db)
library(topGO)

PCs <- c("PC_1", "PC_2", "PC_3", "PC_4", "PC_5", "PC_6", "PC_7", "PC_8", "PC_9", "PC_10")
background <- rownames(PDX_All@assays$RNA@counts)

#DF20 -----------------------------------
DF20.pc.go1 <- vector("list", length = 10)
names(DF20.pc.go1) <- PCs
loadings <- as.data.frame(PDX_DF20@reductions$pca@feature.loadings) %>% rownames_to_column()

for(i in 1:10){
  PC <- PCs[[i]]
  pc.go.both <- vector("list", length = 2)
  names(pc.go.both) <- c("positive", "negative")
  
  pc.positive <- loadings %>% 
    dplyr::select("rowname", PC) %>% dplyr::arrange(-loadings[[PC]]) %>% dplyr::select("rowname") %>% deframe() %>% head(500)
  pc.negative <- loadings %>% 
    dplyr::select("rowname", PC) %>% dplyr::arrange(loadings[[PC]]) %>% dplyr::select("rowname") %>% deframe() %>% head(500)
  
  pc.go.both[["positive"]] <- topGOtable(DEgenes = pc.positive, BGgenes = background, ontology = "BP", mapping = "org.Hs.eg.db", do_padj = TRUE)
  pc.go.both[["negative"]] <- topGOtable(DEgenes = pc.negative, BGgenes = background, ontology = "BP", mapping = "org.Hs.eg.db", do_padj = TRUE)
  
  DF20.pc.go1[[PC]] <- pc.go.both
}

#DF101 -----------------------------------
DF101.pc.go1 <- vector("list", length = 10)
names(DF101.pc.go1) <- PCs
loadings <- as.data.frame(PDX_DF101@reductions$pca@feature.loadings) %>% rownames_to_column()

for(i in 1:10){
  PC <- PCs[[i]]
  pc.go.both <- vector("list", length = 2)
  names(pc.go.both) <- c("positive", "negative")
  
  pc.positive <- loadings %>% 
    dplyr::select("rowname", PC) %>% dplyr::arrange(-loadings[[PC]]) %>% dplyr::select("rowname") %>% deframe() %>% head(500)
  pc.negative <- loadings %>% 
    dplyr::select("rowname", PC) %>% dplyr::arrange(loadings[[PC]]) %>% dplyr::select("rowname") %>% deframe() %>% head(500)
  
  pc.go.both[["positive"]] <- topGOtable(DEgenes = pc.positive, BGgenes = background, ontology = "BP", mapping = "org.Hs.eg.db", do_padj = TRUE)
  pc.go.both[["negative"]] <- topGOtable(DEgenes = pc.negative, BGgenes = background, ontology = "BP", mapping = "org.Hs.eg.db", do_padj = TRUE)
  
  DF101.pc.go1[[PC]] <- pc.go.both
}

#DF68 -----------------------------------
DF68.pc.go1 <- vector("list", length = 10)
names(DF68.pc.go1) <- PCs
loadings <- as.data.frame(PDX_DF68@reductions$pca@feature.loadings) %>% rownames_to_column()

for(i in 1:10){
  PC <- PCs[[i]]
  pc.go.both <- vector("list", length = 2)
  names(pc.go.both) <- c("positive", "negative")
  
  pc.positive <- loadings %>% 
    dplyr::select("rowname", PC) %>% dplyr::arrange(-loadings[[PC]]) %>% dplyr::select("rowname") %>% deframe() %>% head(500)
  pc.negative <- loadings %>% 
    dplyr::select("rowname", PC) %>% dplyr::arrange(loadings[[PC]]) %>% dplyr::select("rowname") %>% deframe() %>% head(500)
  
  pc.go.both[["positive"]] <- topGOtable(DEgenes = pc.positive, BGgenes = background, ontology = "BP", mapping = "org.Hs.eg.db", do_padj = TRUE)
  pc.go.both[["negative"]] <- topGOtable(DEgenes = pc.negative, BGgenes = background, ontology = "BP", mapping = "org.Hs.eg.db", do_padj = TRUE)
  
  DF68.pc.go1[[PC]] <- pc.go.both
}

#save output
for(i in 1:10){
  for(z in c("positive", "negative")){
    PC <- PCs[[i]]
    type <- z
    write_csv(DF20.pc.go1[[PC]][[z]], path = glue("GO_results/Tables/PDX/UF/GOuf_DF20_{PC}_{type}.csv"))
    write_csv(DF101.pc.go1[[PC]][[z]], path = glue("GO_results/Tables/PDX/UF/GOuf_DF101_{PC}_{type}.csv"))
    write_csv(DF68.pc.go1[[PC]][[z]], path = glue("GO_results/Tables/PDX/UF/GOuf_DF68_{PC}_{type}.csv"))
  }
}


#visualization 

plots.DF20 <- vector("list", length = 10)
plots.DF101 <- vector("list", length = 10)
plots.DF68 <- vector("list", length = 10)
names(plots.DF20) <- PCs
names(plots.DF101) <- PCs
names(plots.DF68) <- PCs

for(i in 1:10){
  
  pcDF20.plots <- vector("list", length = 2)
  pcDF101.plots <- vector("list", length = 2)
  pcDF68.plots <- vector("list", length = 2)
  names(pcDF20.plots) <- c("positive", "negative")
  names(pcDF101.plots) <- c("posititve", "negative")
  names(pcDF68.plots) <- c("posititve", "negative")
  
  for(z in c("positive", "negative")){
    PC <- PCs[[i]]
    
    data.DF20 <- DF20.pc.go1[[PC]][[z]]
    data.DF20 <- data.DF20 %>% dplyr::filter(padj_BY_elim < 0.05)
    pcDF20.plots[[z]] <- ggplot(data.DF20, aes(x= reorder(Term, -padj_BY_elim), y= ((Significant/Annotated)*100), fill = padj_BY_elim)) +
              geom_bar(stat = "identity") +
              labs(title = glue("DF20 {PC} {z} significant GO terms"), x = "Biological Process", y = "% Significant/Annotated") +
              coord_flip()

    data.DF101 <- DF101.pc.go1[[PC]][[z]]
    data.DF101 <- data.DF101 %>% dplyr::filter(padj_BY_elim < 0.05)
    pcDF101.plots[[z]] <- ggplot(data.DF101, aes(x= reorder(Term, -padj_BY_elim), y= ((Significant/Annotated)*100), fill = padj_BY_elim)) +
              geom_bar(stat = "identity") +
              labs(title = glue("DF101 {PC} {z} significant GO terms"), x = "Biological Process", y = "% Significant/Annotated") +
              coord_flip()

    data.DF68 <- DF68.pc.go1[[PC]][[z]]
    data.DF68 <- data.DF68 %>% dplyr::filter(padj_BY_elim < 0.05)
    pcDF68.plots[[z]] <- ggplot(data.DF68, aes(x= reorder(Term, -padj_BY_elim), y= ((Significant/Annotated)*100), fill = padj_BY_elim)) +
              geom_bar(stat = "identity") +
              labs(title = glue("DF68 {PC} {z} significant GO terms"), x = "Biological Process", y = "% Significant/Annotated") +
              coord_flip()
  }
  
  plots.DF20[[PC]] <- pcDF20.plots
  plots.DF101[[PC]] <- pcDF101.plots
  plots.DF68[[PC]] <- pcDF68.plots
  
  p <- plots.DF20[[PC]][["positive"]] + plots.DF20[[PC]][["negative"]] + plots.DF101[[PC]][["positive"]] + plots.DF101[[PC]][["negative"]] + plots.DF68[[PC]][["positive"]] + plots.DF68[[PC]][["negative"]] + plot_layout(nrow = 3, ncol = 2)
  ggsave(plot = p, filename = glue("PDXuf_{PC}.png"), path = "GO_results/GO_plots/PDX/UF", width = 30, height = 20)

  
}

PART 2: PC LOADINGS FROM PC1-PC10 OPTION ii (filter out insignificant loadings first)

We focus on the results from the first approach (unfiltered PC loadings) similar to our SS2 data analysis.
Unlike our results from SS2, which shows a consistent enrichment of mitochondrial genes, results from PDX show us that the following GO terms are the most frequently associated with PCs:
- PC_7
  - Type I interferon signaling pathway (most associated with PC_7)
  - Defense to virus
- PC_6
  - Defense to virus
- Negative Regulation of Viral Genome Replication
  - Never associated with the same PC for all three PDX models at once, but appears frequently and associated with 2 of the 3 PDX models such as: PC_3, PC_7, PC_9, PC_10
After we’ve narrowed down our serach to the 3 most consistently significant GO terms across our PDX data, we repeat our DE Analysis of these 3 gene-sets similar to what we’ve done above with the OXPHOS and UPR gene-sets
- Score and center the scores of each geneset
- DE Analysis:
  - VlnPlots plotting the distribution of gene-set score for each treatment status
  - PCA plots using the associated PCs as the axis colored by sample and by gene-set score
    - Even if VlnPlots show us that the overall gene-set score is not significantly different across treatment, it is possible that a few cells within a specific treatment condition does not follow the trend. We therefore visualize this with PCA plots to try to identify possible subpopulations of cells within a treatment condition that behaves differently from the rest.

STEP 7. DE ANALYSIS OF THE IDENTIFIED ENRICHED GO TERMS

SCORE CELLS FOR EACH OF THE 3 IDENTIFIED GO TERMS

#read in GOs of interest
GO_names <- c("DEFENSE_TO_VIRUS", "NEG_REG_OF_VIRAL_GENOME_REPLICATION", "TYPE_I_INTERFERON")
GO.list <- vector(mode = "list", length = length(GO_names))
names(GO.list) <- GO_names

GO.list[["DEFENSE_TO_VIRUS"]] <- read_lines("data/gene_lists/GO_PDX/DEFENSE_RESPONSE_TO_VIRUS.txt", skip =1)
GO.list[["NEG_REG_OF_VIRAL_GENOME_REPLICATION"]] <- read_lines("data/gene_lists/GO_PDX/NEGREG_VIRAL.txt", skip =1)
GO.list[["TYPE_I_INTERFERON"]] <- read_xlsx("data/gene_lists/GO_PDX/INTERFERON.xlsx")
GO.list[["TYPE_I_INTERFERON"]] <- GO.list[["TYPE_I_INTERFERON"]] %>% dplyr::select(Symbol) %>% deframe() %>% toupper()

#score cells with AddModuleScore
PDX_DF20 <- AddModuleScore(PDX_DF20, features = GO.list, name = names(GO.list), nbin = 50, search = T)
PDX_DF101 <- AddModuleScore(PDX_DF101, features = GO.list, name = names(GO.list), nbin = 50, search = T)
PDX_DF68 <- AddModuleScore(PDX_DF68, features = GO.list, name = names(GO.list), nbin = 50, search = T)

#center scores 
GO.centered <- c("DEFENSE_TO_VIRUS1", "NEG_REG_OF_VIRAL_GENOME_REPLICATION2", "TYPE_I_INTERFERON3")
for(i in GO.centered){
    DF20.hm.centered <- scale(PDX_DF20[[i]], center = TRUE, scale = FALSE)
    PDX_DF20 <- AddMetaData(PDX_DF20, DF20.hm.centered, col.name = glue("{i}.centered"))
    
    DF101.hm.centered <- scale(PDX_DF101[[i]], center = TRUE, scale = FALSE)
    PDX_DF101 <- AddMetaData(PDX_DF101, DF101.hm.centered, col.name = glue("{i}.centered"))
    
    DF68.hm.centered <- scale(PDX_DF68[[i]], center = TRUE, scale = FALSE)
    PDX_DF68 <- AddMetaData(PDX_DF68, DF68.hm.centered, col.name = glue("{i}.centered"))
}

VIOLIN PLOTS OF THE HALLMARKS MENTIONED ABOVE * Plot swarm plots of each GO term we’re interested in

PDXs <- c(PDX_DF20, PDX_DF101, PDX_DF68)

#DEFENSE_TO_VIRUS1 ----------
def.swarm.plots <- vector("list", length = 3)
names(def.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(GO.list[["DEFENSE_TO_VIRUS"]] %in% rownames(PDX))
  feature.length <- length(GO.list[["DEFENSE_TO_VIRUS"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = DEFENSE_TO_VIRUS1.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} DEFENSE TO VIRUS expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} DEFENSE TO VIRUS genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["DEFENSE_TO_VIRUS1.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["DEFENSE_TO_VIRUS1.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["DEFENSE_TO_VIRUS1.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  def.swarm.plots[[PDX.name]] <- p 
}

p1 <- def.swarm.plots[["DF20"]] + def.swarm.plots[["DF101"]] + def.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)
p1

ggsave(plot = p1, filename = glue("PDX_VIRUS_DEFENSE.png"), path = "GO_results/GO_Vln/PDX", width = 18, height = 8)


#NEG_REG_OF_VIRAL_GENOME_REPLICATION2 ----------
neg.swarm.plots <- vector("list", length = 3)
names(neg.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(GO.list[["NEG_REG_OF_VIRAL_GENOME_REPLICATION"]] %in% rownames(PDX))
  feature.length <- length(GO.list[["NEG_REG_OF_VIRAL_GENOME_REPLICATION"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} NEG REG TO VIRAL GENOME REPLICATION expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  neg.swarm.plots[[PDX.name]] <- p 
}

p2 <- neg.swarm.plots[["DF20"]] + neg.swarm.plots[["DF101"]] + neg.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)
p2

ggsave(plot = p2, filename = glue("PDX_NEG_REG.png"), path = "GO_results/GO_Vln/PDX", width = 18, height = 8)


#TYPE_I_INTERFERON3 ----------
int.swarm.plots <- vector("list", length = 3)
names(int.swarm.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  my_comparisons <- list(
    c("MRD", "vehicle"),
    c("MRD", "relapse"), 
    c("vehicle", "relapse")
  )
  
  feature.found <- sum(GO.list[["TYPE_I_INTERFERON"]] %in% rownames(PDX))
  feature.length <- length(GO.list[["TYPE_I_INTERFERON"]])
  feature.pFound <- round((feature.found / feature.length)*100, 2)
  numCells <- nrow(PDX@meta.data)
  
  p <- ggplot(PDX@meta.data, aes(x= treatment.status , y = TYPE_I_INTERFERON3.centered, color = treatment.status)) +
  geom_quasirandom(groupOnX =TRUE) + 
  theme_bw() + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    labs(title = glue("{PDX.name} TYPE I INTERFERON SIGNALING expression across treatment"), x = PDX.name, subtitle = glue("{numCells} Malignant Cells, {feature.found} out of {feature.length} genes found ({feature.pFound}%)")) + 
  geom_boxplot(width = 0.10, position = position_dodge(0.9), alpha = 1, show.legend = F, color = "black", outlier.alpha = 0) + 
    geom_text(label = paste(sum(PDX$treatment.status == "vehicle"), "cells"), x = "vehicle", y = min(PDX[["TYPE_I_INTERFERON3.centered"]]) -0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "MRD"), "cells"), x = "MRD", y = min(PDX[["TYPE_I_INTERFERON3.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    geom_text(label = paste(sum(PDX$treatment.status == "relapse"), "cells"), x = "relapse", y = min(PDX[["TYPE_I_INTERFERON3.centered"]]) - 0.03, show.legend = FALSE, color = "black") + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.06) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.06, bracket.size = 0, vjust = 1.8)
  
  int.swarm.plots[[PDX.name]] <- p 
}

p3 <- int.swarm.plots[["DF20"]] + int.swarm.plots[["DF101"]] + int.swarm.plots[["DF68"]] + plot_layout(guides = "collect", ncol = 3)
p3

ggsave(plot = p3, filename = glue("PDX_INTERFERON.png"), path = "GO_results/GO_Vln/PDX", width = 18, height = 8)

CONCLUSIONS
- The statistically significant results are (p < 0.05):
  - DEFENSE RESPONSE TO VIRUS
    - DF101: MRD > Vehicle (supports our hypothesis)
  - NEGATIVE REGULATION OF VIRAL GENOME REPLICATION
    - DF20: MRD and Relapse > Vehicle (supports our hypothesis)
    - DF101: MRD > Vehicle (supports our hypothesis)
  - TYPE I INTERFERON SIGNALING PATHWAY
    - DF20: MRD > Vehicle (supports our hypothesis)

While all of our statistically significant results support our hypothesis, they are not consistent across all of our PDX data. We investigate this further by plotting PCA plots to detect possible subpopulations of cells within a specific treatment condition that enrich these GO terms. The axis for the PCA plot depends on the PC that the GO term is associated with:

PC_7 * Type I interferon signaling pathway (most associated with PC_7) * Defense to virus
- PC_6
  - Defense to virus
- Negative Regulation of Viral Genome Replication
  - Never associated with the same PC for all three PDX models at once, but appears frequently and associated with 2 of the 3 PDX models such as: PC_3, PC_7, PC_9, PC_10

PCs <- vector("list", length=3)
names(PCs) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  
  PCs[[PDX.name]] <- as.data.frame(PDX@reductions$pca@cell.embeddings)
  PCs[[PDX.name]] <- PCs[[PDX.name]] %>% 
    dplyr::mutate("DEFENSE_TO_VIRUS1.centered" = deframe(PDX[["DEFENSE_TO_VIRUS1.centered"]])) %>%
    dplyr::mutate("NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered" = deframe(PDX[["NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered"]])) %>% 
    dplyr::mutate("TYPE_I_INTERFERON3.centered" = deframe(PDX[["TYPE_I_INTERFERON3.centered"]])) %>% 
    dplyr::mutate("treatment.status" = deframe(PDX[["treatment.status"]]))
  
}

# DEFENSE_TO_VIRUS1.centered -------------------
GO.terms <- c("DEFENSE_TO_VIRUS1.centered", "NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered", "TYPE_I_INTERFERON3.centered")
GO.names <- c("DEFENSE TO VIRUS", "NEG REG VIRAL GENOME REPLICATION", "TYPE I INTERFERON SIGNALING")

def.pca.plots <- vector("list", length = 3)
names(def.pca.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  plots <- vector("list", length =2)
  
  plots[[1]] <- ggplot(PCs[[PDX.name]], aes(x = PC_6, y = PC_7, colour = treatment.status)) + geom_point(alpha = 0.7) + labs(title = glue("{PDX.name} by Treatment (PC6 vs PC7)")) + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    theme(plot.title = element_text(size = 10))
  
  plots[[2]] <- ggplot(PCs[[PDX.name]], aes(x = PC_6, y = PC_7, colour = DEFENSE_TO_VIRUS1.centered)) + geom_point() + labs(title = glue("{PDX.name} DEFENSE AGAINT VIRUS score"), colour = "DEFENSE AGAINST VIRUS") + 
    theme(plot.title = element_text(size = 10))
  
  def.pca.plots[[PDX.name]] <- plots
}
  
p1 <- def.pca.plots[["DF20"]][[1]] + def.pca.plots[["DF20"]][[2]] + def.pca.plots[["DF101"]][[1]] + def.pca.plots[["DF101"]][[2]] + def.pca.plots[["DF68"]][[1]] + def.pca.plots[["DF68"]][[2]] + plot_layout(ncol = 2, nrow = 3)
p1

ggsave(plot = p1, filename = glue("PDX_VIRUS_DEFENSE.png"), path = "GO_results/GO_PCA/PDX", width = 10, height = 12)

# TYPE_I_INTERFERON3.centered -------------------
int.pca.plots <- vector("list", length = 3)
names(int.pca.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  plots <- vector("list", length =2)
  
  plots[[1]] <- ggplot(PCs[[PDX.name]], aes(x = PC_6, y = PC_7, colour = treatment.status)) + geom_point(alpha = 0.7) + labs(title = glue("{PDX.name} by Treatment (PC6 vs PC7)")) + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    theme(plot.title = element_text(size = 10))
  
  plots[[2]] <- ggplot(PCs[[PDX.name]], aes(x = PC_6, y = PC_7, colour = TYPE_I_INTERFERON3.centered)) + geom_point() + labs(title = glue("{PDX.name} TYPE I INTERFERON SIGNALING score"), colour = "TYPE I INTERFERON SIGNALING") + 
    theme(plot.title = element_text(size = 10))
  
  int.pca.plots[[PDX.name]] <- plots
}
  
p2 <- int.pca.plots[["DF20"]][[1]] + int.pca.plots[["DF20"]][[2]] + int.pca.plots[["DF101"]][[1]] + int.pca.plots[["DF101"]][[2]] + int.pca.plots[["DF68"]][[1]] + int.pca.plots[["DF68"]][[2]] + plot_layout(ncol = 2, nrow = 3)
p2

ggsave(plot = p2, filename = glue("PDX_INTERFERON.png"), path = "GO_results/GO_PCA/PDX", width = 10, height = 12)

# NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered (PC3 v PC7)-------------------
neg.pca.plots <- vector("list", length = 3)
names(neg.pca.plots) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  plots <- vector("list", length =2)
  
  plots[[1]] <- ggplot(PCs[[PDX.name]], aes(x = PC_3, y = PC_7, colour = treatment.status)) + geom_point(alpha = 0.7) + labs(title = glue("{PDX.name} by Treatment (PC3 vs PC7)")) + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    theme(plot.title = element_text(size = 10))
  
  plots[[2]] <- ggplot(PCs[[PDX.name]], aes(x = PC_3, y = PC_7, colour = NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered)) + geom_point() + labs(title = glue("{PDX.name} NEG REG VIRAL GENOME REPLICATION score"), colour = "NEG REG VIRAL GENOME REP.") + 
    theme(plot.title = element_text(size = 10))
  
  neg.pca.plots[[PDX.name]] <- plots
}
  
p3 <- neg.pca.plots[["DF20"]][[1]] + neg.pca.plots[["DF20"]][[2]] + neg.pca.plots[["DF101"]][[1]] + neg.pca.plots[["DF101"]][[2]] + neg.pca.plots[["DF68"]][[1]] + neg.pca.plots[["DF68"]][[2]] + plot_layout(ncol = 2, nrow = 3)
p3

ggsave(plot = p2, filename = glue("PDX_NEG_REG1.png"), path = "GO_results/GO_PCA/PDX", width = 10, height = 12)


# NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered (PC9 v PC10) -------------------
neg.pca.plots2 <- vector("list", length = 3)
names(neg.pca.plots2) <- PDX.names

for(i in 1:3){
  PDX <- PDXs[[i]]
  PDX.name <- PDX.names[[i]]
  plots <- vector("list", length =2)
  
  plots[[1]] <- ggplot(PCs[[PDX.name]], aes(x = PC_9, y = PC_10, colour = treatment.status)) + geom_point(alpha = 0.7) + labs(title = glue("{PDX.name} by Treatment (PC9 vs PC10)")) + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + 
    theme(plot.title = element_text(size = 10))
  
  plots[[2]] <- ggplot(PCs[[PDX.name]], aes(x = PC_9, y = PC_10, colour = NEG_REG_OF_VIRAL_GENOME_REPLICATION2.centered)) + geom_point() + labs(title = glue("{PDX.name} NEG REG VIRAL GENOME REPLICATION score"), colour = "NEG REG VIRAL GENOME REP.") + 
    theme(plot.title = element_text(size = 10))
  
  neg.pca.plots2[[PDX.name]] <- plots
}
  
p4 <- neg.pca.plots2[["DF20"]][[1]] + neg.pca.plots2[["DF20"]][[2]] + neg.pca.plots2[["DF101"]][[1]] + neg.pca.plots2[["DF101"]][[2]] + neg.pca.plots2[["DF68"]][[1]] + neg.pca.plots2[["DF68"]][[2]] + plot_layout(ncol = 2, nrow = 3)
p4

ggsave(plot = p4, filename = glue("PDX_NEG_REG2.png"), path = "GO_results/GO_PCA/PDX", width = 10, height = 12)

In these PCA plots for each GO term, there does not seem to have very clear separation of cells. It therefore doesn’t seem like certain subpopulations of cells within a treatment condition enriches our GO terms of interest.

CONCLUSION
- Considering all of the results we obtain from our DE Analysis of gene-sets of interest, while we’ve obtained some statistically significant results that support our hypothesis (MRD enriches OXPHOS, UPR, and other gene-sets of interest), these results are not consistent across all comparisons, including our SS2 data. We therefore cannot conclude that our hypothesis is supported by our results, and this is most likely because our data is very low power since we have very little cells to analyze.

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] topGO_2.40.0          SparseM_1.78          GO.db_3.11.4          graph_1.66.0          org.Hs.eg.db_3.11.4  
 [6] AnnotationDbi_1.50.1  IRanges_2.22.2        S4Vectors_0.26.1      pcaExplorer_2.14.2    Biobase_2.48.0       
[11] BiocGenerics_0.34.0   ggbeeswarm_0.6.0      ggpubr_0.4.0          GGally_2.0.0          gt_0.2.1             
[16] reshape2_1.4.4        tidyselect_1.1.0      fgsea_1.14.0          presto_1.0.0          data.table_1.12.8    
[21] Rcpp_1.0.5            glue_1.4.1            patchwork_1.0.1       EnhancedVolcano_1.6.0 ggrepel_0.8.2        
[26] here_0.1              readxl_1.3.1          forcats_0.5.0         stringr_1.4.0         dplyr_1.0.0          
[31] purrr_0.3.4           readr_1.3.1           tidyr_1.1.0           tibble_3.0.3          ggplot2_3.3.2        
[36] tidyverse_1.3.0       cowplot_1.0.0         Seurat_3.1.5          BiocManager_1.30.10   renv_0.11.0-4        

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1              AnnotationForge_1.30.1      pkgmaker_0.31.1             bit64_0.9-7.1              
  [5] knitr_1.29                  irlba_2.3.3                 DelayedArray_0.14.1         RCurl_1.98-1.2             
  [9] doParallel_1.0.15           generics_0.0.2              RSQLite_2.2.0               RANN_2.6.1                 
 [13] future_1.18.0               bit_1.1-15.2                webshot_0.5.2               xml2_1.3.2                 
 [17] lubridate_1.7.9             httpuv_1.5.4                SummarizedExperiment_1.18.2 assertthat_0.2.1           
 [21] viridis_0.5.1               xfun_0.15                   hms_0.5.3                   TSP_1.1-10                 
 [25] evaluate_0.14               promises_1.1.1              fansi_0.4.1                 progress_1.2.2             
 [29] caTools_1.18.0              dendextend_1.13.4           dbplyr_1.4.4                Rgraphviz_2.32.0           
 [33] igraph_1.2.5                DBI_1.1.0                   geneplotter_1.66.0          htmlwidgets_1.5.1          
 [37] reshape_0.8.8               ellipsis_0.3.1              crosstalk_1.1.0.1           backports_1.1.8            
 [41] annotate_1.66.0             gridBase_0.4-7              biomaRt_2.44.1              vctrs_0.3.2                
 [45] ROCR_1.0-11                 abind_1.4-5                 withr_2.2.0                 sctransform_0.2.1          
 [49] gclus_1.3.2                 prettyunits_1.1.1           cluster_2.1.0               ape_5.4                    
 [53] lazyeval_0.2.2              crayon_1.3.4                genefilter_1.70.0           pkgconfig_2.0.3            
 [57] labeling_0.3                GenomeInfoDb_1.24.2         seriation_1.2-8             nlme_3.1-148               
 [61] vipor_0.4.5                 rlang_0.4.7                 globals_0.12.5              lifecycle_0.2.0            
 [65] registry_0.5-1              BiocFileCache_1.12.1        GOstats_2.54.0              modelr_0.1.8               
 [69] rsvd_1.0.3                  cellranger_1.1.0            rprojroot_1.3-2             matrixStats_0.56.0         
 [73] lmtest_0.9-37               rngtools_1.5                Matrix_1.2-18               carData_3.0-4              
 [77] zoo_1.8-8                   reprex_0.3.0                base64enc_0.1-3             beeswarm_0.2.3             
 [81] pheatmap_1.0.12             whisker_0.4                 ggridges_0.5.2              png_0.1-7                  
 [85] viridisLite_0.3.0           bitops_1.0-6                shinydashboard_0.7.1        KernSmooth_2.23-17         
 [89] blob_1.2.1                  workflowr_1.6.2             shinyAce_0.4.1              rstatix_0.6.0              
 [93] ggsignif_0.6.0              scales_1.1.1                GSEABase_1.50.1             memoise_1.1.0              
 [97] magrittr_1.5                plyr_1.8.6                  ica_1.0-2                   gplots_3.0.4               
[101] gdata_2.18.0                bibtex_0.4.2.2              zlibbioc_1.34.0             threejs_0.3.3              
[105] compiler_4.0.2              RColorBrewer_1.1-2          DESeq2_1.28.1               fitdistrplus_1.1-1         
[109] cli_2.0.2                   XVector_0.28.0              Category_2.54.0             listenv_0.8.0              
[113] pbapply_1.4-2               MASS_7.3-51.6               stringi_1.4.6               shinyBS_0.61               
[117] yaml_2.2.1                  askpass_1.1                 locfit_1.5-9.4              grid_4.0.2                 
[121] fastmatch_1.1-0             tools_4.0.2                 future.apply_1.6.0          rio_0.5.16                 
[125] rstudioapi_0.11             foreach_1.5.0               foreign_0.8-80              git2r_0.27.1               
[129] gridExtra_2.3               farver_2.0.3                Rtsne_0.15                  digest_0.6.25              
[133] shiny_1.5.0                 GenomicRanges_1.40.0        car_3.0-8                   broom_0.7.0                
[137] later_1.1.0.1               RcppAnnoy_0.0.16            httr_1.4.1                  colorspace_1.4-1           
[141] rvest_0.3.5                 XML_3.99-0.4                fs_1.4.2                    reticulate_1.16            
[145] splines_4.0.2               RBGL_1.64.0                 uwot_0.1.8                  plotly_4.9.2.1             
[149] xtable_1.8-4                jsonlite_1.7.0              heatmaply_1.1.1             R6_2.4.1                   
[153] pillar_1.4.6                htmltools_0.5.0             mime_0.9                    NMF_0.23.0                 
[157] fastmap_1.0.1               DT_0.14                     BiocParallel_1.22.0         codetools_0.2-16           
[161] tsne_0.1-3                  lattice_0.20-41             curl_4.3                    leiden_0.3.3               
[165] gtools_3.8.2                zip_2.0.4                   openxlsx_4.1.5              openssl_1.4.2              
[169] survival_3.2-3              limma_3.44.3                rmarkdown_2.3               munsell_0.5.0              
[173] GenomeInfoDbData_1.2.3      iterators_1.0.12            haven_2.3.1                 gtable_0.3.0

Izar 2020 PDX (Cohort 3) DE Analysis

Jesslyn Goh and Mike Cuoco

7/20/2020