Last updated: 2020-08-04

Checks: 7 0

Knit directory: jesslyn_ovca/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200713)

The command set.seed(20200713) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 6d23590

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 6d23590. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    code/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/HTAPP/
    Ignored:    data/Izar_2020/
    Ignored:    data/gene_lists/.DS_Store
    Ignored:    data/gene_lists/extra/.DS_Store
    Ignored:    jesslyn_plots/
    Ignored:    mike_plots/
    Ignored:    old/.DS_Store
    Ignored:    old/edited/.DS_Store
    Ignored:    renv/.DS_Store
    Ignored:    renv/library/
    Ignored:    renv/python/
    Ignored:    renv/staging/
    Ignored:    vignettes/

Unstaged changes:
    Modified:   old/edited/02_Izar2020_SS2_Load_Plots.Rmd
    Modified:   old/edited/PDX_choices.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/02.1_Izar2020_SS2_DEAnalysis.Rmd) and HTML (docs/02.1_Izar2020_SS2_DEAnalysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	26f64a3	jgoh2	2020-08-03	Build site.
Rmd	eaa900e	jgoh2	2020-08-03	workflowr::wflow_publish(files = files)
html	2dc1bee	jgoh2	2020-07-31	Build site.
Rmd	284aad4	jgoh2	2020-07-31	workflowr::wflow_publish(files = files)
Rmd	c8bb9fc	jgoh2	2020-07-30	PDX Exploratory + DE + Cell Cycle Analyses
html	6be6c85	jgoh2	2020-07-28	Build site.
Rmd	cdd10f9	jgoh2	2020-07-28	SS2 DE Analysis
html	cdd10f9	jgoh2	2020-07-28	SS2 DE Analysis
Rmd	35c7947	jgoh2	2020-07-27	SS2 Analysis Part 1 and 2
Rmd	e27cfd1	jgoh2	2020-07-22	Moved files out of the analysis folder + AddModulescore in read_Izar_2020.R
Rmd	527247a	jgoh2	2020-07-20	Reorganize SS2 code and add to the analysis folder

IZAR 2020 SS2 (COHORT 2) DATA DIFFERANTIAL EXPRESISON ANALYSIS

OVERVIEW

The SS2 data from Izar2020 consists of cells from 14 ascites samples from 6 individuals:
- selected for malignant EPCAM+CD24+ cells by fluorophore-activated cell sorting into 96-well plates
- Includes both malignant and nonmalignant populations but differs from 10X data - examines malignant ascites more closely.
- We are only interested in the malignant ascites population, specifically samples from patient 8 and 9 because they are the only patients that have samples from different treatment statuses.
In our 5-part analysis of the Izar 2020 SS2 data, we are interested in identifying differentially expressed genes and hallmark genesets between treatment statuses within patients 8 and 9
We split our SS2 analysis into three parts:
1. Load Data and Create SS2 Seurat Object
  1. The code to this part of our analysis is in the read_Izar_2020.R file in the code folder. During this part of our analysis we:
    1. Load in SS2 count matrix and Create Seurat Object
    2. Assign Metadata identities including:
      - Patient ID
      - Time
      - Sample ID
      - Treatment Status
    3. Score cells for cell cycle and hallmark genesets - Note: It does not matter whether we call AddModuleScore before or after subsetting and scaling each model because AddModuleScore uses the data slot.
    4. Save Seurat Object
2. Process Data and Exploratory Data Analysis
  1. The code to this part of our analysis can be found in the 02_Izar2020_SS2_Load_Plots file in the old/edited folder. During this part of our analysis we:
    1. Load in SS2 Seurat Object from Part 1 and subset by patients 8 and 9. Continue analysis separately for each model.
    2. Scale and FindVariableFeatures (prepares data for dimensionality reduction)
    3. Dimensionality Reduction (PCA + UMAP)
    4. Save Seurat Objects
3. Exploratory Data Analysis
  1. The code to this part of our analysis can be found in the 02.0_Izar2020_SS2_Exploratory Analysis file in the analysis folder. During this part of our analysis we:
    1. Load in SS2 Seurat Object from Part 2. Analyze separately for patients 8 and 9
    2. Compute summary metrics for SS2 data such as:
      - Number of cells per patient per treatment
      - Number of cells per treatment per cell cycle phase
    3. Visualize how cells separate based on metadata identities via UMAP and PCA - Intermodel heterogeneity: How do cells separate by sample? - Intramodel heterogeneity:
      - How do cells separate by treament status / sample?
      - How do cells separate by cell cycle phase?
4. DE Analysis
  1. TYPE #1 DE ANALYSIS: Visualizing and Quantifying Differentially Expression on Predefined GO Genesets
    1. Violin Plots and UMAP
    2. Gene Set Enrichment Analysis (GSEA)
  2. TYPE #2 DE ANALYSIS: Finding DE Genes from scratch
    1. Volcano Plots
5. CELL CYCLE ANALYSIS
  1. TYPE #1 CELL CYCLE ANALYSIS: Examine correlation between treatment condition and cell cycle phase
  2. Evaluate the idea that cell cycle might influence expression of signatures

This is the third part of our 5-part analysis of the Izar 2020 SS2 (Cohort 2) data.

STEP 1 LOAD SEURAT OBJECT

# Load packages
source(here::here('packages.R'))

#Read in SS2 RDS object
SS2Malignant = readRDS(file = "data/Izar_2020/jesslyn_SS2Malignant_processed.RDS")
SS2Malignant.8 = readRDS(file = "data/Izar_2020/jesslyn_SS2Malignant8_processed.RDS")
SS2Malignant.9 = readRDS(file = "data/Izar_2020/jesslyn_SS2Malignant9_processed.RDS")

#Read in hallmarks of interest
hallmark_names = read_lines("data/gene_lists/hallmarks.txt")
hallmark.list <- vector(mode = "list", length = length(hallmark_names) + 4)
names(hallmark.list) <- c(hallmark_names, "GO.OXPHOS", "KEGG.OXPHOS", "UNUPDATED.OXPHOS", "UNUPDATED.UPR")
for(hm in hallmark_names){
  file <- read_lines(glue("data/gene_lists/hallmarks/{hm}_updated.txt"), skip = 1)
  hallmark.list[[hm]] <- file
}

hallmark.list[["GO.OXPHOS"]] <- read_lines("data/gene_lists/extra/GO.OXPHOS.txt", skip = 1)
hallmark.list[["KEGG.OXPHOS"]] <- read_lines("data/gene_lists/extra/KEGG.OXPHOS.txt", skip = 2)
hallmark.list[["UNUPDATED.OXPHOS"]] <- read_lines("data/gene_lists/oxphos.txt", skip =1)
hallmark.list[["UNUPDATED.UPR"]] <- read_lines("data/gene_lists/upr.txt", skip =1)

#center module and cell cycle scores and reassign to the metadata of each Seurat object
hm.names <- names(SS2Malignant@meta.data)[14:53]

for(i in hm.names){
    SS2Malignant.hm.centered <- scale(SS2Malignant[[i]], center = TRUE, scale = FALSE)
    SS2Malignant <- AddMetaData(SS2Malignant, SS2Malignant.hm.centered, col.name = glue("{i}.centered"))
    
    SS2Malignant8.hm.centered <- scale(SS2Malignant.8[[i]], center = TRUE, scale = FALSE)
    SS2Malignant.8 <- AddMetaData(SS2Malignant.8, SS2Malignant8.hm.centered, col.name = glue("{i}.centered"))
    
    SS2Malignant9.hm.centered <- scale(SS2Malignant.9[[i]], center = TRUE, scale = FALSE)
    SS2Malignant.9 <- AddMetaData(SS2Malignant.9, SS2Malignant9.hm.centered, col.name = glue("{i}.centered"))
}

STEP 2 DETERMINING OXPHOS GENESET

Before doing any DE Analysis on the genesets, we investigate which of the three OXPHOS genesets we found is best for our data. Although we’ve investigated this for our PDX data, the geneset that is more suitable for SS2 may be different. The OXPHOS genesets that we test are listed as follows:
- Unupdated Version of HALLMARK_OXIDATIVE_PHOSPHORYLATION (200 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_OXIDATIVE_PHOSPHORYLATION
- Updated Version of HALLMARK_OXIDATIVE_PHOSPHORYLATION (200 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_OXIDATIVE_PHOSPHORYLATION
- GO_OXIDATIVE_PHOSPHORYLATION (144 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/GO_OXIDATIVE_PHOSPHORYLATION
- KEGG_OXIDATIVE_PHOSPHORYLATION (131 genes) https://www.gsea-msigdb.org/gsea/msigdb/cards/KEGG_OXIDATIVE_PHOSPHORYLATION
We determine which geneset to use by asking these questions:
1. How many and which genes are not found in the SS2 Seurat Object for each geneset?
2. What makes the genesets different from each other? Which genes in each geneset are the most DE or have the higest logFC? Are they the same?
  1. Volcano Plot: Call FindMarkers on each model, plot onto Volcano Plot, label cells by each geneset
3. Which geneset gives us the most statistically significant results?
  1. VlnPlot: Plot hallmark scores for each geneset group by treatment status for each model
  2. Label the VlnPlot with the number of genes found in SS2 Seurat Object for each OXPHOS geneset

ANSWERING QUESTION #1: How many and which genes are not found in the SS2 Seurat Object for each geneset?

hm.length.df <- data.frame(
  "UNUPDATED.HM.OXPHOS" = length(hallmark.list[["UNUPDATED.OXPHOS"]]),
  "HALLMARK.OXPHOS" = length(hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]]), 
  "GO.OXPHOS" = length(hallmark.list[["GO.OXPHOS"]]), 
  "KEGG.OXPHOS" = length(hallmark.list[["KEGG.OXPHOS"]])
)

Found.df <- data.frame(
  "UNUPDATED.HM.OXPHOS" = sum((hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(SS2Malignant))),
  "HALLMARK.OXPHOS" = sum((hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]] %in% rownames(SS2Malignant))), 
  "GO.OXPHOS" = sum((hallmark.list[["GO.OXPHOS"]] %in% rownames(SS2Malignant))), 
  "KEGG.OXPHOS" = sum((hallmark.list[["KEGG.OXPHOS"]] %in% rownames(SS2Malignant)))
)

PFound.df <- data.frame(
  "UNUPDATED.HM.OXPHOS" = (sum((hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(SS2Malignant)))/length(hallmark.list[["UNUPDATED.OXPHOS"]]))*100,
  "HALLMARK.OXPHOS" = (sum((hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]] %in% rownames(SS2Malignant)))/length(hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]])) * 100, 
  "GO.OXPHOS" = (sum((hallmark.list[["GO.OXPHOS"]] %in% rownames(SS2Malignant)))/length(hallmark.list[["GO.OXPHOS"]]))*100, 
  "KEGG.OXPHOS" = (sum((hallmark.list[["KEGG.OXPHOS"]] %in% rownames(SS2Malignant)))/length(hallmark.list[["KEGG.OXPHOS"]]))*100
)

NA.df <- data.frame(
  "UNUPDATED.HM.OXPHOS" = sum(!(hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(SS2Malignant))),
  "HALLMARK.OXPHOS" = sum(!(hallmark.list[["HALLMARK_OXIDATIVE_PHOSPHORYLATION"]] %in% rownames(SS2Malignant))), 
  "GO.OXPHOS" = sum(!(hallmark.list[["GO.OXPHOS"]] %in% rownames(SS2Malignant))), 
  "KEGG.OXPHOS" = sum(!(hallmark.list[["KEGG.OXPHOS"]] %in% rownames(SS2Malignant)))
)

all.df <- rbind(hm.length.df, Found.df, PFound.df, NA.df)
rownames(all.df) <- c("NumGenes", "Found", "%Found", "Not Found")
all.df[,"GO.OXPHOS"] <- round(all.df[,"GO.OXPHOS"])
all.df[,"KEGG.OXPHOS"] <- round(all.df[,"KEGG.OXPHOS"])
all.df

          UNUPDATED.HM.OXPHOS HALLMARK.OXPHOS GO.OXPHOS KEGG.OXPHOS
NumGenes                  200             200       144         131
Found                     184             182       107         101
%Found                     92              91        74          77
Not Found                  16              18        37          30

# IDENTIFY GENES THAT ARE NOT FOUND -----------
NA.genes.df <- vector("list", length = 4)
names <- c("UNUPDATED.OXPHOS", "HALLMARK_OXIDATIVE_PHOSPHORYLATION", "GO.OXPHOS", "KEGG.OXPHOS")
names(NA.genes.df) <- names
for(i in names){
  NA.genes.df[[i]] <- (hallmark.list[[i]])[which(!(hallmark.list[[i]] %in% rownames(SS2Malignant)))]
}

OBSERVATIONS
- Similar to what we observed in the PDX data, the UNUPDATED.OXPHOS genelist actually has the most genes that are found in the SS2 object.
- Since the UNUPDATED.OXPHOS genelist works better than the updated version (HALLMARK_OXIDATIVE_PHOSPHORYLATION), we will continue our comparison of genesets using the UNUPDATED version.
We now wonder:
- Which OXPHOS genes are the most DE (or have the highest logFC) if we call FindMarkers?
- Which geneset(s) do the most DE OXPHOS genes belong to?

ANSWERING QUESTION 2: Which genes in each geneset are the most DE? Are they the same?

Used the wilcoxon rank sum test for FindMarkers

SS2s <- c(SS2Malignant.8, SS2Malignant.9)
SS2.names <- c("SS2 Patient 8", "SS2 Patient 9")
oxphos.hm <- c("UNUPDATED.OXPHOS", "GO.OXPHOS", "KEGG.OXPHOS")
SS2.hm.plots <- vector("list", length = 3)
names(SS2.hm.plots) <- SS2.names

SS2.markers <- vector("list", length = 3)
names(SS2.markers) <- SS2.names

SS2.markers[["SS2 Patient 8"]] <- FindMarkers(SS2Malignant.8, group.by = "sample", ident.1 = "8.1", ident.2 = "8.0", test.use = "wilcox", logfc.threshold = 0) 
SS2.markers[["SS2 Patient 9"]]  <- FindMarkers(SS2Malignant.9, group.by = "sample", ident.1 = "9.1", ident.2 = "9.0", test.use = "wilcox", logfc.threshold = 0)

for(i in 1:length(SS2s)){
    SS2 <- SS2.names[[i]]
    DF.hm.plot <- vector("list", length = 3)
    names(DF.hm.plot) <- oxphos.hm
    marker <- SS2.markers[[SS2]]
    
  for(oxphos in oxphos.hm){
    
    avgLFC <- rownames(marker[which(abs(marker$avg_logFC) > 0.5),])
    keyvals <- ifelse(!(rownames(marker) %in% hallmark.list[[oxphos]]), 'black', 
                      ifelse(abs(marker$avg_logFC) > 0.5, 'red', 'grey'))
    names(keyvals)[keyvals == 'red'] <- 'OXPHOS & logFC'
    names(keyvals)[keyvals == 'grey'] <- 'OXPHOS x logFC'
    names(keyvals)[keyvals == 'black'] <- 'Not OXPHOS'
    found = length(avgLFC[which(avgLFC %in% hallmark.list[[oxphos]])])
    
    p <- EnhancedVolcano(marker, 
           lab = rownames(marker),
           selectLab = avgLFC[which(avgLFC %in% hallmark.list[[oxphos]])],
           labCol = "red",
           x='avg_logFC', y='p_val_adj', pCutoff = 0.05, 
           FCcutoff = 0.5, 
           colCustom = keyvals,
           pointSize = c(ifelse(rownames(marker) %in% hallmark.list[[oxphos]], 2.5,1)), 
           drawConnectors = TRUE,
           boxedLabels = TRUE,
           labvjust = 1,
           title= glue("{SS2} p.1 vs. p.0"), subtitle= "LogFC cutoff: 0.5, p cutoff: 0.05",
           caption = glue("{oxphos}: {found} high LFC oxphos genes found")
           )
    DF.hm.plot[[oxphos]] <- p
  }
  
  SS2.hm.plots[[SS2]] <- DF.hm.plot[["UNUPDATED.OXPHOS"]] + DF.hm.plot[["GO.OXPHOS"]] + DF.hm.plot[["KEGG.OXPHOS"]]
}

SS2.hm.plots[["SS2 Patient 8"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31

SS2.hm.plots[["SS2 Patient 9"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31

# number of DE oxphos genes found in each geneset within each model -----------------
P8.de.df <- data.frame(
  "UNUPDATED.HM.OXPHOS" = sum(rownames(SS2.markers[["SS2 Patient 8"]][which(abs(SS2.markers[["SS2 Patient 8"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["UNUPDATED.OXPHOS"]]),
  "GO.OXPHOS" = sum(rownames(SS2.markers[["SS2 Patient 8"]][which(abs(SS2.markers[["SS2 Patient 8"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["GO.OXPHOS"]]),
  "KEGG.OXPHOS" = sum(rownames(SS2.markers[["SS2 Patient 8"]][which(abs(SS2.markers[["SS2 Patient 8"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["KEGG.OXPHOS"]])
)

P9.de.df <- data.frame(
  "UNUPDATED.HM.OXPHOS" = sum(rownames(SS2.markers[["SS2 Patient 9"]][which(abs(SS2.markers[["SS2 Patient 9"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["UNUPDATED.OXPHOS"]]),
  "GO.OXPHOS" = sum(rownames(SS2.markers[["SS2 Patient 9"]][which(abs(SS2.markers[["SS2 Patient 9"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["GO.OXPHOS"]]),
  "KEGG.OXPHOS" = sum(rownames(SS2.markers[["SS2 Patient 9"]][which(abs(SS2.markers[["SS2 Patient 9"]]$avg_logFC) > 0.5),]) %in% hallmark.list[["KEGG.OXPHOS"]])
)

all.de.df <- rbind(P8.de.df, P9.de.df)
rownames(all.de.df) <- c("P8.1vs8.0", "P9.1vs9.0")
all.de.df

          UNUPDATED.HM.OXPHOS GO.OXPHOS KEGG.OXPHOS
P8.1vs8.0                  26        23          16
P9.1vs9.0                   5         3           1

percent.logFC <- data.frame(
  "UNUPDATED.HM.OXPHOS" = round(all.de.df[, "UNUPDATED.HM.OXPHOS"]/all.df["Found", "UNUPDATED.HM.OXPHOS"]*100, 2), 
  "GO.OXPHOS" = round(all.de.df[, "GO.OXPHOS"]/all.df["Found", "GO.OXPHOS"]*100, 2), 
  "KEGG.OXPHOS" = round(all.de.df[, "KEGG.OXPHOS"]/all.df["Found", "KEGG.OXPHOS"]*100, 2)
)
rownames(percent.logFC) <- c("P8 %Found highLFC", "P9 %Found highLFC")
rbind(all.df[,names(all.df) != "HALLMARK.OXPHOS"], percent.logFC)

                  UNUPDATED.HM.OXPHOS GO.OXPHOS KEGG.OXPHOS
NumGenes                       200.00     144.0      131.00
Found                          184.00     107.0      101.00
%Found                          92.00      74.0       77.00
Not Found                       16.00      37.0       30.00
P8 %Found highLFC               14.13      21.5       15.84
P9 %Found highLFC                2.72       2.8        0.99

OBSERVATIONS
- UNUPDATED.OXPHOS geneset consistently has the most number OXPHOS genes with high logFC (logFC > 0.5, regardless of padj) relative to the other two genesets for each model comparison between MRD and vehicle. However, GO.OXPHOS actually has the highest percentage of OXPHOS genes with high logFC out of the OXPHOS genes Found.
- Although it seems that there are a few overlaps of OXPHOS genes found, it is hard to visualize which ones overlap, and whether the ones that overlap are among the top DE genes (regardless of padj). We therefore create tables for each model, extract 5 genes with the highest logFC from each geneset and compare if they’re the same.

p8.top5 <- SS2.markers[["SS2 Patient 8"]] %>% arrange(-abs(avg_logFC))
p9.top5 <- SS2.markers[["SS2 Patient 9"]] %>% arrange(-abs(avg_logFC))

p8.gs.df <- data.frame(
  "UNUPDATED.OXPHOS" = head(rownames(p8.top5)[which(rownames(p8.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5), 
  "UNUPDATED.OXPHOS" = select(p8.top5[head(rownames(p8.top5)[which(rownames(p8.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5),], avg_logFC),
  "GO.OXPHOS" = head(rownames(p8.top5)[which(rownames(p8.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),
  "GO.OXPHOS" = select(p8.top5[head(rownames(p8.top5)[which(rownames(p8.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),], avg_logFC),
  "KEGG.OXPHOS" = head(rownames(p8.top5)[which(rownames(p8.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5), 
  "KEGG.OXPHOS" = select(p8.top5[head(rownames(p8.top5)[which(rownames(p8.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5),], avg_logFC)
)
rownames(p8.gs.df) <- seq(from = 1, length = nrow(p8.gs.df))

p9.gs.df <- data.frame(
  "UNUPDATED.OXPHOS" = head(rownames(p9.top5)[which(rownames(p9.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5), 
  "UNUPDATED.OXPHOS" = select(p9.top5[head(rownames(p9.top5)[which(rownames(p9.top5) %in% hallmark.list[["UNUPDATED.OXPHOS"]])], 5),], avg_logFC),
  "GO.OXPHOS" = head(rownames(p9.top5)[which(rownames(p9.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),
  "GO.OXPHOS" = select(p9.top5[head(rownames(p9.top5)[which(rownames(p9.top5) %in% hallmark.list[["GO.OXPHOS"]])], 5),], avg_logFC),
  "KEGG.OXPHOS" = head(rownames(p9.top5)[which(rownames(p9.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5), 
  "KEGG.OXPHOS" = select(p9.top5[head(rownames(p9.top5)[which(rownames(p9.top5) %in% hallmark.list[["KEGG.OXPHOS"]])], 5),], avg_logFC)
)
rownames(p9.gs.df) <- seq(from = 1, length = nrow(p9.gs.df))

p8.gs.df

  UNUPDATED.OXPHOS  avg_logFC GO.OXPHOS avg_logFC.1 KEGG.OXPHOS avg_logFC.2
1             NQO2  1.2232801    MLXIPL  -1.4084023    NDUFA4L2  -1.3529251
2              OAT  0.9683255   CHCHD10  -0.9787089      NDUFB1  -0.9501970
3           NDUFB1 -0.9501970    NDUFB1  -0.9501970      NDUFA3  -0.9273634
4           NDUFA3 -0.9273634    NDUFA3  -0.9273634      NDUFB3  -0.9247449
5           NDUFB3 -0.9247449    NDUFB3  -0.9247449    ATP6V1B2   0.8188562

p9.gs.df

  UNUPDATED.OXPHOS avg_logFC GO.OXPHOS avg_logFC.1 KEGG.OXPHOS avg_logFC.2
1         SLC25A20 0.9290268  SLC25A33  -0.9222536    ATP6V1B1   0.6114638
2            RHOT1 0.8159080     SURF1   0.6201898    ATP6V0A4  -0.4970007
3            MTRF1 0.6758994   DNAJC15  -0.5914836    NDUFA4L2  -0.3899799
4            SURF1 0.6201898      COQ9   0.4983459    ATP6V0A2  -0.3867169
5             MTRR 0.6044816      ISCU   0.4919133    ATP6V0A1  -0.3449714

OBSERVATIONS
- Note: All of these top 5 OXPHOS genes (logFC > 0.5) have a padj of 1.00
- There are quite a few overlaps of OXPHOS genes with high LFC across all three genesets for both patient 8 and 9
- The UNUPDATED.OXPHOS and the GO.OXPHOS genesets both seem to have top OXPHOS genes with higher logFC values relative to KEGG.OXPHOS
Our answers to QUESTION #1 and #2 seem to suggest that UNUPDATED.OXPHOS hallmark geneset is the best our of all the genesets tested because:
1. It has the least number of genes that are not found within the SS2 Seurat Object, and the highest percentage of genes that are found.
2. It has OXPHOS genes with the highest LFC (logFC > 0.5, regardless of padj) relative to the other two genesets for each model comparison between MRD and vehicle
3. The high LFC OXPHOS genes present within this geneset have relatively higher logFC values (regardless of padj) in comparison to the high LFC OXPHOS genes present in the other two genesets, especially KEGG.
We answer our 3rd question to confirm whether the UNUPDATED.OXPHOS geneset is indeed the most appropriate one for our data relative to the other two genesets.

ANSWERING QUESTION #3: which geneset gives us the most statistically significant results

oxphos.centered <- c("UNUPDATED.OXPHOS37.centered", "GO.OXPHOS35.centered", "KEGG.OXPHOS36.centered")
SS2.Oxphos.Vln.plots <- vector("list", length(SS2s))
names(SS2.Oxphos.Vln.plots) <- SS2.names
patient <- c("8", "9")

for (i in 1:length(SS2s)){
  obj <- SS2s[[i]]
  name <- SS2.names[[i]]
  numCells <- nrow(SS2s[[i]]@meta.data)
  
  my_comparisons <- list(
    c(glue("{patient[[i]]}.0"), glue("{patient[[i]]}.1")), 
    c(glue("{patient[[i]]}.1"), glue("{patient[[i]]}.2")), 
    c(glue("{patient[[i]]}.0"), glue("{patient[[i]]}.2"))
  )
  
  if(patient[[i]] == "8"){
    p <- VlnPlot(obj, features = oxphos.centered, group.by = "sample", pt.size = 0, combine = F, cols = c("#00AFBB", "#E7B800", "#FC4E07"), y.max = 0.7)
  }
  else{
    p <- VlnPlot(obj, features = oxphos.centered, group.by = "sample", pt.size = 0, combine = F, cols = c("#00AFBB", "#E7B800", "#FC4E07"), y.max = 1.0)
  }
  unupdated.found <- sum(hallmark.list[["UNUPDATED.OXPHOS"]] %in% rownames(obj))
  unupdated.length <- length(hallmark.list[["UNUPDATED.OXPHOS"]])
  unupdated.pFound <- round((unupdated.found / unupdated.length)*100, 2)
  p[[1]] <- p[[1]] + labs(title = glue("{name} UNUPDATED.OXPHOS scores across sample"), x = name, subtitle = glue("{numCells} Malignant Cells, {unupdated.found} out of {unupdated.length} OXPHOS genes found ({unupdated.pFound}%)")) + 
    theme(plot.title = element_text(size = 12), plot.caption = element_text(size = 10)) + 
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].0")), "cells"), x = glue("{patient[[i]]}.0"), y = min(obj$UNUPDATED.OXPHOS37.centered) -0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].1")), "cells"), x = glue("{patient[[i]]}.1"), y = min(obj$UNUPDATED.OXPHOS37.centered) - 0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].2")), "cells"), x = glue("{patient[[i]]}.2"), y = min(obj$UNUPDATED.OXPHOS37.centered) - 0.03) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.05) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.05, bracket.size = 0, vjust = 1.8)
  
  GO.found <- sum(hallmark.list[["GO.OXPHOS"]] %in% rownames(obj))
  GO.length <- length(hallmark.list[["GO.OXPHOS"]])
  GO.pFound <- round((GO.found / GO.length)*100,2)
  p[[2]] <- p[[2]] + labs(title = glue("{name} GO.OXPHOS scores across sample"), x = name, subtitle = glue("{numCells} Malignant Cells, {GO.found} out of {GO.length} OXPHOS genes found ({GO.pFound}%)")) +
    theme(plot.title = element_text(size = 12), plot.caption = element_text(size = 10)) + 
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].0")), "cells"), x = glue("{patient[[i]]}.0"), y = min(obj$GO.OXPHOS35.centered) -0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].1")), "cells"), x = glue("{patient[[i]]}.1"), y = min(obj$GO.OXPHOS35.centered) - 0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].2")), "cells"), x = glue("{patient[[i]]}.2"), y = min(obj$GO.OXPHOS35.centered) - 0.03) +
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.05) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.05, bracket.size = 0, vjust = 1.8)
  
  KEGG.found <- sum(hallmark.list[["KEGG.OXPHOS"]] %in% rownames(obj))
  KEGG.length <- length(hallmark.list[["KEGG.OXPHOS"]])
  KEGG.pFound <- round((KEGG.found / KEGG.length)*100,2)
  p[[3]] <- p[[3]] + labs(title = glue("{name} KEGG.OXPHOS scores across sample"), x = name, subtitle = glue("{numCells} Malignant Cells, {KEGG.found} out of {KEGG.length} OXPHOS genes found ({KEGG.pFound}%)")) +
    theme(plot.title = element_text(size = 12), plot.caption = element_text(size = 10)) + 
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].0")), "cells"), x = glue("{patient[[i]]}.0"), y = min(obj$KEGG.OXPHOS36.centered) -0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].1")), "cells"), x = glue("{patient[[i]]}.1"), y = min(obj$KEGG.OXPHOS36.centered) - 0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].2")), "cells"), x = glue("{patient[[i]]}.2"), y = min(obj$KEGG.OXPHOS36.centered) - 0.03) +
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.05) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.05, bracket.size = 0, vjust = 1.8)
  
  p <- p[[1]] + p[[2]] + p[[3]] + plot_layout(guides= 'collect')
  
  SS2.Oxphos.Vln.plots[[name]] <- p
}

SS2.Oxphos.Vln.plots[["SS2 Patient 8"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31

SS2.Oxphos.Vln.plots[["SS2 Patient 9"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31

OBSERVATIONS
- The overall trend we see between treatment groups within each model is the same across all three oxphos genesets. However, it seems like the difference in scores is most obvious/drastic when using the KEGG geneset.
- Statistical tests show us that while KEGG gives us the most statistically significant results in Patient 8 and UNUPDATED gives us the most statistically significant results in Patient 9, the GO geneset is actually the most consistent, giving us statistically signifcant results in both patients.
CONCLUSIONS
- Although the UNUPDATED seems like the most suitable one for criteria 1 and 2, the GO geneset gives us the best result for the 3rd criterion:
  1. Question 1: GO has the most number of unfound genes in the SS2 object. However:
  2. Question 2: It has a pretty close tie in comparison to the UNUPDATED geneset, where they had similar numbers of high LFC OXPHOS genes present from FindMarkers, and the top high LFC OXPHOS genes in both genesets have comparable logFC.
  3. Question 3: GO gives us statistically significant results in both patients.
Therefore, considering these three criteria, we decide that the GO geneset is best for our SS2 data.
SUMMARY STATISTICS

#Patient 8 ---------------------------------
SS2Malignant8.0 <- subset(SS2Malignant.8, subset = (sample == "8.0"))
SS2Malignant8.1 <- subset(SS2Malignant.8, subset = (sample == "8.1"))
SS2Malignant8.2 <- subset(SS2Malignant.8, subset = (sample == "8.2"))

SS2M8.uhm.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant8.0$UNUPDATED.OXPHOS37.centered, SS2Malignant8.1$UNUPDATED.OXPHOS37.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant8.1$UNUPDATED.OXPHOS37.centered, SS2Malignant8.2$UNUPDATED.OXPHOS37.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant8.0$UNUPDATED.OXPHOS37.centered, SS2Malignant8.2$UNUPDATED.OXPHOS37.centered)$p.value
)

SS2M8.go.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant8.0$GO.OXPHOS35.centered, SS2Malignant8.1$GO.OXPHOS35.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant8.1$GO.OXPHOS35.centered, SS2Malignant8.2$GO.OXPHOS35.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant8.0$GO.OXPHOS35.centered, SS2Malignant8.2$GO.OXPHOS35.centered)$p.value
)

SS2M8.kegg.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant8.0$KEGG.OXPHOS36.centered, SS2Malignant8.1$KEGG.OXPHOS36.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant8.1$KEGG.OXPHOS36.centered, SS2Malignant8.2$KEGG.OXPHOS36.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant8.0$KEGG.OXPHOS36.centered, SS2Malignant8.2$KEGG.OXPHOS36.centered)$p.value
)

#Patient 9 ---------------------------------
SS2Malignant9.0 <- subset(SS2Malignant.9, subset = (sample == "9.0"))
SS2Malignant9.1 <- subset(SS2Malignant.9, subset = (sample == "9.1"))
SS2Malignant9.2 <- subset(SS2Malignant.9, subset = (sample == "9.2"))

SS2M9.uhm.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant9.0$UNUPDATED.OXPHOS37.centered, SS2Malignant9.1$UNUPDATED.OXPHOS37.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant9.1$UNUPDATED.OXPHOS37.centered, SS2Malignant9.2$UNUPDATED.OXPHOS37.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant9.0$UNUPDATED.OXPHOS37.centered, SS2Malignant9.2$UNUPDATED.OXPHOS37.centered)$p.value
)

SS2M9.go.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant9.0$GO.OXPHOS35.centered, SS2Malignant9.1$GO.OXPHOS35.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant9.1$GO.OXPHOS35.centered, SS2Malignant9.2$GO.OXPHOS35.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant9.0$GO.OXPHOS35.centered, SS2Malignant9.2$GO.OXPHOS35.centered)$p.value
)

SS2M9.kegg.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant9.0$KEGG.OXPHOS36.centered, SS2Malignant9.1$KEGG.OXPHOS36.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant9.1$KEGG.OXPHOS36.centered, SS2Malignant9.2$KEGG.OXPHOS36.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant9.0$KEGG.OXPHOS36.centered, SS2Malignant9.2$KEGG.OXPHOS36.centered)$p.value
)

#combine ------------------------------
hm.oxphos.DF <- rbind(SS2M8.uhm.oxphos.df, SS2M9.uhm.oxphos.df)
rownames(hm.oxphos.DF) <- c("HM.OXPHOS.P8", "HM.OXPHOS.P9")

go.oxphos.DF <- rbind(SS2M8.go.oxphos.df, SS2M9.go.oxphos.df)
rownames(go.oxphos.DF) <- c("GO.OXPHOS.P8", "GO.OXPHOS.P9")

kegg.oxphos.DF <- rbind(SS2M8.kegg.oxphos.df, SS2M9.kegg.oxphos.df)
rownames(kegg.oxphos.DF) <- c("KEGG.OXPHOS.P8", "KEGG.OXPHOS.P9")

all.oxphos.DF <- rbind(hm.oxphos.DF, go.oxphos.DF, kegg.oxphos.DF)
DT::datatable(all.oxphos.DF) %>% 
  DT::formatSignif(names(all.oxphos.DF), digits = 2) %>% 
  DT::formatStyle(names(all.oxphos.DF), color = DT::styleInterval(0.05, c('red', 'black')))

We also tested how the UNUPDATED version of HALLMARK_UNFOLDED_PROTEIN_RESPONSE differs from the updated version.

upr.length.df <- data.frame(
  "UNUPDATED.HM.UPR" = length(hallmark.list[["UNUPDATED.UPR"]]),
  "HALLMARK.UPR" = length(hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]])
)

Found.upr.df <- data.frame(
  "UNUPDATED.HM.UPR" = sum((hallmark.list[["UNUPDATED.UPR"]] %in% rownames(SS2Malignant))),
  "HALLMARK.UPR" = sum((hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]] %in% rownames(SS2Malignant)))
)

PFound.upr.df <- data.frame(
  "UNUPDATED.HM.UPR" = (sum((hallmark.list[["UNUPDATED.UPR"]] %in% rownames(SS2Malignant)))/length(hallmark.list[["UNUPDATED.UPR"]]))*100,
  "HALLMARK.UPR" = (sum((hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]] %in% rownames(SS2Malignant)))/length(hallmark.list[["HALLMARK_UNFOLDED_PROTEIN_RESPONSE"]])) * 100
)

upr.df <- rbind(upr.length.df, Found.upr.df, PFound.upr.df)
upr.df[,"UNUPDATED.HM.UPR"] <- round(upr.df[,"UNUPDATED.HM.UPR"])
upr.df[,"HALLMARK.UPR"] <- round(upr.df[,"HALLMARK.UPR"])
rownames(upr.df) <- c("Num UPR Genes", "Num Found", "% Found")
upr.df

              UNUPDATED.HM.UPR HALLMARK.UPR
Num UPR Genes              113          113
Num Found                  107          105
% Found                     95           93

Since the UNUPDATED version identifies two more genes than the updated version, we decide to proceed with the UNUPDATED.UPR geneset.

After deciding the geneset to use for our data, we can now analyze the differential expression of OXPHOS and UPR genes across treatment condition within each model: - OXPHOS: GO_OXIDATIVE_PHOSPHORYLATION geneset - UPR: unupdated HALLMARK_UNFOLDED_PROTEIN_RESPONSE geneset

STEP 3 DE ANALYSIS #1. VISUALIZING AND QUANTIFYING DE HALLMARK GENESETS

QUESTION Are modules (OXPHOS and UPR) differentially expressed across treatment conditions within each patient?
HYPOTHESIS We hypothesize that the on-treatment time-points (p.1 and p.2) will express enriched levels of OXPHOS and UPR hallmarks relative to cells treatment-naive time-point (p.0). We are not so sure what to expect between p.1 and p.2 on-treatment samples because we are not told how the two samples are different in terms of their treatment status.
- APPROACH #1 Violin Plots and UMAP
  - Visualize differences in hallmark score across treatment conditions with:
    - Violin Plot (center module scores at 0)
  - Statistical test (wilcoxon rank sum test): is the difference in hallmark score across treatment conditions statistically significant?

hms.centered = c("GO.OXPHOS35.centered", "UNUPDATED.UPR38.centered")

#VlnPlot
SS2.names <- c("SS2 Patient 8", "SS2 Patient 9")
SS2s <- c(SS2Malignant.8, SS2Malignant.9)
SS2.Vln.plots <- vector("list", length(SS2s))
names(SS2.Vln.plots) <- SS2.names

for (i in 1:length(SS2s)){
  obj <- SS2s[[i]]
  name <- SS2.names[[i]]
  numCells <- nrow(obj@meta.data)
  
  my_comparisons <- list(
    c(glue("{patient[[i]]}.0"), glue("{patient[[i]]}.1")), 
    c(glue("{patient[[i]]}.1"), glue("{patient[[i]]}.2")), 
    c(glue("{patient[[i]]}.0"), glue("{patient[[i]]}.2"))
  )
  
  if(patient[[i]] == "8"){
    p <- VlnPlot(obj, features = hms.centered, group.by = "sample", pt.size = 0, combine = F, cols = c("#00AFBB", "#E7B800", "#FC4E07"), y.max = 0.9)
  }
  else{
    p <- VlnPlot(obj, features = hms.centered, group.by = "sample", pt.size = 0, combine = F, cols = c("#00AFBB", "#E7B800", "#FC4E07"), y.max = 1.0)
  }
  
  GO.found <- sum(hallmark.list[["GO.OXPHOS"]] %in% rownames(obj))
  GO.length <- length(hallmark.list[["GO.OXPHOS"]])
  GO.pFound <- round((GO.found / GO.length) * 100,2)
    
  p[[1]] <- p[[1]] + labs(title = glue("{name} OXPHOS scores across treatment"), x = name, subtitle = glue("{numCells} Malignant Cells, {GO.found} out of {GO.length} OXPHOS genes found ({GO.pFound}%)"), caption = "GO_OXIDATIVE_PHOSPHORYLATION") +
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].0")), "cells"), x = glue("{patient[[i]]}.0"), y = min(obj$UNUPDATED.OXPHOS37.centered) -0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].1")), "cells"), x = glue("{patient[[i]]}.1"), y = min(obj$UNUPDATED.OXPHOS37.centered) - 0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].2")), "cells"), x = glue("{patient[[i]]}.2"), y = min(obj$UNUPDATED.OXPHOS37.centered) - 0.03) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.05) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.05, bracket.size = 0, vjust = 1.8)
  
  UPR.found <- sum(hallmark.list[["UNUPDATED.UPR"]] %in% rownames(obj))
  UPR.length <- length(hallmark.list[["UNUPDATED.UPR"]])
  UPR.pFound <- round((UPR.found / UPR.length) * 100,2)
  p[[2]] <- p[[2]] + labs(title = glue("{name} UPR scores across treatment"), x = name, subtitle = glue("{numCells} Malignant Cells, {UPR.found} out of {UPR.length} UPR genes found ({UPR.pFound}%)"), caption = "HALLMARK_UNFOLDED_PROTEIN_RESPONSE") +
    geom_boxplot(width = 0.15, position = position_dodge(0.9), alpha = 0.3, show.legend = F) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].0")), "cells"), x = glue("{patient[[i]]}.0"), y = min(obj$UNUPDATED.UPR38.centered) -0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].1")), "cells"), x = glue("{patient[[i]]}.1"), y = min(obj$UNUPDATED.UPR38.centered) - 0.03) + 
    geom_text(label = paste(sum(str_detect(obj$sample, "[:digit:].2")), "cells"), x = glue("{patient[[i]]}.2"), y = min(obj$UNUPDATED.UPR38.centered) - 0.03)
  p <- p[[1]] + p[[2]] + plot_layout(guides= 'collect') + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format", step.increase = 0.05) + 
    stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", step.increase = 0.05, bracket.size = 0, vjust = 1.8)
  
  SS2.Vln.plots[[name]] <- p
}

SS2.Vln.plots[["SS2 Patient 8"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
cdd10f9	jgoh2	2020-07-28

SS2.Vln.plots[["SS2 Patient 9"]]

Version	Author	Date
26f64a3	jgoh2	2020-08-03
2dc1bee	jgoh2	2020-07-31
cdd10f9	jgoh2	2020-07-28

CONCLUSIONS
- The statistically significant results are (p < 0.05):
  - Patient 8
    - OXPHOS: Sample 8.0 > 8.1 ; 8.0 > 8.2 (does not support our hypothesis)
    - UPR: Sample 8.2 > 8.0 ; 8.1 > 8.0 (supports our hypothesis)
  - Patient 9
    - OXPHOS: Sample 9.1 > 9.2 ; 9.1 > 9.0 (supports our hypothesis) ; 9.0 > 9.2 (does not support our hypothesis)
- The trends that support our hypothesis for both OXPHOS and UPR are not consistent across patients 8 and 9. Since our sample size is so small, we’re also not sure how representative this data is of the true population.
- We are, however, not sure how to compare between p.1 vs. p.2 because they are both categorized as “on-treatment” and the clinical status of samples obtained from patients 8 and 9 is “Recurrent”, instead of denoting two types of treatment status “Diagnosis/Initial treatment” that corresponds to the two treatment timepoints for the 10X data.
STATISTICAL SUMMARY

#Patient 8 ---------------------------------
SS2M8.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant8.0$GO.OXPHOS35.centered, SS2Malignant8.1$GO.OXPHOS35.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant8.1$GO.OXPHOS35.centered, SS2Malignant8.2$GO.OXPHOS35.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant8.0$GO.OXPHOS35.centered, SS2Malignant8.2$GO.OXPHOS35.centered)$p.value
)

SS2M8.UPR.df <- 
  data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant8.0$UNUPDATED.UPR38.centered, SS2Malignant8.1$UNUPDATED.UPR38.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant8.1$UNUPDATED.UPR38.centered, SS2Malignant8.2$UNUPDATED.UPR38.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant8.0$UNUPDATED.UPR38.centered, SS2Malignant8.2$UNUPDATED.UPR38.centered)$p.value
)

#Patient 9 ---------------------------------
SS2M9.oxphos.df <- data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant9.0$GO.OXPHOS35.centered, SS2Malignant9.1$GO.OXPHOS35.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant9.1$GO.OXPHOS35.centered, SS2Malignant9.2$GO.OXPHOS35.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant9.0$GO.OXPHOS35.centered, SS2Malignant9.2$GO.OXPHOS35.centered)$p.value
)

SS2M9.UPR.df <- 
  data.frame(
  "p.0vsp.1" = wilcox.test(SS2Malignant9.0$UNUPDATED.UPR38.centered, SS2Malignant9.1$UNUPDATED.UPR38.centered)$p.value, 
  "p.1vsp.2" = wilcox.test(SS2Malignant9.1$UNUPDATED.UPR38.centered, SS2Malignant9.2$UNUPDATED.UPR38.centered)$p.value, 
  "p.0vsp.2" = wilcox.test(SS2Malignant9.0$UNUPDATED.UPR38.centered, SS2Malignant9.2$UNUPDATED.UPR38.centered)$p.value
)

#combine ------------------------------
SS2.oxphos.DF <- rbind(SS2M8.oxphos.df, SS2M9.oxphos.df)
rownames(SS2.oxphos.DF) <- c("OXPHOS.PATIENT8", "OXPHOS.PATIENT9")

SS2.UPR.DF <- rbind(SS2M8.UPR.df, SS2M9.UPR.df)
rownames(SS2.UPR.DF) <- c("UPR.PATIENT8", "UPR.PATIENT9")

SS2.all.DF <- rbind(SS2.oxphos.DF, SS2.UPR.DF)
DT::datatable(SS2.all.DF) %>% 
  DT::formatSignif(names(SS2.all.DF), digits = 2) %>% 
  DT::formatStyle(names(SS2.all.DF), color = DT::styleInterval(0.05, c('red', 'black')))

We try to confirm our results again with our second appraoch: GSEA

APPROACH #2 Geneset Enrichment Analysis (GSEA) - GSEA enrichment plots for hallmarks of interest between condition.1 vs. condition.2 - Rank genes and compute GSEA enrichment scores - Statistical test: how significant are the enrichment scores?

have not finalized the statistical test and ranking method to use for GSEA

STEP 3 DE ANALYSIS #2. IDENTIFYING INDIVIDUAL DE GENES

have not finalized the statistical test to use for volcano plot (need to match what we use for GSEA)

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] ggpubr_0.4.0          GGally_2.0.0          gt_0.2.1              reshape2_1.4.4        tidyselect_1.1.0     
 [6] presto_1.0.0          data.table_1.12.8     Rcpp_1.0.5            glue_1.4.1            patchwork_1.0.1      
[11] EnhancedVolcano_1.6.0 ggrepel_0.8.2         here_0.1              readxl_1.3.1          forcats_0.5.0        
[16] stringr_1.4.0         dplyr_1.0.0           purrr_0.3.4           readr_1.3.1           tidyr_1.1.0          
[21] tibble_3.0.3          ggplot2_3.3.2         tidyverse_1.3.0       cowplot_1.0.0         Seurat_3.1.5         
[26] BiocManager_1.30.10   renv_0.11.0-4        

loaded via a namespace (and not attached):
  [1] Rtsne_0.15         colorspace_1.4-1   ggsignif_0.6.0     rio_0.5.16         ellipsis_0.3.1     ggridges_0.5.2    
  [7] rprojroot_1.3-2    fs_1.4.2           rstudioapi_0.11    farver_2.0.3       leiden_0.3.3       listenv_0.8.0     
 [13] DT_0.14            fansi_0.4.1        lubridate_1.7.9    xml2_1.3.2         codetools_0.2-16   splines_4.0.2     
 [19] knitr_1.29         jsonlite_1.7.0     workflowr_1.6.2    broom_0.7.0        ica_1.0-2          cluster_2.1.0     
 [25] dbplyr_1.4.4       png_0.1-7          uwot_0.1.8         sctransform_0.2.1  compiler_4.0.2     httr_1.4.1        
 [31] backports_1.1.8    assertthat_0.2.1   Matrix_1.2-18      lazyeval_0.2.2     limma_3.44.3       cli_2.0.2         
 [37] later_1.1.0.1      htmltools_0.5.0    tools_4.0.2        rsvd_1.0.3         igraph_1.2.5       gtable_0.3.0      
 [43] RANN_2.6.1         carData_3.0-4      cellranger_1.1.0   vctrs_0.3.2        ape_5.4            nlme_3.1-148      
 [49] crosstalk_1.1.0.1  lmtest_0.9-37      xfun_0.15          globals_0.12.5     openxlsx_4.1.5     rvest_0.3.5       
 [55] lifecycle_0.2.0    irlba_2.3.3        rstatix_0.6.0      future_1.18.0      MASS_7.3-51.6      zoo_1.8-8         
 [61] scales_1.1.1       hms_0.5.3          promises_1.1.1     parallel_4.0.2     RColorBrewer_1.1-2 curl_4.3          
 [67] yaml_2.2.1         reticulate_1.16    pbapply_1.4-2      gridExtra_2.3      reshape_0.8.8      stringi_1.4.6     
 [73] zip_2.0.4          rlang_0.4.7        pkgconfig_2.0.3    evaluate_0.14      lattice_0.20-41    ROCR_1.0-11       
 [79] labeling_0.3       htmlwidgets_1.5.1  RcppAnnoy_0.0.16   plyr_1.8.6         magrittr_1.5       R6_2.4.1          
 [85] generics_0.0.2     DBI_1.1.0          foreign_0.8-80     pillar_1.4.6       haven_2.3.1        whisker_0.4       
 [91] withr_2.2.0        fitdistrplus_1.1-1 abind_1.4-5        survival_3.2-3     future.apply_1.6.0 tsne_0.1-3        
 [97] car_3.0-8          modelr_0.1.8       crayon_1.3.4       KernSmooth_2.23-17 plotly_4.9.2.1     rmarkdown_2.3     
[103] grid_4.0.2         blob_1.2.1         git2r_0.27.1       reprex_0.3.0       digest_0.6.25      httpuv_1.5.4      
[109] munsell_0.5.0      viridisLite_0.3.0

Izar 2020 SS2 (Cohort 2) DE Analysis

Jesslyn Goh and Mike Cuoco