Last updated: 2020-03-05

Checks: 7 0

Knit directory: 20170327_Psen2S4Ter_RNASeq/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200119)

The command set.seed(20200119) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 2c56cef

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   output/Enrichment_Hom_V_Het.csv
    Modified:   output/Enrichment_Mutant_V_WT.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	2c56cef	Steve Ped	2020-03-05	Tidied column names for enrichment output
html	7fe1505	Steve Ped	2020-03-05	Updated all analyses & heatmaps after changing to cpmPostNorm instead of fit\(fitted.values for enrichment/heatmaps</td> </tr> <tr> <td>Rmd</td> <td><a href="https://github.com/UofABioinformaticsHub/20170327_Psen2S4Ter_RNASeq/blob/bad81c1731ba8bc4487d25ec2719e92ae6917e4a/analysis/3_Enrichment_MutantVsWT.Rmd" target="_blank">bad81c1</a></td> <td>Steve Ped</td> <td>2020-03-05</td> <td>Updated MutVsWt UpSet plots for KEGG/HALLMARK</td> </tr> <tr> <td>Rmd</td> <td><a href="https://github.com/UofABioinformaticsHub/20170327_Psen2S4Ter_RNASeq/blob/377410a12ca13a7e233e39fb2f77f165e846d830/analysis/3_Enrichment_MutantVsWT.Rmd" target="_blank">377410a</a></td> <td>Steve Ped</td> <td>2020-03-05</td> <td>Finished revision of Enrichment Analysis</td> </tr> <tr> <td>Rmd</td> <td><a href="https://github.com/UofABioinformaticsHub/20170327_Psen2S4Ter_RNASeq/blob/c0b89ed8f761b0c7222538c353d21cd5aa402b94/analysis/3_Enrichment_MutantVsWT.Rmd" target="_blank">c0b89ed</a></td> <td>Steve Ped</td> <td>2020-03-05</td> <td>Reran after changing to cpmPotNorm instead of the incorrect fit\)fitted.values
html	7e59d3c	Steve Ped	2020-02-20	Recompiled
Rmd	4ede752	Steve Ped	2020-02-20	Added headings
html	0168824	Steve Ped	2020-02-20	Rebult after resizing
Rmd	de9c4b3	Steve Ped	2020-02-20	Resized heatmaps again
html	86ffa07	Steve Ped	2020-02-20	Updated saample layout figure
Rmd	4d2af23	Steve Ped	2020-02-19	Modified heatmap sizes & initial network layout
Rmd	8fe35b0	Steve Ped	2020-02-19	Added multiple heatmaps & UpSet plots
html	679b10d	Steve Ped	2020-02-19	Rendered OXPHOS Plot
Rmd	66ba4e0	Steve Ped	2020-02-19	Added OXPHOS plot
html	e0288c6	Steve Ped	2020-02-19	Generated enrichment tables
Rmd	541fbf0	Steve Ped	2020-02-19	Revised Hom Vs Het Enrichment
Rmd	77f167d	Steve Ped	2020-02-19	Revised enrichment
html	876e40f	Steve Ped	2020-02-17	Compiled after minor corrections
Rmd	53ed0e3	Steve Ped	2020-02-17	Corrected Ensembl Release
Rmd	f55be85	Steve Ped	2020-02-17	Added commas
Rmd	8f29458	Steve Ped	2020-02-17	Minor tweaks to formatting
html	9104ecd	Steve Ped	2020-01-28	First draft of Hom Vs Het
Rmd	207cdc8	Steve Ped	2020-01-28	Added code for Hom Vs Het Enrichment
Rmd	468e6e3	Steve Ped	2020-01-28	Ran enrichment of Mut Vs WT
html	468e6e3	Steve Ped	2020-01-28	Ran enrichment of Mut Vs WT
Rmd	3a9933c	Steve Ped	2020-01-28	Finished Enrichment analysis on MutVsWt
html	3a9933c	Steve Ped	2020-01-28	Finished Enrichment analysis on MutVsWt

Setup

library(tidyverse)
library(magrittr)
library(edgeR)
library(scales)
library(pander)
library(goseq)
library(msigdbr)
library(AnnotationDbi)
library(RColorBrewer)
library(ngsReports)
library(fgsea)
library(metap)
library(UpSetR)
library(pheatmap)

theme_set(theme_bw())
panderOptions("table.split.table", Inf)
panderOptions("table.style", "rmarkdown")
panderOptions("big.mark", ",")

samples <- read_csv("data/samples.csv") %>%
  distinct(sampleName, .keep_all = TRUE) %>%
  dplyr::select(sample = sampleName, sampleID, genotype) %>%
  mutate(
    genotype = factor(genotype, levels = c("WT", "Het", "Hom")),
    mutant = genotype %in% c("Het", "Hom"),
    homozygous = genotype == "Hom"
  )
genoCols <- samples$genotype %>%
  levels() %>%
  length() %>%
  brewer.pal("Set1") %>%
  setNames(levels(samples$genotype))

dgeList <- here::here("data/dgeList.rds") %>% read_rds()
cpmPostNorm <- here::here("data/cpmPostNorm.rds") %>% read_rds()
entrezGenes <- dgeList$genes %>%
  dplyr::filter(!is.na(entrezid)) %>%
  unnest(entrezid) %>%
  dplyr::rename(entrez_gene = entrezid)

formatP <- function(p, m = 0.0001){
  out <- rep("", length(p))
  out[p < m] <- sprintf("%.2e", p[p<m])
  out[p >= m] <- sprintf("%.4f", p[p>=m])
  out
}

Introduction

Enrichment analysis for this dataset present some challenges. Despite normalisation to account for gene length and GC bias, some appeared to still be present in the final results. In addition, the confounding of incomplete rRNA removal with genotype may lead to other distortions in both DE genes and ranking statistics.

Two steps for enrichment analysis will be undertaken.

Testing for enrichment within discrete sets of DE genes as defined in the previous steps
Testing for enrichment within ranked lists, regardless of DE status or statistical significance, before integration of all enrichment analyses using Fisher’s method for combining p-values.

Testing for enrichment within discrete gene sets will be performed using goseq as this allows for the incorporation of a single covariate as a predictor of differential expression. GC content, gene length and correlation with rRNA removal can all be supplied as separate covariates.

Testing for enrichment with ranked lists will be performed using:

fry as this can take into account inter-gene correlations. Values supplied will be fitted values for each gene/sample as these have been corrected for GC and length biases.
camera, which also accommodates inter-gene correlations. Values supplied will be fitted values for each gene/sample as these have been corrected for GC and length biases.
fgsea which is an R implementation of GSEA. This approach simply takes a ranked list and doesn’t directly account for correlations. However, the ranked list will be derived from analysis using CQN-normalisation so will be robust to these technical artefacts.

In the case of camera, inter-gene correlations will be calculated for each gene-set prior to analysis to ensure more conservative p-values are obtained.

Databases used for testing

Data was sourced using the msigdbr package. The initial database used for testing was the Hallmark Gene Sets, with mappings from gene-set to EntrezGene IDs performed by the package authors.

Hallmark Gene Sets

hm <- msigdbr("Danio rerio", category = "H")  %>% 
  left_join(entrezGenes) %>%
  dplyr::filter(!is.na(gene_id)) %>%
  distinct(gs_name, gene_id, .keep_all = TRUE)
hmByGene <- hm %>%
  split(f = .$gene_id) %>%
  lapply(extract2, "gs_name")
hmByID <- hm %>%
  split(f = .$gs_name) %>%
  lapply(extract2, "gene_id")

Mappings are required from gene to pathway, and Ensembl identifiers were used to map from gene to pathway, based on the mappings in the previously used annotations (Ensembl Release 98). A total of 3,459 Ensembl IDs were mapped to pathways from the hallmark gene sets.

KEGG Gene Sets

kg <- msigdbr("Danio rerio", category = "C2", subcategory = "CP:KEGG")  %>% 
  left_join(entrezGenes) %>%
  dplyr::filter(!is.na(gene_id)) %>%
  distinct(gs_name, gene_id, .keep_all = TRUE)
kgByGene <- kg  %>%
  split(f = .$gene_id) %>%
  lapply(extract2, "gs_name")
kgByID <- kg  %>%
  split(f = .$gs_name) %>%
  lapply(extract2, "gene_id")

The same mapping process was applied to KEGG gene sets. A total of 3,614 Ensembl IDs were mapped to pathways from the KEGG gene sets.

Gene Ontology Gene Sets

goSummaries <- url("https://uofabioinformaticshub.github.io/summaries2GO/data/goSummaries.RDS") %>%
  readRDS() %>%
  mutate(
    Term = Term(id),
    gs_name = Term %>% str_to_upper() %>% str_replace_all("[ -]", "_"),
    gs_name = paste0("GO_", gs_name)
    )
minPath <- 3

go <- msigdbr("Danio rerio", category = "C5") %>% 
  left_join(entrezGenes) %>%
  dplyr::filter(!is.na(gene_id)) %>%
  left_join(goSummaries) %>% 
  dplyr::filter(shortest_path >= minPath) %>%
  distinct(gs_name, gene_id, .keep_all = TRUE)
goByGene <- go %>%
  split(f = .$gene_id) %>%
  lapply(extract2, "gs_name")
goByID <- go %>%
  split(f = .$gs_name) %>%
  lapply(extract2, "gene_id")

For analysis of gene-sets from the GO database, gene-sets were restricted to those with 3 or more steps back to the ontology root terms. A total of 11,245 Ensembl IDs were mapped to pathways from restricted database of 8,834 GO gene sets.

gsSizes <- bind_rows(hm, kg, go) %>% 
  dplyr::select(gs_name, gene_symbol, gene_id) %>% 
  chop(c(gene_symbol, gene_id)) %>%
  mutate(gs_size = vapply(gene_symbol, length, integer(1)))

Enrichment in the DE Gene Set

deTable <- here::here("output", "psen2VsWT.csv") %>% 
  read_csv() %>%
  mutate(
    entrezid = dgeList$genes$entrezid[gene_id]
  )

The first step of analysis using goseq, regardless of the gene-set, is estimation of the probability weight function (PWF) which quantifies the probability of a gene being considered as DE based on a single covariate. As GC content and length should have been accounted for during conditional-quantile normalisation, these were not required for any bias. However, the gene-level correlations with rRNA contamination were instead used a predictor of bias in selection of a gene as being DE.

rawFqc <- list.files(
  path = here::here("data/0_rawData/FastQC/"),
  pattern = "zip",
  full.names = TRUE
) %>%
  FastqcDataList()
gc <- getModule(rawFqc, "Per_sequence_GC") %>%
  group_by(Filename) %>% 
  mutate(Freq = Count / sum(Count)) %>%
  ungroup()
gcDev <- gc %>%
  left_join(getGC(gcTheoretical, "Drerio", "Trans")) %>%
  mutate(
    sample = str_remove(Filename, "_R[12].fastq.gz"),
    resid = Freq - Drerio
  ) %>% 
  left_join(samples) %>%
  group_by(sample) %>%
  summarise(
    ss = sum(resid^2),
    n = n(),
    sd = sqrt(ss / (n - 1))
  )

riboVec <- structure(gcDev$sd, names = gcDev$sample)
riboCors <- cpm(dgeList, log = TRUE) %>%
  apply(1, function(x){
    cor(x, riboVec[names(x)])
  })

Values were calculated as per the previous steps, using the logCPM values for each gene, with the sample-level standard deviations from the theoretical GC distribution being used as a measure of rRNA contamination. For estimation of the probability weight function, squared correlations were used to place negative and positive correlations on the same scale. This accounts for genes which are both negatively & positively biased by the presence of excessive rRNA. Clearly, the confounding of genotype with rRNA means that some genes driven by the genuine biology may be down-weighted under this approach.

riboPwf <- deTable %>%
  mutate(riboCors = riboCors[gene_id]^2) %>%
  dplyr::select(gene_id, DE, riboCors) %>%
  distinct(gene_id, .keep_all = TRUE) %>% 
  with(
    nullp(
      DEgenes = structure(
        as.integer(DE), names = gene_id
      ), 
      genome = "danRer10", 
      id = "ensGene", 
      bias.data = riboCors,
      plot.fit = FALSE
    )
  )
plotPWF(riboPwf, main = "Bias from rRNA proportions")

Using this approach, it was clear that correlation with rRNA proportions significantly biased the probability of a gene being considered as DE.

Version	Author	Date
468e6e3	Steve Ped	2020-01-28

All gene-sets were then tested using this PWF.

Hallmark Gene Sets

hmRiboGoseq <- goseq(riboPwf, gene2cat = hmByGene) %>%
  as_tibble %>%
  dplyr::filter(numDEInCat > 0) %>%
  mutate( 
    adjP = p.adjust(over_represented_pvalue, method = "bonf"),
    FDR = p.adjust(over_represented_pvalue, method = "fdr")
  ) %>%
  dplyr::select(-contains("under")) %>%
  dplyr::rename(
    gs_name = category,
    PValue = over_represented_pvalue
  )

No gene-sets achieved significance in the DE genes with the lowest FDR being 41%

KEGG Gene Sets

kgRiboGoseq <- goseq(riboPwf, gene2cat = kgByGene) %>%
  as_tibble %>%
  dplyr::filter(numDEInCat > 0) %>%
  mutate( 
    adjP = p.adjust(over_represented_pvalue, method = "bonf"),
    FDR = p.adjust(over_represented_pvalue, method = "fdr")
  ) %>%
  dplyr::select(-contains("under"))  %>%
  dplyr::rename(
    gs_name = category,
    PValue = over_represented_pvalue
  )

kgRiboGoseq %>%
  dplyr::slice(1:5) %>%
  mutate(
    p = formatP(PValue),
    adjP = formatP(adjP),
    FDR = formatP(FDR)
  ) %>%
  dplyr::select(
    `Gene Set` = gs_name,
    DE = numDEInCat,
    `Set Size` = numInCat,
    PValue,
    `p~bonf~` = adjP,
    `p~FDR~` = FDR
  ) %>%
  pander(
    justify = "lrrrrr",
    caption = paste(
      "The", nrow(.), "most highly-ranked KEGG pathways.",
      "Bonferroni-adjusted p-values are the most stringent and give high",
      "confidence when below 0.05."
    )
  )

The 5 most highly-ranked KEGG pathways. Bonferroni-adjusted p-values are the most stringent and give high confidence when below 0.05.
Gene Set	DE	Set Size	PValue	p_bonf	p_FDR
KEGG_RIBOSOME	26	80	4.754e-09	5.32e-07	5.32e-07
KEGG_PRIMARY_IMMUNODEFICIENCY	2	15	0.01642	1.0000	0.7173
KEGG_CYSTEINE_AND_METHIONINE_METABOLISM	4	26	0.02959	1.0000	0.7173
KEGG_ASTHMA	1	3	0.04088	1.0000	0.7173
KEGG_RETINOL_METABOLISM	3	25	0.0512	1.0000	0.7173

Notably, the KEGG gene-set for Ribosomal genes was detected as enriched in the set of DE genes, with no other KEGG gene-sets being considered significant.

GO Gene Sets

goRiboGoseq <- goseq(riboPwf, gene2cat = goByGene) %>%
  as_tibble %>%
  dplyr::filter(numDEInCat > 0) %>%
  mutate( 
    adjP = p.adjust(over_represented_pvalue, method = "bonf"),
    FDR = p.adjust(over_represented_pvalue, method = "fdr")
  ) %>%
  dplyr::select(-contains("under")) %>%
  dplyr::rename(
    gs_name = category,
    PValue = over_represented_pvalue
  )

goRiboGoseq %>%
  dplyr::filter(adjP < 0.05) %>%
  mutate(
    p = formatP(PValue),
    adjP = formatP(adjP),
    FDR = formatP(FDR)
  ) %>%
  dplyr::select(
    `Gene Set` = gs_name,
    DE = numDEInCat,
    `Set Size` = numInCat,
    PValue,
    `p~bonf~` = adjP,
    `p~FDR~` = FDR
  ) %>%
  pander(
    justify = "lrrrrr",
    caption = paste(
      "*The", nrow(.), "most highly-ranked GO terms.",
      "Bonferroni-adjusted p-values are the most stringent and give high",
      "confidence when below 0.05, with all terms here reaching this threshold.",
      "However, most terms once again indicate the presence of rRNA.*"
    )
  )

The 14 most highly-ranked GO terms. Bonferroni-adjusted p-values are the most stringent and give high confidence when below 0.05, with all terms here reaching this threshold. However, most terms once again indicate the presence of rRNA.
Gene Set	DE	Set Size	PValue	p_bonf	p_FDR
GO_COTRANSLATIONAL_PROTEIN_TARGETING_TO_MEMBRANE	28	93	7.36e-10	2.83e-06	2.03e-06
GO_CYTOSOLIC_RIBOSOME	27	97	1.056e-09	4.06e-06	2.03e-06
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM	27	105	1.6e-08	6.16e-05	2.05e-05
GO_TRANSLATIONAL_INITIATION	29	177	2.178e-07	0.0008	0.0002
GO_PROTEIN_TARGETING_TO_MEMBRANE	30	171	2.533e-07	0.0010	0.0002
GO_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM	27	128	3.849e-07	0.0015	0.0002
GO_CYTOSOLIC_PART	30	206	4.424e-07	0.0017	0.0002
GO_VIRAL_GENE_EXPRESSION	29	172	6.939e-07	0.0027	0.0003
GO_NUCLEAR_TRANSCRIBED_MRNA_CATABOLIC_PROCESS	30	183	7.911e-07	0.0030	0.0003
GO_RIBOSOMAL_SUBUNIT	28	175	1.99e-06	0.0077	0.0008
GO_RIBOSOME	30	210	2.659e-06	0.0102	0.0009
GO_CYTOSOLIC_LARGE_RIBOSOMAL_SUBUNIT	15	51	2.826e-06	0.0109	0.0009
GO_SYMPORTER_ACTIVITY	10	94	7.571e-06	0.0291	0.0022
GO_ANION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	17	228	1.23e-05	0.0473	0.0034

Enrichment Testing on Ranked Lists

rnk <- structure(
  -sign(deTable$logFC)*log10(deTable$PValue), 
  names = deTable$gene_id
) %>% sort()
np <- 1e5

Genes were ranked by -sign(logFC)*log₁₀(p) for approaches which required a ranked list. Multiple approaches were first calculated individually, before being combined for the final integrated set of results.

Hallmark Gene Sets

hmFry <- cpmPostNorm %>%
  fry(
    index = hmByID,
    design = dgeList$design,
    contrast = "mutant",
    sort = "directional"
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For analysis under camera when inter-gene correlations were calculated for a more conservative result.

hmCamera <- cpmPostNorm %>%
  camera(
    index = hmByID,
    design = dgeList$design,
    contrast = "mutant",
    inter.gene.cor = NULL
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For generation of the GSEA ranked list, 100,000 permutations were conducted.

hmGsea <- fgsea(
  pathways = hmByID, 
  stats = rnk,
  nperm = np
) %>%
  as_tibble() %>%
  dplyr::rename(gs_name = pathway, PValue = pval) %>%
  arrange(PValue)

Results for all analyses, including goseq were then combined using Wilkinson’s method to combine p-values. For a conservative approach, under \(m\) tests, the \(m - 1^{\text{th}}\) smallest p-value was chosen.

hmMeta <- hmFry %>%
  dplyr::select(gs_name, fry = PValue) %>%
  left_join(
    dplyr::select(hmCamera, gs_name, camera = PValue)
  ) %>%
  left_join(
    dplyr::select(hmGsea, gs_name, gsea = PValue)
  ) %>%
  left_join(
    dplyr::select(hmRiboGoseq, gs_name, goseq = PValue)
  ) %>% 
  nest(p = one_of(c("fry", "camera", "gsea", "goseq"))) %>%
  mutate(
    n_p = vapply(p, function(x){sum(!is.na(unlist(x)))}, integer(1)), 
    wilkinson_p = vapply(p, function(x){
      x <- unlist(x)
      x <- x[!is.na(x)]
      wilkinsonp(x, length(x) - 1)$p
    }, numeric(1)),
    FDR = p.adjust(wilkinson_p, "fdr"), 
    adjP = p.adjust(wilkinson_p, "bonferroni")
  ) %>% 
  arrange(wilkinson_p) %>% 
  unnest(p) %>%
  left_join(gsSizes) %>%
  mutate(
    DE = lapply(gene_id, intersect, dplyr::filter(deTable, DE)$gene_id),
    DE = lapply(DE, unique),
    nDE = vapply(DE, length, integer(1))
  )

hmMeta %>%
  dplyr::filter(FDR < 0.05) %>%
  mutate_at(vars(one_of(c("wilkinson_p", "FDR", "adjP"))), formatP) %>%
  dplyr::select(`Gene Set` = gs_name, `Number DE` = nDE, `Set Size` = gs_size, `Wilkinson~p~` = wilkinson_p, `p~FDR~` = FDR, `p~bonf~` = adjP) %>%
  pander(
    caption = "Results from combining all above approaches for the Hallmark Gene Sets. All terms are significant to an FDR of 0.05.",
    justify = "lrrrrr"
  )

Results from combining all above approaches for the Hallmark Gene Sets. All terms are significant to an FDR of 0.05.
Gene Set	Number DE	Set Size	Wilkinson_p	p_FDR	p_bonf
HALLMARK_UV_RESPONSE_UP	7	141	5.90e-08	2.95e-06	2.95e-06
HALLMARK_XENOBIOTIC_METABOLISM	11	146	1.87e-05	0.0005	0.0009
HALLMARK_MTORC1_SIGNALING	11	194	9.13e-05	0.0012	0.0046
HALLMARK_FATTY_ACID_METABOLISM	6	135	9.46e-05	0.0012	0.0047
HALLMARK_GLYCOLYSIS	8	172	0.0002	0.0015	0.0084
HALLMARK_PEROXISOME	2	95	0.0002	0.0015	0.0115
HALLMARK_COAGULATION	3	80	0.0003	0.0015	0.0127
HALLMARK_MYC_TARGETS_V1	15	192	0.0003	0.0015	0.0138
HALLMARK_CHOLESTEROL_HOMEOSTASIS	2	69	0.0003	0.0015	0.0153
HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY	5	45	0.0003	0.0015	0.0158
HALLMARK_BILE_ACID_METABOLISM	1	86	0.0003	0.0015	0.0169
HALLMARK_ESTROGEN_RESPONSE_LATE	3	153	0.0004	0.0016	0.0188
HALLMARK_ADIPOGENESIS	13	186	0.0007	0.0027	0.0355
HALLMARK_HYPOXIA	9	169	0.0009	0.0032	0.0451
HALLMARK_PROTEIN_SECRETION	3	92	0.0009	0.0032	0.0475
HALLMARK_UNFOLDED_PROTEIN_RESPONSE	4	113	0.0017	0.0054	0.0864
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION	7	156	0.0020	0.0060	0.1020
HALLMARK_TNFA_SIGNALING_VIA_NFKB	7	149	0.0037	0.0099	0.1841
HALLMARK_IL2_STAT5_SIGNALING	2	147	0.0038	0.0099	0.1889
HALLMARK_OXIDATIVE_PHOSPHORYLATION	15	201	0.0041	0.0101	0.2028
HALLMARK_NOTCH_SIGNALING	0	29	0.0135	0.0317	0.6744
HALLMARK_ANGIOGENESIS	1	26	0.0140	0.0317	0.6985
HALLMARK_ANDROGEN_RESPONSE	2	90	0.0222	0.0482	1.0000

Exploration of significant Hallmark Gene Sets

hmMeta %>% 
  dplyr::filter(nDE > 0, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>% 
  mutate(
    gs_name = str_remove(gs_name, "HALLMARK_"),
    gs_name = fct_lump(gs_name, n = 12)
    ) %>% 
  split(f = .$gs_name) %>% 
  lapply(magrittr::extract2, "DE") %>% 
  fromList() %>% 
  upset(
    nsets = length(.), 
    order.by = "freq", 
    mb.ratio = c(0.6, 0.4)
    )

UpSet plot indicating distribution of DE genes within all significant HALLMARK gene sets. Most gene sets seem relatively independent of each other.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
7e59d3c	Steve Ped	2020-02-20
86ffa07	Steve Ped	2020-02-20

hmHeat <- hmMeta %>% 
  dplyr::filter(nDE > 10, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>%
  left_join(dgeList$genes, by = c("DE" = "gene_id")) %>%
  dplyr::select(gs_name, DE, gene_name) %>%
  mutate(belongs = TRUE) %>%
  pivot_wider(
    id_cols = c(DE, gene_name),
    names_from = gs_name,
    values_from = belongs,
    values_fill = list(belongs = FALSE)
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("DE")
  )
hmHeat %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 6,
    cutree_cols = 2,
    annotation_names_row = FALSE,
    annotation_row = hmHeat %>%
      dplyr::select(gene_name, starts_with("HALLMARK_")) %>%
      mutate_at(vars(starts_with("HALL")), as.character) %>%
      as.data.frame() %>%
      column_to_rownames("gene_name") %>%
      set_colnames(
        str_remove(colnames(.), "HALLMARK_")
      )
  )

Gene expression patterns for all DE genes in HALLMARK gene sets, containing more than 10 DE genes.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
86ffa07	Steve Ped	2020-02-20

KEGG Gene Sets

kgFry <- cpmPostNorm %>%
  fry(
    index = kgByID,
    design = dgeList$design,
    contrast = "mutant",
    sort = "directional"
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For analysis under camera when inter-gene correlations were calculated for a more conservative result.

kgCamera <- cpmPostNorm %>%
  camera(
    index = kgByID,
    design = dgeList$design,
    contrast = "mutant",
    inter.gene.cor = NULL
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For generation of the GSEA ranked list, 100,000 permutations were conducted.

kgGsea <- fgsea(
  pathways = kgByID, 
  stats = rnk,
  nperm = np
) %>%
  as_tibble() %>%
  dplyr::rename(gs_name = pathway, PValue = pval) %>%
  arrange(PValue)

kgMeta <- kgFry %>%
  dplyr::select(gs_name, fry = PValue) %>%
  left_join(
    dplyr::select(kgCamera, gs_name, camera = PValue)
  ) %>%
  left_join(
    dplyr::select(kgGsea, gs_name, gsea = PValue)
  ) %>%
  left_join(
    dplyr::select(kgRiboGoseq, gs_name, goseq = PValue)
  ) %>% 
  nest(p = one_of(c("fry", "camera", "gsea", "goseq"))) %>%
  mutate(
    n_p = vapply(p, function(x){sum(!is.na(unlist(x)))}, integer(1)), 
    wilkinson_p = vapply(p, function(x){
      x <- unlist(x)
      x <- x[!is.na(x)]
      wilkinsonp(x, length(x) - 1)$p
    }, numeric(1)),
    FDR = p.adjust(wilkinson_p, "fdr"), 
    adjP = p.adjust(wilkinson_p, "bonferroni")
  ) %>% 
  arrange(wilkinson_p) %>% 
  unnest(p) %>%
  left_join(gsSizes) %>%
  mutate(
    DE = lapply(gene_id, intersect, dplyr::filter(deTable, DE)$gene_id),
    DE = lapply(DE, unique),
    nDE = vapply(DE, length, integer(1))
  )

kgMeta %>%
  dplyr::filter(FDR < 0.05, nDE > 2) %>%
  mutate_at(vars(one_of(c("wilkinson_p", "FDR", "adjP"))), formatP) %>%
  dplyr::select(`Gene Set` = gs_name, `Number DE` = nDE, `Set Size` = gs_size, `Wilkinson~p~` = wilkinson_p, `p~FDR~` = FDR, `p~bonf~` = adjP) %>%
  pander(
    caption = "Results from combining all above approaches for the KEGG Gene Sets. All terms are significant to an FDR of 0.05. Only Gene Sets with at least three DE genes are shown.",
    justify = "lrrrrr"
  )

Results from combining all above approaches for the KEGG Gene Sets. All terms are significant to an FDR of 0.05. Only Gene Sets with at least three DE genes are shown.
Gene Set	Number DE	Set Size	Wilkinson_p	p_FDR	p_bonf
KEGG_RIBOSOME	26	80	8.32e-09	1.55e-06	1.55e-06
KEGG_GLUTATHIONE_METABOLISM	4	36	1.22e-06	7.56e-05	0.0002
KEGG_CYSTEINE_AND_METHIONINE_METABOLISM	4	26	0.0001	0.0013	0.0189
KEGG_METABOLISM_OF_XENOBIOTICS_BY_CYTOCHROME_P450	3	26	0.0002	0.0020	0.0393
KEGG_DRUG_METABOLISM_CYTOCHROME_P450	3	26	0.0003	0.0021	0.0487
KEGG_RETINOL_METABOLISM	3	25	0.0005	0.0036	0.0960
KEGG_PARKINSONS_DISEASE	13	112	0.0006	0.0040	0.1122
KEGG_PURINE_METABOLISM	4	135	0.0007	0.0044	0.1310
KEGG_HUNTINGTONS_DISEASE	15	159	0.0027	0.0125	0.5003
KEGG_PATHOGENIC_ESCHERICHIA_COLI_INFECTION	3	44	0.0036	0.0148	0.6616
KEGG_ALZHEIMERS_DISEASE	12	141	0.0039	0.0159	0.7294
KEGG_OXIDATIVE_PHOSPHORYLATION	14	119	0.0046	0.0180	0.8633
KEGG_HEMATOPOIETIC_CELL_LINEAGE	3	31	0.0047	0.0180	0.8812
KEGG_GLYCOLYSIS_GLUCONEOGENESIS	3	39	0.0067	0.0229	1.0000
KEGG_CARDIAC_MUSCLE_CONTRACTION	3	63	0.0158	0.0451	1.0000
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS	5	86	0.0182	0.0498	1.0000

Exploration of Significant KEGG Gene Sets

kgMeta %>% 
  dplyr::filter(nDE > 2, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>% 
  mutate(
    gs_name = str_remove(gs_name, "KEGG_"),
    gs_name = fct_lump(gs_name, n = 8)
    ) %>% 
  split(f = .$gs_name) %>% 
  lapply(magrittr::extract2, "DE") %>% 
  fromList() %>% 
  upset(
    nsets = length(.), 
    order.by = "freq",
    mb.ratio = c(0.6, 0.4)
    )

UpSet plot indicating distribution of DE genes within all significant terms from the KEGG gene sets. There is considerable overlap between Oxidative Phosphorylation and both Parkinson’s and Huntington’s Disease, indicating that these terms essentially capture the same signal.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
7e59d3c	Steve Ped	2020-02-20
86ffa07	Steve Ped	2020-02-20

riboOx <- kg %>%
  dplyr::filter(
    grepl("OXIDATIVE_PHOS", gs_name) | grepl("RIBOSOME", gs_name),
    gene_id %in% dplyr::filter(deTable, DE)$gene_id
    ) %>%
  mutate(gs_name = str_remove(gs_name, "(KEGG|HALLMARK)_")) %>%
  distinct(gs_name, gene_id) %>%
  left_join(
    cpmPostNorm %>% 
      as.data.frame() %>% 
      rownames_to_column("gene_id") %>%
      as_tibble()
    ) %>%
  pivot_longer(cols = starts_with("Ps"), names_to = "sample", values_to = "CPM") %>%
  left_join(dgeList$samples) %>% 
  left_join(dgeList$genes) %>%
  dplyr::select(gs_name, gene_name, sampleID, genotype, CPM) %>%
  pivot_wider(
    id_cols = c(gs_name, gene_name),
    values_from = CPM,
    names_from = sampleID
  ) 
riboOx %>% 
  dplyr::select(-gs_name) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    annotation_row = data.frame(
      GeneSet = riboOx$gs_name %>% str_replace("OXI.+", "OXPHOS"), 
      row.names = riboOx$gene_name
      ),
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    annotation_names_row = FALSE,
    cutree_rows = 3,
    cutree_cols = 2
  )

Given that the largest gene sets within the KEGG results were Oxidative Phosphorylation and Ribosomal gene sets, all DE genes associated with these KEGG terms are displayed. All genes appear to show increased expression in mutant samples.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
86ffa07	Steve Ped	2020-02-20

GO Gene Sets

goFry <- cpmPostNorm %>%
  fry(
    index = goByID,
    design = dgeList$design,
    contrast = "mutant",
    sort = "directional"
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For analysis under camera inter-gene correlations were directly calculated for a more conservative result.

goCamera <- cpmPostNorm %>%
  camera(
    index = goByID,
    design = dgeList$design,
    contrast = "mutant",
    inter.gene.cor = NULL
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For generation of the GSEA ranked list, 100,000 permutations were conducted.

goGsea <- fgsea(
  pathways = goByID, 
  stats = rnk,
  nperm = np
) %>%
  as_tibble() %>%
  dplyr::rename(gs_name = pathway, PValue = pval) %>%
  arrange(PValue)

goMeta <- goFry %>%
  dplyr::select(gs_name, fry = PValue) %>%
  left_join(
    dplyr::select(goCamera, gs_name, camera = PValue)
  ) %>%
  left_join(
    dplyr::select(goGsea, gs_name, gsea = PValue)
  ) %>%
  left_join(
    dplyr::select(goRiboGoseq, gs_name, goseq = PValue)
  )  %>% 
  nest(p = one_of(c("fry", "camera", "gsea", "goseq"))) %>%
  mutate(
    n_p = vapply(p, function(x){sum(!is.na(unlist(x)))}, integer(1)), 
    wilkinson_p = vapply(p, function(x){
      x <- unlist(x)
      x <- x[!is.na(x)]
      wilkinsonp(x, length(x) - 1)$p
    }, numeric(1)),
    FDR = p.adjust(wilkinson_p, "fdr"), 
    adjP = p.adjust(wilkinson_p, "bonferroni")
  ) %>% 
  arrange(wilkinson_p) %>% 
   unnest(p) %>%
  left_join(gsSizes) %>%
  mutate(
    DE = lapply(gene_id, intersect, dplyr::filter(deTable, DE)$gene_id),
    DE = lapply(DE, unique),
    nDE = vapply(DE, length, integer(1))
  )

goMeta %>%
  dplyr::filter(FDR < 0.01, nDE >= 10) %>%
  mutate_at(vars(one_of(c("wilkinson_p", "FDR", "adjP"))), formatP) %>%
  mutate(gs_name = str_trunc(gs_name, 70)) %>%
  dplyr::select(`Gene Set` = gs_name, `Number DE` = nDE, `Set Size` = gs_size, `Wilkinson~p~` = wilkinson_p, `p~FDR~` = FDR, `p~bonf~` = adjP) %>%
  pander(
    caption = "Results from combining all above approaches for the GO Gene Sets. All terms are significant using an FDR < 0.01. Only Gene Sets with at least ten DE genes are shown.",
    justify = "lrrrrr"
  )

Results from combining all above approaches for the GO Gene Sets. All terms are significant using an FDR < 0.01. Only Gene Sets with at least ten DE genes are shown.
Gene Set	Number DE	Set Size	Wilkinson_p	p_FDR	p_bonf
GO_SYMPORTER_ACTIVITY	10	94	7.34e-12	6.49e-08	6.49e-08
GO_SECONDARY_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	13	154	2.41e-11	7.50e-08	2.13e-07
GO_ANION_TRANSMEMBRANE_TRANSPORT	15	203	2.55e-11	7.50e-08	2.25e-07
GO_ORGANIC_ANION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	11	144	6.48e-11	1.04e-07	5.72e-07
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_MEMBRANE	30	287	7.06e-11	1.04e-07	6.23e-07
GO_ANION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	17	228	1.42e-10	1.79e-07	1.26e-06
GO_ANION_TRANSPORT	22	434	3.19e-10	3.38e-07	2.81e-06
GO_PROTEIN_TARGETING_TO_MEMBRANE	30	171	2.57e-09	2.27e-06	2.27e-05
GO_PROTEIN_TARGETING	34	379	7.09e-09	5.22e-06	6.26e-05
GO_CYTOSOLIC_PART	30	206	9.18e-09	5.52e-06	8.11e-05
GO_CYTOSOLIC_SMALL_RIBOSOMAL_SUBUNIT	11	41	9.40e-09	5.52e-06	8.30e-05
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_ORGANELLE	38	484	1.00e-08	5.52e-06	8.84e-05
GO_TRANSLATIONAL_INITIATION	29	177	1.29e-08	6.70e-06	0.0001
GO_CYTOSOLIC_LARGE_RIBOSOMAL_SUBUNIT	15	51	1.47e-08	7.20e-06	0.0001
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM	27	105	1.58e-08	7.34e-06	0.0001
GO_COTRANSLATIONAL_PROTEIN_TARGETING_TO_MEMBRANE	28	93	1.91e-08	8.46e-06	0.0002
GO_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM	27	128	2.36e-08	9.92e-06	0.0002
GO_CYTOSOLIC_RIBOSOME	27	97	3.13e-08	1.20e-05	0.0003
GO_ORGANIC_ACID_TRANSPORT	14	240	4.33e-08	1.59e-05	0.0004
GO_ORGANIC_CYCLIC_COMPOUND_CATABOLIC_PROCESS	40	474	5.44e-08	1.92e-05	0.0005
GO_RNA_CATABOLIC_PROCESS	34	337	7.51e-08	2.46e-05	0.0007
GO_RESPONSE_TO_DRUG	31	746	1.05e-07	3.30e-05	0.0009
GO_NUCLEAR_TRANSCRIBED_MRNA_CATABOLIC_PROCESS	30	183	1.61e-07	4.45e-05	0.0014
GO_CYTOPLASMIC_TRANSLATION	11	87	3.11e-07	7.09e-05	0.0028
GO_ORGANONITROGEN_COMPOUND_BIOSYNTHETIC_PROCESS	68	1,431	3.13e-07	7.09e-05	0.0028
GO_AMIDE_BIOSYNTHETIC_PROCESS	40	678	3.31e-07	7.31e-05	0.0029
GO_CELLULAR_AMIDE_METABOLIC_PROCESS	45	805	3.62e-07	7.79e-05	0.0032
GO_INNER_MITOCHONDRIAL_MEMBRANE_PROTEIN_COMPLEX	16	124	3.77e-07	7.94e-05	0.0033
GO_SMALL_RIBOSOMAL_SUBUNIT	11	69	3.96e-07	8.13e-05	0.0035
GO_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	15	254	4.19e-07	8.22e-05	0.0037
GO_ORGANIC_ANION_TRANSPORT	16	345	4.49e-07	8.63e-05	0.0040
GO_PEPTIDE_BIOSYNTHETIC_PROCESS	38	572	1.16e-06	0.0002	0.0102
GO_RESPONSE_TO_CARBOHYDRATE	11	176	1.31e-06	0.0002	0.0116
GO_RIBOSOMAL_SUBUNIT	28	175	1.34e-06	0.0002	0.0118
GO_ORGANIC_ACID_BIOSYNTHETIC_PROCESS	12	198	1.63e-06	0.0003	0.0144
GO_RRNA_BINDING	11	57	2.23e-06	0.0003	0.0197
GO_RIBOSOME	30	210	2.32e-06	0.0003	0.0205
GO_REGULATION_OF_HORMONE_LEVELS	19	362	2.48e-06	0.0003	0.0220
GO_REGULATION_OF_HORMONE_SECRETION	12	200	2.49e-06	0.0003	0.0220
GO_RESPONSE_TO_TOXIC_SUBSTANCE	20	380	3.18e-06	0.0004	0.0281
GO_PROTEIN_LOCALIZATION_TO_MEMBRANE	35	539	3.42e-06	0.0004	0.0302
GO_VIRAL_GENE_EXPRESSION	29	172	3.66e-06	0.0004	0.0323
GO_LARGE_RIBOSOMAL_SUBUNIT	17	108	3.84e-06	0.0005	0.0340
GO_SMALL_MOLECULE_BIOSYNTHETIC_PROCESS	23	478	6.98e-06	0.0007	0.0616
GO_RNA_METABOLIC_PROCESS	48	915	1.27e-05	0.0011	0.1119
GO_OXIDATIVE_PHOSPHORYLATION	15	122	1.35e-05	0.0011	0.1189
GO_MONOCARBOXYLIC_ACID_METABOLIC_PROCESS	10	315	2.46e-05	0.0016	0.2174
GO_SEQUENCE_SPECIFIC_DNA_BINDING	29	827	2.63e-05	0.0017	0.2327
GO_HOMEOSTATIC_PROCESS	55	1,368	3.07e-05	0.0018	0.2715
GO_RESPONSE_TO_OXYGEN_CONTAINING_COMPOUND	43	1,156	3.17e-05	0.0018	0.2803
GO_LIPID_BIOSYNTHETIC_PROCESS	13	508	3.25e-05	0.0018	0.2874
GO_REGULATION_OF_SMALL_MOLECULE_METABOLIC_PROCESS	11	262	3.61e-05	0.0020	0.3186
GO_NEGATIVE_REGULATION_OF_CELL_DIFFERENTIATION	23	549	4.09e-05	0.0021	0.3612
GO_GENERATION_OF_PRECURSOR_METABOLITES_AND_ENERGY	25	422	4.45e-05	0.0022	0.3931
GO_REGULATORY_REGION_NUCLEIC_ACID_BINDING	24	720	4.60e-05	0.0023	0.4059
GO_ATP_SYNTHESIS_COUPLED_ELECTRON_TRANSPORT	12	85	4.62e-05	0.0023	0.4078
GO_RESPIRATORY_CHAIN_COMPLEX	11	75	5.10e-05	0.0025	0.4508
GO_LIPID_METABOLIC_PROCESS	30	906	5.38e-05	0.0025	0.4750
GO_ENZYME_INHIBITOR_ACTIVITY	12	219	5.92e-05	0.0026	0.5226
GO_CELLULAR_LIPID_METABOLIC_PROCESS	16	649	6.23e-05	0.0027	0.5508
GO_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	25	616	6.28e-05	0.0027	0.5544
GO_ORGANIC_ACID_METABOLIC_PROCESS	23	648	7.15e-05	0.0030	0.6318
GO_ENERGY_DERIVATION_BY_OXIDATION_OF_ORGANIC_COMPOUNDS	18	242	7.55e-05	0.0031	0.6666
GO_ENDOPLASMIC_RETICULUM	38	1,407	9.11e-05	0.0034	0.8048
GO_POSITIVE_REGULATION_OF_TRANSCRIPTION_BY_RNA_POLYMERASE_II	31	903	9.22e-05	0.0034	0.8141
GO_NUCLEAR_BODY	21	649	9.34e-05	0.0035	0.8250
GO_REGULATION_OF_NUCLEOBASE_CONTAINING_COMPOUND_METABOLIC_PROCESS	24	421	9.38e-05	0.0035	0.8286
GO_ENDOPLASMIC_RETICULUM_PART	26	1,018	9.68e-05	0.0035	0.8551
GO_REGULATION_OF_APOPTOTIC_SIGNALING_PATHWAY	16	301	0.0001	0.0039	0.9840
GO_CELLULAR_MACROMOLECULE_CATABOLIC_PROCESS	48	939	0.0001	0.0039	1.0000
GO_PROTEIN_LOCALIZATION_TO_ORGANELLE	43	788	0.0001	0.0039	1.0000
GO_DOUBLE_STRANDED_DNA_BINDING	30	740	0.0001	0.0040	1.0000
GO_SEQUENCE_SPECIFIC_DOUBLE_STRANDED_DNA_BINDING	23	668	0.0001	0.0043	1.0000
GO_TRANSMEMBRANE_TRANSPORT	38	1,149	0.0002	0.0047	1.0000
GO_RESPIRATORY_ELECTRON_TRANSPORT_CHAIN	12	103	0.0002	0.0047	1.0000
GO_PROXIMAL_PROMOTER_SEQUENCE_SPECIFIC_DNA_BINDING	15	413	0.0002	0.0049	1.0000
GO_ELECTRON_TRANSPORT_CHAIN	14	153	0.0002	0.0051	1.0000
GO_ION_TRANSPORT	41	1,184	0.0002	0.0051	1.0000
GO_INFLAMMATORY_RESPONSE	18	376	0.0002	0.0052	1.0000
GO_COFACTOR_METABOLIC_PROCESS	10	317	0.0002	0.0052	1.0000
GO_APOPTOTIC_SIGNALING_PATHWAY	24	458	0.0002	0.0052	1.0000
GO_HORMONE_TRANSPORT	12	242	0.0002	0.0053	1.0000
GO_TRANSCRIPTION_FACTOR_BINDING	24	563	0.0002	0.0060	1.0000
GO_ORGANIC_HYDROXY_COMPOUND_METABOLIC_PROCESS	15	342	0.0002	0.0061	1.0000
GO_RESPONSE_TO_CYTOKINE	32	752	0.0003	0.0062	1.0000
GO_RESPONSE_TO_OXIDATIVE_STRESS	14	347	0.0003	0.0064	1.0000
GO_SECRETORY_VESICLE	24	713	0.0003	0.0072	1.0000
GO_NUCLEOSIDE_PHOSPHATE_BIOSYNTHETIC_PROCESS	10	227	0.0003	0.0074	1.0000
GO_CELLULAR_RESPONSE_TO_OXYGEN_CONTAINING_COMPOUND	31	816	0.0004	0.0079	1.0000
GO_NEGATIVE_REGULATION_OF_APOPTOTIC_SIGNALING_PATHWAY	10	171	0.0004	0.0079	1.0000
GO_MACROMOLECULE_CATABOLIC_PROCESS	52	1,116	0.0004	0.0079	1.0000
GO_REGULATION_OF_DNA_METABOLIC_PROCESS	15	236	0.0004	0.0080	1.0000
GO_CATION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	18	458	0.0004	0.0081	1.0000
GO_CYTOPLASMIC_VESICLE_PART	22	1,141	0.0005	0.0087	1.0000
GO_ION_TRANSMEMBRANE_TRANSPORT	29	826	0.0005	0.0088	1.0000
GO_PROTEIN_HOMOOLIGOMERIZATION	10	246	0.0005	0.0090	1.0000
GO_POSITIVE_REGULATION_OF_RNA_BIOSYNTHETIC_PROCESS	43	1,232	0.0005	0.0093	1.0000
GO_VESICLE_MEMBRANE	11	609	0.0005	0.0093	1.0000
GO_MONOVALENT_INORGANIC_CATION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	13	268	0.0005	0.0097	1.0000
GO_INTRACELLULAR_TRANSPORT	65	1,571	0.0005	0.0098	1.0000
GO_TRANSMEMBRANE_RECEPTOR_PROTEIN_TYROSINE_KINASE_SIGNALING_PATHWAY	12	569	0.0006	0.0098	1.0000
GO_CARBOHYDRATE_DERIVATIVE_BIOSYNTHETIC_PROCESS	18	540	0.0006	0.0099	1.0000

Exploration of Significant GO Gene Sets

goMeta %>% 
  dplyr::filter(nDE >= 25, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>% 
  mutate(
    gs_name = str_remove(gs_name, "GO_"),
    gs_name = fct_lump(gs_name, n = 20)
    ) %>% 
  split(f = .$gs_name) %>% 
  lapply(magrittr::extract2, "DE") %>% 
  .[setdiff(names(.), "Other")] %>%
  fromList() %>% 
  upset(
    nsets = length(.), 
    order.by = "freq", 
    mb.ratio = c(0.55, 0.45)
  )

UpSet plot indicating distribution of DE genes within significant terms from the GO gene sets. For this visualisation, GO terms were restricted to those with 25 or more DE genes and an FDR < 0.05. A group of about 300 genes is widely spread across multiple terms and not shown, with the next largest group being 19 genes uniquely associated with RNA Metabolic Process. A group of 15 genes is shared by a large group of terms, indicating that these genes largely drive the signal for these terms. These genes tend to indicate Ribosomal activity (i.e. protein synthesis). However, a set of 14 genes seems to more uniquely indicate Mitochondrial activity, followed by 12 uniquely associated with Ion Transport, and a set of 9 genes associated with homeostasis followed by 25 (9 + + 8) involved with regulation of gene expression.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
0168824	Steve Ped	2020-02-20
86ffa07	Steve Ped	2020-02-20

Genes Associated with RNA Metabolic Process

go %>%
  dplyr::filter(
    gs_name == "GO_RNA_METABOLIC_PROCESS",
    gene_id %in% dplyr::filter(deTable, DE)$gene_id
  ) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )

All DE genes associated with the GO term ‘RNA Metabolic Process’.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
0168824	Steve Ped	2020-02-20
86ffa07	Steve Ped	2020-02-20

Genes Associated with the Mitochondrial Envelope

go %>%
  dplyr::filter(
    gs_name == "GO_MITOCHONDRION",
    gene_id %in% dplyr::filter(deTable, DE)$gene_id
  ) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 5,
    cutree_cols = 2
  )

All DE genes associated with the GO term ‘Mitochondrion’.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
86ffa07	Steve Ped	2020-02-20

Genes Associated with Ion Transport

go %>%
  dplyr::filter(
    gene_id %in% dplyr::filter(deTable, DE)$gene_id,
    gs_name %in% c("GO_ION_TRANSPORT")
  ) %>%
  group_by(gene_id, gene_name) %>%
  tally() %>%
  ungroup() %>%
  dplyr::filter(n == 1) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )

All DE genes associated with the GO term ‘Ion Transport’.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05
0168824	Steve Ped	2020-02-20
86ffa07	Steve Ped	2020-02-20

Genes Associated with Homeostasis

go %>%
  dplyr::filter(
    gene_id %in% dplyr::filter(deTable, DE)$gene_id,
    gs_name %in% c("GO_HOMEOSTATIC_PROCESS")
  ) %>%
  group_by(gene_id, gene_name) %>%
  tally() %>%
  ungroup() %>%
  dplyr::filter(n == 1) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )

All DE genes associated with the GO term ‘Homeostatic Process’.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05

Genes Associated with Gene Regulation

go %>%
  dplyr::filter(
    gene_id %in% dplyr::filter(deTable, DE)$gene_id,
    gs_name %in% c("GO_POSITIVE_REGULATION_OF_GENE_EXPRESSION")
  ) %>%
  group_by(gene_id, gene_name) %>%
  tally() %>%
  ungroup() %>%
  dplyr::filter(n == 1) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )

All DE genes associated with the GO term ‘Positive Regulation of Gene Expression’.

Version	Author	Date
7fe1505	Steve Ped	2020-03-05

Data Export

All enriched gene sets with an FDR adjusted p-value < 0.05 were exported as a single csv file.

add_prefix <- function(x, pre = "p_"){
  paste0(pre, x)
}
bind_rows(
  hmMeta,
  kgMeta,
  goMeta
) %>%
  dplyr::filter(FDR < 0.05) %>%
  mutate(
    DE = lapply(DE, function(x){dplyr::filter(deTable, gene_id %in% x)$gene_name}),
    DE = lapply(DE, unique),
    DE = vapply(DE, paste, character(1), collapse = ";")
  ) %>%
  arrange(wilkinson_p) %>%
  dplyr::select(
    `Gene Set` = gs_name, 
    `Nbr Detected Genes` = gs_size, 
    `Nbr DE Genes` = nDE, 
    combined = wilkinson_p, FDR, 
    fry, camera, gsea, goseq, 
    `DE Genes` = DE
  ) %>%
  rename_at(
    vars(combined, fry, camera, gsea, goseq),
    add_prefix
  ) %>%
  write_csv(
    here::here("output", "Enrichment_Mutant_V_WT.csv")
  )

devtools::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_AU:en                    
 collate  en_AU.UTF-8                 
 ctype    en_AU.UTF-8                 
 tz       Australia/Adelaide          
 date     2020-03-05                  

─ Packages ───────────────────────────────────────────────────────────────────
 package              * version    date       lib source        
 AnnotationDbi        * 1.48.0     2019-10-29 [2] Bioconductor  
 askpass                1.1        2019-01-13 [2] CRAN (R 3.6.0)
 assertthat             0.2.1      2019-03-21 [2] CRAN (R 3.6.0)
 backports              1.1.5      2019-10-02 [2] CRAN (R 3.6.1)
 BiasedUrn            * 1.07       2015-12-28 [2] CRAN (R 3.6.1)
 bibtex                 0.4.2.2    2020-01-02 [2] CRAN (R 3.6.2)
 Biobase              * 2.46.0     2019-10-29 [2] Bioconductor  
 BiocFileCache          1.10.2     2019-11-08 [2] Bioconductor  
 BiocGenerics         * 0.32.0     2019-10-29 [2] Bioconductor  
 BiocParallel           1.20.1     2019-12-21 [2] Bioconductor  
 biomaRt                2.42.0     2019-10-29 [2] Bioconductor  
 Biostrings             2.54.0     2019-10-29 [2] Bioconductor  
 bit                    1.1-15.2   2020-02-10 [2] CRAN (R 3.6.2)
 bit64                  0.9-7      2017-05-08 [2] CRAN (R 3.6.0)
 bitops                 1.0-6      2013-08-17 [2] CRAN (R 3.6.0)
 blob                   1.2.1      2020-01-20 [2] CRAN (R 3.6.2)
 broom                  0.5.4      2020-01-27 [2] CRAN (R 3.6.2)
 callr                  3.4.2      2020-02-12 [2] CRAN (R 3.6.2)
 cellranger             1.1.0      2016-07-27 [2] CRAN (R 3.6.0)
 cli                    2.0.1      2020-01-08 [2] CRAN (R 3.6.2)
 cluster                2.1.0      2019-06-19 [2] CRAN (R 3.6.1)
 codetools              0.2-16     2018-12-24 [4] CRAN (R 3.6.0)
 colorspace             1.4-1      2019-03-18 [2] CRAN (R 3.6.0)
 crayon                 1.3.4      2017-09-16 [2] CRAN (R 3.6.0)
 curl                   4.3        2019-12-02 [2] CRAN (R 3.6.2)
 data.table             1.12.8     2019-12-09 [2] CRAN (R 3.6.2)
 DBI                    1.1.0      2019-12-15 [2] CRAN (R 3.6.2)
 dbplyr                 1.4.2      2019-06-17 [2] CRAN (R 3.6.0)
 DelayedArray           0.12.2     2020-01-06 [2] Bioconductor  
 desc                   1.2.0      2018-05-01 [2] CRAN (R 3.6.0)
 devtools               2.2.2      2020-02-17 [2] CRAN (R 3.6.2)
 digest                 0.6.25     2020-02-23 [2] CRAN (R 3.6.2)
 dplyr                * 0.8.4      2020-01-31 [2] CRAN (R 3.6.2)
 edgeR                * 3.28.1     2020-02-26 [2] Bioconductor  
 ellipsis               0.3.0      2019-09-20 [2] CRAN (R 3.6.1)
 evaluate               0.14       2019-05-28 [2] CRAN (R 3.6.0)
 FactoMineR             2.2        2020-02-05 [2] CRAN (R 3.6.2)
 fansi                  0.4.1      2020-01-08 [2] CRAN (R 3.6.2)
 farver                 2.0.3      2020-01-16 [2] CRAN (R 3.6.2)
 fastmatch              1.1-0      2017-01-28 [2] CRAN (R 3.6.0)
 fgsea                * 1.12.0     2019-10-29 [2] Bioconductor  
 flashClust             1.01-2     2012-08-21 [2] CRAN (R 3.6.1)
 forcats              * 0.4.0      2019-02-17 [2] CRAN (R 3.6.0)
 fs                     1.3.1      2019-05-06 [2] CRAN (R 3.6.0)
 gbRd                   0.4-11     2012-10-01 [2] CRAN (R 3.6.0)
 geneLenDataBase      * 1.22.0     2019-11-05 [2] Bioconductor  
 generics               0.0.2      2018-11-29 [2] CRAN (R 3.6.0)
 GenomeInfoDb           1.22.0     2019-10-29 [2] Bioconductor  
 GenomeInfoDbData       1.2.2      2019-11-21 [2] Bioconductor  
 GenomicAlignments      1.22.1     2019-11-12 [2] Bioconductor  
 GenomicFeatures        1.38.2     2020-02-15 [2] Bioconductor  
 GenomicRanges          1.38.0     2019-10-29 [2] Bioconductor  
 ggdendro               0.1-20     2016-04-27 [2] CRAN (R 3.6.0)
 ggplot2              * 3.2.1      2019-08-10 [2] CRAN (R 3.6.1)
 ggrepel                0.8.1      2019-05-07 [2] CRAN (R 3.6.0)
 git2r                  0.26.1     2019-06-29 [2] CRAN (R 3.6.1)
 glue                   1.3.1      2019-03-12 [2] CRAN (R 3.6.0)
 GO.db                  3.10.0     2019-11-21 [2] Bioconductor  
 goseq                * 1.38.0     2019-10-29 [2] Bioconductor  
 gridExtra              2.3        2017-09-09 [2] CRAN (R 3.6.0)
 gtable                 0.3.0      2019-03-25 [2] CRAN (R 3.6.0)
 haven                  2.2.0      2019-11-08 [2] CRAN (R 3.6.1)
 here                   0.1        2017-05-28 [2] CRAN (R 3.6.0)
 highr                  0.8        2019-03-20 [2] CRAN (R 3.6.0)
 hms                    0.5.3      2020-01-08 [2] CRAN (R 3.6.2)
 htmltools              0.4.0      2019-10-04 [2] CRAN (R 3.6.1)
 htmlwidgets            1.5.1      2019-10-08 [2] CRAN (R 3.6.1)
 httpuv                 1.5.2      2019-09-11 [2] CRAN (R 3.6.1)
 httr                   1.4.1      2019-08-05 [2] CRAN (R 3.6.1)
 hwriter                1.3.2      2014-09-10 [2] CRAN (R 3.6.0)
 IRanges              * 2.20.2     2020-01-13 [2] Bioconductor  
 jpeg                   0.1-8.1    2019-10-24 [2] CRAN (R 3.6.1)
 jsonlite               1.6.1      2020-02-02 [2] CRAN (R 3.6.2)
 kableExtra             1.1.0      2019-03-16 [2] CRAN (R 3.6.1)
 knitr                  1.28       2020-02-06 [2] CRAN (R 3.6.2)
 labeling               0.3        2014-08-23 [2] CRAN (R 3.6.0)
 later                  1.0.0      2019-10-04 [2] CRAN (R 3.6.1)
 lattice                0.20-40    2020-02-19 [4] CRAN (R 3.6.2)
 latticeExtra           0.6-29     2019-12-19 [2] CRAN (R 3.6.2)
 lazyeval               0.2.2      2019-03-15 [2] CRAN (R 3.6.0)
 leaps                  3.1        2020-01-16 [2] CRAN (R 3.6.2)
 lifecycle              0.1.0      2019-08-01 [2] CRAN (R 3.6.1)
 limma                * 3.42.2     2020-02-03 [2] Bioconductor  
 locfit                 1.5-9.1    2013-04-20 [2] CRAN (R 3.6.0)
 lubridate              1.7.4      2018-04-11 [2] CRAN (R 3.6.0)
 magrittr             * 1.5        2014-11-22 [2] CRAN (R 3.6.0)
 MASS                   7.3-51.5   2019-12-20 [4] CRAN (R 3.6.2)
 Matrix                 1.2-18     2019-11-27 [2] CRAN (R 3.6.1)
 matrixStats            0.55.0     2019-09-07 [2] CRAN (R 3.6.1)
 memoise                1.1.0      2017-04-21 [2] CRAN (R 3.6.0)
 metap                * 1.3        2020-01-23 [2] CRAN (R 3.6.2)
 mgcv                   1.8-31     2019-11-09 [4] CRAN (R 3.6.1)
 mnormt                 1.5-6      2020-02-03 [2] CRAN (R 3.6.2)
 modelr                 0.1.6      2020-02-22 [2] CRAN (R 3.6.2)
 msigdbr              * 7.0.1      2019-09-04 [2] CRAN (R 3.6.2)
 multcomp               1.4-12     2020-01-10 [2] CRAN (R 3.6.2)
 multtest               2.42.0     2019-10-29 [2] Bioconductor  
 munsell                0.5.0      2018-06-12 [2] CRAN (R 3.6.0)
 mutoss                 0.1-12     2017-12-04 [2] CRAN (R 3.6.2)
 mvtnorm                1.1-0      2020-02-24 [2] CRAN (R 3.6.2)
 ngsReports           * 1.1.2      2019-10-16 [1] Bioconductor  
 nlme                   3.1-144    2020-02-06 [4] CRAN (R 3.6.2)
 numDeriv               2016.8-1.1 2019-06-06 [2] CRAN (R 3.6.2)
 openssl                1.4.1      2019-07-18 [2] CRAN (R 3.6.1)
 pander               * 0.6.3      2018-11-06 [2] CRAN (R 3.6.0)
 pheatmap             * 1.0.12     2019-01-04 [2] CRAN (R 3.6.0)
 pillar                 1.4.3      2019-12-20 [2] CRAN (R 3.6.2)
 pkgbuild               1.0.6      2019-10-09 [2] CRAN (R 3.6.1)
 pkgconfig              2.0.3      2019-09-22 [2] CRAN (R 3.6.1)
 pkgload                1.0.2      2018-10-29 [2] CRAN (R 3.6.0)
 plotly                 4.9.2      2020-02-12 [2] CRAN (R 3.6.2)
 plotrix                3.7-7      2019-12-05 [2] CRAN (R 3.6.2)
 plyr                   1.8.5      2019-12-10 [2] CRAN (R 3.6.2)
 png                    0.1-7      2013-12-03 [2] CRAN (R 3.6.0)
 prettyunits            1.1.1      2020-01-24 [2] CRAN (R 3.6.2)
 processx               3.4.2      2020-02-09 [2] CRAN (R 3.6.2)
 progress               1.2.2      2019-05-16 [2] CRAN (R 3.6.0)
 promises               1.1.0      2019-10-04 [2] CRAN (R 3.6.1)
 ps                     1.3.2      2020-02-13 [2] CRAN (R 3.6.2)
 purrr                * 0.3.3      2019-10-18 [2] CRAN (R 3.6.1)
 R6                     2.4.1      2019-11-12 [2] CRAN (R 3.6.1)
 rappdirs               0.3.1      2016-03-28 [2] CRAN (R 3.6.0)
 RColorBrewer         * 1.1-2      2014-12-07 [2] CRAN (R 3.6.0)
 Rcpp                 * 1.0.3      2019-11-08 [2] CRAN (R 3.6.1)
 RCurl                  1.98-1.1   2020-01-19 [2] CRAN (R 3.6.2)
 Rdpack                 0.11-1     2019-12-14 [2] CRAN (R 3.6.2)
 readr                * 1.3.1      2018-12-21 [2] CRAN (R 3.6.0)
 readxl                 1.3.1      2019-03-13 [2] CRAN (R 3.6.0)
 remotes                2.1.1      2020-02-15 [2] CRAN (R 3.6.2)
 reprex                 0.3.0      2019-05-16 [2] CRAN (R 3.6.0)
 reshape2               1.4.3      2017-12-11 [2] CRAN (R 3.6.0)
 rlang                  0.4.4      2020-01-28 [2] CRAN (R 3.6.2)
 rmarkdown              2.1        2020-01-20 [2] CRAN (R 3.6.2)
 rprojroot              1.3-2      2018-01-03 [2] CRAN (R 3.6.0)
 Rsamtools              2.2.3      2020-02-23 [2] Bioconductor  
 RSQLite                2.2.0      2020-01-07 [2] CRAN (R 3.6.2)
 rstudioapi             0.11       2020-02-07 [2] CRAN (R 3.6.2)
 rtracklayer            1.46.0     2019-10-29 [2] Bioconductor  
 rvest                  0.3.5      2019-11-08 [2] CRAN (R 3.6.1)
 S4Vectors            * 0.24.3     2020-01-18 [2] Bioconductor  
 sandwich               2.5-1      2019-04-06 [2] CRAN (R 3.6.1)
 scales               * 1.1.0      2019-11-18 [2] CRAN (R 3.6.1)
 scatterplot3d          0.3-41     2018-03-14 [2] CRAN (R 3.6.1)
 sessioninfo            1.1.1      2018-11-05 [2] CRAN (R 3.6.0)
 ShortRead              1.44.3     2020-02-03 [2] Bioconductor  
 sn                     1.5-5      2020-01-30 [2] CRAN (R 3.6.2)
 statmod                1.4.34     2020-02-17 [2] CRAN (R 3.6.2)
 stringi                1.4.6      2020-02-17 [2] CRAN (R 3.6.2)
 stringr              * 1.4.0      2019-02-10 [2] CRAN (R 3.6.0)
 SummarizedExperiment   1.16.1     2019-12-19 [2] Bioconductor  
 survival               3.1-8      2019-12-03 [2] CRAN (R 3.6.2)
 testthat               2.3.1      2019-12-01 [2] CRAN (R 3.6.1)
 TFisher                0.2.0      2018-03-21 [2] CRAN (R 3.6.2)
 TH.data                1.0-10     2019-01-21 [2] CRAN (R 3.6.1)
 tibble               * 2.1.3      2019-06-06 [2] CRAN (R 3.6.0)
 tidyr                * 1.0.2      2020-01-24 [2] CRAN (R 3.6.2)
 tidyselect             1.0.0      2020-01-27 [2] CRAN (R 3.6.2)
 tidyverse            * 1.3.0      2019-11-21 [2] CRAN (R 3.6.1)
 truncnorm              1.0-8      2018-02-27 [2] CRAN (R 3.6.0)
 UpSetR               * 1.4.0      2019-05-22 [2] CRAN (R 3.6.0)
 usethis                1.5.1      2019-07-04 [2] CRAN (R 3.6.1)
 vctrs                  0.2.3      2020-02-20 [2] CRAN (R 3.6.2)
 viridisLite            0.3.0      2018-02-01 [2] CRAN (R 3.6.0)
 webshot                0.5.2      2019-11-22 [2] CRAN (R 3.6.1)
 whisker                0.4        2019-08-28 [2] CRAN (R 3.6.1)
 withr                  2.1.2      2018-03-15 [2] CRAN (R 3.6.0)
 workflowr            * 1.6.0      2019-12-19 [2] CRAN (R 3.6.2)
 xfun                   0.12       2020-01-13 [2] CRAN (R 3.6.2)
 XML                    3.99-0.3   2020-01-20 [2] CRAN (R 3.6.2)
 xml2                   1.2.2      2019-08-09 [2] CRAN (R 3.6.1)
 XVector                0.26.0     2019-10-29 [2] Bioconductor  
 yaml                   2.2.1      2020-02-01 [2] CRAN (R 3.6.2)
 zlibbioc               1.32.0     2019-10-29 [2] Bioconductor  
 zoo                    1.8-7      2020-01-10 [2] CRAN (R 3.6.2)

[1] /home/steveped/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

Enrichment Analysis: Mutant Vs Wild-Type

Steve Pederson

05 March, 2020

Setup

Introduction

Databases used for testing

Hallmark Gene Sets

KEGG Gene Sets

Gene Ontology Gene Sets

Enrichment in the DE Gene Set

Hallmark Gene Sets

KEGG Gene Sets

GO Gene Sets

Enrichment Testing on Ranked Lists

Hallmark Gene Sets

Exploration of significant Hallmark Gene Sets

KEGG Gene Sets

Exploration of Significant KEGG Gene Sets

GO Gene Sets

Exploration of Significant GO Gene Sets

Genes Associated with RNA Metabolic Process

Genes Associated with the Mitochondrial Envelope

Genes Associated with Ion Transport

Genes Associated with Homeostasis

Genes Associated with Gene Regulation

Data Export