Last updated: 2020-03-05

Checks: 7 0

Knit directory: 20170327_Psen2S4Ter_RNASeq/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200119) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   output/Enrichment_Hom_V_Het.csv
    Modified:   output/Enrichment_Mutant_V_WT.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 2c56cef Steve Ped 2020-03-05 Tidied column names for enrichment output
html 7fe1505 Steve Ped 2020-03-05 Updated all analyses & heatmaps after changing to cpmPostNorm instead of fit\(fitted.values for enrichment/heatmaps</td> </tr> <tr> <td>Rmd</td> <td><a href="https://github.com/UofABioinformaticsHub/20170327_Psen2S4Ter_RNASeq/blob/bad81c1731ba8bc4487d25ec2719e92ae6917e4a/analysis/3_Enrichment_MutantVsWT.Rmd" target="_blank">bad81c1</a></td> <td>Steve Ped</td> <td>2020-03-05</td> <td>Updated MutVsWt UpSet plots for KEGG/HALLMARK</td> </tr> <tr> <td>Rmd</td> <td><a href="https://github.com/UofABioinformaticsHub/20170327_Psen2S4Ter_RNASeq/blob/377410a12ca13a7e233e39fb2f77f165e846d830/analysis/3_Enrichment_MutantVsWT.Rmd" target="_blank">377410a</a></td> <td>Steve Ped</td> <td>2020-03-05</td> <td>Finished revision of Enrichment Analysis</td> </tr> <tr> <td>Rmd</td> <td><a href="https://github.com/UofABioinformaticsHub/20170327_Psen2S4Ter_RNASeq/blob/c0b89ed8f761b0c7222538c353d21cd5aa402b94/analysis/3_Enrichment_MutantVsWT.Rmd" target="_blank">c0b89ed</a></td> <td>Steve Ped</td> <td>2020-03-05</td> <td>Reran after changing to cpmPotNorm instead of the incorrect fit\)fitted.values
html 7e59d3c Steve Ped 2020-02-20 Recompiled
Rmd 4ede752 Steve Ped 2020-02-20 Added headings
html 0168824 Steve Ped 2020-02-20 Rebult after resizing
Rmd de9c4b3 Steve Ped 2020-02-20 Resized heatmaps again
html 86ffa07 Steve Ped 2020-02-20 Updated saample layout figure
Rmd 4d2af23 Steve Ped 2020-02-19 Modified heatmap sizes & initial network layout
Rmd 8fe35b0 Steve Ped 2020-02-19 Added multiple heatmaps & UpSet plots
html 679b10d Steve Ped 2020-02-19 Rendered OXPHOS Plot
Rmd 66ba4e0 Steve Ped 2020-02-19 Added OXPHOS plot
html e0288c6 Steve Ped 2020-02-19 Generated enrichment tables
Rmd 541fbf0 Steve Ped 2020-02-19 Revised Hom Vs Het Enrichment
Rmd 77f167d Steve Ped 2020-02-19 Revised enrichment
html 876e40f Steve Ped 2020-02-17 Compiled after minor corrections
Rmd 53ed0e3 Steve Ped 2020-02-17 Corrected Ensembl Release
Rmd f55be85 Steve Ped 2020-02-17 Added commas
Rmd 8f29458 Steve Ped 2020-02-17 Minor tweaks to formatting
html 9104ecd Steve Ped 2020-01-28 First draft of Hom Vs Het
Rmd 207cdc8 Steve Ped 2020-01-28 Added code for Hom Vs Het Enrichment
Rmd 468e6e3 Steve Ped 2020-01-28 Ran enrichment of Mut Vs WT
html 468e6e3 Steve Ped 2020-01-28 Ran enrichment of Mut Vs WT
Rmd 3a9933c Steve Ped 2020-01-28 Finished Enrichment analysis on MutVsWt
html 3a9933c Steve Ped 2020-01-28 Finished Enrichment analysis on MutVsWt

Setup

library(tidyverse)
library(magrittr)
library(edgeR)
library(scales)
library(pander)
library(goseq)
library(msigdbr)
library(AnnotationDbi)
library(RColorBrewer)
library(ngsReports)
library(fgsea)
library(metap)
library(UpSetR)
library(pheatmap)
theme_set(theme_bw())
panderOptions("table.split.table", Inf)
panderOptions("table.style", "rmarkdown")
panderOptions("big.mark", ",")
samples <- read_csv("data/samples.csv") %>%
  distinct(sampleName, .keep_all = TRUE) %>%
  dplyr::select(sample = sampleName, sampleID, genotype) %>%
  mutate(
    genotype = factor(genotype, levels = c("WT", "Het", "Hom")),
    mutant = genotype %in% c("Het", "Hom"),
    homozygous = genotype == "Hom"
  )
genoCols <- samples$genotype %>%
  levels() %>%
  length() %>%
  brewer.pal("Set1") %>%
  setNames(levels(samples$genotype))
dgeList <- here::here("data/dgeList.rds") %>% read_rds()
cpmPostNorm <- here::here("data/cpmPostNorm.rds") %>% read_rds()
entrezGenes <- dgeList$genes %>%
  dplyr::filter(!is.na(entrezid)) %>%
  unnest(entrezid) %>%
  dplyr::rename(entrez_gene = entrezid)
formatP <- function(p, m = 0.0001){
  out <- rep("", length(p))
  out[p < m] <- sprintf("%.2e", p[p<m])
  out[p >= m] <- sprintf("%.4f", p[p>=m])
  out
}

Introduction

Enrichment analysis for this dataset present some challenges. Despite normalisation to account for gene length and GC bias, some appeared to still be present in the final results. In addition, the confounding of incomplete rRNA removal with genotype may lead to other distortions in both DE genes and ranking statistics.

Two steps for enrichment analysis will be undertaken.

  1. Testing for enrichment within discrete sets of DE genes as defined in the previous steps
  2. Testing for enrichment within ranked lists, regardless of DE status or statistical significance, before integration of all enrichment analyses using Fisher’s method for combining p-values.

Testing for enrichment within discrete gene sets will be performed using goseq as this allows for the incorporation of a single covariate as a predictor of differential expression. GC content, gene length and correlation with rRNA removal can all be supplied as separate covariates.

Testing for enrichment with ranked lists will be performed using:

  1. fry as this can take into account inter-gene correlations. Values supplied will be fitted values for each gene/sample as these have been corrected for GC and length biases.
  2. camera, which also accommodates inter-gene correlations. Values supplied will be fitted values for each gene/sample as these have been corrected for GC and length biases.
  3. fgsea which is an R implementation of GSEA. This approach simply takes a ranked list and doesn’t directly account for correlations. However, the ranked list will be derived from analysis using CQN-normalisation so will be robust to these technical artefacts.

In the case of camera, inter-gene correlations will be calculated for each gene-set prior to analysis to ensure more conservative p-values are obtained.

Databases used for testing

Data was sourced using the msigdbr package. The initial database used for testing was the Hallmark Gene Sets, with mappings from gene-set to EntrezGene IDs performed by the package authors.

Hallmark Gene Sets

hm <- msigdbr("Danio rerio", category = "H")  %>% 
  left_join(entrezGenes) %>%
  dplyr::filter(!is.na(gene_id)) %>%
  distinct(gs_name, gene_id, .keep_all = TRUE)
hmByGene <- hm %>%
  split(f = .$gene_id) %>%
  lapply(extract2, "gs_name")
hmByID <- hm %>%
  split(f = .$gs_name) %>%
  lapply(extract2, "gene_id")

Mappings are required from gene to pathway, and Ensembl identifiers were used to map from gene to pathway, based on the mappings in the previously used annotations (Ensembl Release 98). A total of 3,459 Ensembl IDs were mapped to pathways from the hallmark gene sets.

KEGG Gene Sets

kg <- msigdbr("Danio rerio", category = "C2", subcategory = "CP:KEGG")  %>% 
  left_join(entrezGenes) %>%
  dplyr::filter(!is.na(gene_id)) %>%
  distinct(gs_name, gene_id, .keep_all = TRUE)
kgByGene <- kg  %>%
  split(f = .$gene_id) %>%
  lapply(extract2, "gs_name")
kgByID <- kg  %>%
  split(f = .$gs_name) %>%
  lapply(extract2, "gene_id")

The same mapping process was applied to KEGG gene sets. A total of 3,614 Ensembl IDs were mapped to pathways from the KEGG gene sets.

Gene Ontology Gene Sets

goSummaries <- url("https://uofabioinformaticshub.github.io/summaries2GO/data/goSummaries.RDS") %>%
  readRDS() %>%
  mutate(
    Term = Term(id),
    gs_name = Term %>% str_to_upper() %>% str_replace_all("[ -]", "_"),
    gs_name = paste0("GO_", gs_name)
    )
minPath <- 3
go <- msigdbr("Danio rerio", category = "C5") %>% 
  left_join(entrezGenes) %>%
  dplyr::filter(!is.na(gene_id)) %>%
  left_join(goSummaries) %>% 
  dplyr::filter(shortest_path >= minPath) %>%
  distinct(gs_name, gene_id, .keep_all = TRUE)
goByGene <- go %>%
  split(f = .$gene_id) %>%
  lapply(extract2, "gs_name")
goByID <- go %>%
  split(f = .$gs_name) %>%
  lapply(extract2, "gene_id")

For analysis of gene-sets from the GO database, gene-sets were restricted to those with 3 or more steps back to the ontology root terms. A total of 11,245 Ensembl IDs were mapped to pathways from restricted database of 8,834 GO gene sets.

gsSizes <- bind_rows(hm, kg, go) %>% 
  dplyr::select(gs_name, gene_symbol, gene_id) %>% 
  chop(c(gene_symbol, gene_id)) %>%
  mutate(gs_size = vapply(gene_symbol, length, integer(1)))

Enrichment in the DE Gene Set

deTable <- here::here("output", "psen2VsWT.csv") %>% 
  read_csv() %>%
  mutate(
    entrezid = dgeList$genes$entrezid[gene_id]
  )

The first step of analysis using goseq, regardless of the gene-set, is estimation of the probability weight function (PWF) which quantifies the probability of a gene being considered as DE based on a single covariate. As GC content and length should have been accounted for during conditional-quantile normalisation, these were not required for any bias. However, the gene-level correlations with rRNA contamination were instead used a predictor of bias in selection of a gene as being DE.

rawFqc <- list.files(
  path = here::here("data/0_rawData/FastQC/"),
  pattern = "zip",
  full.names = TRUE
) %>%
  FastqcDataList()
gc <- getModule(rawFqc, "Per_sequence_GC") %>%
  group_by(Filename) %>% 
  mutate(Freq = Count / sum(Count)) %>%
  ungroup()
gcDev <- gc %>%
  left_join(getGC(gcTheoretical, "Drerio", "Trans")) %>%
  mutate(
    sample = str_remove(Filename, "_R[12].fastq.gz"),
    resid = Freq - Drerio
  ) %>% 
  left_join(samples) %>%
  group_by(sample) %>%
  summarise(
    ss = sum(resid^2),
    n = n(),
    sd = sqrt(ss / (n - 1))
  )
riboVec <- structure(gcDev$sd, names = gcDev$sample)
riboCors <- cpm(dgeList, log = TRUE) %>%
  apply(1, function(x){
    cor(x, riboVec[names(x)])
  })

Values were calculated as per the previous steps, using the logCPM values for each gene, with the sample-level standard deviations from the theoretical GC distribution being used as a measure of rRNA contamination. For estimation of the probability weight function, squared correlations were used to place negative and positive correlations on the same scale. This accounts for genes which are both negatively & positively biased by the presence of excessive rRNA. Clearly, the confounding of genotype with rRNA means that some genes driven by the genuine biology may be down-weighted under this approach.

riboPwf <- deTable %>%
  mutate(riboCors = riboCors[gene_id]^2) %>%
  dplyr::select(gene_id, DE, riboCors) %>%
  distinct(gene_id, .keep_all = TRUE) %>% 
  with(
    nullp(
      DEgenes = structure(
        as.integer(DE), names = gene_id
      ), 
      genome = "danRer10", 
      id = "ensGene", 
      bias.data = riboCors,
      plot.fit = FALSE
    )
  )
plotPWF(riboPwf, main = "Bias from rRNA proportions")
*Using this approach, it was clear that correlation with rRNA proportions significantly biased the probability of a gene being considered as DE.*

Using this approach, it was clear that correlation with rRNA proportions significantly biased the probability of a gene being considered as DE.

Version Author Date
468e6e3 Steve Ped 2020-01-28

All gene-sets were then tested using this PWF.

Hallmark Gene Sets

hmRiboGoseq <- goseq(riboPwf, gene2cat = hmByGene) %>%
  as_tibble %>%
  dplyr::filter(numDEInCat > 0) %>%
  mutate( 
    adjP = p.adjust(over_represented_pvalue, method = "bonf"),
    FDR = p.adjust(over_represented_pvalue, method = "fdr")
  ) %>%
  dplyr::select(-contains("under")) %>%
  dplyr::rename(
    gs_name = category,
    PValue = over_represented_pvalue
  )

No gene-sets achieved significance in the DE genes with the lowest FDR being 41%

KEGG Gene Sets

kgRiboGoseq <- goseq(riboPwf, gene2cat = kgByGene) %>%
  as_tibble %>%
  dplyr::filter(numDEInCat > 0) %>%
  mutate( 
    adjP = p.adjust(over_represented_pvalue, method = "bonf"),
    FDR = p.adjust(over_represented_pvalue, method = "fdr")
  ) %>%
  dplyr::select(-contains("under"))  %>%
  dplyr::rename(
    gs_name = category,
    PValue = over_represented_pvalue
  )
kgRiboGoseq %>%
  dplyr::slice(1:5) %>%
  mutate(
    p = formatP(PValue),
    adjP = formatP(adjP),
    FDR = formatP(FDR)
  ) %>%
  dplyr::select(
    `Gene Set` = gs_name,
    DE = numDEInCat,
    `Set Size` = numInCat,
    PValue,
    `p~bonf~` = adjP,
    `p~FDR~` = FDR
  ) %>%
  pander(
    justify = "lrrrrr",
    caption = paste(
      "The", nrow(.), "most highly-ranked KEGG pathways.",
      "Bonferroni-adjusted p-values are the most stringent and give high",
      "confidence when below 0.05."
    )
  )
The 5 most highly-ranked KEGG pathways. Bonferroni-adjusted p-values are the most stringent and give high confidence when below 0.05.
Gene Set DE Set Size PValue pbonf pFDR
KEGG_RIBOSOME 26 80 4.754e-09 5.32e-07 5.32e-07
KEGG_PRIMARY_IMMUNODEFICIENCY 2 15 0.01642 1.0000 0.7173
KEGG_CYSTEINE_AND_METHIONINE_METABOLISM 4 26 0.02959 1.0000 0.7173
KEGG_ASTHMA 1 3 0.04088 1.0000 0.7173
KEGG_RETINOL_METABOLISM 3 25 0.0512 1.0000 0.7173

Notably, the KEGG gene-set for Ribosomal genes was detected as enriched in the set of DE genes, with no other KEGG gene-sets being considered significant.

GO Gene Sets

goRiboGoseq <- goseq(riboPwf, gene2cat = goByGene) %>%
  as_tibble %>%
  dplyr::filter(numDEInCat > 0) %>%
  mutate( 
    adjP = p.adjust(over_represented_pvalue, method = "bonf"),
    FDR = p.adjust(over_represented_pvalue, method = "fdr")
  ) %>%
  dplyr::select(-contains("under")) %>%
  dplyr::rename(
    gs_name = category,
    PValue = over_represented_pvalue
  )
goRiboGoseq %>%
  dplyr::filter(adjP < 0.05) %>%
  mutate(
    p = formatP(PValue),
    adjP = formatP(adjP),
    FDR = formatP(FDR)
  ) %>%
  dplyr::select(
    `Gene Set` = gs_name,
    DE = numDEInCat,
    `Set Size` = numInCat,
    PValue,
    `p~bonf~` = adjP,
    `p~FDR~` = FDR
  ) %>%
  pander(
    justify = "lrrrrr",
    caption = paste(
      "*The", nrow(.), "most highly-ranked GO terms.",
      "Bonferroni-adjusted p-values are the most stringent and give high",
      "confidence when below 0.05, with all terms here reaching this threshold.",
      "However, most terms once again indicate the presence of rRNA.*"
    )
  )
The 14 most highly-ranked GO terms. Bonferroni-adjusted p-values are the most stringent and give high confidence when below 0.05, with all terms here reaching this threshold. However, most terms once again indicate the presence of rRNA.
Gene Set DE Set Size PValue pbonf pFDR
GO_COTRANSLATIONAL_PROTEIN_TARGETING_TO_MEMBRANE 28 93 7.36e-10 2.83e-06 2.03e-06
GO_CYTOSOLIC_RIBOSOME 27 97 1.056e-09 4.06e-06 2.03e-06
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM 27 105 1.6e-08 6.16e-05 2.05e-05
GO_TRANSLATIONAL_INITIATION 29 177 2.178e-07 0.0008 0.0002
GO_PROTEIN_TARGETING_TO_MEMBRANE 30 171 2.533e-07 0.0010 0.0002
GO_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM 27 128 3.849e-07 0.0015 0.0002
GO_CYTOSOLIC_PART 30 206 4.424e-07 0.0017 0.0002
GO_VIRAL_GENE_EXPRESSION 29 172 6.939e-07 0.0027 0.0003
GO_NUCLEAR_TRANSCRIBED_MRNA_CATABOLIC_PROCESS 30 183 7.911e-07 0.0030 0.0003
GO_RIBOSOMAL_SUBUNIT 28 175 1.99e-06 0.0077 0.0008
GO_RIBOSOME 30 210 2.659e-06 0.0102 0.0009
GO_CYTOSOLIC_LARGE_RIBOSOMAL_SUBUNIT 15 51 2.826e-06 0.0109 0.0009
GO_SYMPORTER_ACTIVITY 10 94 7.571e-06 0.0291 0.0022
GO_ANION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 17 228 1.23e-05 0.0473 0.0034

Enrichment Testing on Ranked Lists

rnk <- structure(
  -sign(deTable$logFC)*log10(deTable$PValue), 
  names = deTable$gene_id
) %>% sort()
np <- 1e5

Genes were ranked by -sign(logFC)*log10(p) for approaches which required a ranked list. Multiple approaches were first calculated individually, before being combined for the final integrated set of results.

Hallmark Gene Sets

hmFry <- cpmPostNorm %>%
  fry(
    index = hmByID,
    design = dgeList$design,
    contrast = "mutant",
    sort = "directional"
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For analysis under camera when inter-gene correlations were calculated for a more conservative result.

hmCamera <- cpmPostNorm %>%
  camera(
    index = hmByID,
    design = dgeList$design,
    contrast = "mutant",
    inter.gene.cor = NULL
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For generation of the GSEA ranked list, 100,000 permutations were conducted.

hmGsea <- fgsea(
  pathways = hmByID, 
  stats = rnk,
  nperm = np
) %>%
  as_tibble() %>%
  dplyr::rename(gs_name = pathway, PValue = pval) %>%
  arrange(PValue)

Results for all analyses, including goseq were then combined using Wilkinson’s method to combine p-values. For a conservative approach, under \(m\) tests, the \(m - 1^{\text{th}}\) smallest p-value was chosen.

hmMeta <- hmFry %>%
  dplyr::select(gs_name, fry = PValue) %>%
  left_join(
    dplyr::select(hmCamera, gs_name, camera = PValue)
  ) %>%
  left_join(
    dplyr::select(hmGsea, gs_name, gsea = PValue)
  ) %>%
  left_join(
    dplyr::select(hmRiboGoseq, gs_name, goseq = PValue)
  ) %>% 
  nest(p = one_of(c("fry", "camera", "gsea", "goseq"))) %>%
  mutate(
    n_p = vapply(p, function(x){sum(!is.na(unlist(x)))}, integer(1)), 
    wilkinson_p = vapply(p, function(x){
      x <- unlist(x)
      x <- x[!is.na(x)]
      wilkinsonp(x, length(x) - 1)$p
    }, numeric(1)),
    FDR = p.adjust(wilkinson_p, "fdr"), 
    adjP = p.adjust(wilkinson_p, "bonferroni")
  ) %>% 
  arrange(wilkinson_p) %>% 
  unnest(p) %>%
  left_join(gsSizes) %>%
  mutate(
    DE = lapply(gene_id, intersect, dplyr::filter(deTable, DE)$gene_id),
    DE = lapply(DE, unique),
    nDE = vapply(DE, length, integer(1))
  )
hmMeta %>%
  dplyr::filter(FDR < 0.05) %>%
  mutate_at(vars(one_of(c("wilkinson_p", "FDR", "adjP"))), formatP) %>%
  dplyr::select(`Gene Set` = gs_name, `Number DE` = nDE, `Set Size` = gs_size, `Wilkinson~p~` = wilkinson_p, `p~FDR~` = FDR, `p~bonf~` = adjP) %>%
  pander(
    caption = "Results from combining all above approaches for the Hallmark Gene Sets. All terms are significant to an FDR of 0.05.",
    justify = "lrrrrr"
  )
Results from combining all above approaches for the Hallmark Gene Sets. All terms are significant to an FDR of 0.05.
Gene Set Number DE Set Size Wilkinsonp pFDR pbonf
HALLMARK_UV_RESPONSE_UP 7 141 5.90e-08 2.95e-06 2.95e-06
HALLMARK_XENOBIOTIC_METABOLISM 11 146 1.87e-05 0.0005 0.0009
HALLMARK_MTORC1_SIGNALING 11 194 9.13e-05 0.0012 0.0046
HALLMARK_FATTY_ACID_METABOLISM 6 135 9.46e-05 0.0012 0.0047
HALLMARK_GLYCOLYSIS 8 172 0.0002 0.0015 0.0084
HALLMARK_PEROXISOME 2 95 0.0002 0.0015 0.0115
HALLMARK_COAGULATION 3 80 0.0003 0.0015 0.0127
HALLMARK_MYC_TARGETS_V1 15 192 0.0003 0.0015 0.0138
HALLMARK_CHOLESTEROL_HOMEOSTASIS 2 69 0.0003 0.0015 0.0153
HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY 5 45 0.0003 0.0015 0.0158
HALLMARK_BILE_ACID_METABOLISM 1 86 0.0003 0.0015 0.0169
HALLMARK_ESTROGEN_RESPONSE_LATE 3 153 0.0004 0.0016 0.0188
HALLMARK_ADIPOGENESIS 13 186 0.0007 0.0027 0.0355
HALLMARK_HYPOXIA 9 169 0.0009 0.0032 0.0451
HALLMARK_PROTEIN_SECRETION 3 92 0.0009 0.0032 0.0475
HALLMARK_UNFOLDED_PROTEIN_RESPONSE 4 113 0.0017 0.0054 0.0864
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 7 156 0.0020 0.0060 0.1020
HALLMARK_TNFA_SIGNALING_VIA_NFKB 7 149 0.0037 0.0099 0.1841
HALLMARK_IL2_STAT5_SIGNALING 2 147 0.0038 0.0099 0.1889
HALLMARK_OXIDATIVE_PHOSPHORYLATION 15 201 0.0041 0.0101 0.2028
HALLMARK_NOTCH_SIGNALING 0 29 0.0135 0.0317 0.6744
HALLMARK_ANGIOGENESIS 1 26 0.0140 0.0317 0.6985
HALLMARK_ANDROGEN_RESPONSE 2 90 0.0222 0.0482 1.0000

Exploration of significant Hallmark Gene Sets

hmMeta %>% 
  dplyr::filter(nDE > 0, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>% 
  mutate(
    gs_name = str_remove(gs_name, "HALLMARK_"),
    gs_name = fct_lump(gs_name, n = 12)
    ) %>% 
  split(f = .$gs_name) %>% 
  lapply(magrittr::extract2, "DE") %>% 
  fromList() %>% 
  upset(
    nsets = length(.), 
    order.by = "freq", 
    mb.ratio = c(0.6, 0.4)
    )
*UpSet plot indicating distribution of DE genes within all significant HALLMARK gene sets. Most gene sets seem relatively independent of each other.*

UpSet plot indicating distribution of DE genes within all significant HALLMARK gene sets. Most gene sets seem relatively independent of each other.

Version Author Date
7fe1505 Steve Ped 2020-03-05
7e59d3c Steve Ped 2020-02-20
86ffa07 Steve Ped 2020-02-20
hmHeat <- hmMeta %>% 
  dplyr::filter(nDE > 10, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>%
  left_join(dgeList$genes, by = c("DE" = "gene_id")) %>%
  dplyr::select(gs_name, DE, gene_name) %>%
  mutate(belongs = TRUE) %>%
  pivot_wider(
    id_cols = c(DE, gene_name),
    names_from = gs_name,
    values_from = belongs,
    values_fill = list(belongs = FALSE)
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("DE")
  )
hmHeat %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 6,
    cutree_cols = 2,
    annotation_names_row = FALSE,
    annotation_row = hmHeat %>%
      dplyr::select(gene_name, starts_with("HALLMARK_")) %>%
      mutate_at(vars(starts_with("HALL")), as.character) %>%
      as.data.frame() %>%
      column_to_rownames("gene_name") %>%
      set_colnames(
        str_remove(colnames(.), "HALLMARK_")
      )
  )
*Gene expression patterns for all DE genes in HALLMARK gene sets, containing more than 10 DE genes.*

Gene expression patterns for all DE genes in HALLMARK gene sets, containing more than 10 DE genes.

Version Author Date
7fe1505 Steve Ped 2020-03-05
86ffa07 Steve Ped 2020-02-20

KEGG Gene Sets

kgFry <- cpmPostNorm %>%
  fry(
    index = kgByID,
    design = dgeList$design,
    contrast = "mutant",
    sort = "directional"
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For analysis under camera when inter-gene correlations were calculated for a more conservative result.

kgCamera <- cpmPostNorm %>%
  camera(
    index = kgByID,
    design = dgeList$design,
    contrast = "mutant",
    inter.gene.cor = NULL
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For generation of the GSEA ranked list, 100,000 permutations were conducted.

kgGsea <- fgsea(
  pathways = kgByID, 
  stats = rnk,
  nperm = np
) %>%
  as_tibble() %>%
  dplyr::rename(gs_name = pathway, PValue = pval) %>%
  arrange(PValue)

Results for all analyses, including goseq were then combined using Wilkinson’s method to combine p-values. For a conservative approach, under \(m\) tests, the \(m - 1^{\text{th}}\) smallest p-value was chosen.

kgMeta <- kgFry %>%
  dplyr::select(gs_name, fry = PValue) %>%
  left_join(
    dplyr::select(kgCamera, gs_name, camera = PValue)
  ) %>%
  left_join(
    dplyr::select(kgGsea, gs_name, gsea = PValue)
  ) %>%
  left_join(
    dplyr::select(kgRiboGoseq, gs_name, goseq = PValue)
  ) %>% 
  nest(p = one_of(c("fry", "camera", "gsea", "goseq"))) %>%
  mutate(
    n_p = vapply(p, function(x){sum(!is.na(unlist(x)))}, integer(1)), 
    wilkinson_p = vapply(p, function(x){
      x <- unlist(x)
      x <- x[!is.na(x)]
      wilkinsonp(x, length(x) - 1)$p
    }, numeric(1)),
    FDR = p.adjust(wilkinson_p, "fdr"), 
    adjP = p.adjust(wilkinson_p, "bonferroni")
  ) %>% 
  arrange(wilkinson_p) %>% 
  unnest(p) %>%
  left_join(gsSizes) %>%
  mutate(
    DE = lapply(gene_id, intersect, dplyr::filter(deTable, DE)$gene_id),
    DE = lapply(DE, unique),
    nDE = vapply(DE, length, integer(1))
  )
kgMeta %>%
  dplyr::filter(FDR < 0.05, nDE > 2) %>%
  mutate_at(vars(one_of(c("wilkinson_p", "FDR", "adjP"))), formatP) %>%
  dplyr::select(`Gene Set` = gs_name, `Number DE` = nDE, `Set Size` = gs_size, `Wilkinson~p~` = wilkinson_p, `p~FDR~` = FDR, `p~bonf~` = adjP) %>%
  pander(
    caption = "Results from combining all above approaches for the KEGG Gene Sets. All terms are significant to an FDR of 0.05. Only Gene Sets with at least three DE genes are shown.",
    justify = "lrrrrr"
  )
Results from combining all above approaches for the KEGG Gene Sets. All terms are significant to an FDR of 0.05. Only Gene Sets with at least three DE genes are shown.
Gene Set Number DE Set Size Wilkinsonp pFDR pbonf
KEGG_RIBOSOME 26 80 8.32e-09 1.55e-06 1.55e-06
KEGG_GLUTATHIONE_METABOLISM 4 36 1.22e-06 7.56e-05 0.0002
KEGG_CYSTEINE_AND_METHIONINE_METABOLISM 4 26 0.0001 0.0013 0.0189
KEGG_METABOLISM_OF_XENOBIOTICS_BY_CYTOCHROME_P450 3 26 0.0002 0.0020 0.0393
KEGG_DRUG_METABOLISM_CYTOCHROME_P450 3 26 0.0003 0.0021 0.0487
KEGG_RETINOL_METABOLISM 3 25 0.0005 0.0036 0.0960
KEGG_PARKINSONS_DISEASE 13 112 0.0006 0.0040 0.1122
KEGG_PURINE_METABOLISM 4 135 0.0007 0.0044 0.1310
KEGG_HUNTINGTONS_DISEASE 15 159 0.0027 0.0125 0.5003
KEGG_PATHOGENIC_ESCHERICHIA_COLI_INFECTION 3 44 0.0036 0.0148 0.6616
KEGG_ALZHEIMERS_DISEASE 12 141 0.0039 0.0159 0.7294
KEGG_OXIDATIVE_PHOSPHORYLATION 14 119 0.0046 0.0180 0.8633
KEGG_HEMATOPOIETIC_CELL_LINEAGE 3 31 0.0047 0.0180 0.8812
KEGG_GLYCOLYSIS_GLUCONEOGENESIS 3 39 0.0067 0.0229 1.0000
KEGG_CARDIAC_MUSCLE_CONTRACTION 3 63 0.0158 0.0451 1.0000
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS 5 86 0.0182 0.0498 1.0000

Exploration of Significant KEGG Gene Sets

kgMeta %>% 
  dplyr::filter(nDE > 2, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>% 
  mutate(
    gs_name = str_remove(gs_name, "KEGG_"),
    gs_name = fct_lump(gs_name, n = 8)
    ) %>% 
  split(f = .$gs_name) %>% 
  lapply(magrittr::extract2, "DE") %>% 
  fromList() %>% 
  upset(
    nsets = length(.), 
    order.by = "freq",
    mb.ratio = c(0.6, 0.4)
    )
*UpSet plot indicating distribution of DE genes within all significant terms from the KEGG gene sets. There is considerable overlap between Oxidative Phosphorylation and both Parkinson's and Huntington's Disease, indicating that these terms essentially capture the same signal.*

UpSet plot indicating distribution of DE genes within all significant terms from the KEGG gene sets. There is considerable overlap between Oxidative Phosphorylation and both Parkinson’s and Huntington’s Disease, indicating that these terms essentially capture the same signal.

Version Author Date
7fe1505 Steve Ped 2020-03-05
7e59d3c Steve Ped 2020-02-20
86ffa07 Steve Ped 2020-02-20
riboOx <- kg %>%
  dplyr::filter(
    grepl("OXIDATIVE_PHOS", gs_name) | grepl("RIBOSOME", gs_name),
    gene_id %in% dplyr::filter(deTable, DE)$gene_id
    ) %>%
  mutate(gs_name = str_remove(gs_name, "(KEGG|HALLMARK)_")) %>%
  distinct(gs_name, gene_id) %>%
  left_join(
    cpmPostNorm %>% 
      as.data.frame() %>% 
      rownames_to_column("gene_id") %>%
      as_tibble()
    ) %>%
  pivot_longer(cols = starts_with("Ps"), names_to = "sample", values_to = "CPM") %>%
  left_join(dgeList$samples) %>% 
  left_join(dgeList$genes) %>%
  dplyr::select(gs_name, gene_name, sampleID, genotype, CPM) %>%
  pivot_wider(
    id_cols = c(gs_name, gene_name),
    values_from = CPM,
    names_from = sampleID
  ) 
riboOx %>% 
  dplyr::select(-gs_name) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    annotation_row = data.frame(
      GeneSet = riboOx$gs_name %>% str_replace("OXI.+", "OXPHOS"), 
      row.names = riboOx$gene_name
      ),
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    annotation_names_row = FALSE,
    cutree_rows = 3,
    cutree_cols = 2
  )
*Given that the largest gene sets within the KEGG results were Oxidative Phosphorylation and Ribosomal gene sets, all DE genes associated with these KEGG terms are displayed. All genes appear to show increased expression in mutant samples.*

Given that the largest gene sets within the KEGG results were Oxidative Phosphorylation and Ribosomal gene sets, all DE genes associated with these KEGG terms are displayed. All genes appear to show increased expression in mutant samples.

Version Author Date
7fe1505 Steve Ped 2020-03-05
86ffa07 Steve Ped 2020-02-20

GO Gene Sets

goFry <- cpmPostNorm %>%
  fry(
    index = goByID,
    design = dgeList$design,
    contrast = "mutant",
    sort = "directional"
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For analysis under camera inter-gene correlations were directly calculated for a more conservative result.

goCamera <- cpmPostNorm %>%
  camera(
    index = goByID,
    design = dgeList$design,
    contrast = "mutant",
    inter.gene.cor = NULL
    ) %>%
  rownames_to_column("gs_name") %>%
  as_tibble()

For generation of the GSEA ranked list, 100,000 permutations were conducted.

goGsea <- fgsea(
  pathways = goByID, 
  stats = rnk,
  nperm = np
) %>%
  as_tibble() %>%
  dplyr::rename(gs_name = pathway, PValue = pval) %>%
  arrange(PValue)

Results for all analyses, including goseq were then combined using Wilkinson’s method to combine p-values. For a conservative approach, under \(m\) tests, the \(m - 1^{\text{th}}\) smallest p-value was chosen.

goMeta <- goFry %>%
  dplyr::select(gs_name, fry = PValue) %>%
  left_join(
    dplyr::select(goCamera, gs_name, camera = PValue)
  ) %>%
  left_join(
    dplyr::select(goGsea, gs_name, gsea = PValue)
  ) %>%
  left_join(
    dplyr::select(goRiboGoseq, gs_name, goseq = PValue)
  )  %>% 
  nest(p = one_of(c("fry", "camera", "gsea", "goseq"))) %>%
  mutate(
    n_p = vapply(p, function(x){sum(!is.na(unlist(x)))}, integer(1)), 
    wilkinson_p = vapply(p, function(x){
      x <- unlist(x)
      x <- x[!is.na(x)]
      wilkinsonp(x, length(x) - 1)$p
    }, numeric(1)),
    FDR = p.adjust(wilkinson_p, "fdr"), 
    adjP = p.adjust(wilkinson_p, "bonferroni")
  ) %>% 
  arrange(wilkinson_p) %>% 
   unnest(p) %>%
  left_join(gsSizes) %>%
  mutate(
    DE = lapply(gene_id, intersect, dplyr::filter(deTable, DE)$gene_id),
    DE = lapply(DE, unique),
    nDE = vapply(DE, length, integer(1))
  )
goMeta %>%
  dplyr::filter(FDR < 0.01, nDE >= 10) %>%
  mutate_at(vars(one_of(c("wilkinson_p", "FDR", "adjP"))), formatP) %>%
  mutate(gs_name = str_trunc(gs_name, 70)) %>%
  dplyr::select(`Gene Set` = gs_name, `Number DE` = nDE, `Set Size` = gs_size, `Wilkinson~p~` = wilkinson_p, `p~FDR~` = FDR, `p~bonf~` = adjP) %>%
  pander(
    caption = "Results from combining all above approaches for the GO Gene Sets. All terms are significant using an FDR < 0.01. Only Gene Sets with at least ten DE genes are shown.",
    justify = "lrrrrr"
  )
Results from combining all above approaches for the GO Gene Sets. All terms are significant using an FDR < 0.01. Only Gene Sets with at least ten DE genes are shown.
Gene Set Number DE Set Size Wilkinsonp pFDR pbonf
GO_SYMPORTER_ACTIVITY 10 94 7.34e-12 6.49e-08 6.49e-08
GO_SECONDARY_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 13 154 2.41e-11 7.50e-08 2.13e-07
GO_ANION_TRANSMEMBRANE_TRANSPORT 15 203 2.55e-11 7.50e-08 2.25e-07
GO_ORGANIC_ANION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 11 144 6.48e-11 1.04e-07 5.72e-07
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_MEMBRANE 30 287 7.06e-11 1.04e-07 6.23e-07
GO_ANION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 17 228 1.42e-10 1.79e-07 1.26e-06
GO_ANION_TRANSPORT 22 434 3.19e-10 3.38e-07 2.81e-06
GO_PROTEIN_TARGETING_TO_MEMBRANE 30 171 2.57e-09 2.27e-06 2.27e-05
GO_PROTEIN_TARGETING 34 379 7.09e-09 5.22e-06 6.26e-05
GO_CYTOSOLIC_PART 30 206 9.18e-09 5.52e-06 8.11e-05
GO_CYTOSOLIC_SMALL_RIBOSOMAL_SUBUNIT 11 41 9.40e-09 5.52e-06 8.30e-05
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_ORGANELLE 38 484 1.00e-08 5.52e-06 8.84e-05
GO_TRANSLATIONAL_INITIATION 29 177 1.29e-08 6.70e-06 0.0001
GO_CYTOSOLIC_LARGE_RIBOSOMAL_SUBUNIT 15 51 1.47e-08 7.20e-06 0.0001
GO_ESTABLISHMENT_OF_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM 27 105 1.58e-08 7.34e-06 0.0001
GO_COTRANSLATIONAL_PROTEIN_TARGETING_TO_MEMBRANE 28 93 1.91e-08 8.46e-06 0.0002
GO_PROTEIN_LOCALIZATION_TO_ENDOPLASMIC_RETICULUM 27 128 2.36e-08 9.92e-06 0.0002
GO_CYTOSOLIC_RIBOSOME 27 97 3.13e-08 1.20e-05 0.0003
GO_ORGANIC_ACID_TRANSPORT 14 240 4.33e-08 1.59e-05 0.0004
GO_ORGANIC_CYCLIC_COMPOUND_CATABOLIC_PROCESS 40 474 5.44e-08 1.92e-05 0.0005
GO_RNA_CATABOLIC_PROCESS 34 337 7.51e-08 2.46e-05 0.0007
GO_RESPONSE_TO_DRUG 31 746 1.05e-07 3.30e-05 0.0009
GO_NUCLEAR_TRANSCRIBED_MRNA_CATABOLIC_PROCESS 30 183 1.61e-07 4.45e-05 0.0014
GO_CYTOPLASMIC_TRANSLATION 11 87 3.11e-07 7.09e-05 0.0028
GO_ORGANONITROGEN_COMPOUND_BIOSYNTHETIC_PROCESS 68 1,431 3.13e-07 7.09e-05 0.0028
GO_AMIDE_BIOSYNTHETIC_PROCESS 40 678 3.31e-07 7.31e-05 0.0029
GO_CELLULAR_AMIDE_METABOLIC_PROCESS 45 805 3.62e-07 7.79e-05 0.0032
GO_INNER_MITOCHONDRIAL_MEMBRANE_PROTEIN_COMPLEX 16 124 3.77e-07 7.94e-05 0.0033
GO_SMALL_RIBOSOMAL_SUBUNIT 11 69 3.96e-07 8.13e-05 0.0035
GO_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 15 254 4.19e-07 8.22e-05 0.0037
GO_ORGANIC_ANION_TRANSPORT 16 345 4.49e-07 8.63e-05 0.0040
GO_PEPTIDE_BIOSYNTHETIC_PROCESS 38 572 1.16e-06 0.0002 0.0102
GO_RESPONSE_TO_CARBOHYDRATE 11 176 1.31e-06 0.0002 0.0116
GO_RIBOSOMAL_SUBUNIT 28 175 1.34e-06 0.0002 0.0118
GO_ORGANIC_ACID_BIOSYNTHETIC_PROCESS 12 198 1.63e-06 0.0003 0.0144
GO_RRNA_BINDING 11 57 2.23e-06 0.0003 0.0197
GO_RIBOSOME 30 210 2.32e-06 0.0003 0.0205
GO_REGULATION_OF_HORMONE_LEVELS 19 362 2.48e-06 0.0003 0.0220
GO_REGULATION_OF_HORMONE_SECRETION 12 200 2.49e-06 0.0003 0.0220
GO_RESPONSE_TO_TOXIC_SUBSTANCE 20 380 3.18e-06 0.0004 0.0281
GO_PROTEIN_LOCALIZATION_TO_MEMBRANE 35 539 3.42e-06 0.0004 0.0302
GO_VIRAL_GENE_EXPRESSION 29 172 3.66e-06 0.0004 0.0323
GO_LARGE_RIBOSOMAL_SUBUNIT 17 108 3.84e-06 0.0005 0.0340
GO_SMALL_MOLECULE_BIOSYNTHETIC_PROCESS 23 478 6.98e-06 0.0007 0.0616
GO_RNA_METABOLIC_PROCESS 48 915 1.27e-05 0.0011 0.1119
GO_OXIDATIVE_PHOSPHORYLATION 15 122 1.35e-05 0.0011 0.1189
GO_MONOCARBOXYLIC_ACID_METABOLIC_PROCESS 10 315 2.46e-05 0.0016 0.2174
GO_SEQUENCE_SPECIFIC_DNA_BINDING 29 827 2.63e-05 0.0017 0.2327
GO_HOMEOSTATIC_PROCESS 55 1,368 3.07e-05 0.0018 0.2715
GO_RESPONSE_TO_OXYGEN_CONTAINING_COMPOUND 43 1,156 3.17e-05 0.0018 0.2803
GO_LIPID_BIOSYNTHETIC_PROCESS 13 508 3.25e-05 0.0018 0.2874
GO_REGULATION_OF_SMALL_MOLECULE_METABOLIC_PROCESS 11 262 3.61e-05 0.0020 0.3186
GO_NEGATIVE_REGULATION_OF_CELL_DIFFERENTIATION 23 549 4.09e-05 0.0021 0.3612
GO_GENERATION_OF_PRECURSOR_METABOLITES_AND_ENERGY 25 422 4.45e-05 0.0022 0.3931
GO_REGULATORY_REGION_NUCLEIC_ACID_BINDING 24 720 4.60e-05 0.0023 0.4059
GO_ATP_SYNTHESIS_COUPLED_ELECTRON_TRANSPORT 12 85 4.62e-05 0.0023 0.4078
GO_RESPIRATORY_CHAIN_COMPLEX 11 75 5.10e-05 0.0025 0.4508
GO_LIPID_METABOLIC_PROCESS 30 906 5.38e-05 0.0025 0.4750
GO_ENZYME_INHIBITOR_ACTIVITY 12 219 5.92e-05 0.0026 0.5226
GO_CELLULAR_LIPID_METABOLIC_PROCESS 16 649 6.23e-05 0.0027 0.5508
GO_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 25 616 6.28e-05 0.0027 0.5544
GO_ORGANIC_ACID_METABOLIC_PROCESS 23 648 7.15e-05 0.0030 0.6318
GO_ENERGY_DERIVATION_BY_OXIDATION_OF_ORGANIC_COMPOUNDS 18 242 7.55e-05 0.0031 0.6666
GO_ENDOPLASMIC_RETICULUM 38 1,407 9.11e-05 0.0034 0.8048
GO_POSITIVE_REGULATION_OF_TRANSCRIPTION_BY_RNA_POLYMERASE_II 31 903 9.22e-05 0.0034 0.8141
GO_NUCLEAR_BODY 21 649 9.34e-05 0.0035 0.8250
GO_REGULATION_OF_NUCLEOBASE_CONTAINING_COMPOUND_METABOLIC_PROCESS 24 421 9.38e-05 0.0035 0.8286
GO_ENDOPLASMIC_RETICULUM_PART 26 1,018 9.68e-05 0.0035 0.8551
GO_REGULATION_OF_APOPTOTIC_SIGNALING_PATHWAY 16 301 0.0001 0.0039 0.9840
GO_CELLULAR_MACROMOLECULE_CATABOLIC_PROCESS 48 939 0.0001 0.0039 1.0000
GO_PROTEIN_LOCALIZATION_TO_ORGANELLE 43 788 0.0001 0.0039 1.0000
GO_DOUBLE_STRANDED_DNA_BINDING 30 740 0.0001 0.0040 1.0000
GO_SEQUENCE_SPECIFIC_DOUBLE_STRANDED_DNA_BINDING 23 668 0.0001 0.0043 1.0000
GO_TRANSMEMBRANE_TRANSPORT 38 1,149 0.0002 0.0047 1.0000
GO_RESPIRATORY_ELECTRON_TRANSPORT_CHAIN 12 103 0.0002 0.0047 1.0000
GO_PROXIMAL_PROMOTER_SEQUENCE_SPECIFIC_DNA_BINDING 15 413 0.0002 0.0049 1.0000
GO_ELECTRON_TRANSPORT_CHAIN 14 153 0.0002 0.0051 1.0000
GO_ION_TRANSPORT 41 1,184 0.0002 0.0051 1.0000
GO_INFLAMMATORY_RESPONSE 18 376 0.0002 0.0052 1.0000
GO_COFACTOR_METABOLIC_PROCESS 10 317 0.0002 0.0052 1.0000
GO_APOPTOTIC_SIGNALING_PATHWAY 24 458 0.0002 0.0052 1.0000
GO_HORMONE_TRANSPORT 12 242 0.0002 0.0053 1.0000
GO_TRANSCRIPTION_FACTOR_BINDING 24 563 0.0002 0.0060 1.0000
GO_ORGANIC_HYDROXY_COMPOUND_METABOLIC_PROCESS 15 342 0.0002 0.0061 1.0000
GO_RESPONSE_TO_CYTOKINE 32 752 0.0003 0.0062 1.0000
GO_RESPONSE_TO_OXIDATIVE_STRESS 14 347 0.0003 0.0064 1.0000
GO_SECRETORY_VESICLE 24 713 0.0003 0.0072 1.0000
GO_NUCLEOSIDE_PHOSPHATE_BIOSYNTHETIC_PROCESS 10 227 0.0003 0.0074 1.0000
GO_CELLULAR_RESPONSE_TO_OXYGEN_CONTAINING_COMPOUND 31 816 0.0004 0.0079 1.0000
GO_NEGATIVE_REGULATION_OF_APOPTOTIC_SIGNALING_PATHWAY 10 171 0.0004 0.0079 1.0000
GO_MACROMOLECULE_CATABOLIC_PROCESS 52 1,116 0.0004 0.0079 1.0000
GO_REGULATION_OF_DNA_METABOLIC_PROCESS 15 236 0.0004 0.0080 1.0000
GO_CATION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 18 458 0.0004 0.0081 1.0000
GO_CYTOPLASMIC_VESICLE_PART 22 1,141 0.0005 0.0087 1.0000
GO_ION_TRANSMEMBRANE_TRANSPORT 29 826 0.0005 0.0088 1.0000
GO_PROTEIN_HOMOOLIGOMERIZATION 10 246 0.0005 0.0090 1.0000
GO_POSITIVE_REGULATION_OF_RNA_BIOSYNTHETIC_PROCESS 43 1,232 0.0005 0.0093 1.0000
GO_VESICLE_MEMBRANE 11 609 0.0005 0.0093 1.0000
GO_MONOVALENT_INORGANIC_CATION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 13 268 0.0005 0.0097 1.0000
GO_INTRACELLULAR_TRANSPORT 65 1,571 0.0005 0.0098 1.0000
GO_TRANSMEMBRANE_RECEPTOR_PROTEIN_TYROSINE_KINASE_SIGNALING_PATHWAY 12 569 0.0006 0.0098 1.0000
GO_CARBOHYDRATE_DERIVATIVE_BIOSYNTHETIC_PROCESS 18 540 0.0006 0.0099 1.0000

Exploration of Significant GO Gene Sets

goMeta %>% 
  dplyr::filter(nDE >= 25, FDR < 0.05) %>%
  dplyr::select(gs_name, DE) %>%
  unnest(DE) %>% 
  mutate(
    gs_name = str_remove(gs_name, "GO_"),
    gs_name = fct_lump(gs_name, n = 20)
    ) %>% 
  split(f = .$gs_name) %>% 
  lapply(magrittr::extract2, "DE") %>% 
  .[setdiff(names(.), "Other")] %>%
  fromList() %>% 
  upset(
    nsets = length(.), 
    order.by = "freq", 
    mb.ratio = c(0.55, 0.45)
  )
*UpSet plot indicating distribution of DE genes within significant terms from the GO gene sets. For this visualisation, GO terms were restricted to those with 25 or more DE genes and an FDR < 0.05. A group of about 300 genes is widely spread across multiple terms and not shown, with the next largest group being 19 genes uniquely associated with RNA Metabolic Process. A group of 15 genes is shared by a large group of terms, indicating that these genes largely drive the signal for these terms. These genes tend to indicate Ribosomal activity (i.e. protein synthesis). However, a set of 14 genes seems to more uniquely indicate Mitochondrial activity, followed by 12 uniquely associated with Ion Transport, and a set of 9 genes associated with homeostasis followed by 25 (9 + + 8) involved with regulation of gene expression.*

UpSet plot indicating distribution of DE genes within significant terms from the GO gene sets. For this visualisation, GO terms were restricted to those with 25 or more DE genes and an FDR < 0.05. A group of about 300 genes is widely spread across multiple terms and not shown, with the next largest group being 19 genes uniquely associated with RNA Metabolic Process. A group of 15 genes is shared by a large group of terms, indicating that these genes largely drive the signal for these terms. These genes tend to indicate Ribosomal activity (i.e. protein synthesis). However, a set of 14 genes seems to more uniquely indicate Mitochondrial activity, followed by 12 uniquely associated with Ion Transport, and a set of 9 genes associated with homeostasis followed by 25 (9 + + 8) involved with regulation of gene expression.

Version Author Date
7fe1505 Steve Ped 2020-03-05
0168824 Steve Ped 2020-02-20
86ffa07 Steve Ped 2020-02-20

Genes Associated with RNA Metabolic Process

go %>%
  dplyr::filter(
    gs_name == "GO_RNA_METABOLIC_PROCESS",
    gene_id %in% dplyr::filter(deTable, DE)$gene_id
  ) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )
*All DE genes associated with the GO term 'RNA Metabolic Process'.*

All DE genes associated with the GO term ‘RNA Metabolic Process’.

Version Author Date
7fe1505 Steve Ped 2020-03-05
0168824 Steve Ped 2020-02-20
86ffa07 Steve Ped 2020-02-20

Genes Associated with the Mitochondrial Envelope

go %>%
  dplyr::filter(
    gs_name == "GO_MITOCHONDRION",
    gene_id %in% dplyr::filter(deTable, DE)$gene_id
  ) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 5,
    cutree_cols = 2
  )
*All DE genes associated with the GO term 'Mitochondrion'.*

All DE genes associated with the GO term ‘Mitochondrion’.

Version Author Date
7fe1505 Steve Ped 2020-03-05
86ffa07 Steve Ped 2020-02-20

Genes Associated with Ion Transport

go %>%
  dplyr::filter(
    gene_id %in% dplyr::filter(deTable, DE)$gene_id,
    gs_name %in% c("GO_ION_TRANSPORT")
  ) %>%
  group_by(gene_id, gene_name) %>%
  tally() %>%
  ungroup() %>%
  dplyr::filter(n == 1) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )
*All DE genes associated with the GO term 'Ion Transport'.*

All DE genes associated with the GO term ‘Ion Transport’.

Version Author Date
7fe1505 Steve Ped 2020-03-05
0168824 Steve Ped 2020-02-20
86ffa07 Steve Ped 2020-02-20

Genes Associated with Homeostasis

go %>%
  dplyr::filter(
    gene_id %in% dplyr::filter(deTable, DE)$gene_id,
    gs_name %in% c("GO_HOMEOSTATIC_PROCESS")
  ) %>%
  group_by(gene_id, gene_name) %>%
  tally() %>%
  ungroup() %>%
  dplyr::filter(n == 1) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )
*All DE genes associated with the GO term 'Homeostatic Process'.*

All DE genes associated with the GO term ‘Homeostatic Process’.

Version Author Date
7fe1505 Steve Ped 2020-03-05

Genes Associated with Gene Regulation

go %>%
  dplyr::filter(
    gene_id %in% dplyr::filter(deTable, DE)$gene_id,
    gs_name %in% c("GO_POSITIVE_REGULATION_OF_GENE_EXPRESSION")
  ) %>%
  group_by(gene_id, gene_name) %>%
  tally() %>%
  ungroup() %>%
  dplyr::filter(n == 1) %>%
  dplyr::select(
    gene_id, gene_name
  ) %>%
  left_join(
    cpmPostNorm %>%
      as.data.frame() %>%
      rownames_to_column("gene_id")
  ) %>%
  dplyr::select(gene_name, starts_with("Ps")) %>%
  as.data.frame() %>%
  column_to_rownames("gene_name") %>%
  as.matrix() %>%
  pheatmap(
    color = viridis_pal(option = "magma")(100),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    labels_col = hmHeat %>%
      dplyr::select(starts_with("Ps")) %>% 
      colnames() %>% 
      enframe(name = NULL) %>% 
      dplyr::rename(sample = value) %>%
      left_join(dgeList$samples) %>%
      dplyr::select(sample, sampleID) %>%
      with(
        structure(sampleID, names = sample)
      ),
    cutree_rows = 4,
    cutree_cols = 2
  )
*All DE genes associated with the GO term 'Positive Regulation of Gene Expression'.*

All DE genes associated with the GO term ‘Positive Regulation of Gene Expression’.

Version Author Date
7fe1505 Steve Ped 2020-03-05

Data Export

All enriched gene sets with an FDR adjusted p-value < 0.05 were exported as a single csv file.

add_prefix <- function(x, pre = "p_"){
  paste0(pre, x)
}
bind_rows(
  hmMeta,
  kgMeta,
  goMeta
) %>%
  dplyr::filter(FDR < 0.05) %>%
  mutate(
    DE = lapply(DE, function(x){dplyr::filter(deTable, gene_id %in% x)$gene_name}),
    DE = lapply(DE, unique),
    DE = vapply(DE, paste, character(1), collapse = ";")
  ) %>%
  arrange(wilkinson_p) %>%
  dplyr::select(
    `Gene Set` = gs_name, 
    `Nbr Detected Genes` = gs_size, 
    `Nbr DE Genes` = nDE, 
    combined = wilkinson_p, FDR, 
    fry, camera, gsea, goseq, 
    `DE Genes` = DE
  ) %>%
  rename_at(
    vars(combined, fry, camera, gsea, goseq),
    add_prefix
  ) %>%
  write_csv(
    here::here("output", "Enrichment_Mutant_V_WT.csv")
  )

devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_AU:en                    
 collate  en_AU.UTF-8                 
 ctype    en_AU.UTF-8                 
 tz       Australia/Adelaide          
 date     2020-03-05                  

─ Packages ───────────────────────────────────────────────────────────────────
 package              * version    date       lib source        
 AnnotationDbi        * 1.48.0     2019-10-29 [2] Bioconductor  
 askpass                1.1        2019-01-13 [2] CRAN (R 3.6.0)
 assertthat             0.2.1      2019-03-21 [2] CRAN (R 3.6.0)
 backports              1.1.5      2019-10-02 [2] CRAN (R 3.6.1)
 BiasedUrn            * 1.07       2015-12-28 [2] CRAN (R 3.6.1)
 bibtex                 0.4.2.2    2020-01-02 [2] CRAN (R 3.6.2)
 Biobase              * 2.46.0     2019-10-29 [2] Bioconductor  
 BiocFileCache          1.10.2     2019-11-08 [2] Bioconductor  
 BiocGenerics         * 0.32.0     2019-10-29 [2] Bioconductor  
 BiocParallel           1.20.1     2019-12-21 [2] Bioconductor  
 biomaRt                2.42.0     2019-10-29 [2] Bioconductor  
 Biostrings             2.54.0     2019-10-29 [2] Bioconductor  
 bit                    1.1-15.2   2020-02-10 [2] CRAN (R 3.6.2)
 bit64                  0.9-7      2017-05-08 [2] CRAN (R 3.6.0)
 bitops                 1.0-6      2013-08-17 [2] CRAN (R 3.6.0)
 blob                   1.2.1      2020-01-20 [2] CRAN (R 3.6.2)
 broom                  0.5.4      2020-01-27 [2] CRAN (R 3.6.2)
 callr                  3.4.2      2020-02-12 [2] CRAN (R 3.6.2)
 cellranger             1.1.0      2016-07-27 [2] CRAN (R 3.6.0)
 cli                    2.0.1      2020-01-08 [2] CRAN (R 3.6.2)
 cluster                2.1.0      2019-06-19 [2] CRAN (R 3.6.1)
 codetools              0.2-16     2018-12-24 [4] CRAN (R 3.6.0)
 colorspace             1.4-1      2019-03-18 [2] CRAN (R 3.6.0)
 crayon                 1.3.4      2017-09-16 [2] CRAN (R 3.6.0)
 curl                   4.3        2019-12-02 [2] CRAN (R 3.6.2)
 data.table             1.12.8     2019-12-09 [2] CRAN (R 3.6.2)
 DBI                    1.1.0      2019-12-15 [2] CRAN (R 3.6.2)
 dbplyr                 1.4.2      2019-06-17 [2] CRAN (R 3.6.0)
 DelayedArray           0.12.2     2020-01-06 [2] Bioconductor  
 desc                   1.2.0      2018-05-01 [2] CRAN (R 3.6.0)
 devtools               2.2.2      2020-02-17 [2] CRAN (R 3.6.2)
 digest                 0.6.25     2020-02-23 [2] CRAN (R 3.6.2)
 dplyr                * 0.8.4      2020-01-31 [2] CRAN (R 3.6.2)
 edgeR                * 3.28.1     2020-02-26 [2] Bioconductor  
 ellipsis               0.3.0      2019-09-20 [2] CRAN (R 3.6.1)
 evaluate               0.14       2019-05-28 [2] CRAN (R 3.6.0)
 FactoMineR             2.2        2020-02-05 [2] CRAN (R 3.6.2)
 fansi                  0.4.1      2020-01-08 [2] CRAN (R 3.6.2)
 farver                 2.0.3      2020-01-16 [2] CRAN (R 3.6.2)
 fastmatch              1.1-0      2017-01-28 [2] CRAN (R 3.6.0)
 fgsea                * 1.12.0     2019-10-29 [2] Bioconductor  
 flashClust             1.01-2     2012-08-21 [2] CRAN (R 3.6.1)
 forcats              * 0.4.0      2019-02-17 [2] CRAN (R 3.6.0)
 fs                     1.3.1      2019-05-06 [2] CRAN (R 3.6.0)
 gbRd                   0.4-11     2012-10-01 [2] CRAN (R 3.6.0)
 geneLenDataBase      * 1.22.0     2019-11-05 [2] Bioconductor  
 generics               0.0.2      2018-11-29 [2] CRAN (R 3.6.0)
 GenomeInfoDb           1.22.0     2019-10-29 [2] Bioconductor  
 GenomeInfoDbData       1.2.2      2019-11-21 [2] Bioconductor  
 GenomicAlignments      1.22.1     2019-11-12 [2] Bioconductor  
 GenomicFeatures        1.38.2     2020-02-15 [2] Bioconductor  
 GenomicRanges          1.38.0     2019-10-29 [2] Bioconductor  
 ggdendro               0.1-20     2016-04-27 [2] CRAN (R 3.6.0)
 ggplot2              * 3.2.1      2019-08-10 [2] CRAN (R 3.6.1)
 ggrepel                0.8.1      2019-05-07 [2] CRAN (R 3.6.0)
 git2r                  0.26.1     2019-06-29 [2] CRAN (R 3.6.1)
 glue                   1.3.1      2019-03-12 [2] CRAN (R 3.6.0)
 GO.db                  3.10.0     2019-11-21 [2] Bioconductor  
 goseq                * 1.38.0     2019-10-29 [2] Bioconductor  
 gridExtra              2.3        2017-09-09 [2] CRAN (R 3.6.0)
 gtable                 0.3.0      2019-03-25 [2] CRAN (R 3.6.0)
 haven                  2.2.0      2019-11-08 [2] CRAN (R 3.6.1)
 here                   0.1        2017-05-28 [2] CRAN (R 3.6.0)
 highr                  0.8        2019-03-20 [2] CRAN (R 3.6.0)
 hms                    0.5.3      2020-01-08 [2] CRAN (R 3.6.2)
 htmltools              0.4.0      2019-10-04 [2] CRAN (R 3.6.1)
 htmlwidgets            1.5.1      2019-10-08 [2] CRAN (R 3.6.1)
 httpuv                 1.5.2      2019-09-11 [2] CRAN (R 3.6.1)
 httr                   1.4.1      2019-08-05 [2] CRAN (R 3.6.1)
 hwriter                1.3.2      2014-09-10 [2] CRAN (R 3.6.0)
 IRanges              * 2.20.2     2020-01-13 [2] Bioconductor  
 jpeg                   0.1-8.1    2019-10-24 [2] CRAN (R 3.6.1)
 jsonlite               1.6.1      2020-02-02 [2] CRAN (R 3.6.2)
 kableExtra             1.1.0      2019-03-16 [2] CRAN (R 3.6.1)
 knitr                  1.28       2020-02-06 [2] CRAN (R 3.6.2)
 labeling               0.3        2014-08-23 [2] CRAN (R 3.6.0)
 later                  1.0.0      2019-10-04 [2] CRAN (R 3.6.1)
 lattice                0.20-40    2020-02-19 [4] CRAN (R 3.6.2)
 latticeExtra           0.6-29     2019-12-19 [2] CRAN (R 3.6.2)
 lazyeval               0.2.2      2019-03-15 [2] CRAN (R 3.6.0)
 leaps                  3.1        2020-01-16 [2] CRAN (R 3.6.2)
 lifecycle              0.1.0      2019-08-01 [2] CRAN (R 3.6.1)
 limma                * 3.42.2     2020-02-03 [2] Bioconductor  
 locfit                 1.5-9.1    2013-04-20 [2] CRAN (R 3.6.0)
 lubridate              1.7.4      2018-04-11 [2] CRAN (R 3.6.0)
 magrittr             * 1.5        2014-11-22 [2] CRAN (R 3.6.0)
 MASS                   7.3-51.5   2019-12-20 [4] CRAN (R 3.6.2)
 Matrix                 1.2-18     2019-11-27 [2] CRAN (R 3.6.1)
 matrixStats            0.55.0     2019-09-07 [2] CRAN (R 3.6.1)
 memoise                1.1.0      2017-04-21 [2] CRAN (R 3.6.0)
 metap                * 1.3        2020-01-23 [2] CRAN (R 3.6.2)
 mgcv                   1.8-31     2019-11-09 [4] CRAN (R 3.6.1)
 mnormt                 1.5-6      2020-02-03 [2] CRAN (R 3.6.2)
 modelr                 0.1.6      2020-02-22 [2] CRAN (R 3.6.2)
 msigdbr              * 7.0.1      2019-09-04 [2] CRAN (R 3.6.2)
 multcomp               1.4-12     2020-01-10 [2] CRAN (R 3.6.2)
 multtest               2.42.0     2019-10-29 [2] Bioconductor  
 munsell                0.5.0      2018-06-12 [2] CRAN (R 3.6.0)
 mutoss                 0.1-12     2017-12-04 [2] CRAN (R 3.6.2)
 mvtnorm                1.1-0      2020-02-24 [2] CRAN (R 3.6.2)
 ngsReports           * 1.1.2      2019-10-16 [1] Bioconductor  
 nlme                   3.1-144    2020-02-06 [4] CRAN (R 3.6.2)
 numDeriv               2016.8-1.1 2019-06-06 [2] CRAN (R 3.6.2)
 openssl                1.4.1      2019-07-18 [2] CRAN (R 3.6.1)
 pander               * 0.6.3      2018-11-06 [2] CRAN (R 3.6.0)
 pheatmap             * 1.0.12     2019-01-04 [2] CRAN (R 3.6.0)
 pillar                 1.4.3      2019-12-20 [2] CRAN (R 3.6.2)
 pkgbuild               1.0.6      2019-10-09 [2] CRAN (R 3.6.1)
 pkgconfig              2.0.3      2019-09-22 [2] CRAN (R 3.6.1)
 pkgload                1.0.2      2018-10-29 [2] CRAN (R 3.6.0)
 plotly                 4.9.2      2020-02-12 [2] CRAN (R 3.6.2)
 plotrix                3.7-7      2019-12-05 [2] CRAN (R 3.6.2)
 plyr                   1.8.5      2019-12-10 [2] CRAN (R 3.6.2)
 png                    0.1-7      2013-12-03 [2] CRAN (R 3.6.0)
 prettyunits            1.1.1      2020-01-24 [2] CRAN (R 3.6.2)
 processx               3.4.2      2020-02-09 [2] CRAN (R 3.6.2)
 progress               1.2.2      2019-05-16 [2] CRAN (R 3.6.0)
 promises               1.1.0      2019-10-04 [2] CRAN (R 3.6.1)
 ps                     1.3.2      2020-02-13 [2] CRAN (R 3.6.2)
 purrr                * 0.3.3      2019-10-18 [2] CRAN (R 3.6.1)
 R6                     2.4.1      2019-11-12 [2] CRAN (R 3.6.1)
 rappdirs               0.3.1      2016-03-28 [2] CRAN (R 3.6.0)
 RColorBrewer         * 1.1-2      2014-12-07 [2] CRAN (R 3.6.0)
 Rcpp                 * 1.0.3      2019-11-08 [2] CRAN (R 3.6.1)
 RCurl                  1.98-1.1   2020-01-19 [2] CRAN (R 3.6.2)
 Rdpack                 0.11-1     2019-12-14 [2] CRAN (R 3.6.2)
 readr                * 1.3.1      2018-12-21 [2] CRAN (R 3.6.0)
 readxl                 1.3.1      2019-03-13 [2] CRAN (R 3.6.0)
 remotes                2.1.1      2020-02-15 [2] CRAN (R 3.6.2)
 reprex                 0.3.0      2019-05-16 [2] CRAN (R 3.6.0)
 reshape2               1.4.3      2017-12-11 [2] CRAN (R 3.6.0)
 rlang                  0.4.4      2020-01-28 [2] CRAN (R 3.6.2)
 rmarkdown              2.1        2020-01-20 [2] CRAN (R 3.6.2)
 rprojroot              1.3-2      2018-01-03 [2] CRAN (R 3.6.0)
 Rsamtools              2.2.3      2020-02-23 [2] Bioconductor  
 RSQLite                2.2.0      2020-01-07 [2] CRAN (R 3.6.2)
 rstudioapi             0.11       2020-02-07 [2] CRAN (R 3.6.2)
 rtracklayer            1.46.0     2019-10-29 [2] Bioconductor  
 rvest                  0.3.5      2019-11-08 [2] CRAN (R 3.6.1)
 S4Vectors            * 0.24.3     2020-01-18 [2] Bioconductor  
 sandwich               2.5-1      2019-04-06 [2] CRAN (R 3.6.1)
 scales               * 1.1.0      2019-11-18 [2] CRAN (R 3.6.1)
 scatterplot3d          0.3-41     2018-03-14 [2] CRAN (R 3.6.1)
 sessioninfo            1.1.1      2018-11-05 [2] CRAN (R 3.6.0)
 ShortRead              1.44.3     2020-02-03 [2] Bioconductor  
 sn                     1.5-5      2020-01-30 [2] CRAN (R 3.6.2)
 statmod                1.4.34     2020-02-17 [2] CRAN (R 3.6.2)
 stringi                1.4.6      2020-02-17 [2] CRAN (R 3.6.2)
 stringr              * 1.4.0      2019-02-10 [2] CRAN (R 3.6.0)
 SummarizedExperiment   1.16.1     2019-12-19 [2] Bioconductor  
 survival               3.1-8      2019-12-03 [2] CRAN (R 3.6.2)
 testthat               2.3.1      2019-12-01 [2] CRAN (R 3.6.1)
 TFisher                0.2.0      2018-03-21 [2] CRAN (R 3.6.2)
 TH.data                1.0-10     2019-01-21 [2] CRAN (R 3.6.1)
 tibble               * 2.1.3      2019-06-06 [2] CRAN (R 3.6.0)
 tidyr                * 1.0.2      2020-01-24 [2] CRAN (R 3.6.2)
 tidyselect             1.0.0      2020-01-27 [2] CRAN (R 3.6.2)
 tidyverse            * 1.3.0      2019-11-21 [2] CRAN (R 3.6.1)
 truncnorm              1.0-8      2018-02-27 [2] CRAN (R 3.6.0)
 UpSetR               * 1.4.0      2019-05-22 [2] CRAN (R 3.6.0)
 usethis                1.5.1      2019-07-04 [2] CRAN (R 3.6.1)
 vctrs                  0.2.3      2020-02-20 [2] CRAN (R 3.6.2)
 viridisLite            0.3.0      2018-02-01 [2] CRAN (R 3.6.0)
 webshot                0.5.2      2019-11-22 [2] CRAN (R 3.6.1)
 whisker                0.4        2019-08-28 [2] CRAN (R 3.6.1)
 withr                  2.1.2      2018-03-15 [2] CRAN (R 3.6.0)
 workflowr            * 1.6.0      2019-12-19 [2] CRAN (R 3.6.2)
 xfun                   0.12       2020-01-13 [2] CRAN (R 3.6.2)
 XML                    3.99-0.3   2020-01-20 [2] CRAN (R 3.6.2)
 xml2                   1.2.2      2019-08-09 [2] CRAN (R 3.6.1)
 XVector                0.26.0     2019-10-29 [2] Bioconductor  
 yaml                   2.2.1      2020-02-01 [2] CRAN (R 3.6.2)
 zlibbioc               1.32.0     2019-10-29 [2] Bioconductor  
 zoo                    1.8-7      2020-01-10 [2] CRAN (R 3.6.2)

[1] /home/steveped/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library