Last updated: 2020-02-17

Checks: 7 0

Knit directory: 20170327_Psen2S4Ter_RNASeq/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200119)

The command set.seed(20200119) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 53ed0e3

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
html	68033fe	Steve Ped	2020-01-26	Corrected chunk labels after publishing DE analysis
Rmd	25922d6	Steve Ped	2020-01-26	Added first pass of enrichment
Rmd	96d8cc7	Steve Ped	2020-01-25	Compiled after data export & added compression to output
html	96d8cc7	Steve Ped	2020-01-25	Compiled after data export & added compression to output
Rmd	07a6053	Steve Ped	2020-01-25	Corrected paths
Rmd	09180e5	Steve Ped	2020-01-25	Added data export
Rmd	fca554f	Steve Ped	2020-01-25	Tweaked explanations & labels
Rmd	be95a60	Steve Ped	2020-01-25	Finished first pass of DE Analysis
html	be95a60	Steve Ped	2020-01-25	Finished first pass of DE Analysis
html	f04bb47	Steve Ped	2020-01-25	Minor typos
Rmd	7251bd2	Steve Ped	2020-01-25	Tweaks
Rmd	7fea156	Steve Ped	2020-01-25	Added rRNA checks & heatmaps
html	9bff516	Steve Ped	2020-01-24	Added analysis without CQN
Rmd	7b680b2	Steve Ped	2020-01-24	Added analysis without CQN
html	95ccc90	Steve Ped	2020-01-22	Compiled DE so far
Rmd	1858f49	Steve Ped	2020-01-22	Started presenting actual DE genes
html	1858f49	Steve Ped	2020-01-22	Started presenting actual DE genes
Rmd	10a285b	Steve Ped	2020-01-22	Expanded description in index and tried reusing an image
html	10a285b	Steve Ped	2020-01-22	Expanded description in index and tried reusing an image
html	aabb570	Steve Ped	2020-01-21	Rebuilt to get rid of warnings
Rmd	71b8832	Steve Ped	2020-01-21	Added DE QC for GC bias
html	71b8832	Steve Ped	2020-01-21	Added DE QC for GC bias
Rmd	e825637	Steve Ped	2020-01-21	Minor updates to DE plots
html	01512da	Steve Ped	2020-01-21	Added initial DE analysis to index
Rmd	fbb6242	Steve Ped	2020-01-21	Paused DE analysis
Rmd	c560637	Steve Ped	2020-01-20	Started DE analysis

Setup

library(ngsReports)
library(tidyverse)
library(magrittr)
library(edgeR)
library(AnnotationHub)
library(ensembldb)
library(scales)
library(pander)
library(cqn)
library(ggrepel)
library(pheatmap)
library(RColorBrewer)

if (interactive()) setwd(here::here())
theme_set(theme_bw())
panderOptions("big.mark", ",")
panderOptions("table.split.table", Inf)
panderOptions("table.style", "rmarkdown")

Annotations

ah <- AnnotationHub() %>%
    subset(species == "Danio rerio") %>%
    subset(rdataclass == "EnsDb")
ensDb <- ah[["AH74989"]]

grTrans <- transcripts(ensDb)
trLengths <- exonsBy(ensDb, "tx") %>%
    width() %>%
    vapply(sum, integer(1))
mcols(grTrans)$length <- trLengths[names(grTrans)]

gcGene <- grTrans %>%
    mcols() %>%
    as.data.frame() %>%
    dplyr::select(gene_id, tx_id, gc_content, length) %>%
    as_tibble() %>%
    group_by(gene_id) %>%
    summarise(
        gc_content = sum(gc_content*length) / sum(length),
        length = ceiling(median(length))
    )
grGenes <- genes(ensDb)
mcols(grGenes) %<>%
    as.data.frame() %>%
    left_join(gcGene) %>%
    as.data.frame() %>%
    DataFrame()

Similarly to the Quality Assessment steps, GRanges objects were formed at the gene and transcript levels, to enable estimation of GC content and length for each transcript and gene. GC content and transcript length are available for each transcript, and for gene-level estimates, GC content was taken as the sum of all GC bases divided by the sum of all transcript lengths, effectively averaging across all transcripts. Gene length was defined as the median transcript length.

samples <- read_csv("data/samples.csv") %>%
  distinct(sampleName, .keep_all = TRUE) %>%
  dplyr::select(sample = sampleName, sampleID, genotype) %>%
  mutate(
    genotype = factor(genotype, levels = c("WT", "Het", "Hom")),
    mutant = genotype %in% c("Het", "Hom"),
    homozygous = genotype == "Hom"
  )
genoCols <- samples$genotype %>%
  levels() %>%
  length() %>%
  brewer.pal("Set1") %>%
  setNames(levels(samples$genotype))

Sample metadata was also loaded, with only the sampleID and genotype being retained. All other fields were considered irrelevant.

Count Data

minCPM <- 1.5
minSamples <- 4
dgeList <- file.path("data", "2_alignedData", "featureCounts", "genes.out") %>%
    read_delim(delim = "\t") %>%
    set_names(basename(names(.))) %>%
    as.data.frame() %>%
    column_to_rownames("Geneid") %>%
    as.matrix() %>% 
    set_colnames(str_remove(colnames(.), "Aligned.sortedByCoord.out.bam")) %>%
    .[rowSums(cpm(.) >= minCPM) >= minCPM,] %>%
    DGEList(
        samples = tibble(sample = colnames(.)) %>%
            left_join(samples),
        genes = grGenes[rownames(.)] %>%
            as.data.frame() %>%
            dplyr::select(
                chromosome = seqnames, start, end, 
                gene_id, gene_name, gene_biotype, description, 
                entrezid, gc_content, length
            )
    ) %>%
    .[!grepl("rRNA", .$genes$gene_biotype),] %>%
    calcNormFactors()

Gene-level count data as output by featureCounts, was loaded and formed into a DGEList object. During this process, genes were removed if:

They were not considered as detectable (CPM < 1.5 in > 8 samples). This translates to > 18 reads assigned a gene in all samples from one or more of the genotype groups
The gene_biotype was any type of rRNA.

These filtering steps returned gene-level counts for 16,640 genes, with total library sizes between 11,852,141 and 16,997,219 reads assigned to genes. It was noted that these library sizes were about 1.5-fold larger than the transcript-level counts used for the QA steps.

cpm(dgeList, log = TRUE) %>%
  as.data.frame() %>%
  pivot_longer(
    cols = everything(),
    names_to = "sample",
    values_to = "logCPM"
  ) %>%
  split(f = .$sample) %>%
  lapply(function(x){
    d <- density(x$logCPM)
    tibble(
      sample = unique(x$sample),
      x = d$x,
      y = d$y
    )
  }) %>%
  bind_rows() %>%
  left_join(samples) %>%
  ggplot(aes(x, y, colour = genotype, group = sample)) +
  geom_line() +
  scale_colour_manual(
    values = genoCols
  ) +
  labs(
    x = "logCPM",
    y = "Density",
    colour = "Genotype"
  )

Expression density plots for all samples after filtering, showing logCPM values.

Version	Author	Date
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22
01512da	Steve Ped	2020-01-21

Additional Functions

contLabeller <- as_labeller(
    c(
        HetVsWT = "S4Ter/+ Vs +/+",
        HomVsWT = "S4Ter/S4Ter Vs +/+",
        HomVsHet = "S4Ter/S4Ter Vs S4Ter/+",
        Hom = "S4Ter/S4Ter",
        Het = "S4Ter/+",
        WT = "+/+",
        mutant = "S4Ter Vs WT",
        homozygous = "S4Ter/S4Ter Vs S4Ter/+"
    )
)
geneLabeller <- structure(grGenes$gene_name, names = grGenes$gene_id) %>%
    as_labeller()

Labeller functions for genotypes, contrasts and gene names were additionally defined for simpler plotting using ggplot2.

Correlations with rRNA contaminants

In order to asses the impact of the potential rRNA contaminant, correlations between the initial estimate of contamination and each gene’s expression pattern were also investigated. Conceptually, genes which are highly correlated with rRNA may be physically associated with these complexes. Similary, genes with high negative correlations may be actively selected against in the presence of rRNA complexes.

rawFqc <- list.files(
  path = "data/0_rawData/FastQC/",
  pattern = "zip",
  full.names = TRUE
) %>%
  FastqcDataList()
gc <- getModule(rawFqc, "Per_sequence_GC") %>%
  group_by(Filename) %>% 
  mutate(Freq = Count / sum(Count)) %>%
  ungroup()
rawGC <- gc %>%
  dplyr::filter(GC_Content > 70) %>% 
  group_by(Filename) %>%
  summarise(Freq = sum(Freq)) %>% 
  arrange(desc(Freq)) %>%
  mutate(sample = str_remove(Filename, "_R[12].fastq.gz")) %>%
  group_by(sample) %>%
  summarise(Freq = mean(Freq)) %>%
  left_join(samples)

riboVec <- structure(rawGC$Freq, names = rawGC$sample)
riboCors <- cpm(dgeList, log = TRUE) %>%
  apply(1, function(x){
    cor(x, riboVec[names(x)])
  })

Analysis

PCA

pca <- dgeList %>%
    cpm(log = TRUE) %>%
    t() %>%
    prcomp() 
pcaVars <- percent_format(0.1)(summary(pca)$importance["Proportion of Variance",])

pca$x %>%
  as.data.frame() %>%
  rownames_to_column("sample") %>%
  left_join(samples) %>%
  as_tibble() %>%
  ggplot(aes(PC1, PC2, colour = genotype, fill = genotype)) +
  geom_point() +
  geom_text_repel(aes(label = sampleID), show.legend = FALSE) +
  stat_ellipse(geom = "polygon", alpha = 0.05, show.legend = FALSE) +
  guides(fill = FALSE) +
  scale_colour_manual(
    values = genoCols
  ) +
  labs(
    x = paste0("PC1 (", pcaVars[["PC1"]], ")"),
    y = paste0("PC2 (", pcaVars[["PC2"]], ")"),
    colour = "Genotype"
  )

PCA of gene-level counts.

Version	Author	Date
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22
01512da	Steve Ped	2020-01-21

A Principal Component Analysis (PCA) was also performed using logCPM values from each sample. Both mutant genotypes appear to cluster together, however it has previously been noted that GC content appears to track closely with PC1, as a result of variable rRNA removal.

Model Description

Version	Author	Date
10a285b	Steve Ped	2020-01-22

The initial expectation was to perform three pairwise comparisons as described in the above figure. However, given there was a strong similarity between mutants, the model matrix was defined as containing an intercept (i.e. baseline expression in wild-type), with additional columns defining presence of any mutant alleles, and the final column capturing the difference between mutants. This avoids any issues with the use of hard cutoffs when combing lists, such as would occur when comparing separate results from both mutant genotypes against wild-type samples.

d <- model.matrix(~mutant + homozygous, data = dgeList$samples) %>%
  set_colnames(str_remove(colnames(.), "TRUE"))

pheatmap(
  d, 
  cluster_cols = FALSE, 
  cluster_rows = FALSE, 
  color = c("white", "grey50"),
  annotation_row = dgeList$samples["genotype"],
  annotation_colors = list(genotype = genoCols),
  legend = FALSE
)

Visualisation of the design matrix

Version	Author	Date
7251bd2	Steve Ped	2020-01-25

Normalisation

As GC content and length was noted as being of concern for this dataset, conditional-quantile normalisation was performed using the cqn package. This adds a gene and sample-level offset for each count which takes into account any systemic bias, such as that identified previously as an artefact of variable rRNA removal. The resultant glm.offset values were added to the original DGEList object, and all dispersion estimates were calculated.

gcCqn <- cqn(
    counts = dgeList$counts,
    x = dgeList$genes$gc_content,
    lengths = dgeList$genes$length,
    sizeFactors = dgeList$samples$lib.size
)

par(mfrow = c(1, 2))
cqnplot(gcCqn, n = 1, xlab = "GC Content", col = genoCols)
cqnplot(gcCqn, n = 2, xlab = "Length", col = genoCols)
legend("bottomright", legend = levels(samples$genotype), col = genoCols, lty = 1)

Model fits for GC content and gene length under the CQN model. Genotype-specific effects are clearly visible.

Version	Author	Date
7251bd2	Steve Ped	2020-01-25
01512da	Steve Ped	2020-01-21

par(mfrow = c(1, 1))

dgeList$offset <- gcCqn$glm.offset 
dgeList %<>% estimateDisp(design = d)

Model Fitting

minLfc <- log2(2)
alpha <- 0.01
fit <- glmFit(dgeList)
topTables <- colnames(d)[2:3] %>%
  sapply(function(x){
    glmLRT(fit, coef = x) %>%
      topTags(n = Inf) %>%
      .[["table"]] %>%
      as_tibble() %>%
      arrange(PValue) %>%
      dplyr::select(
        gene_id, gene_name, logFC, logCPM, PValue, FDR, everything()  
      ) %>%
      mutate(
        coef = x,
        bonfP = p.adjust(PValue, "bonf"),
        DE = case_when(
          bonfP < alpha ~ TRUE,
          FDR < alpha & abs(logFC) > minLfc ~ TRUE
        ),
        DE = ifelse(is.na(DE), FALSE, DE)
      )
  }, simplify = FALSE)

Models were fit using the negative-binomial approaches of glmFit(). Top Tables were then obtained using likelihood-ratio tests in glmLRT(). These test the standard \(H_0\) that the true value of the estimated model coefficient is zero. These model coefficients effectively estimate:

the effect of the presence of a mutation, and
the difference in heterozygous and heterozygous mutants

For enrichment testing, genes were initially considered to be DE using:

A Bonferroni-adjusted p-value < 0.01
An FDR-adjusted p-value < 0.01 along with an estimated logFC outside of the range \(\pm \log_2(2)\).

As fewer genes were detected in the comparisons between homozygous and heterozygous mutants, a simple FDR of 0.05 was subsequently chosen for this comparison.

topTables$homozygous %<>%
  mutate(DE = FDR < 0.05)

Using these criteria, the following initial DE gene-sets were defined:

topTables %>%
  lapply(dplyr::filter, DE) %>% 
  vapply(nrow, integer(1)) %>%
  set_names(
    case_when(
      names(.) == "mutant" ~ "psen2 mutant",
      names(.) == "homozygous" ~ "HomVsHet"
    )
  ) %>%
  pander()

psen2 mutant	HomVsHet
615	7

deCols <- c(
  `FALSE` = rgb(0.5, 0.5, 0.5, 0.4), 
  `TRUE` = rgb(1, 0, 0, 0.7)
)

Model Checking

Checks for rRNA impacts

Given the previously identified concerns about variable rRNA removal, correlations were calculated between each gene’s expression values and the proportion of the raw libraries with > 70% GC. Those with the strongest +^ve correlation, and for which the FDR is < 0.01 are shown below.

Most notably, the most correlated RNA with these rRNA proportions was eif1b which is the initiation factor for translation. As this is a protein coding gene with the protein product interacting with the ribosome, the reason for this correlation at the RNA level raises difficult questions which cannot be answered here.

topTables$mutant %>%
  mutate(riboCors = riboCors[gene_id]) %>%
  dplyr::filter(
    FDR < alpha
  ) %>%
  dplyr::select(gene_id, gene_name, logFC, logCPM, FDR, riboCors, DE) %>%
  arrange(desc(riboCors)) %>%
  mutate(FDR = case_when(
    FDR >= 0.0001 ~ sprintf("%.4f", FDR),
    FDR < 0.0001 ~ sprintf("%.2e", FDR)
    )
  ) %>%
  dplyr::slice(1:40) %>%
  pander(
    justify = "llrrrrl",
    style = "rmarkdown",
    caption = paste(
      "The", nrow(.), "genes most correlated with the original high GC content.",
      "Many failed the selection criteria for differential expression,",
      "primarily due to the stringent logFC filter.",
      "However, the unusually high number of ribosomal protein coding genes",
      "is currently difficult to explain, but notable.",
      "Two possibilities are these RNAs are somehow translated with physical",
      "linkage to the ribosomal complex, or that there is genuinely a global",
      "shift in the level of ribosomal activity.",
      "Neither explanation is initially satisfying."
    )
  )

The 40 genes most correlated with the original high GC content. Many failed the selection criteria for differential expression, primarily due to the stringent logFC filter. However, the unusually high number of ribosomal protein coding genes is currently difficult to explain, but notable. Two possibilities are these RNAs are somehow translated with physical linkage to the ribosomal complex, or that there is genuinely a global shift in the level of ribosomal activity. Neither explanation is initially satisfying.
gene_id	gene_name	logFC	logCPM	FDR	riboCors	DE
ENSDARG00000012688	eif1b	0.5229	7.482	0.0024	0.9906	FALSE
ENSDARG00000012871	npepl1	0.6049	4.485	0.0028	0.9823	FALSE
ENSDARG00000103994	ppiab	0.8891	7.372	6.18e-05	0.9814	FALSE
ENSDARG00000034897	rps10	0.8809	5.898	0.0001	0.9795	FALSE
ENSDARG00000068995	h2afx1	0.644	6.605	0.0025	0.9734	FALSE
ENSDARG00000077291	rps2	1.305	7.52	1.58e-07	0.9733	TRUE
ENSDARG00000002240	psmb6	0.6407	4.702	0.0056	0.9733	FALSE
ENSDARG00000036298	rps13	0.6762	6.076	8.26e-05	0.9707	FALSE
ENSDARG00000037071	rps26	0.651	5.323	0.0045	0.97	FALSE
ENSDARG00000057026	ran	0.6265	7.071	0.0002	0.9689	FALSE
ENSDARG00000111753	hist1h4l	0.9775	2.015	0.0061	0.9685	FALSE
ENSDARG00000038028	ndufa6	1.051	4.693	2.57e-05	0.9667	TRUE
ENSDARG00000046157	RPS17	0.7725	6.74	0.0002	0.9667	FALSE
ENSDARG00000078929	abhd16a	0.5358	4.906	0.0013	0.9664	FALSE
ENSDARG00000099226	CABZ01076667.1	0.5407	3.973	0.0008	0.9652	FALSE
ENSDARG00000014690	rps4x	0.6511	7.368	0.0003	0.9649	FALSE
ENSDARG00000092553	slc25a5	0.9012	8.952	2.64e-05	0.9648	TRUE
ENSDARG00000011665	aldoaa	0.6082	7.763	0.0029	0.9639	FALSE
ENSDARG00000045447	slc35g2b	0.766	4.873	0.0002	0.9627	FALSE
ENSDARG00000034534	atp6v1aa	0.6212	7.049	0.0003	0.9618	FALSE
ENSDARG00000099766	myl12.1	0.3841	6.468	0.0058	0.9595	FALSE
ENSDARG00000037962	psmb7	0.7979	4.104	3.30e-05	0.9592	FALSE
ENSDARG00000019230	rpl7a	0.9036	7.71	5.34e-05	0.9591	FALSE
ENSDARG00000099104	rpl24	0.662	6.729	0.0004	0.9571	FALSE
ENSDARG00000055475	rps27.2	0.6175	6.719	0.0019	0.9566	FALSE
ENSDARG00000026322	dhrs13a.1	0.7388	2.804	0.0032	0.9563	FALSE
ENSDARG00000025581	rpl10	1.096	6.381	0.0003	0.9562	TRUE
ENSDARG00000092807	si:dkey-151g10.6	0.8303	6.491	9.37e-07	0.9551	TRUE
ENSDARG00000053365	rpl31	1.215	6.638	8.99e-06	0.9549	TRUE
ENSDARG00000075445	psmb5	0.8405	5.59	5.33e-05	0.9544	FALSE
ENSDARG00000030237	pgrmc2	0.3734	6.21	0.0046	0.9538	FALSE
ENSDARG00000011405	rps9	0.823	6.209	0.0006	0.9537	FALSE
ENSDARG00000035808	clcn4	0.6887	5.678	0.0014	0.9528	FALSE
ENSDARG00000043561	psmc1b	0.6844	4.673	0.0004	0.9517	FALSE
ENSDARG00000017235	eif5a	0.7003	7.388	5.54e-06	0.9514	TRUE
ENSDARG00000034291	rpl37	1.085	5.662	5.03e-06	0.9513	TRUE
ENSDARG00000104173	tufm	0.7123	5.051	0.0013	0.9511	FALSE
ENSDARG00000058451	rpl6	0.7848	7.877	0.0004	0.9497	FALSE
ENSDARG00000014867	rpl8	0.9139	6.716	1.50e-06	0.9493	TRUE
ENSDARG00000089976	spcs3	0.7379	3.54	0.0034	0.949	FALSE

topTables$mutant %>%
  mutate(riboCors = riboCors[gene_id]) %>%
  dplyr::filter(
    FDR < alpha
  ) %>%
  dplyr::select(gene_id, gene_name, logFC, logCPM, FDR, riboCors, DE) %>%
  arrange(riboCors) %>%
  mutate(FDR = case_when(
    FDR >= 0.0001 ~ sprintf("%.4f", FDR),
    FDR < 0.0001 ~ sprintf("%.2e", FDR)
    )
  ) %>%
  dplyr::slice(1:20) %>%
  pander(
    justify = "llrrrrl",
    style = "rmarkdown",
    caption = paste(
      "The", nrow(.), "genes most negatively correlated with the original high GC content.",
      "Most failed the selection criteria for differential expression,",
      "primarily due to the stringent logFC filter.",
      "These genes may be being selected against in the presence of rRNA."
    )
  )

The 20 genes most negatively correlated with the original high GC content. Most failed the selection criteria for differential expression, primarily due to the stringent logFC filter. These genes may be being selected against in the presence of rRNA.
gene_id	gene_name	logFC	logCPM	FDR	riboCors	DE
ENSDARG00000001829	zgc:112982	-0.8984	6.26	0.0010	-0.9826	FALSE
ENSDARG00000100915	si:ch211-255f4.7	-0.6696	3.31	0.0015	-0.9652	FALSE
ENSDARG00000033889	spty2d1	-0.8317	5.462	0.0004	-0.9611	FALSE
ENSDARG00000019223	arglu1a	-0.8874	7.854	0.0028	-0.9602	FALSE
ENSDARG00000059097	nktr	-1.657	6.727	0.0002	-0.9581	TRUE
ENSDARG00000059768	gpatch8	-0.9409	7.864	0.0011	-0.9526	FALSE
ENSDARG00000075380	kri1	-0.778	4.336	0.0006	-0.9496	FALSE
ENSDARG00000001162	matr3l1.1	-0.6702	6.855	0.0003	-0.9495	FALSE
ENSDARG00000014825	si:dkeyp-69b9.6	-0.4038	6.53	0.0061	-0.9441	FALSE
ENSDARG00000077060	rbm6	-0.5108	5.891	0.0037	-0.9438	FALSE
ENSDARG00000104840	si:dkey-28a3.2	-0.472	4.533	0.0019	-0.9435	FALSE
ENSDARG00000069855	pnisr	-0.7195	7.192	0.0056	-0.9434	FALSE
ENSDARG00000007444	ift81	-0.6065	3.6	0.0009	-0.9396	FALSE
ENSDARG00000097086	si:dkey-172k15.4	-0.7701	3.038	0.0028	-0.9391	FALSE
ENSDARG00000012763	arl13b	-0.8101	4.787	2.80e-05	-0.9383	FALSE
ENSDARG00000061490	u2surp	-1.191	6.981	4.01e-06	-0.933	TRUE
ENSDARG00000044625	pcf11	-0.879	7.04	3.29e-05	-0.9312	FALSE
ENSDARG00000060361	cep104	-0.6092	4.35	0.0026	-0.9281	FALSE
ENSDARG00000051953	ythdc1	-0.7477	7.073	0.0007	-0.9263	FALSE
ENSDARG00000104682	tpm2	-0.4561	5.335	0.0028	-0.9237	FALSE

GC and Length Bias

topTables %>%
    bind_rows() %>%
    mutate(stat = -sign(logFC)*log10(PValue)) %>%
    ggplot(aes(gc_content, stat)) +
    geom_point(aes(colour = DE), alpha = 0.4) +
    geom_smooth(se = FALSE) +
    facet_wrap(~coef, labeller = contLabeller)  +
    labs(
        x = "GC content (%)",
        y = "Ranking Statistic"
    ) +
    coord_cartesian(ylim = c(-10, 10)) +
    scale_colour_manual(values = deCols) +
    theme(legend.position = "none")

Checks for GC bias in differential expression. GC content is shown against the ranking statistic, using -log10(p) multiplied by the sign of log fold-change. A small amount of bias was noted particularly in the comparison between mutants and wild-type.

Version	Author	Date
96d8cc7	Steve Ped	2020-01-25
be95a60	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22
71b8832	Steve Ped	2020-01-21

topTables %>%
    bind_rows() %>%
    mutate(stat = -sign(logFC)*log10(PValue)) %>%
    ggplot(aes(length, stat)) +
    geom_point(aes(colour = DE), alpha = 0.4) +
    geom_smooth(se = FALSE) +
    facet_wrap(~coef, labeller = contLabeller)  +
    labs(
        x = "Gene Length (bp)",
        y = "Ranking Statistic"
    ) +
    coord_cartesian(ylim = c(-10, 10)) +
    scale_x_log10(labels = comma) +
    scale_colour_manual(values = deCols) +
    theme(legend.position = "none")

Checks for length bias in differential expression. Gene length is shown against the ranking statistic, using -log10(p) multiplied by the sign of log fold-change. Again, a small amount of bias was noted particularly in the comparison between mutants and wild-type.

Version	Author	Date
96d8cc7	Steve Ped	2020-01-25
be95a60	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22
71b8832	Steve Ped	2020-01-21

Checks for both GC and length bias on differential expression showed that a small bias remained evident, despite using conditional-quantile normalisation. However, an alternative analysis on the same dataset excluding the CQN steps revealed vastly different and exaggerated bias. As such, the impact of CQN normalisation was considered to be appropriate.

Other Artefacts

topTables %>%
  bind_rows() %>%
  arrange(DE) %>%
  ggplot(aes(logCPM, logFC)) +
  geom_point(aes(colour = DE)) +
  geom_text_repel(
    aes(label = gene_name, colour = DE),
    data = . %>% dplyr::filter(DE & abs(logFC) > 3)
  ) +
  geom_text_repel(
    aes(label = gene_name, colour = DE),
    data = . %>% dplyr::filter(FDR < 0.05 & coef == "homozygous")
  ) +
  geom_smooth(se = FALSE) +
  geom_hline(
    yintercept = c(-1, 1)*minLfc,
    linetype = 2,
    colour = "red"
  ) +
  facet_wrap(~coef, nrow = 2, labeller = contLabeller) +
  scale_y_continuous(breaks = seq(-8, 8, by = 2)) +
  scale_colour_manual(values = deCols) +
  theme(legend.position = "none")

MA plots checking for any logFC bias across the range of expression values. The small curve in the average at the low end of expression values was considered to be an artefact of the sparse points at this end. Initial DE genes are shown in red, with select points labelled.

Version	Author	Date
96d8cc7	Steve Ped	2020-01-25
be95a60	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22
71b8832	Steve Ped	2020-01-21
01512da	Steve Ped	2020-01-21

topTables %>%
  bind_rows() %>%
  ggplot(aes(PValue)) +
  geom_histogram(
    binwidth = 0.02,
    colour = "black", fill = "grey90"
    ) +
  facet_wrap(~coef, labeller = contLabeller)

Histograms of p-values for both sets of coefficients. Values for the difference between mutants follow the expected distribution for when there are very few differences, whilst values for the presence of a mutant allele show the expected distribution for when there are many differences. In particular the spike near zero indicates many genes which are differentially expressed.

Version	Author	Date
96d8cc7	Steve Ped	2020-01-25
be95a60	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25

Results

topTables %>%
    bind_rows() %>%
    ggplot(aes(logFC, -log10(PValue), colour = DE)) +
    geom_point(alpha = 0.4) +
    geom_text_repel(
        aes(label = gene_name),
        data = . %>% dplyr::filter(PValue < 1e-13 | abs(logFC) > 4)
    ) +
    geom_text_repel(
        aes(label = gene_name),
        data = . %>% dplyr::filter(FDR < 0.05 & coef == "homozygous")
    ) +
  geom_vline(
    xintercept = c(-1, 1)*minLfc,
    linetype = 2,
    colour = "red"
  ) +
  facet_wrap(~coef, nrow = 1, labeller = contLabeller) +
  scale_colour_manual(values = deCols) +
  scale_x_continuous(breaks = seq(-8, 8, by = 2)) +
  theme(legend.position = "none")

Volcano Plots showing DE genes against logFC.

Version	Author	Date
68033fe	Steve Ped	2020-01-26
96d8cc7	Steve Ped	2020-01-25
be95a60	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22
71b8832	Steve Ped	2020-01-21
01512da	Steve Ped	2020-01-21

deGenes <- topTables %>%
  lapply(dplyr::filter, DE) %>%
  lapply(magrittr::extract2, "gene_id")

Genes Tracking with psen2^S4Ter

n <- 40

A set of 615 genes was identified as DE in the presence of the mutant psen2^S4Ter transcript. The most highly ranked 40 genes are shown in the following heatmap. The presence of SNORNA70 in this list is of some concern as this is an ncRNA which is known to interact with ribosomes. In particular the expression pattern follows very closely the patterns previosuly identified for rRNA depletion using the percentage of libraries with GC > 70. For example, WT_86 was the sample with most complete rRNA depletion whilst, Het_84 showed the largest proportion of the library as being derived from rRNA. These represent the samples with lowest and highest expression of SNORNA70 respectively.

genoCols <- RColorBrewer::brewer.pal(3, "Set1") %>%
  setNames(levels(dgeList$samples$genotype))
fit$fitted.values %>%
  cpm(log = TRUE) %>% 
  extract(deGenes$mutant[seq_len(n)],) %>%
  as.data.frame() %>%
  set_rownames(unlist(geneLabeller(rownames(.)))) %>%
  pheatmap::pheatmap(
    color = viridis_pal(option = "magma")(100),
    labels_col = colnames(.) %>%
      str_replace_all(".+(Het|Hom|WT).+F3(_[0-9]{2}).+", "\\1\\2"),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    annotation_col = dgeList$samples %>%
      dplyr::select(Genotype = genotype),
    annotation_names_col = FALSE,
    annotation_colors = list(Genotype = genoCols),
    cutree_cols = 2,
    cutree_rows = 6
  )

The 40 most highly-ranked genes by FDR which are commonly considered DE between mutants and WT samples. Plotted values are logCPM based on normalised counts.

Version	Author	Date
68033fe	Steve Ped	2020-01-26
96d8cc7	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25
1858f49	Steve Ped	2020-01-22

Genes showing differences between mutants

When inspecting the genes showing differences between mutants, it was noted that the WT and Homozygous mutant samples clustered together, implying that there was a unique effect on these genes which was specific to the heterozygous condition.

fit$fitted.values %>%
  cpm(log = TRUE) %>% 
  extract(deGenes$homozygous,) %>%
  as.data.frame() %>%
  set_rownames(unlist(geneLabeller(rownames(.)))) %>%
  pheatmap::pheatmap(
    color = viridis_pal(option = "magma")(100),
    labels_col = colnames(.) %>%
      str_replace_all(".+(Het|Hom|WT).+F3(_[0-9]{2}).+", "\\1\\2"),
    legend_breaks = c(seq(-2, 8, by = 2), max(.)),
    legend_labels = c(seq(-2, 8, by = 2), "logCPM\n"),
    annotation_col = dgeList$samples %>%
      dplyr::select(Genotype = genotype),
    annotation_names_col = FALSE,
    annotation_colors = list(Genotype = genoCols),
    cutree_cols = 2,
    cutree_rows = 3
  )

The 7 most highly-ranked genes by FDR which are DE between between mutants. Plotted values are logCPM based on normalised counts.

Version	Author	Date
68033fe	Steve Ped	2020-01-26
96d8cc7	Steve Ped	2020-01-25
7251bd2	Steve Ped	2020-01-25

Data Export

Final gene lists were exported as separate csv files, along with fit and dgeList objects.

write_csv(topTables$mutant, "output/psen2VsWT.csv")
write_csv(topTables$homozygous, "output/psen2HomVsHet.csv")
write_rds(fit, "data/fit.rds", compress = "gz")
write_rds(dgeList, "data/dgeList.rds", compress = "gz")

devtools::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_AU:en                    
 collate  en_AU.UTF-8                 
 ctype    en_AU.UTF-8                 
 tz       Australia/Adelaide          
 date     2020-02-17                  

─ Packages ───────────────────────────────────────────────────────────────────
 package                * version  date       lib source        
 AnnotationDbi          * 1.48.0   2019-10-29 [2] Bioconductor  
 AnnotationFilter       * 1.10.0   2019-10-29 [2] Bioconductor  
 AnnotationHub          * 2.18.0   2019-10-29 [2] Bioconductor  
 askpass                  1.1      2019-01-13 [2] CRAN (R 3.6.0)
 assertthat               0.2.1    2019-03-21 [2] CRAN (R 3.6.0)
 backports                1.1.5    2019-10-02 [2] CRAN (R 3.6.1)
 Biobase                * 2.46.0   2019-10-29 [2] Bioconductor  
 BiocFileCache          * 1.10.2   2019-11-08 [2] Bioconductor  
 BiocGenerics           * 0.32.0   2019-10-29 [2] Bioconductor  
 BiocManager              1.30.10  2019-11-16 [2] CRAN (R 3.6.1)
 BiocParallel             1.20.1   2019-12-21 [2] Bioconductor  
 BiocVersion              3.10.1   2019-06-06 [2] Bioconductor  
 biomaRt                  2.42.0   2019-10-29 [2] Bioconductor  
 Biostrings               2.54.0   2019-10-29 [2] Bioconductor  
 bit                      1.1-15.1 2020-01-14 [2] CRAN (R 3.6.2)
 bit64                    0.9-7    2017-05-08 [2] CRAN (R 3.6.0)
 bitops                   1.0-6    2013-08-17 [2] CRAN (R 3.6.0)
 blob                     1.2.1    2020-01-20 [2] CRAN (R 3.6.2)
 broom                    0.5.4    2020-01-27 [2] CRAN (R 3.6.2)
 callr                    3.4.1    2020-01-24 [2] CRAN (R 3.6.2)
 cellranger               1.1.0    2016-07-27 [2] CRAN (R 3.6.0)
 cli                      2.0.1    2020-01-08 [2] CRAN (R 3.6.2)
 cluster                  2.1.0    2019-06-19 [2] CRAN (R 3.6.1)
 colorspace               1.4-1    2019-03-18 [2] CRAN (R 3.6.0)
 cqn                    * 1.32.0   2019-10-29 [2] Bioconductor  
 crayon                   1.3.4    2017-09-16 [2] CRAN (R 3.6.0)
 curl                     4.3      2019-12-02 [2] CRAN (R 3.6.2)
 data.table               1.12.8   2019-12-09 [2] CRAN (R 3.6.2)
 DBI                      1.1.0    2019-12-15 [2] CRAN (R 3.6.2)
 dbplyr                 * 1.4.2    2019-06-17 [2] CRAN (R 3.6.0)
 DelayedArray             0.12.2   2020-01-06 [2] Bioconductor  
 desc                     1.2.0    2018-05-01 [2] CRAN (R 3.6.0)
 devtools                 2.2.1    2019-09-24 [2] CRAN (R 3.6.1)
 digest                   0.6.23   2019-11-23 [2] CRAN (R 3.6.1)
 dplyr                  * 0.8.4    2020-01-31 [2] CRAN (R 3.6.2)
 edgeR                  * 3.28.0   2019-10-29 [2] Bioconductor  
 ellipsis                 0.3.0    2019-09-20 [2] CRAN (R 3.6.1)
 ensembldb              * 2.10.2   2019-11-20 [2] Bioconductor  
 evaluate                 0.14     2019-05-28 [2] CRAN (R 3.6.0)
 FactoMineR               2.1      2020-01-17 [2] CRAN (R 3.6.2)
 fansi                    0.4.1    2020-01-08 [2] CRAN (R 3.6.2)
 farver                   2.0.3    2020-01-16 [2] CRAN (R 3.6.2)
 fastmap                  1.0.1    2019-10-08 [2] CRAN (R 3.6.1)
 flashClust               1.01-2   2012-08-21 [2] CRAN (R 3.6.1)
 forcats                * 0.4.0    2019-02-17 [2] CRAN (R 3.6.0)
 fs                       1.3.1    2019-05-06 [2] CRAN (R 3.6.0)
 generics                 0.0.2    2018-11-29 [2] CRAN (R 3.6.0)
 GenomeInfoDb           * 1.22.0   2019-10-29 [2] Bioconductor  
 GenomeInfoDbData         1.2.2    2019-11-21 [2] Bioconductor  
 GenomicAlignments        1.22.1   2019-11-12 [2] Bioconductor  
 GenomicFeatures        * 1.38.1   2020-01-22 [2] Bioconductor  
 GenomicRanges          * 1.38.0   2019-10-29 [2] Bioconductor  
 ggdendro                 0.1-20   2016-04-27 [2] CRAN (R 3.6.0)
 ggplot2                * 3.2.1    2019-08-10 [2] CRAN (R 3.6.1)
 ggrepel                * 0.8.1    2019-05-07 [2] CRAN (R 3.6.0)
 git2r                    0.26.1   2019-06-29 [2] CRAN (R 3.6.1)
 glue                     1.3.1    2019-03-12 [2] CRAN (R 3.6.0)
 gtable                   0.3.0    2019-03-25 [2] CRAN (R 3.6.0)
 haven                    2.2.0    2019-11-08 [2] CRAN (R 3.6.1)
 highr                    0.8      2019-03-20 [2] CRAN (R 3.6.0)
 hms                      0.5.3    2020-01-08 [2] CRAN (R 3.6.2)
 htmltools                0.4.0    2019-10-04 [2] CRAN (R 3.6.1)
 htmlwidgets              1.5.1    2019-10-08 [2] CRAN (R 3.6.1)
 httpuv                   1.5.2    2019-09-11 [2] CRAN (R 3.6.1)
 httr                     1.4.1    2019-08-05 [2] CRAN (R 3.6.1)
 hwriter                  1.3.2    2014-09-10 [2] CRAN (R 3.6.0)
 interactiveDisplayBase   1.24.0   2019-10-29 [2] Bioconductor  
 IRanges                * 2.20.2   2020-01-13 [2] Bioconductor  
 jpeg                     0.1-8.1  2019-10-24 [2] CRAN (R 3.6.1)
 jsonlite                 1.6      2018-12-07 [2] CRAN (R 3.6.0)
 kableExtra               1.1.0    2019-03-16 [2] CRAN (R 3.6.1)
 knitr                    1.27     2020-01-16 [2] CRAN (R 3.6.2)
 labeling                 0.3      2014-08-23 [2] CRAN (R 3.6.0)
 later                    1.0.0    2019-10-04 [2] CRAN (R 3.6.1)
 lattice                  0.20-38  2018-11-04 [4] CRAN (R 3.6.0)
 latticeExtra             0.6-29   2019-12-19 [2] CRAN (R 3.6.2)
 lazyeval                 0.2.2    2019-03-15 [2] CRAN (R 3.6.0)
 leaps                    3.1      2020-01-16 [2] CRAN (R 3.6.2)
 lifecycle                0.1.0    2019-08-01 [2] CRAN (R 3.6.1)
 limma                  * 3.42.1   2020-01-29 [2] Bioconductor  
 locfit                   1.5-9.1  2013-04-20 [2] CRAN (R 3.6.0)
 lubridate                1.7.4    2018-04-11 [2] CRAN (R 3.6.0)
 magrittr               * 1.5      2014-11-22 [2] CRAN (R 3.6.0)
 MASS                     7.3-51.5 2019-12-20 [4] CRAN (R 3.6.2)
 Matrix                   1.2-18   2019-11-27 [2] CRAN (R 3.6.1)
 MatrixModels             0.4-1    2015-08-22 [2] CRAN (R 3.6.1)
 matrixStats              0.55.0   2019-09-07 [2] CRAN (R 3.6.1)
 mclust                 * 5.4.5    2019-07-08 [2] CRAN (R 3.6.1)
 memoise                  1.1.0    2017-04-21 [2] CRAN (R 3.6.0)
 mgcv                     1.8-31   2019-11-09 [4] CRAN (R 3.6.1)
 mime                     0.8      2019-12-19 [2] CRAN (R 3.6.2)
 modelr                   0.1.5    2019-08-08 [2] CRAN (R 3.6.1)
 munsell                  0.5.0    2018-06-12 [2] CRAN (R 3.6.0)
 ngsReports             * 1.1.2    2019-10-16 [1] Bioconductor  
 nlme                     3.1-144  2020-02-06 [4] CRAN (R 3.6.2)
 nor1mix                * 1.3-0    2019-06-13 [2] CRAN (R 3.6.1)
 openssl                  1.4.1    2019-07-18 [2] CRAN (R 3.6.1)
 pander                 * 0.6.3    2018-11-06 [2] CRAN (R 3.6.0)
 pheatmap               * 1.0.12   2019-01-04 [2] CRAN (R 3.6.0)
 pillar                   1.4.3    2019-12-20 [2] CRAN (R 3.6.2)
 pkgbuild                 1.0.6    2019-10-09 [2] CRAN (R 3.6.1)
 pkgconfig                2.0.3    2019-09-22 [2] CRAN (R 3.6.1)
 pkgload                  1.0.2    2018-10-29 [2] CRAN (R 3.6.0)
 plotly                   4.9.1    2019-11-07 [2] CRAN (R 3.6.1)
 plyr                     1.8.5    2019-12-10 [2] CRAN (R 3.6.2)
 png                      0.1-7    2013-12-03 [2] CRAN (R 3.6.0)
 preprocessCore         * 1.48.0   2019-10-29 [2] Bioconductor  
 prettyunits              1.1.1    2020-01-24 [2] CRAN (R 3.6.2)
 processx                 3.4.1    2019-07-18 [2] CRAN (R 3.6.1)
 progress                 1.2.2    2019-05-16 [2] CRAN (R 3.6.0)
 promises                 1.1.0    2019-10-04 [2] CRAN (R 3.6.1)
 ProtGenerics             1.18.0   2019-10-29 [2] Bioconductor  
 ps                       1.3.0    2018-12-21 [2] CRAN (R 3.6.0)
 purrr                  * 0.3.3    2019-10-18 [2] CRAN (R 3.6.1)
 quantreg               * 5.54     2019-12-13 [2] CRAN (R 3.6.2)
 R6                       2.4.1    2019-11-12 [2] CRAN (R 3.6.1)
 rappdirs                 0.3.1    2016-03-28 [2] CRAN (R 3.6.0)
 RColorBrewer           * 1.1-2    2014-12-07 [2] CRAN (R 3.6.0)
 Rcpp                     1.0.3    2019-11-08 [2] CRAN (R 3.6.1)
 RCurl                    1.98-1.1 2020-01-19 [2] CRAN (R 3.6.2)
 readr                  * 1.3.1    2018-12-21 [2] CRAN (R 3.6.0)
 readxl                   1.3.1    2019-03-13 [2] CRAN (R 3.6.0)
 remotes                  2.1.0    2019-06-24 [2] CRAN (R 3.6.0)
 reprex                   0.3.0    2019-05-16 [2] CRAN (R 3.6.0)
 reshape2                 1.4.3    2017-12-11 [2] CRAN (R 3.6.0)
 rlang                    0.4.4    2020-01-28 [2] CRAN (R 3.6.2)
 rmarkdown                2.1      2020-01-20 [2] CRAN (R 3.6.2)
 rprojroot                1.3-2    2018-01-03 [2] CRAN (R 3.6.0)
 Rsamtools                2.2.1    2019-11-06 [2] Bioconductor  
 RSQLite                  2.2.0    2020-01-07 [2] CRAN (R 3.6.2)
 rstudioapi               0.10     2019-03-19 [2] CRAN (R 3.6.0)
 rtracklayer              1.46.0   2019-10-29 [2] Bioconductor  
 rvest                    0.3.5    2019-11-08 [2] CRAN (R 3.6.1)
 S4Vectors              * 0.24.3   2020-01-18 [2] Bioconductor  
 scales                 * 1.1.0    2019-11-18 [2] CRAN (R 3.6.1)
 scatterplot3d            0.3-41   2018-03-14 [2] CRAN (R 3.6.1)
 sessioninfo              1.1.1    2018-11-05 [2] CRAN (R 3.6.0)
 shiny                    1.4.0    2019-10-10 [2] CRAN (R 3.6.1)
 ShortRead                1.44.1   2019-12-19 [2] Bioconductor  
 SparseM                * 1.78     2019-12-13 [2] CRAN (R 3.6.2)
 stringi                  1.4.5    2020-01-11 [2] CRAN (R 3.6.2)
 stringr                * 1.4.0    2019-02-10 [2] CRAN (R 3.6.0)
 SummarizedExperiment     1.16.1   2019-12-19 [2] Bioconductor  
 testthat                 2.3.1    2019-12-01 [2] CRAN (R 3.6.1)
 tibble                 * 2.1.3    2019-06-06 [2] CRAN (R 3.6.0)
 tidyr                  * 1.0.2    2020-01-24 [2] CRAN (R 3.6.2)
 tidyselect               1.0.0    2020-01-27 [2] CRAN (R 3.6.2)
 tidyverse              * 1.3.0    2019-11-21 [2] CRAN (R 3.6.1)
 truncnorm                1.0-8    2018-02-27 [2] CRAN (R 3.6.0)
 usethis                  1.5.1    2019-07-04 [2] CRAN (R 3.6.1)
 vctrs                    0.2.2    2020-01-24 [2] CRAN (R 3.6.2)
 viridisLite              0.3.0    2018-02-01 [2] CRAN (R 3.6.0)
 webshot                  0.5.2    2019-11-22 [2] CRAN (R 3.6.1)
 whisker                  0.4      2019-08-28 [2] CRAN (R 3.6.1)
 withr                    2.1.2    2018-03-15 [2] CRAN (R 3.6.0)
 workflowr              * 1.6.0    2019-12-19 [2] CRAN (R 3.6.2)
 xfun                     0.12     2020-01-13 [2] CRAN (R 3.6.2)
 XML                      3.99-0.3 2020-01-20 [2] CRAN (R 3.6.2)
 xml2                     1.2.2    2019-08-09 [2] CRAN (R 3.6.1)
 xtable                   1.8-4    2019-04-21 [2] CRAN (R 3.6.0)
 XVector                  0.26.0   2019-10-29 [2] Bioconductor  
 yaml                     2.2.1    2020-02-01 [2] CRAN (R 3.6.2)
 zlibbioc                 1.32.0   2019-10-29 [2] Bioconductor  
 zoo                      1.8-7    2020-01-10 [2] CRAN (R 3.6.2)

[1] /home/steveped/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

Differential Expression

Steve Pederson

17 February, 2020

Setup

Annotations

Count Data

Additional Functions

Correlations with rRNA contaminants

Analysis

PCA

Model Description

Normalisation

Model Fitting

Model Checking

Checks for rRNA impacts

GC and Length Bias

Other Artefacts

Results

Genes Tracking with psen2^S4Ter

Genes showing differences between mutants

Data Export

Differential Expression

Steve Pederson

17 February, 2020

Setup

Annotations

Count Data

Additional Functions

Correlations with rRNA contaminants

Analysis

PCA

Model Description

Normalisation

Model Fitting

Model Checking

Checks for rRNA impacts

GC and Length Bias

Other Artefacts

Results

Genes Tracking with psen2S4Ter

Genes showing differences between mutants

Data Export

Genes Tracking with psen2^S4Ter