Last updated: 2023-03-14

Checks: 7 0

Knit directory: NMD-analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20230314) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 780272b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  NMD-DEG-limma.xlsx
    Untracked:  data/LTK_Sample Metafile_V3.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/DEG-analysis.Rmd) and HTML (docs/DEG-analysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 780272b unawaz1996 2023-03-14 Build site.
html 5a8cba6 unawaz1996 2023-03-14 Build site.
Rmd 4856ece unawaz1996 2023-03-14 Add DEG analysis

Introduction

source("code/plots.R") 
source("code/functions.R")
source("code/libraries.R")
# Quality control of genes 
salmon.files = ("/home/neuro/Documents/NMD_analysis/Analysis/Results/Quantification/Salmon")
md.files = "data/LTK_Sample Metafile_V3.txt"

# Loading data
txdf = transcripts(EnsDb.Mmusculus.v79, return.type="DataFrame")
tx2gene = as.data.frame(txdf[,c("tx_id","gene_id")])

salmon = list.files(salmon.files, pattern = "transcripts$", full.names = TRUE)
all_files = file.path(salmon, "quant.sf")
names(all_files) <- paste(c(212:229))

txi_genes <- tximport(all_files, type="salmon", txOut=FALSE,
                      countsFromAbundance="scaledTPM", tx2gene = tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)


md = read.table(md.files, header= TRUE) %>%  mutate(files = file.path(salmon, "quant.sf"))
md$Group = factor(md$Group)

QC analysis

Detection of gene expression

group.labs <- c("Control", "UPF3A KD", "UPF3A OE", "UPF3B KD", "Double KD", "UPF3A OE, UPF3B KD")
names(group.labs) <- unique(md$Group)

# upf3a
txi_genes$abundance["ENSMUSG00000038398",] %>% melt() %>% 
    rownames_to_column("sample") %>% cbind(md) %>% 
    ggplot(aes(x=as.character(Sample), y= value, fill = Group)) + 
    geom_bar(stat="identity", width = 4, color = "black", alpha = 0.9) +
    facet_grid(~Group, labeller = labeller(Group = group.labs) ) + ylab("tpm") + theme_bw() + xlab("") + 
    scale_fill_manual(values = c("#9D5D7C",  "#8B9488", "#657FA2", "#8483AD", "#8D779E", "#3A4170"),
                       labels = c("UPF3A_KD_UPF3B_KD" = "Double KD", "UPF3A_KD" = "UPF3A KD", 
                                  "UPF3B_KD" = "UPF3B KD", "UPF3A_OE" = "UPF3A OE", 
                                  "UPF3A_OE_UPF3B_KD" = "UPF3A OE UPF3B KD")) +
    theme(legend.position = "none", axis.text.x=element_blank(),
          strip.background =element_rect(fill="#D3D5E3", color = "black"))

Version Author Date
5a8cba6 unawaz1996 2023-03-14
#upf3b 
txi_genes$abundance["ENSMUSG00000036572",] %>% melt() %>% 
    rownames_to_column("sample") %>% cbind(md) %>% 
    ggplot(aes(x=as.character(Sample), y= value, fill = Group)) + 
    geom_bar(stat="identity", width = 4, color = "black", alpha = 0.9) +
    facet_grid(~Group, labeller = labeller(Group = group.labs))  + 
    ylab("tpm") + theme_bw() + xlab("") + 
    scale_fill_manual(values = c("#9D5D7C",  "#8B9488", "#657FA2", "#8483AD", "#8D779E", "#3A4170")) +
    theme(legend.position = "none", axis.text.x=element_blank(),
          strip.background =element_rect(fill="#D3D5E3", color = "black"))

Version Author Date
5a8cba6 unawaz1996 2023-03-14

Filtering

keep = (rowSums(txi_genes$abundance >= .5 ) >= 3)

txi_genes$counts %>% cpm(log=TRUE) %>%
    melt() %>%
    dplyr::filter(is.finite(value)) %>% 
    ggplot(aes(x=value, color = as.character(Var2))) +
    geom_density() +
    ggtitle("Before filtering") +
    labs(x = "logCPM", y = "Proportion of Genes") + 
    theme_bw() +
    theme(legend.position = "none")

Version Author Date
5a8cba6 unawaz1996 2023-03-14
txi_genes$counts %>%
    cpm(log=TRUE) %>%
    magrittr::extract(keep,) %>% 
    melt() %>%
    dplyr::filter(is.finite(value)) %>% 
    ggplot(aes(x=value, color = as.character(Var2))) +
    geom_density() +
    ggtitle("After filtering") +
    labs(x = "logCPM", y = "Proportion of Genes") + theme_bw() +
    theme(legend.position = "none")

Version Author Date
5a8cba6 unawaz1996 2023-03-14

Another level of filtering I’d like to do here is remove all genes that don’t have an annotation

txi_genes_filtered = txi_genes$counts[keep,] %>%  
    as.data.frame() %>% 
    rownames_to_column("geneID") %>%
    mutate(geneName = mapIds(org.Mm.eg.db, keys=.$geneID,
                             column="SYMBOL",keytype="ENSEMBL",
                         multiVals="first")) %>%
    drop_na(geneName) %>% 
    column_to_rownames("geneID") %>% dplyr::select(-geneName)

PCA

PC = log2(txi_genes$abundance[keep,]  + 10^-6 ) %>%
    t() %>% 
    prcomp(scale = TRUE)

PC$x[,1:2] %>% 
    as.data.frame()  %>%
    cbind(md) %>%
    ggplot(aes(PC1, PC2, color = Group, shape = Condition_UPF3B, label = Condition_UPF3B)) +
    geom_point(size = 11.5) +
    theme_bw() + scale_shape_manual(values = c(16,15)) + labs(shape="UPF3B status", colour= "Samples") +
    theme(axis.title.y = element_text(size = 12), axis.title.x = element_text(size = 12), 
          axis.text = element_text(size = 10)) +
    xlab(paste0("Principal Component 1 (28.1%)")) + ylab("Principal Component 2 (16.6%)") +
    scale_color_manual(values = c("#874257",  "#8B9488", "#657FA2", "#8483AD", "#C2A8BE", "#3A4170"),
                      labels = c("UPF3A_KD_UPF3B_KD" = "Double KD", "UPF3A_KD" = "UPF3A KD", 
                                                                   "UPF3B_KD" = "UPF3B KD", "UPF3A_OE" = "UPF3A OE", 
                                                                   "UPF3A_OE_UPF3B_KD" = "UPF3A OE UPF3B KD"))

Version Author Date
5a8cba6 unawaz1996 2023-03-14

Boxplots of gene expression and library size

txi_genes$abundance[keep,] %>% 
    as.data.frame() %>% 
    gather(sample,value) %>% 
    ggplot(aes(x=sample, y = log(value), fill = sample)) + geom_boxplot() +
    ylab("Log(TPM)") + xlab("Samples") +
    theme_bw() + theme(legend.position = "none") + scale_fill_manual(values = hydrangea[1:18])

Version Author Date
5a8cba6 unawaz1996 2023-03-14
# Library size 

countsSum <- colSums(txi_genes$counts) # library size
countsSum %>% melt() %>% rownames_to_column("Sample") %>% 
    ggplot(aes(x=Sample, y = value)) + geom_bar(stat = "identity") + ylab("Library size") +
    theme_bw()

Version Author Date
5a8cba6 unawaz1996 2023-03-14
#max(countsSum) / min(countsSum)

Differential gene expression using limma

y <- DGEList(txi_genes_filtered)


design <- model.matrix(~0+Group, data = md) %>% 
    set_colnames(gsub(pattern = "Group", replacement ="", x = colnames(.)))

contrasts <- makeContrasts(levels = colnames(design), 
                           UPF3B_vs_Control = UPF3B_KD - Control, 
                           UPF3A_vs_Control = UPF3A_KD - Control, 
                           UPF3A_OE_vs_Control = UPF3A_OE - Control, 
                           DoubleKD_vs_Control = UPF3A_KD_UPF3B_KD - Control, 
                           UPF3A_OE_UPF3B_KD_vs_Control = UPF3A_OE_UPF3B_KD - Control)
y <- calcNormFactors(y)
v <- voom(y, design)

fit = lmFit(v, design) %>% 
    contrasts.fit(contrasts) %>% 
    eBayes()
results <- decideTests(fit, p.value = 0.05, adjust.method = "fdr", method= "global", lfc = log2(1.5))
limma_results = write_fit(fit, results= results, adjust = "fdr") %>% 
    rownames_to_column("ensembl_gene_id") %>% 
    mutate(gene = mapIds(org.Mm.eg.db, keys=ensembl_gene_id,  column="SYMBOL",keytype="ENSEMBL", multiVals="first"), 
           entrez = mapIds(org.Mm.eg.db, keys=ensembl_gene_id,  column="ENTREZID",keytype="ENSEMBL", multiVals="first")) %>% drop_na(entrez)


#save(limma_results, file="/home/neuro/Documents/NMD_analysis/Analysis/Results/DEGs/DEG-limma-results.Rda")
#save(results, fit,y,v,contrasts, design, contrasts,  file = "/home/neuro/Documents/NMD_analysis/Analysis/Results/DEGs/limma-matrices.Rda")
DEGenes <- list()
DEGenes[["UPF3B_vs_Control"]] <- limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0, abs(Coef.UPF3B_vs_Control) > log2(1.5))
DEGenes[["UPF3A_vs_Control"]] <- limma_results %>%
    dplyr::filter(Res.UPF3A_vs_Control != 0, abs(Coef.UPF3A_vs_Control) > log2(1.5))
DEGenes[["UPF3A_OE_vs_Control"]] <- limma_results %>%
    dplyr::filter(Res.UPF3A_OE_vs_Control != 0, abs(Coef.UPF3A_OE_vs_Control) > log2(1.5)) %>%
    arrange(p.value.UPF3A_OE_vs_Control)
DEGenes[["DoubleKD_vs_Control"]] <- limma_results %>%
    dplyr::filter(Res.DoubleKD_vs_Control != 0, abs(Coef.DoubleKD_vs_Control) > log2(1.5)) %>%
    arrange(p.value.DoubleKD_vs_Control)
DEGenes[["UPF3A_OE_UPF3B_KD_vs_Control"]] <- limma_results %>%
    dplyr::filter(Res.UPF3A_OE_UPF3B_KD_vs_Control != 0, abs(Coef.UPF3A_OE_UPF3B_KD_vs_Control) > log2(1.5)) %>%
    arrange(p.value.UPF3A_OE_UPF3B_KD_vs_Control)

write.xlsx(DEGenes, "NMD-DEG-limma.xlsx")

#save(DEGenes, file="/home/neuro/Documents/NMD_analysis/Analysis/Results/DEGs/DEG-list.Rda")

Total number of DEGs

summary(results)
       UPF3B_vs_Control UPF3A_vs_Control UPF3A_OE_vs_Control
Down                 95               58                  24
NotSig            10564            10568               10926
Up                  298              331                   7
       DoubleKD_vs_Control UPF3A_OE_UPF3B_KD_vs_Control
Down                   227                          125
NotSig               10016                        10626
Up                     714                          206
summary(results) %>% melt() %>% 
    dplyr::filter(Var1 != "NotSig") %>% 
    ggplot(aes(x=reorder(Var2, -value), y = value, fill=Var1)) + 
    geom_bar(stat = "identity",   colour = "white") + 
    theme_bw() + 
    theme(axis.text.x = element_text(angle = 70,  hjust=1, face = "bold"), 
          legend.position = "bottom", 
          legend.title= element_blank(), axis.title.y = element_text(size = 12)) + 
    scale_fill_manual(values = c("#0B032D", "#E56B6F")) +
    ylab("No. DEGs") + xlab("") + scale_x_discrete(labels=c("UPF3A_KD_UPF3B_KD_vs_Control" = "Double KD", "UPF3A_KD_vs_Control" = "UPF3A KD", 
                                                            "UPF3B_KD_vs_Control" = "UPF3B KD", "UPF3A_OE_vs_Control" = "UPF3A OE", 
                                                            "UPF3A_OE_UPF3B_KD_vs_Control" = "UPF3A OE UPF3B KD"))

Version Author Date
5a8cba6 unawaz1996 2023-03-14

After performing pairwise differential gene expression analysis, we can see that majority of all significantly expressed genes in all contrasts are upregulated. Overall, we find that a total of 395 genes are differentially expressed between UPF3B KD and controls. 388 genes are differentially expressed between UPF3A KD vs Controls.

Overlapping DEGs

## Venn diagram and table representing each venn diagram with gene name and its results 
ven <- venndetail(list("UPF3B KD" = DEGenes$UPF3B_vs_Control$gene, 
                       "UPF3A KD" = DEGenes$UPF3A_vs_Control$gene,
                    "Double KD" = DEGenes$DoubleKD_vs_Control$gene,
                    "UPF3A OE" = DEGenes$UPF3A_OE_vs_Control$gene, 
                  "UPF3A OE, UPF3B KD" = DEGenes$UPF3A_OE_UPF3B_KD_vs_Control$gene))
venn(list("UPF3B KD" = DEGenes$UPF3B_vs_Control$ensembl_gene_id, 
                       "UPF3A KD" = DEGenes$UPF3A_vs_Control$ensembl_gene_id,
                    "Double KD" = DEGenes$DoubleKD_vs_Control$ensembl_gene_id,
                    "UPF3A OE" = DEGenes$UPF3A_OE_vs_Control$ensembl_gene_id, 
                  "UPF3A OE, UPF3B KD" = DEGenes$UPF3A_OE_UPF3B_KD_vs_Control$ensembl_gene_id))

Version Author Date
5a8cba6 unawaz1996 2023-03-14
#detail(ven) 
#result(ven)

UPF3B vs Controls

DEColours <- c("-1" = "#503E5C", "1" = "#874257", "0" = "#D3D5E3")
volc = limma_results %>%
    ggplot(aes(x = Coef.UPF3B_vs_Control, 
               y = -log10(p.value.UPF3B_vs_Control), 
               colour = as.factor(as.character(Res.UPF3B_vs_Control)))) +
    geom_point(alpha = 0.8, size =1) +
    scale_colour_manual(values = DEColours) + theme_bw() + 
    theme(legend.position = "none") +
    geom_hline(yintercept = -log10(0.05), color = "grey60", size = 0.5, lty = "dashed") +
    geom_hline(yintercept = -log10(limma_results %>% 
                                       dplyr::filter(Res.UPF3B_vs_Control != 0) %>% 
                                       use_series(p.value.UPF3B_vs_Control) %>% max), 
               color = "black") +
    labs(x = "log2 Fold Change", y = "-log10 p-value") +
    ggtitle("UPF3B KD vs. Control") + xlim(-2.5,2.5) 


my_gg = volc + geom_point_interactive(aes(tooltip = gene, data_id = gene), 
    size = 1, hover_nearest = TRUE)
girafe(ggobj = my_gg)
DEGenes$UPF3B_vs_Control %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains("UPF3B_vs")) %>% 
    dplyr::select(-contains("Res")) %>% 
    mutate(genename = mapIds(org.Mm.eg.db, keys=ensembl_gene_id,  column="GENENAME",keytype="ENSEMBL", multiVals="first")) %>%     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'), caption = "Genes significantly expressed in UPF3B KD relative to Controls",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

Given that previous studies have confirmed UPF3B KD results in a decrease of efficieny of NMD resulting in an increase of upregulated genes, we are interested in seeing how the signifiantly expressed upregulated genes behave in other conditions compared to controls.

In order to do this, we will create a heatmap visualisaiton.

rg <- limma_results %>% 
     dplyr::filter(Res.UPF3B_vs_Control == 1) %>%
    dplyr::select(contains("Coef"), gene)  %>%
    column_to_rownames("gene") %>% max(abs(.));

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == 1) %>%
    dplyr::select(contains("Coef"), gene)  %>%
    column_to_rownames("gene") %>%
    pheatmap(,
        scale="none", 
        cluster_cols = FALSE,
        breaks = seq(-rg, rg, length.out = 100),
        clustering_distance_rows = "euclidean", 
        border_color = "white", 
        treeheight_row = 0,
        fontsize = 6,
        color = colorRampPalette(c("#10c1e5", "#82e0b4","#F9F9F9", "#FBB829", "#FF0066"))(100),
        labels_row = row.names(.), 
        cellwidth = 8
    )

Version Author Date
5a8cba6 unawaz1996 2023-03-14

From the heatmap above, we can notice a few trends that upregulated genes in UPF3B KD have in other conditions.

  1. Inverted expression in UPF3A OE genes

  2. Inverted expression in UPF3A KD

These genes can be explored here:

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == 1) %>%
    dplyr::select(gene, contains("Coef"))  %>%
   # column_to_rownames("gene") %>%  
    DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

Overlap with other significant genes

We are now also interested in seeing how all genes differentially expressed in UPF3B KD overlap with significant genes from other conditions.

Comparison with UPF3A KD significant genes

#venn(list("UPF3B KD" = DEGenes$UPF3B_vs_Control$ensembl_gene_id, 
 #                      "UPF3A KD" = DEGenes$UPF3A_vs_Control$ensembl_gene_id))
x = list( "UPF3B KD" = DEGenes$UPF3B_vs_Control$ensembl_gene_id, 
    "UPF3A KD" = DEGenes$UPF3A_vs_Control$ensembl_gene_id, 
    "Double KD" = DEGenes$DoubleKD_vs_Control$ensembl_gene_id,
    "UPF3A OE" = DEGenes$UPF3A_OE_vs_Control$ensembl_gene_id, 
    "UPF3A OE UPF3B KD" = DEGenes$UPF3A_OE_UPF3B_KD_vs_Control$ensembl_gene_id)

ggvenn(x[c(1,2)])

Version Author Date
5a8cba6 unawaz1996 2023-03-14

When we compare all significant genes in UPF3B KD with UPF3A KD, we can see that there is an overlap of 128 genes.

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>% 
    ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_vs_Control", 
              cor.coef = TRUE, add = "reg.line", 
              conf.int = TRUE, add.params = list(color = "blue",
                                                 fill = "lightgray")) + 
    theme_bw() + ylab("UPF3A KD (log2FoldChange)") + xlab("UPF3B KD (log2FoldChange)")

Version Author Date
5a8cba6 unawaz1996 2023-03-14

When we compare the direction of gene expression in these 128 genes, we can see that there is a positive correlation (r = 0.68).

We are now interested in seeing which genes are present in each of the quadrants:

  1. Quadrant 1: Definied as x < 0, y > 0 where is significant genes belogning to UPF3A KD group show upregulation and genes belonging to UPF3B KD group show downregulation. A total of 12 genes belong to this group:
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.UPF3A_vs_Control == 1) %>%  
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
                   caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 1 (Upregulated in UPF3A KD, downregulated in UPF3B KD)",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
  1. Quadrant 2: Defined as x > 0, y > 0 where significant genes from both conditions show upregulation. 102 genes belong to this group.
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.UPF3A_vs_Control ==1) %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
    DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
                  caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 2 (Upregulated in both conditions)",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
  1. Quadrant 3 is defined as x < 0, y < 0 where genes in both groups show downregulation. 12 genes are in this quadrant:
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.UPF3A_vs_Control == -1) %>%
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
    DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
                  caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 3 (Downregulated in both conditions)",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
  1. Quadrant 4 is defined as x > 0, y < 0, where significant genes belonging to UPF3B KD group are upregulated and genes belonging to UPF3A KD group are downregulated. Only 2 genes belong in this group.
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.UPF3A_vs_Control ==-1) %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
                   caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 4 (Upregulated in UPF3B KD, downregulated in UPF3A KD)",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

We are also interested in seeing how the 128 overlapping genes between UPF3A and UPF3B KD behave in other conditions. To do this, we will use a heatmap visualisation again.

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>% 
    dplyr::select(ensembl_gene_id,gene, contains(c("Coef."))) %>%
 DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
               caption ="Overlapping genes between UPF3B KD and UPF3A KD (padj < 0.05; absolute log2FoldChange > log2(1.5)) in other comparisons)",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
rg <- limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>%
    dplyr::select(contains("Coef"), gene)  %>%
    column_to_rownames("gene") %>% max(abs(.));

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>%
    dplyr::select(contains("Coef"), gene)  %>%
    column_to_rownames("gene") %>%
    pheatmap(,
       # scale="row", 
        cluster_cols = FALSE,
        breaks = seq(-rg, rg, length.out = 100),
        clustering_distance_rows = "euclidean", 
        border_color = "white", 
        treeheight_row = 0,
        cutree_rows = 2,
        fontsize = 6,
        color = colorRampPalette(c("#10c1e5", "#82e0b4","#F9F9F9", "#FBB829", "#FF0066"))(100),
        labels_row = row.names(.),
        cellheight = 10, 
        cellwidth = 12
    )

Version Author Date
5a8cba6 unawaz1996 2023-03-14

UPF3B vs Double KD

ggvenn(x[c(1,3)])

Version Author Date
5a8cba6 unawaz1996 2023-03-14

A total of 309 genes overlap betweeen the Double KDs and UPF3B KD. When we look at the correlation, we see that there is a strong positive correlation.

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.DoubleKD_vs_Control !=0) %>% 
    ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.DoubleKD_vs_Control", 
              cor.coef = TRUE, add = "reg.line", 
              conf.int = TRUE, add.params = list(color = "blue",
                                                 fill = "lightgray")) + 
    theme_bw() 

Version Author Date
5a8cba6 unawaz1996 2023-03-14
  1. Quadrant 1: Definied as x < 0, y > 0 where is significant genes belogning to Double KD group show upregulation and genes belonging to UPF3B KD group show downregulation. A total of 2 genes belong to this group:
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.DoubleKD_vs_Control == 1) %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "Double"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
  1. Quadrant 2: Defined as x > 0, y > 0 where significant genes from both conditions show upregulation. 246 genes belong to this group.
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.DoubleKD_vs_Control ==1) %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
   DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
  1. Quadrant 3 is defined as x < 0, y < 0 where genes in both groups show downregulation. 61 genes are in this quadrant:
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.DoubleKD_vs_Control == -1) %>%
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

UPF3B vs UPF3A OE, UPF3B KD

ggvenn(x[c(1,5)])

Version Author Date
5a8cba6 unawaz1996 2023-03-14
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_OE_UPF3B_KD_vs_Control !=0) %>% 
    ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control", 
              cor.coef = TRUE, add = "reg.line", 
              conf.int = TRUE, add.params = list(color = "blue",
                                                 fill = "lightgray")) + 
    theme_bw() 

Version Author Date
5a8cba6 unawaz1996 2023-03-14

When we compare UPF3B KD with UPF3A OE in UPF3B KD cell lines, we can see there is a strong positive correlation. This correlaiton is very likely induced by the UPF3B KD cell line itself, and not the impact of the UPF3A OE. For interest, here are the gene in Quadrant 2 and Quadrant 3

  1. Quadrant 2: Defined as x > 0, y > 0 where significant genes from both conditions show upregulation. 170 genes belong to this group.
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.UPF3A_OE_UPF3B_KD_vs_Control ==1) %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
   DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))
  1. Quadrant 3 is defined as x < 0, y < 0 where genes in both groups show downregulation. 64 genes are in this quadrant:
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.UPF3A_OE_UPF3B_KD_vs_Control== -1) %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>% 
    dplyr::arrange(Res.UPF3B_vs_Control) %>%
     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

Following up from this, it would be good to see how the genes that were specific to the UPF3B KD (159 genes) and UPF3A OE, UPF3B KD (97) behave relative to the other condition.

limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control != 0 &  Res.UPF3A_OE_UPF3B_KD_vs_Control == 0 ) %>% 
    ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control", 
              cor.coef = TRUE, add = "reg.line", 
              conf.int = TRUE, add.params = list(color = "blue",
                                                 fill = "lightgray")) + 
    theme_bw() 

Version Author Date
5a8cba6 unawaz1996 2023-03-14
limma_results %>% 
    dplyr::filter(Res.UPF3B_vs_Control ==0 &  Res.UPF3A_OE_UPF3B_KD_vs_Control != 0 ) %>% 
    ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control", 
              cor.coef = TRUE, add = "reg.line", 
              conf.int = TRUE, add.params = list(color = "blue",
                                                 fill = "lightgray")) + 
    theme_bw() 

Version Author Date
5a8cba6 unawaz1996 2023-03-14

Overall the above plots show that there is a strong positive correlation.

UPF3A

DEColours <- c("-1" = "#503E5C", "1" = "#874257", "0" = "#D3D5E3")
volc = limma_results %>%
    ggplot(aes(x = Coef.UPF3A_vs_Control, 
               y = -log10(p.value.UPF3A_vs_Control), 
               colour = as.factor(as.character(Res.UPF3A_vs_Control)))) +
    geom_point(alpha = 0.8, size =0.8) +
    scale_colour_manual(values = DEColours) + theme_bw() + 
    theme(legend.position = "none") +
    geom_hline(yintercept = -log10(0.05), color = "grey60", size = 0.5, lty = "dashed") +
    geom_hline(yintercept = -log10(limma_results %>% 
                                       dplyr::filter(Res.UPF3A_vs_Control != 0) %>% 
                                       use_series(p.value.UPF3A_vs_Control) %>% max), 
               color = "black") +
    labs(x = "log2 Fold Change", y = "-log10 p-value") +
    ggtitle("UPF3B KD vs. Control") + xlim(-2.5,2.5) 


my_gg = volc + geom_point_interactive(aes(tooltip = gene, data_id = gene), 
    size = 1, hover_nearest = TRUE)
girafe(ggobj = my_gg)
DEGenes$UPF3A_vs_Control %>% 
    dplyr::select(ensembl_gene_id,gene,entrez, contains("UPF3A_vs")) %>% 
    mutate(genename = mapIds(org.Mm.eg.db, keys=ensembl_gene_id,  column="GENENAME",keytype="ENSEMBL", multiVals="first")) %>%     DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
                                                                                                                                                 caption = "Genes significantly expression in UPF3A KDs relative to Controls",
            options = list(scrollX = TRUE, 
                           scrollCollapse = TRUE, 
                           pageLength = 5, autoWidth = TRUE, 
                           dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

UPF3A vs Double KDs

ggvenn(x[c(2,3)])

Version Author Date
5a8cba6 unawaz1996 2023-03-14
limma_results %>% 
    dplyr::filter(Res.UPF3A_vs_Control != 0 & Res.DoubleKD_vs_Control !=0) %>% 
    ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control", 
              cor.coef = TRUE, add = "reg.line", 
              conf.int = TRUE, add.params = list(color = "blue",
                                                 fill = "lightgray")) + 
    theme_bw() 

Version Author Date
5a8cba6 unawaz1996 2023-03-14

sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      stats4    tools     stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] VennDiagram_1.7.3           futile.logger_1.4.3        
 [3] UpSetR_1.4.0                fgsea_1.24.0               
 [5] GOplot_1.0.2                RColorBrewer_1.1-3         
 [7] gridExtra_2.3               ggdendro_0.1.23            
 [9] AnnotationHub_3.6.0         BiocFileCache_2.6.1        
[11] dbplyr_2.3.1                openxlsx_4.2.5.2           
[13] ggiraph_0.8.6               wasabi_1.0.1               
[15] sleuth_0.30.1               DT_0.27                    
[17] VennDetail_1.14.0           msigdb_1.6.0               
[19] GSEABase_1.60.0             graph_1.76.0               
[21] annotate_1.76.0             XML_3.99-0.13              
[23] pheatmap_1.0.12             ggvenn_0.1.9               
[25] MetBrewer_0.2.0             ggpubr_0.6.0               
[27] venn_1.11                   viridis_0.6.2              
[29] viridisLite_0.4.1           tximeta_1.16.1             
[31] tximport_1.26.1             goseq_1.50.0               
[33] geneLenDataBase_1.34.0      BiasedUrn_2.0.9            
[35] org.Mm.eg.db_3.16.0         EnsDb.Mmusculus.v79_2.99.0 
[37] ensembldb_2.22.0            AnnotationFilter_1.22.0    
[39] GenomicFeatures_1.50.4      AnnotationDbi_1.60.0       
[41] biomaRt_2.54.0              edgeR_3.40.2               
[43] limma_3.54.1                DESeq2_1.38.3              
[45] SummarizedExperiment_1.28.0 Biobase_2.58.0             
[47] MatrixGenerics_1.10.0       matrixStats_0.63.0         
[49] GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
[51] IRanges_2.32.0              S4Vectors_0.36.1           
[53] BiocGenerics_0.44.0         corrplot_0.92              
[55] lubridate_1.9.2             forcats_1.0.0              
[57] purrr_1.0.1                 readr_2.1.4                
[59] tidyverse_2.0.0             stringr_1.5.0              
[61] tidyr_1.3.0                 scales_1.2.1               
[63] data.table_1.14.8           readxl_1.4.2               
[65] tibble_3.1.8                magrittr_2.0.3             
[67] reshape2_1.4.4              ggplot2_3.4.1              
[69] dplyr_1.1.0.9000            workflowr_1.7.0            

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                    tidyselect_1.2.0             
  [3] htmlwidgets_1.6.1             RSQLite_2.3.0                
  [5] BiocParallel_1.32.5           munsell_0.5.0                
  [7] codetools_0.2-19              withr_2.5.0                  
  [9] colorspace_2.1-0              filelock_1.0.2               
 [11] highr_0.10                    uuid_1.1-0                   
 [13] knitr_1.42                    rstudioapi_0.14              
 [15] ggsignif_0.6.4                labeling_0.4.2               
 [17] git2r_0.31.0                  GenomeInfoDbData_1.2.9       
 [19] farver_2.1.1                  bit64_4.0.5                  
 [21] rhdf5_2.42.0                  rprojroot_2.0.3              
 [23] vctrs_0.5.2.9000              generics_0.1.3               
 [25] lambda.r_1.2.4                xfun_0.37                    
 [27] timechange_0.2.0              R6_2.5.1                     
 [29] locfit_1.5-9.7                rhdf5filters_1.10.0          
 [31] bitops_1.0-7                  cachem_1.0.7                 
 [33] DelayedArray_0.24.0           vroom_1.6.1                  
 [35] promises_1.2.0.1              BiocIO_1.8.0                 
 [37] gtable_0.3.1                  processx_3.8.0               
 [39] rlang_1.0.6.9000              systemfonts_1.0.4            
 [41] splines_4.2.2                 rtracklayer_1.58.0           
 [43] rstatix_0.7.2                 lazyeval_0.2.2               
 [45] broom_1.0.3                   BiocManager_1.30.20          
 [47] yaml_2.3.7                    abind_1.4-5                  
 [49] crosstalk_1.2.0               backports_1.4.1              
 [51] httpuv_1.6.9                  ellipsis_0.3.2               
 [53] jquerylib_0.1.4               Rcpp_1.0.10                  
 [55] plyr_1.8.8                    progress_1.2.2               
 [57] zlibbioc_1.44.0               RCurl_1.98-1.10              
 [59] ps_1.7.2                      prettyunits_1.1.1            
 [61] cowplot_1.1.1                 fs_1.6.1                     
 [63] futile.options_1.0.1          whisker_0.4.1                
 [65] ProtGenerics_1.30.0           hms_1.1.2                    
 [67] mime_0.12                     evaluate_0.20                
 [69] xtable_1.8-4                  compiler_4.2.2               
 [71] crayon_1.5.2                  htmltools_0.5.4              
 [73] mgcv_1.8-41                   later_1.3.0                  
 [75] tzdb_0.3.0                    geneplotter_1.76.0           
 [77] DBI_1.1.3                     formatR_1.14                 
 [79] MASS_7.3-58.2                 rappdirs_0.3.3               
 [81] Matrix_1.5-3                  car_3.1-1                    
 [83] cli_3.6.0                     parallel_4.2.2               
 [85] pkgconfig_2.0.3               getPass_0.2-2                
 [87] GenomicAlignments_1.34.0      xml2_1.3.3                   
 [89] bslib_0.4.2                   admisc_0.30                  
 [91] XVector_0.38.0                callr_3.7.3                  
 [93] digest_0.6.31                 Biostrings_2.66.0            
 [95] fastmatch_1.1-3               rmarkdown_2.20               
 [97] cellranger_1.1.0              restfulr_0.0.15              
 [99] curl_5.0.0                    shiny_1.7.4                  
[101] Rsamtools_2.14.0              rjson_0.2.21                 
[103] lifecycle_1.0.3               nlme_3.1-162                 
[105] jsonlite_1.8.4                Rhdf5lib_1.20.0              
[107] carData_3.0-5                 fansi_1.0.4                  
[109] pillar_1.8.1                  lattice_0.20-45              
[111] KEGGREST_1.38.0               fastmap_1.1.1                
[113] httr_1.4.5                    GO.db_3.16.0                 
[115] interactiveDisplayBase_1.36.0 glue_1.6.2                   
[117] zip_2.2.2                     png_0.1-8                    
[119] BiocVersion_3.16.0            bit_4.0.5                    
[121] stringi_1.7.12                sass_0.4.5                   
[123] blob_1.2.3                    memoise_2.0.1