Last updated: 2023-03-14
Checks: 7 0
Knit directory: NMD-analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230314) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 4856ece. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rproj.user/
Untracked files:
Untracked: NMD-DEG-limma.xlsx
Untracked: data/LTK_Sample Metafile_V3.txt
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/DEG-analysis.Rmd) and HTML
(docs/DEG-analysis.html) files. If you’ve configured a
remote Git repository (see ?wflow_git_remote), click on the
hyperlinks in the table below to view the files as they were in that
past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 4856ece | unawaz1996 | 2023-03-14 | Add DEG analysis |
source("code/plots.R")
source("code/functions.R")
source("code/libraries.R")
# Quality control of genes
salmon.files = ("/home/neuro/Documents/NMD_analysis/Analysis/Results/Quantification/Salmon")
md.files = "data/LTK_Sample Metafile_V3.txt"
# Loading data
txdf = transcripts(EnsDb.Mmusculus.v79, return.type="DataFrame")
tx2gene = as.data.frame(txdf[,c("tx_id","gene_id")])
salmon = list.files(salmon.files, pattern = "transcripts$", full.names = TRUE)
all_files = file.path(salmon, "quant.sf")
names(all_files) <- paste(c(212:229))
txi_genes <- tximport(all_files, type="salmon", txOut=FALSE,
countsFromAbundance="scaledTPM", tx2gene = tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)
md = read.table(md.files, header= TRUE) %>% mutate(files = file.path(salmon, "quant.sf"))
md$Group = factor(md$Group)
group.labs <- c("Control", "UPF3A KD", "UPF3A OE", "UPF3B KD", "Double KD", "UPF3A OE, UPF3B KD")
names(group.labs) <- unique(md$Group)
# upf3a
txi_genes$abundance["ENSMUSG00000038398",] %>% melt() %>%
rownames_to_column("sample") %>% cbind(md) %>%
ggplot(aes(x=as.character(Sample), y= value, fill = Group)) +
geom_bar(stat="identity", width = 4, color = "black", alpha = 0.9) +
facet_grid(~Group, labeller = labeller(Group = group.labs) ) + ylab("tpm") + theme_bw() + xlab("") +
scale_fill_manual(values = c("#9D5D7C", "#8B9488", "#657FA2", "#8483AD", "#8D779E", "#3A4170"),
labels = c("UPF3A_KD_UPF3B_KD" = "Double KD", "UPF3A_KD" = "UPF3A KD",
"UPF3B_KD" = "UPF3B KD", "UPF3A_OE" = "UPF3A OE",
"UPF3A_OE_UPF3B_KD" = "UPF3A OE UPF3B KD")) +
theme(legend.position = "none", axis.text.x=element_blank(),
strip.background =element_rect(fill="#D3D5E3", color = "black"))

#upf3b
txi_genes$abundance["ENSMUSG00000036572",] %>% melt() %>%
rownames_to_column("sample") %>% cbind(md) %>%
ggplot(aes(x=as.character(Sample), y= value, fill = Group)) +
geom_bar(stat="identity", width = 4, color = "black", alpha = 0.9) +
facet_grid(~Group, labeller = labeller(Group = group.labs)) +
ylab("tpm") + theme_bw() + xlab("") +
scale_fill_manual(values = c("#9D5D7C", "#8B9488", "#657FA2", "#8483AD", "#8D779E", "#3A4170")) +
theme(legend.position = "none", axis.text.x=element_blank(),
strip.background =element_rect(fill="#D3D5E3", color = "black"))

keep = (rowSums(txi_genes$abundance >= .5 ) >= 3)
txi_genes$counts %>% cpm(log=TRUE) %>%
melt() %>%
dplyr::filter(is.finite(value)) %>%
ggplot(aes(x=value, color = as.character(Var2))) +
geom_density() +
ggtitle("Before filtering") +
labs(x = "logCPM", y = "Proportion of Genes") +
theme_bw() +
theme(legend.position = "none")

txi_genes$counts %>%
cpm(log=TRUE) %>%
magrittr::extract(keep,) %>%
melt() %>%
dplyr::filter(is.finite(value)) %>%
ggplot(aes(x=value, color = as.character(Var2))) +
geom_density() +
ggtitle("After filtering") +
labs(x = "logCPM", y = "Proportion of Genes") + theme_bw() +
theme(legend.position = "none")

Another level of filtering I’d like to do here is remove all genes that don’t have an annotation
txi_genes_filtered = txi_genes$counts[keep,] %>%
as.data.frame() %>%
rownames_to_column("geneID") %>%
mutate(geneName = mapIds(org.Mm.eg.db, keys=.$geneID,
column="SYMBOL",keytype="ENSEMBL",
multiVals="first")) %>%
drop_na(geneName) %>%
column_to_rownames("geneID") %>% dplyr::select(-geneName)
PC = log2(txi_genes$abundance[keep,] + 10^-6 ) %>%
t() %>%
prcomp(scale = TRUE)
PC$x[,1:2] %>%
as.data.frame() %>%
cbind(md) %>%
ggplot(aes(PC1, PC2, color = Group, shape = Condition_UPF3B, label = Condition_UPF3B)) +
geom_point(size = 11.5) +
theme_bw() + scale_shape_manual(values = c(16,15)) + labs(shape="UPF3B status", colour= "Samples") +
theme(axis.title.y = element_text(size = 12), axis.title.x = element_text(size = 12),
axis.text = element_text(size = 10)) +
xlab(paste0("Principal Component 1 (28.1%)")) + ylab("Principal Component 2 (16.6%)") +
scale_color_manual(values = c("#874257", "#8B9488", "#657FA2", "#8483AD", "#C2A8BE", "#3A4170"),
labels = c("UPF3A_KD_UPF3B_KD" = "Double KD", "UPF3A_KD" = "UPF3A KD",
"UPF3B_KD" = "UPF3B KD", "UPF3A_OE" = "UPF3A OE",
"UPF3A_OE_UPF3B_KD" = "UPF3A OE UPF3B KD"))

txi_genes$abundance[keep,] %>%
as.data.frame() %>%
gather(sample,value) %>%
ggplot(aes(x=sample, y = log(value), fill = sample)) + geom_boxplot() +
ylab("Log(TPM)") + xlab("Samples") +
theme_bw() + theme(legend.position = "none") + scale_fill_manual(values = hydrangea[1:18])

# Library size
countsSum <- colSums(txi_genes$counts) # library size
countsSum %>% melt() %>% rownames_to_column("Sample") %>%
ggplot(aes(x=Sample, y = value)) + geom_bar(stat = "identity") + ylab("Library size") +
theme_bw()

#max(countsSum) / min(countsSum)
y <- DGEList(txi_genes_filtered)
design <- model.matrix(~0+Group, data = md) %>%
set_colnames(gsub(pattern = "Group", replacement ="", x = colnames(.)))
contrasts <- makeContrasts(levels = colnames(design),
UPF3B_vs_Control = UPF3B_KD - Control,
UPF3A_vs_Control = UPF3A_KD - Control,
UPF3A_OE_vs_Control = UPF3A_OE - Control,
DoubleKD_vs_Control = UPF3A_KD_UPF3B_KD - Control,
UPF3A_OE_UPF3B_KD_vs_Control = UPF3A_OE_UPF3B_KD - Control)
y <- calcNormFactors(y)
v <- voom(y, design)
fit = lmFit(v, design) %>%
contrasts.fit(contrasts) %>%
eBayes()
results <- decideTests(fit, p.value = 0.05, adjust.method = "fdr", method= "global", lfc = log2(1.5))
limma_results = write_fit(fit, results= results, adjust = "fdr") %>%
rownames_to_column("ensembl_gene_id") %>%
mutate(gene = mapIds(org.Mm.eg.db, keys=ensembl_gene_id, column="SYMBOL",keytype="ENSEMBL", multiVals="first"),
entrez = mapIds(org.Mm.eg.db, keys=ensembl_gene_id, column="ENTREZID",keytype="ENSEMBL", multiVals="first")) %>% drop_na(entrez)
#save(limma_results, file="/home/neuro/Documents/NMD_analysis/Analysis/Results/DEGs/DEG-limma-results.Rda")
#save(results, fit,y,v,contrasts, design, contrasts, file = "/home/neuro/Documents/NMD_analysis/Analysis/Results/DEGs/limma-matrices.Rda")
DEGenes <- list()
DEGenes[["UPF3B_vs_Control"]] <- limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0, abs(Coef.UPF3B_vs_Control) > log2(1.5))
DEGenes[["UPF3A_vs_Control"]] <- limma_results %>%
dplyr::filter(Res.UPF3A_vs_Control != 0, abs(Coef.UPF3A_vs_Control) > log2(1.5))
DEGenes[["UPF3A_OE_vs_Control"]] <- limma_results %>%
dplyr::filter(Res.UPF3A_OE_vs_Control != 0, abs(Coef.UPF3A_OE_vs_Control) > log2(1.5)) %>%
arrange(p.value.UPF3A_OE_vs_Control)
DEGenes[["DoubleKD_vs_Control"]] <- limma_results %>%
dplyr::filter(Res.DoubleKD_vs_Control != 0, abs(Coef.DoubleKD_vs_Control) > log2(1.5)) %>%
arrange(p.value.DoubleKD_vs_Control)
DEGenes[["UPF3A_OE_UPF3B_KD_vs_Control"]] <- limma_results %>%
dplyr::filter(Res.UPF3A_OE_UPF3B_KD_vs_Control != 0, abs(Coef.UPF3A_OE_UPF3B_KD_vs_Control) > log2(1.5)) %>%
arrange(p.value.UPF3A_OE_UPF3B_KD_vs_Control)
write.xlsx(DEGenes, "NMD-DEG-limma.xlsx")
#save(DEGenes, file="/home/neuro/Documents/NMD_analysis/Analysis/Results/DEGs/DEG-list.Rda")
summary(results)
UPF3B_vs_Control UPF3A_vs_Control UPF3A_OE_vs_Control
Down 95 58 24
NotSig 10564 10568 10926
Up 298 331 7
DoubleKD_vs_Control UPF3A_OE_UPF3B_KD_vs_Control
Down 227 125
NotSig 10016 10626
Up 714 206
summary(results) %>% melt() %>%
dplyr::filter(Var1 != "NotSig") %>%
ggplot(aes(x=reorder(Var2, -value), y = value, fill=Var1)) +
geom_bar(stat = "identity", colour = "white") +
theme_bw() +
theme(axis.text.x = element_text(angle = 70, hjust=1, face = "bold"),
legend.position = "bottom",
legend.title= element_blank(), axis.title.y = element_text(size = 12)) +
scale_fill_manual(values = c("#0B032D", "#E56B6F")) +
ylab("No. DEGs") + xlab("") + scale_x_discrete(labels=c("UPF3A_KD_UPF3B_KD_vs_Control" = "Double KD", "UPF3A_KD_vs_Control" = "UPF3A KD",
"UPF3B_KD_vs_Control" = "UPF3B KD", "UPF3A_OE_vs_Control" = "UPF3A OE",
"UPF3A_OE_UPF3B_KD_vs_Control" = "UPF3A OE UPF3B KD"))

After performing pairwise differential gene expression analysis, we can see that majority of all significantly expressed genes in all contrasts are upregulated. Overall, we find that a total of 395 genes are differentially expressed between UPF3B KD and controls. 388 genes are differentially expressed between UPF3A KD vs Controls.
## Venn diagram and table representing each venn diagram with gene name and its results
ven <- venndetail(list("UPF3B KD" = DEGenes$UPF3B_vs_Control$gene,
"UPF3A KD" = DEGenes$UPF3A_vs_Control$gene,
"Double KD" = DEGenes$DoubleKD_vs_Control$gene,
"UPF3A OE" = DEGenes$UPF3A_OE_vs_Control$gene,
"UPF3A OE, UPF3B KD" = DEGenes$UPF3A_OE_UPF3B_KD_vs_Control$gene))
venn(list("UPF3B KD" = DEGenes$UPF3B_vs_Control$ensembl_gene_id,
"UPF3A KD" = DEGenes$UPF3A_vs_Control$ensembl_gene_id,
"Double KD" = DEGenes$DoubleKD_vs_Control$ensembl_gene_id,
"UPF3A OE" = DEGenes$UPF3A_OE_vs_Control$ensembl_gene_id,
"UPF3A OE, UPF3B KD" = DEGenes$UPF3A_OE_UPF3B_KD_vs_Control$ensembl_gene_id))

#detail(ven)
#result(ven)
DEColours <- c("-1" = "#503E5C", "1" = "#874257", "0" = "#D3D5E3")
volc = limma_results %>%
ggplot(aes(x = Coef.UPF3B_vs_Control,
y = -log10(p.value.UPF3B_vs_Control),
colour = as.factor(as.character(Res.UPF3B_vs_Control)))) +
geom_point(alpha = 0.8, size =1) +
scale_colour_manual(values = DEColours) + theme_bw() +
theme(legend.position = "none") +
geom_hline(yintercept = -log10(0.05), color = "grey60", size = 0.5, lty = "dashed") +
geom_hline(yintercept = -log10(limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0) %>%
use_series(p.value.UPF3B_vs_Control) %>% max),
color = "black") +
labs(x = "log2 Fold Change", y = "-log10 p-value") +
ggtitle("UPF3B KD vs. Control") + xlim(-2.5,2.5)
my_gg = volc + geom_point_interactive(aes(tooltip = gene, data_id = gene),
size = 1, hover_nearest = TRUE)
girafe(ggobj = my_gg)
DEGenes$UPF3B_vs_Control %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains("UPF3B_vs")) %>%
dplyr::select(-contains("Res")) %>%
mutate(genename = mapIds(org.Mm.eg.db, keys=ensembl_gene_id, column="GENENAME",keytype="ENSEMBL", multiVals="first")) %>% DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'), caption = "Genes significantly expressed in UPF3B KD relative to Controls",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
Given that previous studies have confirmed UPF3B KD results in a decrease of efficieny of NMD resulting in an increase of upregulated genes, we are interested in seeing how the signifiantly expressed upregulated genes behave in other conditions compared to controls.
In order to do this, we will create a heatmap visualisaiton.
rg <- limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1) %>%
dplyr::select(contains("Coef"), gene) %>%
column_to_rownames("gene") %>% max(abs(.));
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1) %>%
dplyr::select(contains("Coef"), gene) %>%
column_to_rownames("gene") %>%
pheatmap(,
scale="none",
cluster_cols = FALSE,
breaks = seq(-rg, rg, length.out = 100),
clustering_distance_rows = "euclidean",
border_color = "white",
treeheight_row = 0,
fontsize = 6,
color = colorRampPalette(c("#10c1e5", "#82e0b4","#F9F9F9", "#FBB829", "#FF0066"))(100),
labels_row = row.names(.),
cellwidth = 8
)

From the heatmap above, we can notice a few trends that upregulated genes in UPF3B KD have in other conditions.
Inverted expression in UPF3A OE genes
Inverted expression in UPF3A KD
These genes can be explored here:
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1) %>%
dplyr::select(gene, contains("Coef")) %>%
# column_to_rownames("gene") %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
We are now also interested in seeing how all genes differentially expressed in UPF3B KD overlap with significant genes from other conditions.
#venn(list("UPF3B KD" = DEGenes$UPF3B_vs_Control$ensembl_gene_id,
# "UPF3A KD" = DEGenes$UPF3A_vs_Control$ensembl_gene_id))
x = list( "UPF3B KD" = DEGenes$UPF3B_vs_Control$ensembl_gene_id,
"UPF3A KD" = DEGenes$UPF3A_vs_Control$ensembl_gene_id,
"Double KD" = DEGenes$DoubleKD_vs_Control$ensembl_gene_id,
"UPF3A OE" = DEGenes$UPF3A_OE_vs_Control$ensembl_gene_id,
"UPF3A OE UPF3B KD" = DEGenes$UPF3A_OE_UPF3B_KD_vs_Control$ensembl_gene_id)
ggvenn(x[c(1,2)])

When we compare all significant genes in UPF3B KD with UPF3A KD, we can see that there is an overlap of 128 genes.
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>%
ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_vs_Control",
cor.coef = TRUE, add = "reg.line",
conf.int = TRUE, add.params = list(color = "blue",
fill = "lightgray")) +
theme_bw() + ylab("UPF3A KD (log2FoldChange)") + xlab("UPF3B KD (log2FoldChange)")

When we compare the direction of gene expression in these 128 genes, we can see that there is a positive correlation (r = 0.68).
We are now interested in seeing which genes are present in each of the quadrants:
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.UPF3A_vs_Control == 1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 1 (Upregulated in UPF3A KD, downregulated in UPF3B KD)",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.UPF3A_vs_Control ==1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 2 (Upregulated in both conditions)",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.UPF3A_vs_Control == -1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 3 (Downregulated in both conditions)",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.UPF3A_vs_Control ==-1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "UPF3A_vs"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
caption ="Overlapping genes between UPF3B KD and UPF3A KD relative to controls in Quadrant 4 (Upregulated in UPF3B KD, downregulated in UPF3A KD)",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
We are also interested in seeing how the 128 overlapping genes between UPF3A and UPF3B KD behave in other conditions. To do this, we will use a heatmap visualisation again.
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>%
dplyr::select(ensembl_gene_id,gene, contains(c("Coef."))) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
caption ="Overlapping genes between UPF3B KD and UPF3A KD (padj < 0.05; absolute log2FoldChange > log2(1.5)) in other comparisons)",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
rg <- limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>%
dplyr::select(contains("Coef"), gene) %>%
column_to_rownames("gene") %>% max(abs(.));
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_vs_Control !=0) %>%
dplyr::select(contains("Coef"), gene) %>%
column_to_rownames("gene") %>%
pheatmap(,
# scale="row",
cluster_cols = FALSE,
breaks = seq(-rg, rg, length.out = 100),
clustering_distance_rows = "euclidean",
border_color = "white",
treeheight_row = 0,
cutree_rows = 2,
fontsize = 6,
color = colorRampPalette(c("#10c1e5", "#82e0b4","#F9F9F9", "#FBB829", "#FF0066"))(100),
labels_row = row.names(.),
cellheight = 10,
cellwidth = 12
)

ggvenn(x[c(1,3)])

A total of 309 genes overlap betweeen the Double KDs and UPF3B KD. When we look at the correlation, we see that there is a strong positive correlation.
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.DoubleKD_vs_Control !=0) %>%
ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.DoubleKD_vs_Control",
cor.coef = TRUE, add = "reg.line",
conf.int = TRUE, add.params = list(color = "blue",
fill = "lightgray")) +
theme_bw()

limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.DoubleKD_vs_Control == 1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "Double"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.DoubleKD_vs_Control ==1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.DoubleKD_vs_Control == -1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
ggvenn(x[c(1,5)])

limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_OE_UPF3B_KD_vs_Control !=0) %>%
ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control",
cor.coef = TRUE, add = "reg.line",
conf.int = TRUE, add.params = list(color = "blue",
fill = "lightgray")) +
theme_bw()

When we compare UPF3B KD with UPF3A OE in UPF3B KD cell lines, we can see there is a strong positive correlation. This correlaiton is very likely induced by the UPF3B KD cell line itself, and not the impact of the UPF3A OE. For interest, here are the gene in Quadrant 2 and Quadrant 3
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == 1 & Res.UPF3A_OE_UPF3B_KD_vs_Control ==1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control == -1 & Res.UPF3A_OE_UPF3B_KD_vs_Control== -1) %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains(c("UPF3B_vs", "DoubleKD"))) %>%
dplyr::arrange(Res.UPF3B_vs_Control) %>%
DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
Following up from this, it would be good to see how the genes that were specific to the UPF3B KD (159 genes) and UPF3A OE, UPF3B KD (97) behave relative to the other condition.
limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control != 0 & Res.UPF3A_OE_UPF3B_KD_vs_Control == 0 ) %>%
ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control",
cor.coef = TRUE, add = "reg.line",
conf.int = TRUE, add.params = list(color = "blue",
fill = "lightgray")) +
theme_bw()

limma_results %>%
dplyr::filter(Res.UPF3B_vs_Control ==0 & Res.UPF3A_OE_UPF3B_KD_vs_Control != 0 ) %>%
ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control",
cor.coef = TRUE, add = "reg.line",
conf.int = TRUE, add.params = list(color = "blue",
fill = "lightgray")) +
theme_bw()

Overall the above plots show that there is a strong positive correlation.
DEColours <- c("-1" = "#503E5C", "1" = "#874257", "0" = "#D3D5E3")
volc = limma_results %>%
ggplot(aes(x = Coef.UPF3A_vs_Control,
y = -log10(p.value.UPF3A_vs_Control),
colour = as.factor(as.character(Res.UPF3A_vs_Control)))) +
geom_point(alpha = 0.8, size =0.8) +
scale_colour_manual(values = DEColours) + theme_bw() +
theme(legend.position = "none") +
geom_hline(yintercept = -log10(0.05), color = "grey60", size = 0.5, lty = "dashed") +
geom_hline(yintercept = -log10(limma_results %>%
dplyr::filter(Res.UPF3A_vs_Control != 0) %>%
use_series(p.value.UPF3A_vs_Control) %>% max),
color = "black") +
labs(x = "log2 Fold Change", y = "-log10 p-value") +
ggtitle("UPF3B KD vs. Control") + xlim(-2.5,2.5)
my_gg = volc + geom_point_interactive(aes(tooltip = gene, data_id = gene),
size = 1, hover_nearest = TRUE)
girafe(ggobj = my_gg)
DEGenes$UPF3A_vs_Control %>%
dplyr::select(ensembl_gene_id,gene,entrez, contains("UPF3A_vs")) %>%
mutate(genename = mapIds(org.Mm.eg.db, keys=ensembl_gene_id, column="GENENAME",keytype="ENSEMBL", multiVals="first")) %>% DT::datatable(filter = 'top', extensions = c('Buttons', 'FixedColumns'),
caption = "Genes significantly expression in UPF3A KDs relative to Controls",
options = list(scrollX = TRUE,
scrollCollapse = TRUE,
pageLength = 5, autoWidth = TRUE,
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'colvis'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
ggvenn(x[c(2,3)])

limma_results %>%
dplyr::filter(Res.UPF3A_vs_Control != 0 & Res.DoubleKD_vs_Control !=0) %>%
ggscatter(., x= "Coef.UPF3B_vs_Control", y="Coef.UPF3A_OE_UPF3B_KD_vs_Control",
cor.coef = TRUE, add = "reg.line",
conf.int = TRUE, add.params = list(color = "blue",
fill = "lightgray")) +
theme_bw()

sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 tools stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] VennDiagram_1.7.3 futile.logger_1.4.3
[3] UpSetR_1.4.0 fgsea_1.24.0
[5] GOplot_1.0.2 RColorBrewer_1.1-3
[7] gridExtra_2.3 ggdendro_0.1.23
[9] AnnotationHub_3.6.0 BiocFileCache_2.6.1
[11] dbplyr_2.3.1 openxlsx_4.2.5.2
[13] ggiraph_0.8.6 wasabi_1.0.1
[15] sleuth_0.30.1 DT_0.27
[17] VennDetail_1.14.0 msigdb_1.6.0
[19] GSEABase_1.60.0 graph_1.76.0
[21] annotate_1.76.0 XML_3.99-0.13
[23] pheatmap_1.0.12 ggvenn_0.1.9
[25] MetBrewer_0.2.0 ggpubr_0.6.0
[27] venn_1.11 viridis_0.6.2
[29] viridisLite_0.4.1 tximeta_1.16.1
[31] tximport_1.26.1 goseq_1.50.0
[33] geneLenDataBase_1.34.0 BiasedUrn_2.0.9
[35] org.Mm.eg.db_3.16.0 EnsDb.Mmusculus.v79_2.99.0
[37] ensembldb_2.22.0 AnnotationFilter_1.22.0
[39] GenomicFeatures_1.50.4 AnnotationDbi_1.60.0
[41] biomaRt_2.54.0 edgeR_3.40.2
[43] limma_3.54.1 DESeq2_1.38.3
[45] SummarizedExperiment_1.28.0 Biobase_2.58.0
[47] MatrixGenerics_1.10.0 matrixStats_0.63.0
[49] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[51] IRanges_2.32.0 S4Vectors_0.36.1
[53] BiocGenerics_0.44.0 corrplot_0.92
[55] lubridate_1.9.2 forcats_1.0.0
[57] purrr_1.0.1 readr_2.1.4
[59] tidyverse_2.0.0 stringr_1.5.0
[61] tidyr_1.3.0 scales_1.2.1
[63] data.table_1.14.8 readxl_1.4.2
[65] tibble_3.1.8 magrittr_2.0.3
[67] reshape2_1.4.4 ggplot2_3.4.1
[69] dplyr_1.1.0.9000 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] utf8_1.2.3 tidyselect_1.2.0
[3] htmlwidgets_1.6.1 RSQLite_2.3.0
[5] BiocParallel_1.32.5 munsell_0.5.0
[7] codetools_0.2-19 withr_2.5.0
[9] colorspace_2.1-0 filelock_1.0.2
[11] highr_0.10 uuid_1.1-0
[13] knitr_1.42 rstudioapi_0.14
[15] ggsignif_0.6.4 labeling_0.4.2
[17] git2r_0.31.0 GenomeInfoDbData_1.2.9
[19] farver_2.1.1 bit64_4.0.5
[21] rhdf5_2.42.0 rprojroot_2.0.3
[23] vctrs_0.5.2.9000 generics_0.1.3
[25] lambda.r_1.2.4 xfun_0.37
[27] timechange_0.2.0 R6_2.5.1
[29] locfit_1.5-9.7 rhdf5filters_1.10.0
[31] bitops_1.0-7 cachem_1.0.7
[33] DelayedArray_0.24.0 vroom_1.6.1
[35] promises_1.2.0.1 BiocIO_1.8.0
[37] gtable_0.3.1 processx_3.8.0
[39] rlang_1.0.6.9000 systemfonts_1.0.4
[41] splines_4.2.2 rtracklayer_1.58.0
[43] rstatix_0.7.2 lazyeval_0.2.2
[45] broom_1.0.3 BiocManager_1.30.20
[47] yaml_2.3.7 abind_1.4-5
[49] crosstalk_1.2.0 backports_1.4.1
[51] httpuv_1.6.9 ellipsis_0.3.2
[53] jquerylib_0.1.4 Rcpp_1.0.10
[55] plyr_1.8.8 progress_1.2.2
[57] zlibbioc_1.44.0 RCurl_1.98-1.10
[59] ps_1.7.2 prettyunits_1.1.1
[61] cowplot_1.1.1 fs_1.6.1
[63] futile.options_1.0.1 whisker_0.4.1
[65] ProtGenerics_1.30.0 hms_1.1.2
[67] mime_0.12 evaluate_0.20
[69] xtable_1.8-4 compiler_4.2.2
[71] crayon_1.5.2 htmltools_0.5.4
[73] mgcv_1.8-41 later_1.3.0
[75] tzdb_0.3.0 geneplotter_1.76.0
[77] DBI_1.1.3 formatR_1.14
[79] MASS_7.3-58.2 rappdirs_0.3.3
[81] Matrix_1.5-3 car_3.1-1
[83] cli_3.6.0 parallel_4.2.2
[85] pkgconfig_2.0.3 getPass_0.2-2
[87] GenomicAlignments_1.34.0 xml2_1.3.3
[89] bslib_0.4.2 admisc_0.30
[91] XVector_0.38.0 callr_3.7.3
[93] digest_0.6.31 Biostrings_2.66.0
[95] fastmatch_1.1-3 rmarkdown_2.20
[97] cellranger_1.1.0 restfulr_0.0.15
[99] curl_5.0.0 shiny_1.7.4
[101] Rsamtools_2.14.0 rjson_0.2.21
[103] lifecycle_1.0.3 nlme_3.1-162
[105] jsonlite_1.8.4 Rhdf5lib_1.20.0
[107] carData_3.0-5 fansi_1.0.4
[109] pillar_1.8.1 lattice_0.20-45
[111] KEGGREST_1.38.0 fastmap_1.1.1
[113] httr_1.4.5 GO.db_3.16.0
[115] interactiveDisplayBase_1.36.0 glue_1.6.2
[117] zip_2.2.2 png_0.1-8
[119] BiocVersion_3.16.0 bit_4.0.5
[121] stringi_1.7.12 sass_0.4.5
[123] blob_1.2.3 memoise_2.0.1