Last updated: 2025-03-03
Checks: 5 2
Knit directory:
locust-comparative-genomics/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20221025) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data | data |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version f8bda68. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: analysis/figure/
Ignored: data/.DS_Store
Ignored: data/DEG_results/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/americana/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/cancellata/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/cubense/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/davidO/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/gregaria/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/nitens/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/piceifrons/.DS_Store
Ignored: data/DEG_results/RNAi/.DS_Store
Ignored: data/DEG_results/RNAi/All/.DS_Store
Ignored: data/DEG_results/RNAi/All_control/.DS_Store
Ignored: data/DEG_results/RNAi/All_no_rRNA/.DS_Store
Ignored: data/DEG_results/RNAi/Head/.DS_Store
Ignored: data/DEG_results/RNAi/Head_control/.DS_Store
Ignored: data/DEG_results/RNAi/Head_no_rRNA/.DS_Store
Ignored: data/DEG_results/RNAi/Thorax/.DS_Store
Ignored: data/DEG_results/RNAi/Thorax_no_rRNA/.DS_Store
Ignored: data/DEG_results/gregaria/
Ignored: data/DEG_results/single_cell/.DS_Store
Ignored: data/WGCNA/.DS_Store
Ignored: data/WGCNA/input/.DS_Store
Ignored: data/WGCNA/input/Bulk_RNAseq/.DS_Store
Ignored: data/WGCNA/output/
Ignored: data/behavioral_data/.DS_Store
Ignored: data/behavioral_data/Raw_data/.DS_Store
Ignored: data/custom_sgregaria_orgdb/.DS_Store
Ignored: data/list/.DS_Store
Ignored: data/list/Bulk_RNAseq/.DS_Store
Ignored: data/list/GO_Annotations/.DS_Store
Ignored: data/orthofinder/.DS_Store
Ignored: data/orthofinder/Polyneoptera/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I2/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I2/Orthogroups/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I5/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I5/Orthogroups/.DS_Store
Ignored: data/orthofinder/Schistocerca/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I2/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I2/Orthogroups/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I5/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I5/Orthogroups/.DS_Store
Ignored: data/overlap/.DS_Store
Ignored: data/overlap/Bulk_RNAseq/.DS_Store
Ignored: data/overlap/Bulk_RNAseq/cancellata/
Ignored: data/readcounts/.DS_Store
Ignored: data/readcounts/Bulk_RNAseq/.DS_Store
Ignored: data/readcounts/RNAi/.DS_Store
Untracked files:
Untracked: data/DEG_results/Bulk_RNAseq/GO10_enrichment_Thorax_gregaria_custom.csv
Untracked: data/DEG_results/Bulk_RNAseq/GO10_enrichment_Thorax_piceifrons_custom.csv
Untracked: data/DEG_results/RNAi/All_GFP/
Untracked: data/DEG_results/RNAi/All_GFP_no_rRNA/
Untracked: data/DEG_results/RNAi/Head_GFP/
Untracked: data/DEG_results/RNAi/Head_GFP_no_rRNA/
Untracked: data/DEG_results/RNAi/Thorax_GFP/
Untracked: data/DEG_results/RNAi/Thorax_GFP_no_rRNA/
Untracked: data/RefSeq/
Untracked: data/list/RNAi/All_RNAi_GFPCRDsample_list.csv
Untracked: data/list/RNAi/Head_RNAi_GFPCRDsample_list.csv
Untracked: data/list/RNAi/Thorax_RNAi_GFPCRDsample_list.csv
Unstaged changes:
Modified: analysis/3_compiling_tables.Rmd
Modified: analysis/3_go-enrichment.Rmd
Modified: analysis/3_overlap-venn.Rmd
Modified: analysis/4_RNAi_degs.Rmd
Modified: analysis/_site.yml
Modified: data/DEG_results/RNAi/All_control/GFP_vs_CONTROL/DEG_sigresults_GFP_vs_CONTROL.csv
Modified: data/DEG_results/RNAi/All_control_no_rRNA/GFP_vs_CONTROL/DEG_sigresults_GFP_vs_CONTROL.csv
Modified: data/DEG_results/RNAi/Head/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_control/GFP_vs_CONTROL/DEG_sigresults_GFP_vs_CONTROL.csv
Modified: data/DEG_results/RNAi/Head_control_no_rRNA/GFP_vs_CONTROL/DEG_sigresults_GFP_vs_CONTROL.csv
Modified: data/DEG_results/RNAi/Head_no_rRNA/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/PCA_vst_Gene_labelled.png
Modified: data/DEG_results/RNAi/Head_no_rRNA/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV1.png
Modified: data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV2.png
Modified: data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV3.png
Modified: data/DEG_results/RNAi/Thorax_control/GFP_vs_CONTROL/DEG_sigresults_GFP_vs_CONTROL.csv
Modified: data/DEG_results/RNAi/Thorax_control_no_rRNA/GFP_vs_CONTROL/DEG_sigresults_GFP_vs_CONTROL.csv
Deleted: data/list/RNAi/Head_RNAi_noninjectedsample_list2.csv
Modified: data/orthofinder/Schistocerca/Results_I2/Orthogroups_genesproteinbiotype_Schistocerca_Jan2025.csv
Modified: data/overlap/summaryPolyneoptera_DEGs_Orthogroups_Feb2025.csv
Modified: data/overlap/summaryPolyneoptera_DEGs_Orthogroups_togregaria_Feb2025.csv
Modified: data/overlap/summarySchistocerca_DEGs_Orthogroups_Feb2025.csv
Modified: data/overlap/summarySchistocerca_DEGs_Orthogroups_togregaria_Feb2025.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/3_go-enrichment.Rmd) and
HTML (docs/3_go-enrichment.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 89984c0 | Maeva TECHER | 2025-02-19 | Add overlap update |
| Rmd | 3746422 | Maeva TECHER | 2025-02-12 | Add RNAi |
| html | 3746422 | Maeva TECHER | 2025-02-12 | Add RNAi |
| Rmd | 1fddc47 | Maeva TECHER | 2025-02-03 | Go enrichment |
| html | 1fddc47 | Maeva TECHER | 2025-02-03 | Go enrichment |
| Rmd | faf2db3 | Maeva TECHER | 2025-01-13 | update markdown |
| Rmd | 616f6d6 | Maeva TECHER | 2025-01-07 | remove old files |
| html | 616f6d6 | Maeva TECHER | 2025-01-07 | remove old files |
| Rmd | 0f0ac1f | Maeva TECHER | 2024-11-19 | update deseq2 |
| html | 0f0ac1f | Maeva TECHER | 2024-11-19 | update deseq2 |
| Rmd | fe6dae9 | Maeva TECHER | 2024-11-19 | changes ESA |
| html | fe6dae9 | Maeva TECHER | 2024-11-19 | changes ESA |
| Rmd | 3fa8e62 | Maeva TECHER | 2024-11-09 | updated analysis |
| html | 3fa8e62 | Maeva TECHER | 2024-11-09 | updated analysis |
| Rmd | edb70fe | Maeva TECHER | 2024-11-08 | overlap and deg results created |
| html | edb70fe | Maeva TECHER | 2024-11-08 | overlap and deg results created |
| html | ba35b82 | Maeva A. TECHER | 2024-06-20 | Build site. |
| html | d605bd3 | Maeva A. TECHER | 2024-05-16 | Build site. |
| Rmd | 9f04a80 | Maeva A. TECHER | 2024-05-16 | wflow_publish("analysis/3_go-enrichment.Rmd") |
| html | d7b2c58 | Maeva A. TECHER | 2024-05-16 | Build site. |
| Rmd | f5a78da | Maeva A. TECHER | 2024-05-16 | wflow_publish("analysis/3_go-enrichment.Rmd") |
| html | a32a56d | Maeva A. TECHER | 2024-05-15 | Build site. |
| Rmd | ebc0f04 | Maeva A. TECHER | 2024-05-15 | wflow_publish("analysis/3_go-enrichment.Rmd") |
Once we have shortlisted some genes of interest—whether they are obtained from top differentially expressed genes (DEGs), weighted gene co-expression network analysis (WGCNA) modules, or other comparative genomics analyses (e.g., signatures of selection, gene family expansion)—we want to determine if certain functions are enriched in our subset. For example, we hypothesize that although locusts have evolved similar traits, they may have diverged in their strategies to respond to the environment. Therefore, we expect to see DEGs involved in divergent biological processes, molecular function, and cellular components between S. gregaria, S. piceifrons and S. cancellata.
To test that, we need to look for Gene Ontology (GO) terms that can provide us a bird’s-eye of the related functions associated with our genes of interests.
To create the GO association file with each of our genome, we are
using the paid version of OmicsBox with the integrated
workflow Blast2Go. We details below our step-by-step with
one Schistocerca genome, but followed the same process for all
six RefSeq.


We choose the More Sensitive mode of blastx from the
Diamond Blast mode which allows to align large lists of nucelotide or
protein sequences against up-to-date public sequence collections.
Diamond Blast has a very similar accuracy compared to the NCBI Blast
with a much higher throughput. All our association files were run
against the Database (NR (2024-07-11)).

library(topGO)
library(dplyr)
library(ggplot2)
library(tidyr)
library(tibble)
# Define working directory and species
workDir <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data"
species <- "gregaria"
# Step 1: Load DESeq2 results for the species
deg_file <- file.path(workDir, "DEG_results/Bulk_RNAseq/", paste0(species, "/Head/DESeq2_sigresults_Head_", species ,".csv"))
deg_data <- read.csv(deg_file, stringsAsFactors = FALSE)
names(deg_data)[names(deg_data) == "X"] <- "GeneID"
# Separate DEGs into upregulated and downregulated
upregulated_genes <- subset(deg_data, padj < 0.05 & log2FoldChange > 1)$GeneID
downregulated_genes <- subset(deg_data, padj < 0.05 & log2FoldChange < -1)$GeneID
# Load the custom annotation file
custom_annot_file <- file.path(workDir, "list/GO_Annotations", paste0("blast2go_", species, "_custom.txt"))
custom_annot_df <- read.table(custom_annot_file, sep = "\t", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)
# Prepare gene-to-GO mapping for topGO
colnames(custom_annot_df) <- c("GeneID", "Description", "GO_Extended")
#custom_annot_df <- custom_annot_df %>%
# separate(GO_Extended, into = c("Category", "GO_ID", "GO_Term"), sep = " ", extra = "merge") %>%
# mutate(Category = substr(Category, 1, 1))
library(data.table)
# Convert to data.table (if not already)
setDT(custom_annot_df)
# Split `GO_Extended` column efficiently
custom_annot_df[, c("Category", "GO_ID", "GO_Term") := tstrsplit(GO_Extended, " ", fixed = TRUE, keep = 1:3)]
# Extract first letter of `Category`
custom_annot_df[, Category := substr(Category, 1, 1)]
gene2GO <- custom_annot_df %>%
group_by(GeneID) %>%
summarize(GOterms = list(unique(GO_ID))) %>%
deframe()
# Function to run topGO analysis by ontology
run_topGO <- function(ontology, gene_set, gene2GO) {
all_genes <- factor(as.integer(names(gene2GO) %in% gene_set), levels = c(0, 1))
names(all_genes) <- names(gene2GO)
GOdata <- new("topGOdata", ontology = ontology, allGenes = all_genes, annot = annFUN.gene2GO, gene2GO = gene2GO)
resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
GenTable(GOdata, classicFisher = resultFisher, orderBy = "classicFisher", topNodes = 10)
}
# Run topGO for each ontology category and regulation type
allRes_up_BP <- run_topGO("BP", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "BP")
allRes_up_MF <- run_topGO("MF", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "MF")
allRes_up_CC <- run_topGO("CC", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "CC")
allRes_down_BP <- run_topGO("BP", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "BP")
allRes_down_MF <- run_topGO("MF", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "MF")
allRes_down_CC <- run_topGO("CC", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "CC")
# Combine all results with ontology labels
allRes <- bind_rows(
allRes_up_BP, allRes_up_MF, allRes_up_CC,
allRes_down_BP, allRes_down_MF, allRes_down_CC
)
# Check if ontology is retained
head(allRes)
GO.ID Term Annotated Significant
1 GO:0019310 inositol catabolic process 6 4
2 GO:0046164 alcohol catabolic process 7 4
3 GO:0046174 polyol catabolic process 7 4
4 GO:1901616 organic hydroxy compound catabolic proce... 7 4
5 GO:0006020 inositol metabolic process 8 4
6 GO:0055085 transmembrane transport 649 26
Expected classicFisher Regulation ontology
1 0.11 1.3e-06 Upregulated BP
2 0.12 3.0e-06 Upregulated BP
3 0.12 3.0e-06 Upregulated BP
4 0.12 3.0e-06 Upregulated BP
5 0.14 5.9e-06 Upregulated BP
6 11.37 2.9e-05 Upregulated BP
# Visualization with ggplot2
allRes$classicFisher <- as.numeric(as.character(allRes$classicFisher))
allRes$FoldEnrichment <- allRes$Significant / allRes$Expected
# Plot with ggplot2 using facet_wrap by ontology
ggplot(allRes, aes(x = reorder(Term, -log10(classicFisher)), y = -log10(classicFisher), size = Significant, color = FoldEnrichment)) +
geom_point() +
facet_wrap(~ ontology, scales = "free_y") +
coord_flip() +
labs(
x = "GO Term",
y = "-log10(p-value)",
size = "Gene Count",
color = "Fold Enrichment",
title = "Top 10 Enriched GO Terms by Ontology",
subtitle = "Head transcriptomes S. gregaria"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, face = "bold.italic"),
axis.text.y = element_text(size = 8, face = "bold.italic", color = "black")
) +
scale_size_continuous(range = c(3, 10)) +
scale_color_viridis_c(option = "D")

# Plotting code up and downregulated
ggplot(allRes, aes(x = reorder(Term, -log10(classicFisher)), y = -log10(classicFisher), size = Significant, color = Regulation)) +
geom_point() +
facet_wrap(~ ontology, scales = "free_y") +
coord_flip() +
labs(
x = "GO Term",
y = "-log10(p-value)",
size = "Gene Count",
color = "Regulation",
title = "Top 10 Enriched GO Terms by Ontology",
subtitle = "Head transcriptomes S. gregaria"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, face = "italic"),
axis.text.y = element_text(size = 8, face = "bold", color = "black")
) +
scale_size_continuous(range = c(3, 10)) +
scale_color_manual(values = c("Upregulated" = "red", "Downregulated" = "blue"))

# Bar Plot for top GO terms per ontology
ggplot(allRes, aes(x = FoldEnrichment, y = reorder(Term, FoldEnrichment), color = -log10(classicFisher), size = Significant)) +
geom_point() +
facet_wrap(~ ontology, scales = "free_y") +
labs(
x = "Fold Enrichment",
y = "GO Term",
color = "-log10(p-value)",
size = "Gene Count",
title = "GO Term Enrichment: Fold Enrichment vs p-value"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, face = "bold.italic"),
axis.text.y = element_text(size = 8, face = "bold", color = "black")
) +
scale_fill_viridis_c(option = "C")

library(pheatmap)
# Aggregate data to ensure unique combinations of Term and ontology
# Keep the row with the smallest classicFisher value for each Term and ontology pair
heatmap_data <- allRes %>%
dplyr::group_by(Term, ontology) %>%
dplyr::slice_min(order_by = classicFisher, n = 1) %>%
dplyr::ungroup() %>%
dplyr::select(Term, ontology, classicFisher, Regulation) %>%
tidyr::spread(ontology, classicFisher) %>%
tibble::column_to_rownames("Term")
# Verify heatmap data
str(heatmap_data)
'data.frame': 56 obs. of 4 variables:
$ Regulation: chr "Upregulated" "Downregulated" "Downregulated" "Downregulated" ...
$ BP : num 3.0e-06 3.5e-04 1.9e-04 4.1e-04 NA 1.9e-04 4.1e-04 NA NA NA ...
$ CC : num NA NA NA NA NA NA NA NA 0.167 0.00242 ...
$ MF : num NA NA NA NA 3.1e-04 NA NA 1.9e-06 NA NA ...
# Ensure Regulation exists
if (!"Regulation" %in% colnames(heatmap_data)) {
stop("Error: 'Regulation' column is missing from heatmap_data!")
}
# Create annotation data frame
annotation <- data.frame(Regulation = heatmap_data$Regulation)
rownames(annotation) <- rownames(heatmap_data)
# Ensure no NA values
annotation <- na.omit(annotation)
# Define annotation colors
annotation_colors <- list(
Regulation = c("Upregulated" = "red", "Downregulated" = "blue")
)
# Ensure annotation_colors matches annotation values
if (!all(unique(annotation$Regulation) %in% names(annotation_colors$Regulation))) {
stop("Error: annotation_colors does not match all values in annotation$Regulation")
}
# Create heatmap matrix
heatmap_matrix <- as.matrix(-log10(heatmap_data %>%
dplyr::select(-Regulation)))
# Replace NA values
heatmap_matrix[is.na(heatmap_matrix)] <- -log10(1)
# Plot heatmap
pheatmap(
heatmap_matrix,
cluster_rows = TRUE,
cluster_cols = FALSE,
color = colorRampPalette(c("white", "darkblue"))(50),
annotation_row = annotation,
main = "GO Term Enrichment Heatmap by Ontology",
annotation_colors = annotation_colors
)

# Define working directory and species list
workDir <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data"
species_list <- c("gregaria", "piceifrons", "cancellata", "americana", "cubense", "nitens")
# Function to run the GO analysis for a given species
run_GO_analysis_for_species <- function(species) {
# Load DESeq2 results for the species
deg_file <- file.path(workDir, "DEG_results/Bulk_RNAseq/", paste0(species, "/Head/DESeq2_results_Head_", species ,".csv"))
deg_data <- read.csv(deg_file, stringsAsFactors = FALSE)
names(deg_data)[names(deg_data) == "X"] <- "GeneID"
# Separate DEGs into upregulated and downregulated
upregulated_genes <- subset(deg_data, padj < 0.05 & log2FoldChange > 1)$GeneID
downregulated_genes <- subset(deg_data, padj < 0.05 & log2FoldChange < -1)$GeneID
# Load the custom annotation file
custom_annot_file <- file.path(workDir, "list/GO_Annotations", paste0("blast2go_", species, "_custom.txt"))
custom_annot_df <- read.table(custom_annot_file, sep = "\t", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)
# Prepare gene-to-GO mapping for topGO
colnames(custom_annot_df) <- c("GeneID", "Description", "GO_Extended")
library(data.table)
# Convert to data.table (if not already)
setDT(custom_annot_df)
# Split `GO_Extended` column efficiently
custom_annot_df[, c("Category", "GO_ID", "GO_Term") := tstrsplit(GO_Extended, " ", fixed = TRUE, keep = 1:3)]
# Extract first letter of `Category`
custom_annot_df[, Category := substr(Category, 1, 1)]
gene2GO <- custom_annot_df %>%
group_by(GeneID) %>%
summarize(GOterms = list(unique(GO_ID))) %>%
deframe()
# Function to run topGO analysis by ontology
run_topGO <- function(ontology, gene_set, gene2GO) {
all_genes <- factor(as.integer(names(gene2GO) %in% gene_set), levels = c(0, 1))
names(all_genes) <- names(gene2GO)
GOdata <- new("topGOdata", ontology = ontology, allGenes = all_genes, annot = annFUN.gene2GO, gene2GO = gene2GO)
resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
GenTable(GOdata, classicFisher = resultFisher, orderBy = "classicFisher", topNodes = 10)
}
# Run topGO for each ontology category and regulation type
allRes_up_BP <- run_topGO("BP", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "BP")
allRes_up_MF <- run_topGO("MF", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "MF")
allRes_up_CC <- run_topGO("CC", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "CC")
allRes_down_BP <- run_topGO("BP", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "BP")
allRes_down_MF <- run_topGO("MF", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "MF")
allRes_down_CC <- run_topGO("CC", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "CC")
# Combine all results with ontology labels
allRes <- bind_rows(
allRes_up_BP, allRes_up_MF, allRes_up_CC,
allRes_down_BP, allRes_down_MF, allRes_down_CC
)
# Calculate FoldEnrichment and convert p-values
allRes$classicFisher <- as.numeric(as.character(allRes$classicFisher))
allRes$FoldEnrichment <- allRes$Significant / allRes$Expected
# Export results for this species
output_file <- file.path(workDir, "DEG_results/Bulk_RNAseq", paste0("GO10_enrichment_Head_", species, "_custom.csv"))
write.csv(allRes, output_file, row.names = FALSE)
return(allRes)
}
# Name each element in species_list
names(species_list) <- species_list
# Run the analysis for each species
results_list <- lapply(species_list, run_GO_analysis_for_species)
# Combine all results into a single table if desired
combined_results <- bind_rows(results_list, .id = "Species")
# Define working directory and species list
workDir <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data"
species_list <- c("gregaria", "piceifrons", "cancellata", "americana", "cubense", "nitens")
# Function to run the GO analysis for a given species
run_GO_analysis_for_species <- function(species) {
# Load DESeq2 results for the species
deg_file <- file.path(workDir, "DEG_results/Bulk_RNAseq/", paste0(species, "/Thorax/DESeq2_results_Thorax_", species ,".csv"))
deg_data <- read.csv(deg_file, stringsAsFactors = FALSE)
names(deg_data)[names(deg_data) == "X"] <- "GeneID"
# Separate DEGs into upregulated and downregulated
upregulated_genes <- subset(deg_data, padj < 0.05 & log2FoldChange > 1)$GeneID
downregulated_genes <- subset(deg_data, padj < 0.05 & log2FoldChange < -1)$GeneID
# Load the custom annotation file
custom_annot_file <- file.path(workDir, "list/GO_Annotations", paste0("blast2go_", species, "_custom.txt"))
custom_annot_df <- read.table(custom_annot_file, sep = "\t", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)
# Prepare gene-to-GO mapping for topGO
colnames(custom_annot_df) <- c("GeneID", "Description", "GO_Extended")
library(data.table)
# Convert to data.table (if not already)
setDT(custom_annot_df)
# Split `GO_Extended` column efficiently
custom_annot_df[, c("Category", "GO_ID", "GO_Term") := tstrsplit(GO_Extended, " ", fixed = TRUE, keep = 1:3)]
# Extract first letter of `Category`
custom_annot_df[, Category := substr(Category, 1, 1)]
gene2GO <- custom_annot_df %>%
group_by(GeneID) %>%
summarize(GOterms = list(unique(GO_ID))) %>%
deframe()
# Function to run topGO analysis by ontology
run_topGO <- function(ontology, gene_set, gene2GO) {
all_genes <- factor(as.integer(names(gene2GO) %in% gene_set), levels = c(0, 1))
names(all_genes) <- names(gene2GO)
GOdata <- new("topGOdata", ontology = ontology, allGenes = all_genes, annot = annFUN.gene2GO, gene2GO = gene2GO)
resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
GenTable(GOdata, classicFisher = resultFisher, orderBy = "classicFisher", topNodes = 10)
}
# Run topGO for each ontology category and regulation type
allRes_up_BP <- run_topGO("BP", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "BP")
allRes_up_MF <- run_topGO("MF", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "MF")
allRes_up_CC <- run_topGO("CC", upregulated_genes, gene2GO) %>% mutate(Regulation = "Upregulated", ontology = "CC")
allRes_down_BP <- run_topGO("BP", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "BP")
allRes_down_MF <- run_topGO("MF", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "MF")
allRes_down_CC <- run_topGO("CC", downregulated_genes, gene2GO) %>% mutate(Regulation = "Downregulated", ontology = "CC")
# Combine all results with ontology labels
allRes <- bind_rows(
allRes_up_BP, allRes_up_MF, allRes_up_CC,
allRes_down_BP, allRes_down_MF, allRes_down_CC
)
# Calculate FoldEnrichment and convert p-values
allRes$classicFisher <- as.numeric(as.character(allRes$classicFisher))
allRes$FoldEnrichment <- allRes$Significant / allRes$Expected
# Export results for this species
output_file <- file.path(workDir, "DEG_results/Bulk_RNAseq", paste0("GO10_enrichment_Thorax_", species, "_custom.csv"))
write.csv(allRes, output_file, row.names = FALSE)
return(allRes)
}
# Name each element in species_list
names(species_list) <- species_list
# Run the analysis for each species
results_list <- lapply(species_list, run_GO_analysis_for_species)
# Combine all results into a single table if desired
combined_results <- bind_rows(results_list, .id = "Species")
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Tokyo
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] pheatmap_1.0.12 data.table_1.17.0 tibble_3.2.1
[4] tidyr_1.3.1 ggplot2_3.5.1 dplyr_1.1.4
[7] topGO_2.58.0 SparseM_1.84-2 GO.db_3.20.0
[10] AnnotationDbi_1.68.0 IRanges_2.40.1 S4Vectors_0.44.0
[13] Biobase_2.66.0 graph_1.84.1 BiocGenerics_0.52.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.46.0 gtable_0.3.6 xfun_0.51
[4] bslib_0.9.0 lattice_0.22-6 vctrs_0.6.5
[7] tools_4.4.2 generics_0.1.3 RSQLite_2.3.9
[10] blob_1.2.4 pkgconfig_2.0.3 RColorBrewer_1.1-3
[13] lifecycle_1.0.4 GenomeInfoDbData_1.2.13 farver_2.1.2
[16] compiler_4.4.2 stringr_1.5.1 git2r_0.35.0
[19] Biostrings_2.74.1 munsell_0.5.1 httpuv_1.6.15
[22] GenomeInfoDb_1.42.3 htmltools_0.5.8.1 sass_0.4.9
[25] yaml_2.3.10 later_1.4.1 pillar_1.10.1
[28] crayon_1.5.3 jquerylib_0.1.4 whisker_0.4.1
[31] cachem_1.1.0 tidyselect_1.2.1 digest_0.6.37
[34] stringi_1.8.4 purrr_1.0.4 labeling_0.4.3
[37] rprojroot_2.0.4 fastmap_1.2.0 grid_4.4.2
[40] colorspace_2.1-1 cli_3.6.4 magrittr_2.0.3
[43] withr_3.0.2 scales_1.3.0 UCSC.utils_1.2.0
[46] promises_1.3.2 bit64_4.6.0-1 rmarkdown_2.29
[49] XVector_0.46.0 httr_1.4.7 matrixStats_1.5.0
[52] bit_4.5.0.1 workflowr_1.7.1 png_0.1-8
[55] memoise_2.0.1 evaluate_1.0.3 knitr_1.49
[58] viridisLite_0.4.2 rlang_1.1.5 Rcpp_1.0.14
[61] glue_1.8.0 DBI_1.2.3 rstudioapi_0.17.1
[64] jsonlite_1.9.0 R6_2.6.1 fs_1.6.5
[67] zlibbioc_1.52.0