Last updated: 2024-11-13

Checks: 5 2

Knit directory: locust-comparative-genomics/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20221025) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/Users/maevatecher/Library/Mobile Documents/comappleCloudDocs/Documents/GitHub/locust-comparative-genomics/data data
/Users/maevatecher/Library/Mobile Documents/comappleCloudDocs/Documents/GitHub/locust-comparative-genomics/data/orthofinder/ data/orthofinder

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f78318e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    data/.DS_Store
    Ignored:    data/.Rhistory
    Ignored:    data/DEG-results/.DS_Store
    Ignored:    data/OLD/.DS_Store
    Ignored:    data/OLD/DEseq2_SCUBE_SCUBE_THORAX_STARnew_features/.DS_Store
    Ignored:    data/OLD/DEseq2_SGREG_SGREG_HEAD_STARnew_features/.DS_Store
    Ignored:    data/OLD/DEseq2_SGREG_SGREG_THORAX_STARnew_features/.DS_Store
    Ignored:    data/OLD/americana/.DS_Store
    Ignored:    data/OLD/americana/deg_counts/.DS_Store
    Ignored:    data/OLD/americana/deg_counts/STAR_newparams/.DS_Store
    Ignored:    data/OLD/cubense/deg_counts/STAR/cubense/featurecounts/
    Ignored:    data/OLD/cubense/deg_counts/STAR/gregaria/
    Ignored:    data/OLD/gregaria/.DS_Store
    Ignored:    data/OLD/gregaria/deg_counts/.DS_Store
    Ignored:    data/OLD/gregaria/deg_counts/STAR/.DS_Store
    Ignored:    data/OLD/gregaria/deg_counts/STAR/gregaria/.DS_Store
    Ignored:    data/OLD/gregaria/deg_counts/STAR_newparams/.DS_Store
    Ignored:    data/OLD/piceifrons/.DS_Store
    Ignored:    data/list/.DS_Store
    Ignored:    data/list/GO_Annotations/.DS_Store
    Ignored:    data/orthofinder/.DS_Store
    Ignored:    data/orthofinder/Orthogroups/.DS_Store
    Ignored:    figures/
    Ignored:    tables/

Untracked files:
    Untracked:  VennDiagram.2024-11-13_14-30-50.011991.log
    Untracked:  VennDiagram.2024-11-13_14-30-50.974102.log
    Untracked:  VennDiagram.2024-11-13_14-30-51.617989.log
    Untracked:  VennDiagram.2024-11-13_14-30-52.416183.log
    Untracked:  VennDiagram.2024-11-13_14-30-53.184871.log
    Untracked:  VennDiagram.2024-11-13_14-30-53.775563.log
    Untracked:  VennDiagram.2024-11-13_14-30-54.457887.log
    Untracked:  VennDiagram.2024-11-13_14-30-54.555096.log
    Untracked:  VennDiagram.2024-11-13_14-30-54.658232.log
    Untracked:  VennDiagram.2024-11-13_14-30-55.885554.log
    Untracked:  VennDiagram.2024-11-13_14-30-55.950948.log
    Untracked:  VennDiagram.2024-11-13_14-30-56.065268.log
    Untracked:  VennDiagram.2024-11-13_14-30-57.224978.log
    Untracked:  VennDiagram.2024-11-13_14-30-57.272004.log
    Untracked:  VennDiagram.2024-11-13_14-30-57.333934.log
    Untracked:  VennDiagram.2024-11-13_14-30-58.284478.log
    Untracked:  VennDiagram.2024-11-13_14-30-58.328494.log
    Untracked:  VennDiagram.2024-11-13_14-30-58.389913.log
    Untracked:  VennDiagram.2024-11-13_14-30-59.414708.log
    Untracked:  VennDiagram.2024-11-13_14-30-59.536009.log
    Untracked:  VennDiagram.2024-11-13_14-30-59.652241.log
    Untracked:  VennDiagram.2024-11-13_14-31-00.86269.log
    Untracked:  VennDiagram.2024-11-13_14-31-01.01457.log
    Untracked:  VennDiagram.2024-11-13_14-31-01.135563.log
    Untracked:  VennDiagram.2024-11-13_14-31-02.461225.log
    Untracked:  VennDiagram.2024-11-13_14-31-02.633344.log
    Untracked:  VennDiagram.2024-11-13_14-31-02.918113.log
    Untracked:  VennDiagram.2024-11-13_14-31-03.26959.log
    Untracked:  VennDiagram.2024-11-13_14-31-03.389265.log
    Untracked:  VennDiagram.2024-11-13_14-31-03.503396.log
    Untracked:  VennDiagram.2024-11-13_14-31-08.227974.log
    Untracked:  VennDiagram.2024-11-13_14-31-08.914107.log
    Untracked:  VennDiagram.2024-11-13_14-31-09.599848.log
    Untracked:  VennDiagram.2024-11-13_14-31-10.306526.log
    Untracked:  VennDiagram.2024-11-13_14-31-11.018901.log
    Untracked:  VennDiagram.2024-11-13_14-31-11.81221.log
    Untracked:  VennDiagram.2024-11-13_14-31-12.932698.log
    Untracked:  VennDiagram.2024-11-13_14-31-12.987567.log
    Untracked:  VennDiagram.2024-11-13_14-31-13.132029.log
    Untracked:  VennDiagram.2024-11-13_14-31-14.726798.log
    Untracked:  VennDiagram.2024-11-13_14-31-14.786706.log
    Untracked:  VennDiagram.2024-11-13_14-31-14.856016.log
    Untracked:  VennDiagram.2024-11-13_14-31-16.229469.log
    Untracked:  VennDiagram.2024-11-13_14-31-16.306154.log
    Untracked:  VennDiagram.2024-11-13_14-31-16.363059.log
    Untracked:  VennDiagram.2024-11-13_14-31-17.683699.log
    Untracked:  VennDiagram.2024-11-13_14-31-17.718708.log
    Untracked:  VennDiagram.2024-11-13_14-31-17.806893.log
    Untracked:  VennDiagram.2024-11-13_14-31-19.398628.log
    Untracked:  VennDiagram.2024-11-13_14-31-19.615697.log
    Untracked:  VennDiagram.2024-11-13_14-31-19.747907.log
    Untracked:  VennDiagram.2024-11-13_14-31-21.331558.log
    Untracked:  VennDiagram.2024-11-13_14-31-21.44838.log
    Untracked:  VennDiagram.2024-11-13_14-31-21.711258.log
    Untracked:  data/03-gregaria-vivian/
    Untracked:  data/DEG-results/DESeq2_results_AGY_gregaria.csv
    Untracked:  data/DEG-results/DESeq2_results_AGY_nitens.csv
    Untracked:  data/DEG-results/DESeq2_sigresults_AGY_gregaria.csv
    Untracked:  data/DEG-results/DESeq2_sigresults_AGY_nitens.csv
    Untracked:  data/DEG-results/DESeq2_sigresults_AGY_togregaria_nitens.csv
    Untracked:  data/DEG-results/GO10_enrichment_Head_americana_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Head_cancellata_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Head_cubense_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Head_gregaria_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Head_nitens_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Head_piceifrons_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Thorax_americana_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Thorax_cancellata_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Thorax_cubense_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Thorax_gregaria_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Thorax_nitens_custom.csv
    Untracked:  data/DEG-results/GO10_enrichment_Thorax_piceifrons_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Head_americana_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Head_cancellata_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Head_cubense_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Head_gregaria_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Head_nitens_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Head_piceifrons_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Thorax_americana_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Thorax_cancellata_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Thorax_cubense_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Thorax_gregaria_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Thorax_nitens_custom.csv
    Untracked:  data/DEG-results/GO30_enrichment_Thorax_piceifrons_custom.csv
    Untracked:  data/DEG-results/GO_enrichment_head_americana_custom.csv
    Untracked:  data/DEG-results/GO_enrichment_head_cancellata_custom.csv
    Untracked:  data/DEG-results/GO_enrichment_head_cubense_custom.csv
    Untracked:  data/DEG-results/GO_enrichment_head_gregaria_custom.csv
    Untracked:  data/DEG-results/GO_enrichment_head_gregaria_custom_top10.csv
    Untracked:  data/DEG-results/GO_enrichment_head_nitens_custom.csv
    Untracked:  data/DEG-results/GO_enrichment_head_piceifrons_custom.csv
    Untracked:  data/OLD/orthologs/Orthogroups_reprocessed.txt
    Untracked:  data/list/AGY_vivian copy.txt
    Untracked:  data/list/AGY_vivian.txt
    Untracked:  data/list/AGY_vivian_extremes.txt
    Untracked:  data/list/AGY_vivian_extremes_nooutliers.txt
    Untracked:  data/list/AGY_vivian_gregarious.txt
    Untracked:  data/list/AGY_vivian_solitary.txt
    Untracked:  data/list/AGY_vivian_solitarysolitary.txt
    Untracked:  data/list/GO_Annotations/blast2go_americana_custom.txt
    Untracked:  data/list/GO_Annotations/blast2go_cancellata_custom.txt
    Untracked:  data/list/GO_Annotations/blast2go_cubense_custom.txt
    Untracked:  data/list/GO_Annotations/blast2go_nitens_custom.txt
    Untracked:  data/list/GO_Annotations/blast2go_piceifrons_custom.txt
    Untracked:  data/orthofinder/Orthogroups/Orthogroups_reprocessed.tsv
    Untracked:  data/orthofinder/Orthogroups/Orthogroups_reprocessed.txt

Unstaged changes:
    Modified:   analysis/3_deseq2-results.Rmd
    Modified:   analysis/3_go-enrichment.Rmd
    Modified:   analysis/3_overlap-venn.Rmd
    Modified:   analysis/3_wgcna-network.Rmd
    Deleted:    analysis/VennDiagram.2024-11-07_21-35-20.971724.log
    Deleted:    analysis/VennDiagram.2024-11-07_21-35-21.068495.log
    Deleted:    analysis/VennDiagram.2024-11-07_21-35-21.149934.log
    Deleted:    analysis/VennDiagram.2024-11-07_22-00-17.392767.log
    Deleted:    analysis/VennDiagram.2024-11-07_23-31-10.009739.log
    Deleted:    analysis/VennDiagram.2024-11-07_23-32-47.393962.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-23.127328.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-23.373311.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-23.57838.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-34.86787.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-34.972065.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-35.097888.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-51.785126.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-51.891257.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-44-52.02239.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-45-02.189987.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-45-02.293896.log
    Deleted:    analysis/VennDiagram.2024-11-08_00-45-02.42197.log
    Deleted:    analysis/VennDiagram.2024-11-08_16-23-06.254856.log
    Deleted:    analysis/VennDiagram.2024-11-08_16-24-43.252158.log
    Deleted:    analysis/VennDiagram.2024-11-08_16-24-43.67364.log
    Deleted:    analysis/VennDiagram.2024-11-08_16-24-43.911885.log
    Deleted:    analysis/seqdata-qc.Rmd
    Modified:   data/DEG-results/overlapping_genes_head_thorax_americana.csv
    Modified:   data/DEG-results/overlapping_genes_head_thorax_cancellata.csv
    Modified:   data/DEG-results/overlapping_genes_head_thorax_cubense.csv
    Modified:   data/DEG-results/overlapping_genes_head_thorax_piceifrons.csv
    Modified:   data/DEG-results/scatter_plot_overlapping_genes_americana.png
    Modified:   data/DEG-results/scatter_plot_overlapping_genes_cancellata.png
    Modified:   data/DEG-results/scatter_plot_overlapping_genes_cubense.png
    Modified:   data/DEG-results/scatter_plot_overlapping_genes_gregaria.png
    Modified:   data/DEG-results/scatter_plot_overlapping_genes_piceifrons.png
    Modified:   data/orthofinder/Orthogroups/Orthogroups.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/3_overlap-venn.Rmd) and HTML (docs/3_overlap-venn.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 3fa8e62 Maeva TECHER 2024-11-08 updated analysis
html 3fa8e62 Maeva TECHER 2024-11-08 updated analysis
Rmd edb70fe Maeva TECHER 2024-11-07 overlap and deg results created
html edb70fe Maeva TECHER 2024-11-07 overlap and deg results created
html ba35b82 Maeva A. TECHER 2024-06-19 Build site.
html 45d0b6b Maeva A. TECHER 2024-05-15 Build site.
Rmd 5dff93d Maeva A. TECHER 2024-05-15 wflow_publish("analysis/3_overlap-venn.Rmd")

Load libraries

We start by loading all the required R packages.

#(install first from CRAN or Bioconductor)
library(knitr)
library(dplyr) 
library(ggplot2)
library(plotly)
library(htmlwidgets)  # For saving interactive plots
library(ggVennDiagram)
library(pheatmap)
library(tidyr)
library(RColorBrewer)
library(viridis)
library(kableExtra)
library(tibble)
library(VennDiagram)
library(gridExtra)
library(grid)
library(DT)
library(readr)
library(tidyverse)

# Path for all species
workDir <- "/Users/maevatecher/Library/Mobile Documents/com~apple~CloudDocs/Documents/GitHub/locust-comparative-genomics/data"
ortho_dir <- "/Users/maevatecher/Library/Mobile Documents/com~apple~CloudDocs/Documents/GitHub/locust-comparative-genomics/data/orthofinder/"
allspecies_path <- file.path(workDir, "/list/allspecies_geneid.csv")
allspecies_df <- read.table(allspecies_path, sep = ",", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)
species_list <- c("gregaria", "piceifrons", "cancellata", "americana", "cubense", "nitens")
species_order <- c( "nitens", "cubense", "americana",  "piceifrons", "cancellata", "gregaria")

Here our objective is to compare the abundance, composition and overlap of the DEGs found in the head and thorax tissues of each species between the isolated and crowded last instar females. We found that the differential genes expressed detected by DESeq2 varied across species and tissues but we need some perspective: Are locusts up-regulated and down-regulated the same genes? In the later section GO enrichment, we will investigate what are the functions of these genes as we will see that each species seems to show different gene expression profiles in response to density changes.

STRATEGY 1: One genome S. gregaria

1. DEGs comparison among species

We summarized the number of genes differential expressed between density for each species and each tissues.

# Initialize an empty data frame to store results
summary_table_head <- data.frame(Species = character(),
                            Head_Upregulated = integer(),
                            Head_Upregulated_Strict = integer(),
                            Head_Downregulated = integer(),
                            Head_Downregulated_Strict = integer(),
                            stringsAsFactors = FALSE)

summary_table_thorax <- data.frame(Species = character(),
                            Thorax_Upregulated = integer(),
                            Thorax_Upregulated_Strict = integer(),
                            Thorax_Downregulated = integer(),
                            Thorax_Downregulated_Strict = integer(),
                            stringsAsFactors = FALSE)


# Loop through each species to process their data
for (species in species_list) {
    # Read the DESeq2 results and sigresults files
    head_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
    thorax_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))

    head_sigresults <- read.csv(head_sigresults_file, stringsAsFactors = FALSE)
    thorax_sigresults <- read.csv(thorax_sigresults_file, stringsAsFactors = FALSE)

    # Count upregulated and downregulated genes for head
    head_upregulated <- nrow(head_sigresults %>% filter(log2FoldChange > 0))
    head_downregulated <- nrow(head_sigresults %>% filter(log2FoldChange < -0))
    head_upregulated_strict <- nrow(head_sigresults %>% filter(log2FoldChange > 1))
    head_downregulated_strict <- nrow(head_sigresults %>% filter(log2FoldChange < -1))

    # Count upregulated and downregulated genes for thorax
    thorax_upregulated <- nrow(thorax_sigresults %>% filter(log2FoldChange > 0))
    thorax_downregulated <- nrow(thorax_sigresults %>% filter(log2FoldChange < -0))
    thorax_upregulated_strict <- nrow(thorax_sigresults %>% filter(log2FoldChange > 1))
    thorax_downregulated_strict <- nrow(thorax_sigresults %>% filter(log2FoldChange < -1))

    # Add to the summary table
    summary_table_head <- rbind(summary_table_head, data.frame("Species" = species,
                                                      "Head_Upregulated" = head_upregulated,
                                                      "Head_Downregulated" = head_downregulated,
                                                      "Head_Upregulated_Strict" = head_upregulated_strict,
                                                      "Head_Downregulated_Strict" = head_downregulated_strict,
                                                      stringsAsFactors = FALSE))
    
    summary_table_thorax <- rbind(summary_table_thorax, data.frame("Species" = species,
                                                      "Thorax_Upregulated" = thorax_upregulated,
                                                      "Thorax_Downregulated" = thorax_downregulated,
                                                      "Thorax_Upregulated_Strict" = thorax_upregulated_strict,
                                                      "Thorax_Downregulated_Strict" = thorax_downregulated_strict,
                                                      stringsAsFactors = FALSE))
}

# Print the summary table in a markdown-friendly format
knitr::kable(summary_table_head, format = "markdown", caption = "Summary of differentially expressed genes in head per species")
Summary of differentially expressed genes in head per species
Species Head_Upregulated Head_Downregulated Head_Upregulated_Strict Head_Downregulated_Strict
gregaria 2709 2988 814 662
piceifrons 375 375 194 191
cancellata 689 756 301 386
americana 703 487 311 256
cubense 30 31 30 31
nitens 189 259 104 207
# Convert the summary table to a long format for easier plotting
summary_long_head <- summary_table_head %>%
  select(Species, Head_Upregulated_Strict, Head_Downregulated_Strict) %>%
  pivot_longer(cols = -Species, 
               names_to = c(".value", "Tissue"), 
               names_sep = "_")


# Adjust the values for downregulated genes to be negative
summary_long_head <- summary_long_head %>%
  mutate(Head = ifelse(Tissue == "Downregulated", -Head, Head))

summary_long_head$Species <- factor(summary_long_head$Species, levels = species_order)
# Now create a barplot for head
ggplot(summary_long_head, aes(x = Species, y = Head, fill = Tissue)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Upregulated and Downregulated Genes in Head  (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  scale_fill_manual(values = c("Upregulated" = "red2", "Downregulated" = "blue")) +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200)) + # Set fixed y-axis limits
  theme_minimal(base_size = 12) + # Increase base font size for better readability
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1),  # Adjust text size and angle for x-axis
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"), 
        panel.grid.minor = element_blank()) +
  coord_flip()  

Version Author Date
3fa8e62 Maeva TECHER 2024-11-08
edb70fe Maeva TECHER 2024-11-07
knitr::kable(summary_table_thorax, format = "markdown", caption = "Summary of differentially expressed genes in thorax per species")
Summary of differentially expressed genes in thorax per species
Species Thorax_Upregulated Thorax_Downregulated Thorax_Upregulated_Strict Thorax_Downregulated_Strict
gregaria 2751 2691 622 1174
piceifrons 1517 1200 549 221
cancellata 686 648 289 303
americana 398 699 149 339
cubense 104 218 64 154
nitens 0 0 0 0
# Convert the summary table to a long format for easier plotting
summary_long_thorax <- summary_table_thorax %>%
  select(Species, Thorax_Upregulated_Strict, Thorax_Downregulated_Strict) %>%
  pivot_longer(cols = -Species, 
               names_to = c(".value", "Tissue"), 
               names_sep = "_")

# Adjust the values for downregulated genes to be negative
summary_long_thorax <- summary_long_thorax %>%
  mutate(Thorax = ifelse(Tissue == "Downregulated", -Thorax, Thorax))

summary_long_thorax$Species <- factor(summary_long_thorax$Species, levels = species_order)
# Now create a barplot for head
ggplot(summary_long_thorax, aes(x = Species, y = Thorax, fill = Tissue)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Upregulated and Downregulated Genes in Thorax (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  scale_fill_manual(values = c("Upregulated" = "red2", "Downregulated" = "blue")) +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200)) + # Set fixed y-axis limits
  theme_minimal(base_size = 12) + # Increase base font size for better readability
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1),  # Adjust text size and angle for x-axis
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"), 
        panel.grid.minor = element_blank()) +
  coord_flip()  

Version Author Date
3fa8e62 Maeva TECHER 2024-11-08
edb70fe Maeva TECHER 2024-11-07
# Define custom colors for each GeneType
custom_colors <- c(
  "transcribed_pseudogene" = "#F4F1BB",  # Example color for transcribed_pseudogene
  "protein-coding" = "#9B57D3",         # Example color for protein-coding
  "lncRNA" = "#A5300F",                 # Example color for lncRNA
  "tRNA" = "#74D055FF",                   # Example color for tRNA
  "misc_RNA" = "#3B6978",               # Example color for misc_RNA
  "ncRNA" = "#29AF7FFF",                  # Example color for ncRNA
  "pseudogene" = "#81B29A",             # Example color for pseudogene
  "rRNA" = "#5982DB",                   # Example color for rRNA
  "snoRNA" = "#DCE318FF",                 # Example color for snoRNA
  "snRNA" = "#665EB8"                   # Example color for snRNA
)

# Use scale_fill_manual to map the custom colors to the GeneTypes
custom_color_scale <- scale_fill_manual(
  values = custom_colors
)
# Create an empty list to store the data for all species
all_species_data <- list()

# Loop through each species to process their data
for (species in species_list) {
  # Read the DESeq2 results for head and thorax
  head_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
  thorax_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
  
  head_sigresults <- read.csv(head_sigresults_file, stringsAsFactors = FALSE)
  thorax_sigresults <- read.csv(thorax_sigresults_file, stringsAsFactors = FALSE)
  
  # Add GeneType and Species columns (from `allspecies_df`)
  head_sigresults_merged <- merge(head_sigresults, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
  thorax_sigresults_merged <- merge(thorax_sigresults, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
  
  # Count for upregulated and downregulated genes in head
  head_upregulated <- head_sigresults_merged %>%
    filter(log2FoldChange > 1) %>%
    mutate(Regulation = "Upregulated", Tissue = "Head", Count = 1)
  
  head_downregulated <- head_sigresults_merged %>%
    filter(log2FoldChange < -1) %>%
    mutate(Regulation = "Downregulated", Tissue = "Head", Count = -1)  # Mutate downregulated genes to negative
  
  # Combine upregulated and downregulated genes for head
  head_combined <- rbind(head_upregulated, head_downregulated)
  
  # Ensure all GeneTypes are represented for this species, even if they have no DEGs
  head_combined <- head_combined %>%
    complete(GeneType = unique(allspecies_df$GeneType), 
             fill = list(Count = 0))  # Fill missing GeneTypes with Count = 0
  
  # Count for upregulated and downregulated genes in thorax
  thorax_upregulated <- thorax_sigresults_merged %>%
    filter(log2FoldChange > 1) %>%
    mutate(Regulation = "Upregulated", Tissue = "Thorax", Count = 1)
  
  thorax_downregulated <- thorax_sigresults_merged %>%
    filter(log2FoldChange < -1) %>%
    mutate(Regulation = "Downregulated", Tissue = "Thorax", Count = -1)  # Mutate downregulated genes to negative
  
  # Combine upregulated and downregulated genes for thorax
  thorax_combined <- rbind(thorax_upregulated, thorax_downregulated)
  
  # Ensure all GeneTypes are represented for this species in thorax, even if they have no DEGs
  thorax_combined <- thorax_combined %>%
    complete(GeneType = unique(allspecies_df$GeneType), 
             fill = list(Count = 0))  # Fill missing GeneTypes with Count = 0
  
  # Combine data for head and thorax into one
  combined_data <- rbind(head_combined, thorax_combined)
  
  # Add species column to the data
  combined_data$Species <- species
  
  # Append the data to the list for all species
  all_species_data[[species]] <- combined_data
}

# Combine all species data into one data frame
final_data <- bind_rows(all_species_data)

# Reorder species according to the desired order
final_data$Species <- factor(final_data$Species, levels = species_order)

# Filter for head tissue only
final_data_head <- final_data %>% filter(Tissue == "Head")
final_data_thorax <- final_data %>% filter(Tissue == "Thorax")

# Create the barplot for all species and only head tissue
ggplot(final_data_head, aes(x = Species, y = Count, fill = GeneType)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "DEGs by Gene Biotype for Head (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  custom_color_scale +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200))+
theme_minimal(base_size = 12) + 
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1), 
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"),
        panel.grid.minor = element_blank()) +
  coord_flip()  # Flip coordinates to make the plot horizontal

# Create the barplot for all species and only head tissue
ggplot(final_data_thorax, aes(x = Species, y = Count, fill = GeneType)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "DEGs by Gene Biotype for Thorax (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  custom_color_scale +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200))+
theme_minimal(base_size = 12) + 
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1), 
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"),
        panel.grid.minor = element_blank()) +
  coord_flip()  # Flip coordinates to make the plot horizontal

2. Overlap DEGs between tissues

gregaria


species <- "gregaria"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

piceifrons


species <- "piceifrons"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

cancellata


species <- "cancellata"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

americana


species <- "americana"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

cubense


species <- "cubense"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

nitens


species <- "nitens"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

3. Overlap DEGs among species

Locusts

Head tissues

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")

# Initialize an empty list to store DEG data
venn_data_locusts_up <- list()
venn_data_locusts_down <- list()
venn_data_locusts_all <- list()

# Function to load DEGs for a given group of species for head
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in locusts) {
    head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
    
    head_data <- read.csv(head_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(head_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    head_up <- head_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    head_down <- head_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- head_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- head_up$GeneID
    degs_down[[species]] <- head_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for head
venn_data_locusts <- load_deg_data(locusts)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_locusts$up[["gregaria"]],
  piceifrons = venn_data_locusts$up[["piceifrons"]],
  cancellata = venn_data_locusts$up[["cancellata"]]
)

venn_data_down <- list(
  gregaria = venn_data_locusts$down[["gregaria"]],
  piceifrons = venn_data_locusts$down[["piceifrons"]],
  cancellata = venn_data_locusts$down[["cancellata"]]
)

venn_data_all <- list(
  gregaria = venn_data_locusts$all[["gregaria"]],
  piceifrons = venn_data_locusts$all[["piceifrons"]],
  cancellata = venn_data_locusts$all[["cancellata"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("gregaria", "piceifrons", "cancellata"), 
    filename = NULL, 
    output = TRUE, 
    fill = c("orange", "red", "orchid"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata")
    legend_colors <- c("orange", "red", "orchid")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }  
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for head upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - Locusts", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - Locusts", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - Locusts", allspecies_df)

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in locusts) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
  
  # Load the data
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  head_up <- head_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- head_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Head_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Head values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Head), 
              values_fill = list(Head = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "cyan3",  "black", "orange3", "orange")
color_palette2 <- c("blue3", "blue",  "white", "red", "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

Thorax tissues

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")

# Initialize an empty list to store DEG data
venn_data_locusts_up <- list()
venn_data_locusts_down <- list()
venn_data_locusts_all <- list()

# Function to load DEGs for a given group of species for thorax
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in locusts) {
    thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))
    
    thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(thorax_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    thorax_up <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    thorax_down <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- thorax_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- thorax_up$GeneID
    degs_down[[species]] <- thorax_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for thorax
venn_data_locusts <- load_deg_data(locusts)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_locusts$up[["gregaria"]],
  piceifrons = venn_data_locusts$up[["piceifrons"]],
  cancellata = venn_data_locusts$up[["cancellata"]]
)

venn_data_down <- list(
  gregaria = venn_data_locusts$down[["gregaria"]],
  piceifrons = venn_data_locusts$down[["piceifrons"]],
  cancellata = venn_data_locusts$down[["cancellata"]]
)

venn_data_all <- list(
  gregaria = venn_data_locusts$all[["gregaria"]],
  piceifrons = venn_data_locusts$all[["piceifrons"]],
  cancellata = venn_data_locusts$all[["cancellata"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("gregaria", "piceifrons", "cancellata"), 
    filename = NULL, 
    output = TRUE, 
    fill = c("orange", "red", "orchid"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata")
    legend_colors <- c("orange", "red", "orchid")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }    
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for thorax upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - Locusts", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - Locusts", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - Locusts", allspecies_df)

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in locusts) {
  # Load DESeq2 results for thorax
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
  
  # Load the data
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  thorax_up <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Thorax_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Thorax values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Thorax), 
              values_fill = list(Thorax = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "cyan3",  "black", "orange3", "orange")
color_palette2 <- c("blue3", "blue",  "white", "red", "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

piceifrons-americana-cubense

Head tissues

PACclade <- c("piceifrons", "americana", "cubense")

# Initialize an empty list to store DEG data
venn_data_PACclade_up <- list()
venn_data_PACclade_down <- list()
venn_data_PACclade_all <- list()

# Function to load DEGs for a given group of species for head
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in PACclade) {
    head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
    
    head_data <- read.csv(head_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(head_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    head_up <- head_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    head_down <- head_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- head_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- head_up$GeneID
    degs_down[[species]] <- head_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for head
venn_data_PACclade <- load_deg_data(PACclade)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  piceifrons = venn_data_PACclade$up[["piceifrons"]],
  americana = venn_data_PACclade$up[["americana"]],
  cubense = venn_data_PACclade$up[["cubense"]]
)

venn_data_down <- list(
  piceifrons = venn_data_PACclade$down[["piceifrons"]],
  americana = venn_data_PACclade$down[["americana"]],
  cubense = venn_data_PACclade$down[["cubense"]]
)

venn_data_all <- list(
  piceifrons = venn_data_PACclade$all[["piceifrons"]],
  americana = venn_data_PACclade$all[["americana"]],
  cubense = venn_data_PACclade$all[["cubense"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("piceifrons", "americana", "cubense"), 
    filename = NULL, 
    output = TRUE, 
    fill = c("red", "green", "yellow"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("piceifrons", "americana", "cubense")
    legend_colors <- c("red", "green", "yellow")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }    
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for head upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - PACclade", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - PACclade", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - PACclade", allspecies_df)

# Define the species for Group 1
PACclade <- c("piceifrons", "americana", "cubense")

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in PACclade) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
  
  # Load the data
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  head_up <- head_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- head_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Head_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Head values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Head), 
              values_fill = list(Head = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "cyan3",  "black", "orange3", "orange")
color_palette2 <- c("blue3", "blue",  "white", "red", "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

Thorax tissues

# Define the species for PACclade
PACclade <- c("piceifrons", "americana", "cubense")

# Initialize an empty list to store DEG data
venn_data_PACclade_up <- list()
venn_data_PACclade_down <- list()
venn_data_PACclade_all <- list()

# Function to load DEGs for a given group of species for thorax
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in PACclade) {
    thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))
    
    thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(thorax_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    thorax_up <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    thorax_down <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- thorax_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- thorax_up$GeneID
    degs_down[[species]] <- thorax_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for thorax
venn_data_PACclade <- load_deg_data(PACclade)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  piceifrons = venn_data_PACclade$up[["piceifrons"]],
  americana = venn_data_PACclade$up[["americana"]],
  cubense = venn_data_PACclade$up[["cubense"]]
)

venn_data_down <- list(
  piceifrons = venn_data_PACclade$down[["piceifrons"]],
  americana = venn_data_PACclade$down[["americana"]],
  cubense = venn_data_PACclade$down[["cubense"]]
)

venn_data_all <- list(
  piceifrons = venn_data_PACclade$all[["piceifrons"]],
  americana = venn_data_PACclade$all[["americana"]],
  cubense = venn_data_PACclade$all[["cubense"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("piceifrons", "americana", "cubense"), 
    filename = NULL, 
    output = TRUE, 
    fill = c("red", "green", "yellow"),   # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("piceifrons", "americana", "cubense")
    legend_colors <- c("red", "green", "yellow")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }    
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for thorax upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - PACclade", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - PACclade", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - PACclade", allspecies_df)

PACclade <- c("piceifrons", "americana", "cubense")

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in PACclade) {
  # Load DESeq2 results for thorax
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
  
  # Load the data
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  thorax_up <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Thorax_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Thorax values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Thorax), 
              values_fill = list(Thorax = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan",  "black", "orange")
color_palette2 <- c("blue", "white",  "red")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

Plastic species

Head tissues

# Define the species for Group 1
plastic_species <- c("gregaria", "piceifrons", "cancellata","americana")

# Initialize an empty list to store DEG data
venn_data_plastic_species_up <- list()
venn_data_plastic_species_down <- list()
venn_data_plastic_species_all <- list()

# Function to load DEGs for a given group of species for head
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in plastic_species) {
    head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
    
    head_data <- read.csv(head_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(head_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    head_up <- head_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    head_down <- head_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- head_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- head_up$GeneID
    degs_down[[species]] <- head_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for head
venn_data_plastic_species <- load_deg_data(plastic_species)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_plastic_species$up[["gregaria"]],
  piceifrons = venn_data_plastic_species$up[["piceifrons"]],
  cancellata = venn_data_plastic_species$up[["cancellata"]],
  americana = venn_data_plastic_species$up[["americana"]]
)

venn_data_down <- list(
  gregaria = venn_data_plastic_species$down[["gregaria"]],
  piceifrons = venn_data_plastic_species$down[["piceifrons"]],
  cancellata = venn_data_plastic_species$down[["cancellata"]],
  americana = venn_data_plastic_species$down[["americana"]]
)

venn_data_all <- list(
  gregaria = venn_data_plastic_species$all[["gregaria"]],
  piceifrons = venn_data_plastic_species$all[["piceifrons"]],
  cancellata = venn_data_plastic_species$all[["cancellata"]],
  americana = venn_data_plastic_species$all[["americana"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("gregaria", "piceifrons", "cancellata","americana"),
    filename = NULL, 
    output = TRUE, 
    fill = c("orange", "red", "orchid", "green"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata","americana")
    legend_colors <- c("orange", "red", "orchid", "green")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }    
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for head upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - plastic_species", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - plastic_species", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - plastic_species", allspecies_df)

# Define the species for Group 1
plastic_species <- c("gregaria", "piceifrons", "cancellata","americana")

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in plastic_species) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
  
  # Load the data
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  head_up <- head_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- head_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Head_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Head values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Head), 
              values_fill = list(Head = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "cyan3",  "black", "orange3", "orange")
color_palette2 <- c("blue3", "blue",  "white", "red", "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

Thorax tissues

plastic_species <- c("gregaria", "piceifrons", "cancellata","americana")

# Initialize an empty list to store DEG data
venn_data_plastic_species_up <- list()
venn_data_plastic_species_down <- list()
venn_data_plastic_species_all <- list()

# Function to load DEGs for a given group of species for thorax
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in plastic_species) {
    thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))
    
    thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(thorax_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    thorax_up <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    thorax_down <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- thorax_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- thorax_up$GeneID
    degs_down[[species]] <- thorax_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for thorax
venn_data_plastic_species <- load_deg_data(plastic_species)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_plastic_species$up[["gregaria"]],
  piceifrons = venn_data_plastic_species$up[["piceifrons"]],
  cancellata = venn_data_plastic_species$up[["cancellata"]],
  americana = venn_data_plastic_species$up[["americana"]]
)

venn_data_down <- list(
  gregaria = venn_data_plastic_species$down[["gregaria"]],
  piceifrons = venn_data_plastic_species$down[["piceifrons"]],
  cancellata = venn_data_plastic_species$down[["cancellata"]],
  americana = venn_data_plastic_species$down[["americana"]]
)

venn_data_all <- list(
  gregaria = venn_data_plastic_species$all[["gregaria"]],
  piceifrons = venn_data_plastic_species$all[["piceifrons"]],
  cancellata = venn_data_plastic_species$all[["cancellata"]],
  americana = venn_data_plastic_species$all[["americana"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("gregaria", "piceifrons", "cancellata","americana"),
    filename = NULL, 
    output = TRUE, 
    fill = c("orange", "red", "orchid", "green"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata","americana")
    legend_colors <- c("orange", "red", "orchid", "green")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }    
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for thorax upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - plastic_species", allspecies_df)

# Display the Venn diagram and datatable for thorax downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - plastic_species", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - plastic_species", allspecies_df)

plastic_species <- c("gregaria", "piceifrons", "cancellata","americana")

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in plastic_species) {
  # Load DESeq2 results for thorax
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
  
  # Load the data
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  thorax_up <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Thorax_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Thorax values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Thorax), 
              values_fill = list(Thorax = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan",  "black", "orange")
color_palette2 <- c("blue", "white",  "red")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

All species

Combined tissues

# Define the species for Group 1
allspecies <- c("gregaria", "piceifrons", "cancellata","americana", "cubense")

# Initialize an empty list to store DEG data
venn_data_allspecies_up <- list()
venn_data_allspecies_down <- list()
venn_data_allspecies_all <- list()

# Function to load DEGs for a given group of species for head
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in allspecies) {
    head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
    
    head_data <- read.csv(head_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(head_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    head_up <- head_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    head_down <- head_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- head_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- head_up$GeneID
    degs_down[[species]] <- head_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for head
venn_data_allspecies <- load_deg_data(allspecies)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_allspecies$up[["gregaria"]],
  piceifrons = venn_data_allspecies$up[["piceifrons"]],
  cancellata = venn_data_allspecies$up[["cancellata"]],
  americana = venn_data_allspecies$up[["americana"]],
  cubense = venn_data_allspecies$up[["cubense"]]
)

venn_data_down <- list(
  gregaria = venn_data_allspecies$down[["gregaria"]],
  piceifrons = venn_data_allspecies$down[["piceifrons"]],
  cancellata = venn_data_allspecies$down[["cancellata"]],
  americana = venn_data_allspecies$down[["americana"]],
  cubense = venn_data_allspecies$down[["cubense"]]
)

venn_data_all <- list(
  gregaria = venn_data_allspecies$all[["gregaria"]],
  piceifrons = venn_data_allspecies$all[["piceifrons"]],
  cancellata = venn_data_allspecies$all[["cancellata"]],
  americana = venn_data_allspecies$all[["americana"]],
  cubense = venn_data_allspecies$all[["cubense"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("gregaria", "piceifrons", "cancellata","americana", "cubense"),
    filename = NULL, 
    output = TRUE, 
    fill = c("orange", "red", "orchid", "green", "yellow"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata","americana", "cubense")
    legend_colors <- c("orange", "red", "orchid", "green", "yellow")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }      
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for head upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - all species", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - all species", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - all species", allspecies_df)

# Thorax
# Initialize an empty list to store DEG data
venn_data_allspecies_up <- list()
venn_data_allspecies_down <- list()
venn_data_allspecies_all <- list()

# Function to load DEGs for a given group of species for thorax
load_deg_data <- function(species_list) {
  degs_up <- list()
  degs_down <- list()
  degs_all <- list()
  
  for (species in allspecies) {
    thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))
    
    thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
    
    # Check if data is empty and handle accordingly
    if (nrow(thorax_data) == 0) {
      message(paste("No data for species:", species))
      next  # Skip to the next species if there's no data
    }
    
    # Filter for significant DEGs (both upregulated and downregulated)
    thorax_up <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange >= 1) %>%
      select(GeneID = X)
    
    thorax_down <- thorax_data %>%
      filter(padj < 0.05 & log2FoldChange <= -1) %>%
      select(GeneID = X)
    
    all_deg <- thorax_data %>%
      filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
      select(GeneID = X)

    # Store the DEGs in the list
    degs_up[[species]] <- thorax_up$GeneID
    degs_down[[species]] <- thorax_down$GeneID
    degs_all[[species]] <- all_deg$GeneID
  }
  
  return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Load DEG data for Group 1 for thorax
venn_data_allspecies <- load_deg_data(allspecies)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_allspecies$up[["gregaria"]],
  piceifrons = venn_data_allspecies$up[["piceifrons"]],
  cancellata = venn_data_allspecies$up[["cancellata"]],
  americana = venn_data_allspecies$up[["americana"]],
  cubense = venn_data_allspecies$up[["cubense"]]
)

venn_data_down <- list(
  gregaria = venn_data_allspecies$down[["gregaria"]],
  piceifrons = venn_data_allspecies$down[["piceifrons"]],
  cancellata = venn_data_allspecies$down[["cancellata"]],
  americana = venn_data_allspecies$down[["americana"]],
  cubense = venn_data_allspecies$down[["cubense"]]
)

venn_data_all <- list(
  gregaria = venn_data_allspecies$all[["gregaria"]],
  piceifrons = venn_data_allspecies$all[["piceifrons"]],
  cancellata = venn_data_allspecies$all[["cancellata"]],
  americana = venn_data_allspecies$all[["americana"]],
  cubense = venn_data_allspecies$all[["cubense"]]
)

# Function to display Venn diagram and corresponding datatable
display_venn_with_datatable <- function(venn_data, title, allspecies_df) {
  # Calculate the overlapping genes
  overlap_genes <- Reduce(intersect, venn_data)
  
  # Create a data frame for the overlapping genes
  overlap_df <- data.frame(GeneID = overlap_genes)

  # Merge to get species information
  meta_brock_df <- merge(overlap_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Generate the Venn diagram
  venn_plot <- venn.diagram(
    x = venn_data, 
    category.names = c("gregaria", "piceifrons", "cancellata","americana", "cubense"),
    filename = NULL, 
    output = TRUE, 
    fill = c("orange", "red", "orchid", "green", "yellow"),  # Set colors for the groups
    alpha = 0.5, 
    cex = 1.2, 
    cat.cex = 0, 
    main = title,
    main.cex = 1.2
  )

  # Clear the current plotting area before drawing the Venn diagram
  grid.newpage()
  
  # Display the Venn diagram
  grid.draw(venn_plot)
    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata","americana", "cubense")
    legend_colors <- c("orange", "red", "orchid", "green", "yellow")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }     
  # Display the merged overlapping genes table with datatable
  datatable(meta_brock_df, options = list(
      pageLength = 10,
      scrollX = TRUE,
      autoWidth = TRUE,
      searchHighlight = TRUE
  ),
  rownames = FALSE,
  escape = FALSE
  ) %>%
  formatStyle(
      'Species', target = 'cell',
      fontStyle = 'italic'
  ) %>%
  formatStyle(
      columns = names(meta_brock_df), 
      target = 'row',
      color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
      fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
      backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
  )
}

# Display the Venn diagram and datatable for thorax upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - all species", allspecies_df)

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - all species", allspecies_df)

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Significant DEGs - all species", allspecies_df)

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in species_list) {
  # Load DESeq2 results for head and thorax
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
  
  # Load the data
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  head_up <- head_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- head_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  thorax_up <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species),
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
    Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Head_Combined", "Thorax_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Head and Thorax values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
    Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Head, Thorax), 
              values_fill = list(Head = 0, Thorax = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "cyan2", "cyan3", "black", "orange3", "orange2", "orange")
color_palette2 <- c("blue3", "blue2", "blue1", "white", "red", "red2", "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head and Thorax tissue - STRATEGY 1"
)

Version Author Date
3fa8e62 Maeva TECHER 2024-11-08
edb70fe Maeva TECHER 2024-11-07
pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head and Thorax tissue - STRATEGY 1"
)

Version Author Date
3fa8e62 Maeva TECHER 2024-11-08
edb70fe Maeva TECHER 2024-11-07

Head tissues

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in species_list) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
  
  # Load the data
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  head_up <- head_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- head_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c("Head_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Thorax values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Head), 
              values_fill = list(Head = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "cyan2", "cyan3", "black", "orange3", "orange2", "orange")
color_palette2 <- c("blue3", "blue2", "blue1", "white", "red", "red2", "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

Thorax tissues

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in species_list) {
  # Load DESeq2 results for thorax
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
  
  # Load the data
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
  thorax_up <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- thorax_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(GeneID, log2FoldChange, Tissue, Regulation, Species)
  
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
  stop("No valid data available for heatmap generation.")
}

# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
  group_by(GeneID, Species) %>%
  summarize(
    Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_longer(cols = c( "Thorax_Combined"), 
               names_to = c("Tissue", ".value"), 
               names_sep = "_") %>%
  # Combine the Thorax values for each GeneID while keeping Species
  group_by(GeneID, Species) %>%
  summarize(
    Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  pivot_wider(names_from = Species, 
              values_from = c(Thorax), 
              values_fill = list(Thorax = 0)) %>%
  column_to_rownames("GeneID") %>%
  as.matrix()

color_palette <- c("cyan", "black", "orange")
color_palette2 <- c("blue3",  "white",  "red3")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

Version Author Date
3fa8e62 Maeva TECHER 2024-11-08
edb70fe Maeva TECHER 2024-11-07
pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

Version Author Date
3fa8e62 Maeva TECHER 2024-11-08
edb70fe Maeva TECHER 2024-11-07

STRATEGY 2: Own RefSeq genome

Here the difference with STRATEGY 1 is that to look at the correspondance of genes across species for comparison, we will have to use orthologs (see section Orthofinder).

We load from our previous conversion

input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

1. DEGs comparison among species

We summarized the number of genes differential expressed between density for each species and each tissues.

# Initialize an empty data frame to store results
summary_table_head <- data.frame(Species = character(),
                            Head_Upregulated = integer(),
                            Head_Upregulated_Strict = integer(),
                            Head_Downregulated = integer(),
                            Head_Downregulated_Strict = integer(),
                            stringsAsFactors = FALSE)

summary_table_thorax <- data.frame(Species = character(),
                            Thorax_Upregulated = integer(),
                            Thorax_Upregulated_Strict = integer(),
                            Thorax_Downregulated = integer(),
                            Thorax_Downregulated_Strict = integer(),
                            stringsAsFactors = FALSE)


# Loop through each species to process their data
for (species in species_list) {
    # Read the DESeq2 results and sigresults files
    head_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_", species, ".csv"))
    thorax_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_", species, ".csv"))

    head_sigresults <- read.csv(head_sigresults_file, stringsAsFactors = FALSE)
    thorax_sigresults <- read.csv(thorax_sigresults_file, stringsAsFactors = FALSE)

    # Count upregulated and downregulated genes for head
    head_upregulated <- nrow(head_sigresults %>% filter(log2FoldChange > 0))
    head_downregulated <- nrow(head_sigresults %>% filter(log2FoldChange < -0))
    head_upregulated_strict <- nrow(head_sigresults %>% filter(log2FoldChange > 1))
    head_downregulated_strict <- nrow(head_sigresults %>% filter(log2FoldChange < -1))

    # Count upregulated and downregulated genes for thorax
    thorax_upregulated <- nrow(thorax_sigresults %>% filter(log2FoldChange > 0))
    thorax_downregulated <- nrow(thorax_sigresults %>% filter(log2FoldChange < -0))
    thorax_upregulated_strict <- nrow(thorax_sigresults %>% filter(log2FoldChange > 1))
    thorax_downregulated_strict <- nrow(thorax_sigresults %>% filter(log2FoldChange < -1))

    # Add to the summary table
    summary_table_head <- rbind(summary_table_head, data.frame("Species" = species,
                                                      "Head_Upregulated" = head_upregulated,
                                                      "Head_Downregulated" = head_downregulated,
                                                      "Head_Upregulated_Strict" = head_upregulated_strict,
                                                      "Head_Downregulated_Strict" = head_downregulated_strict,
                                                      stringsAsFactors = FALSE))
    
    summary_table_thorax <- rbind(summary_table_thorax, data.frame("Species" = species,
                                                      "Thorax_Upregulated" = thorax_upregulated,
                                                      "Thorax_Downregulated" = thorax_downregulated,
                                                      "Thorax_Upregulated_Strict" = thorax_upregulated_strict,
                                                      "Thorax_Downregulated_Strict" = thorax_downregulated_strict,
                                                      stringsAsFactors = FALSE))
}

# Print the summary table in a markdown-friendly format
knitr::kable(summary_table_head, format = "markdown", caption = "Summary of differentially expressed genes in head per species")
Summary of differentially expressed genes in head per species
Species Head_Upregulated Head_Downregulated Head_Upregulated_Strict Head_Downregulated_Strict
gregaria 2709 2988 814 662
piceifrons 538 518 301 263
cancellata 751 877 378 476
americana 802 619 357 339
cubense 49 56 49 55
nitens 233 314 122 245
# Convert the summary table to a long format for easier plotting
summary_long_head <- summary_table_head %>%
  select(Species, Head_Upregulated_Strict, Head_Downregulated_Strict) %>%
  pivot_longer(cols = -Species, 
               names_to = c(".value", "Tissue"), 
               names_sep = "_")


# Adjust the values for downregulated genes to be negative
summary_long_head <- summary_long_head %>%
  mutate(Head = ifelse(Tissue == "Downregulated", -Head, Head))

summary_long_head$Species <- factor(summary_long_head$Species, levels = species_order)
# Now create a barplot for head
ggplot(summary_long_head, aes(x = Species, y = Head, fill = Tissue)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Upregulated and Downregulated Genes in Head  (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  scale_fill_manual(values = c("Upregulated" = "red2", "Downregulated" = "blue")) +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200)) + # Set fixed y-axis limits
  theme_minimal(base_size = 12) + # Increase base font size for better readability
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1),  # Adjust text size and angle for x-axis
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"), 
        panel.grid.minor = element_blank())+
  coord_flip()

knitr::kable(summary_table_thorax, format = "markdown", caption = "Summary of differentially expressed genes in thorax per species")
Summary of differentially expressed genes in thorax per species
Species Thorax_Upregulated Thorax_Downregulated Thorax_Upregulated_Strict Thorax_Downregulated_Strict
gregaria 2751 2691 622 1174
piceifrons 1641 1361 652 332
cancellata 734 738 324 376
americana 460 798 181 427
cubense 127 251 78 185
nitens 0 0 0 0
# Convert the summary table to a long format for easier plotting
summary_long_thorax <- summary_table_thorax %>%
  select(Species, Thorax_Upregulated_Strict, Thorax_Downregulated_Strict) %>%
  pivot_longer(cols = -Species, 
               names_to = c(".value", "Tissue"), 
               names_sep = "_")

# Adjust the values for downregulated genes to be negative
summary_long_thorax <- summary_long_thorax %>%
  mutate(Thorax = ifelse(Tissue == "Downregulated", -Thorax, Thorax))

summary_long_thorax$Species <- factor(summary_long_thorax$Species, levels = species_order)
# Now create a barplot for head
ggplot(summary_long_thorax, aes(x = Species, y = Thorax, fill = Tissue)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Upregulated and Downregulated Genes in Thorax  (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  scale_fill_manual(values = c("Upregulated" = "red2", "Downregulated" = "blue")) +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200)) + # Set fixed y-axis limits
  theme_minimal(base_size = 12) + # Increase base font size for better readability
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1),  # Adjust text size and angle for x-axis
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"), 
        panel.grid.minor = element_blank())+
  coord_flip()

# Define custom colors for each GeneType
custom_colors <- c(
  "transcribed_pseudogene" = "#F4F1BB",  # Example color for transcribed_pseudogene
  "protein-coding" = "#9B57D3",         # Example color for protein-coding
  "lncRNA" = "#A5300F",                 # Example color for lncRNA
  "tRNA" = "#74D055FF",                   # Example color for tRNA
  "misc_RNA" = "#3B6978",               # Example color for misc_RNA
  "ncRNA" = "#29AF7FFF",                  # Example color for ncRNA
  "pseudogene" = "#81B29A",             # Example color for pseudogene
  "rRNA" = "#5982DB",                   # Example color for rRNA
  "snoRNA" = "#DCE318FF",                 # Example color for snoRNA
  "snRNA" = "#665EB8"                   # Example color for snRNA
)

# Use scale_fill_manual to map the custom colors to the GeneTypes
custom_color_scale <- scale_fill_manual(
  values = custom_colors
)
# Create an empty list to store the data for all species
all_species_data <- list()

# Loop through each species to process their data
for (species in species_list) {
  # Read the DESeq2 results for head and thorax
  head_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_", species, ".csv"))
  thorax_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_", species, ".csv"))
  
  head_sigresults <- read.csv(head_sigresults_file, stringsAsFactors = FALSE)
  thorax_sigresults <- read.csv(thorax_sigresults_file, stringsAsFactors = FALSE)
  
  # Add GeneType and Species columns (from `allspecies_df`)
  head_sigresults_merged <- merge(head_sigresults, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
  thorax_sigresults_merged <- merge(thorax_sigresults, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
  
  # Count for upregulated and downregulated genes in head
  head_upregulated <- head_sigresults_merged %>%
    filter(log2FoldChange > 1) %>%
    mutate(Regulation = "Upregulated", Tissue = "Head", Count = 1)
  
  head_downregulated <- head_sigresults_merged %>%
    filter(log2FoldChange < -1) %>%
    mutate(Regulation = "Downregulated", Tissue = "Head", Count = -1)  # Mutate downregulated genes to negative
  
  # Combine upregulated and downregulated genes for head
  head_combined <- rbind(head_upregulated, head_downregulated)
  
  # Ensure all GeneTypes are represented for this species, even if they have no DEGs
  head_combined <- head_combined %>%
    complete(GeneType = unique(allspecies_df$GeneType), 
             fill = list(Count = 0))  # Fill missing GeneTypes with Count = 0
  
  # Count for upregulated and downregulated genes in thorax
  thorax_upregulated <- thorax_sigresults_merged %>%
    filter(log2FoldChange > 1) %>%
    mutate(Regulation = "Upregulated", Tissue = "Thorax", Count = 1)
  
  thorax_downregulated <- thorax_sigresults_merged %>%
    filter(log2FoldChange < -1) %>%
    mutate(Regulation = "Downregulated", Tissue = "Thorax", Count = -1)  # Mutate downregulated genes to negative
  
  # Combine upregulated and downregulated genes for thorax
  thorax_combined <- rbind(thorax_upregulated, thorax_downregulated)
  
  # Ensure all GeneTypes are represented for this species in thorax, even if they have no DEGs
  thorax_combined <- thorax_combined %>%
    complete(GeneType = unique(allspecies_df$GeneType), 
             fill = list(Count = 0))  # Fill missing GeneTypes with Count = 0
  
  # Combine data for head and thorax into one
  combined_data <- rbind(head_combined, thorax_combined)
  
  # Add species column to the data
  combined_data$Species <- species
  
  # Append the data to the list for all species
  all_species_data[[species]] <- combined_data
}

# Combine all species data into one data frame
final_data <- bind_rows(all_species_data)

# Reorder species according to the desired order
final_data$Species <- factor(final_data$Species, levels = species_order)

# Filter for head tissue only
final_data_head <- final_data %>% filter(Tissue == "Head")
final_data_thorax <- final_data %>% filter(Tissue == "Thorax")

# Create the barplot for all species and only head tissue
ggplot(final_data_head, aes(x = Species, y = Count, fill = GeneType)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "DEGs by Gene Biotype for Head (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  custom_color_scale +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200))+
theme_minimal(base_size = 12) + 
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1), 
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"),
        panel.grid.minor = element_blank()) +
  coord_flip()  # Flip coordinates to make the plot horizontal

# Create the barplot for all species and only head tissue
ggplot(final_data_thorax, aes(x = Species, y = Count, fill = GeneType)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "DEGs by Gene Biotype for Thorax (absolute lfc >1)",
       x = "Species",
       y = "Number of Genes") +
  custom_color_scale +
  scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 1200))+
theme_minimal(base_size = 12) + 
  theme(legend.position = "top", 
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), 
        axis.title.x = element_text(size = 14, face = "bold"), 
        axis.title.y = element_text(size = 14, face = "bold"), 
        axis.text.x = element_text(size = 12, angle = 45, hjust = 1), 
        axis.text.y = element_text(size = 12), 
        panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"),
        panel.grid.minor = element_blank()) +
  coord_flip()  # Flip coordinates to make the plot horizontal

2. Overlap DEGs between tissues

gregaria


species <- "gregaria"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

piceifrons


species <- "piceifrons"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

cancellata


species <- "cancellata"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

americana


species <- "americana"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

cubense


species <- "cubense"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

nitens


species <- "nitens"  # Specify the species for which to generate plots

# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))

head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)

# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
} else {
    # Filter for significant DEGs and select upregulated and downregulated genes
    head_up <- head_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    head_down <- head_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    thorax_up <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange > 1) %>%
        select(GeneID = X)

    thorax_down <- thorax_data %>%
        filter(padj < 0.05 & log2FoldChange < -1) %>%
        select(GeneID = X)

    # Prepare data for Venn diagram
    venn_data <- list(
        Head_Upregulated = head_up$GeneID,
        Head_Downregulated = head_down$GeneID,
        Thorax_Upregulated = thorax_up$GeneID,
        Thorax_Downregulated = thorax_down$GeneID
    )

    # Generate the four-way Venn diagram with specified colors and legend outside
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("Head Upregulated", "Head Downregulated", "Thorax Upregulated", "Thorax Downregulated"), 
        filename = NULL, 
        output = TRUE, 
        fill = c("red", "skyblue", "orange", "blue"),  # Set colors for upregulated and downregulated
        alpha = 0.5, 
        cex = 1.2,  # Text size for numbers
        cat.cex = 0,  # Text size for category labels
        cat.pos = c(0, 0, 0, 0),  # Position to center labels
        cat.dist = c(0.1, 0.1, 0.1, 0.1),  # Distance between category labels and circles
        main = paste("Venn Diagram of DEGs for S.", species),
        main.cex = 1.2,  # Size of the main title
        cat.col = c("red", "skyblue", "orange", "blue")  # Color the category labels
    )

    # Clear the current plotting area before drawing the next Venn diagram
    grid.newpage()

    # Display the Venn diagram
    grid.draw(venn_plot)

    # Manually create a custom legend
    legend_labels <- c("Head Up", "Head Down", "Thorax Up", "Thorax Down")
    legend_colors <- c("red", "skyblue", "orange", "blue")

    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")    # Lower the legend vertically

    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }

    # Scatter plot for overlapping genes
    # Filter significant DEGs for both head and thorax
    head_sig_genes <- head_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj) 

    thorax_sig_genes <- thorax_data %>%
        filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
        select(GeneID = X, log2FoldChange, padj)

    # Find overlapping genes based on GeneID
    overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))

    # Save the overlapping genes to a CSV file
    output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
    write.csv(overlapping_genes, output_file, row.names = FALSE)

    # Plot overlapping genes with scatter plot
    p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
        geom_point(aes(color = case_when(
            log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
            log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
            log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
            log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
        )), size = 3, alpha = 0.7) +
        geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
        labs(
            x = "Log2 Fold Change (Head)", 
            y = "Log2 Fold Change (Thorax)", 
            color = "Regulation Pattern", 
            title = "Comparison of Log2 Fold Changes in Overlapping Genes", 
            subtitle = paste("Head vs. Thorax in", species)
        ) +
        theme_minimal() + 
        theme(
            plot.title = element_text(size = 16, face = "bold"), 
            plot.subtitle = element_text(size = 12, face = "italic"), 
            legend.position = "top"
        ) +
        scale_color_manual(values = c(
            "Upregulated in Both" = "red", 
            "Downregulated in Both" = "blue", 
            "Up in Head, Down in Thorax" = "purple", 
            "Down in Head, Up in Thorax" = "green"
        ))

    # Save the scatter plot
    ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)

    # Display the scatter plot
    print(p)
}

3. Overlap DEGs among species

Locusts

Head tissues

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Function to load DEGs for a given group of species
load_deg_data <- function(locusts, allspecies_df, filtered_final_orthotable) {
    degs_up <- list()
    degs_down <- list()
    degs_all <- list()
    
    # Rename the "gene_id" column in filtered_final_orthotable for consistency
    colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
    
    for (species in locusts) {
        head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
        
        # Check if the file exists
        if (!file.exists(head_file)) {
            message(paste("File not found for species:", species))
            next  # Skip this iteration if the file is missing
        }
        
        # Read the data
        head_data <- read.csv(head_file, stringsAsFactors = FALSE)
        
        # Rename the "X" column to "GeneID"
        colnames(head_data)[colnames(head_data) == "X"] <- "GeneID"
        
        # Merge DEG data with GeneType and Orthogroup information
        head_data_merged <- merge(head_data, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
        head_data_merged <- merge(head_data_merged, filtered_final_orthotable[, c("GeneID", "Orthogroup")], by = "GeneID")
        
        # Handle missing Orthogroups
        head_data_merged$Orthogroup[is.na(head_data_merged$Orthogroup)] <- "Unknown"
        
        # Filter for significant DEGs (both upregulated and downregulated)
        head_up <- head_data_merged %>%
            filter(padj < 0.05 & log2FoldChange >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        head_down <- head_data_merged %>%
            filter(padj < 0.05 & log2FoldChange <= -1) %>%
            select(Orthogroup) %>%
            distinct()
        
        all_deg <- head_data_merged %>%
            filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        # Store the DEGs in the list
        degs_up[[species]] <- head_up$Orthogroup
        degs_down[[species]] <- head_down$Orthogroup
        degs_all[[species]] <- all_deg$Orthogroup
    }
    
    return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Function to display Venn diagram and corresponding datatable based on Orthogroups
display_venn_with_datatable <- function(venn_data, title, allspecies_df, filtered_final_orthotable) {
    
    # Calculate the overlapping Orthogroups
    overlap_orthogroups <- Reduce(intersect, venn_data)
    
    # Print the overlap to check if there are matching Orthogroups
    cat("Overlapping Orthogroups: \n")
    print(overlap_orthogroups)
    
    # Create a data frame for the overlapping Orthogroups
    overlap_df <- data.frame(Orthogroup = overlap_orthogroups)
    
    # Check if the overlap_df has any data
    if (nrow(overlap_df) == 0) {
        stop("No overlapping Orthogroups found.")
    }
    
    # Rename GeneID column to avoid conflict in allspecies_df
    colnames(allspecies_df)[colnames(allspecies_df) == "GeneID"] <- "GeneID_allspecies"
    
    # Merge to get species and other information from filtered_final_orthotable
    meta_brock_df <- merge(overlap_df, filtered_final_orthotable, by = "Orthogroup", all.x = TRUE)
    
    # Check if the merge produced any data
    cat("Merged Data (after Orthogroup merge): \n")
    if (nrow(meta_brock_df) == 0) {
        stop("Merge failed: No matching rows after merging Orthogroups.")
    }
    print(head(meta_brock_df))
    
    # Rename gene_id column to GeneID_meta_brock
    colnames(meta_brock_df)[colnames(meta_brock_df) == "gene_id"] <- "GeneID_meta_brock"
    
    # Ensure that GeneID columns are the same type (character)
    meta_brock_df$GeneID_meta_brock <- as.character(meta_brock_df$GeneID_meta_brock)
    allspecies_df$GeneID_allspecies <- as.character(allspecies_df$GeneID_allspecies)
    
    # Perform the merge with renamed columns
    meta_brock_df <- merge(meta_brock_df, allspecies_df, by.x = "GeneID_meta_brock", by.y = "GeneID_allspecies", all.x = TRUE)
    
    # Generate the Venn diagram
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("gregaria", "piceifrons", "cancellata"), 
        filename = NULL, 
        output = TRUE,
        fill = c("orange", "red", "orchid"),
        alpha = 0.5,
        cex = 1.2,
        cat.cex = 0,
        main = title,
        main.cex = 1.2
    )
    
    # Clear the current plotting area before drawing the Venn diagram
    grid.newpage()
    
    # Display the Venn diagram
    grid.draw(venn_plot)
    
    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata")
    legend_colors <- c("orange", "red", "orchid")
    
    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")   # Lower the legend vertically
    
    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Display the merged overlapping Orthogroups table with datatable
    datatable(meta_brock_df, options = list(
        pageLength = 10,
        scrollX = TRUE,
        autoWidth = TRUE,
        searchHighlight = TRUE
    ),
    rownames = FALSE,
    escape = FALSE
    ) %>%
        formatStyle(
            'Species.x', target = 'cell',
            fontStyle = 'italic'
        ) %>%
        formatStyle(
            columns = names(meta_brock_df), 
            target = 'row',
            color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
            fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
            backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
        )
}

# Example for testing with your data
venn_data_locusts <- load_deg_data(locusts, allspecies_df, filtered_final_orthotable)

# Prepare the data for the Venn diagrams
venn_data_up <- list(
  gregaria = venn_data_locusts$up[["gregaria"]],
  piceifrons = venn_data_locusts$up[["piceifrons"]],
  cancellata = venn_data_locusts$up[["cancellata"]]
)

venn_data_down <- list(
  gregaria = venn_data_locusts$down[["gregaria"]],
  piceifrons = venn_data_locusts$down[["piceifrons"]],
  cancellata = venn_data_locusts$down[["cancellata"]]
)

venn_data_all <- list(
  gregaria = venn_data_locusts$all[["gregaria"]],
  piceifrons = venn_data_locusts$all[["piceifrons"]],
  cancellata = venn_data_locusts$all[["cancellata"]]
)

# Display the Venn diagram and datatable for head upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - Locusts", allspecies_df, filtered_final_orthotable)
Overlapping Orthogroups: 
[1] "OG0008343"
Merged Data (after Orthogroup merge): 
  Orthogroup Species     protein_id      gene_id
1  OG0008343   Scanc XP_049762685.1 LOC126088548
2  OG0008343   Scanc XP_049762686.1 LOC126088550
3  OG0008343   Sscub XP_049963929.1 LOC126484451
4  OG0008343   Sscub XP_049963930.1 LOC126484452
5  OG0008343   Sscub XP_049963931.1 LOC126484453
6  OG0008343   Sgreg XP_049831452.1 LOC126272572
                                   gene_description
1            putative beta-carotene-binding protein
2            putative beta-carotene-binding protein
3            putative beta-carotene-binding protein
4            putative beta-carotene-binding protein
5 putative beta-carotene-binding protein isoform X1
6            putative beta-carotene-binding protein
                        species
1       Schistocerca cancellata
2       Schistocerca cancellata
3 Schistocerca serialis cubense
4 Schistocerca serialis cubense
5 Schistocerca serialis cubense
6         Schistocerca gregaria

# Display the Venn diagram and datatable for head downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - Locusts", allspecies_df, filtered_final_orthotable)
Overlapping Orthogroups: 
[1] "OG0007750" "OG0007665" "OG0007872"
Merged Data (after Orthogroup merge): 
  Orthogroup Species     protein_id      gene_id
1  OG0007665   Scanc XP_049770752.1 LOC126109740
2  OG0007665   Sgreg XP_049842240.1 LOC126293182
3  OG0007665   Sscub XP_049946148.1 LOC126428279
4  OG0007665   Snite XP_049797822.1 LOC126215200
5  OG0007665   Spice XP_047100451.1 LOC124718862
6  OG0007750   Snite XP_049792449.1 LOC126199569
                    gene_description                       species
1         putative defense protein 3       Schistocerca cancellata
2 reelin domain-containing protein 1         Schistocerca gregaria
3         putative defense protein 3 Schistocerca serialis cubense
4 reelin domain-containing protein 1           Schistocerca nitens
5         putative defense protein 3       Schistocerca piceifrons
6                       mucin-2-like           Schistocerca nitens

# Display the Venn diagram and datatable for all significant DEGs
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Head DEGs - Locusts", allspecies_df, filtered_final_orthotable)
Overlapping Orthogroups: 
[1] "OG0008343" "OG0007750" "OG0007665" "OG0007872"
Merged Data (after Orthogroup merge): 
  Orthogroup Species     protein_id      gene_id
1  OG0007665   Scanc XP_049770752.1 LOC126109740
2  OG0007665   Sgreg XP_049842240.1 LOC126293182
3  OG0007665   Sscub XP_049946148.1 LOC126428279
4  OG0007665   Snite XP_049797822.1 LOC126215200
5  OG0007665   Spice XP_047100451.1 LOC124718862
6  OG0007750   Snite XP_049792449.1 LOC126199569
                    gene_description                       species
1         putative defense protein 3       Schistocerca cancellata
2 reelin domain-containing protein 1         Schistocerca gregaria
3         putative defense protein 3 Schistocerca serialis cubense
4 reelin domain-containing protein 1           Schistocerca nitens
5         putative defense protein 3       Schistocerca piceifrons
6                       mucin-2-like           Schistocerca nitens

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in locusts) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_", species, ".csv"))
  
  # Load the DESeq2 results
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Rename the "gene_id" column in filtered_final_orthotable for consistency
  colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
  
  # Merge with filtered_final_orthotable to include Orthogroup
  merged_data <- merge(head_data, filtered_final_orthotable, by = "GeneID", all.x = TRUE)
  
  # Check if merge was successful
  if (nrow(merged_data) == 0) {
    message(paste("No matching data for species:", species))
    next  # Skip if no matching data after merging
  }

  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  head_up <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(Orthogroup, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
    stop("No valid data available for heatmap generation.")
}

# Filter out rows with missing Orthogroup values
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(Orthogroup))

# Check if there are any missing values in log2FoldChange (optional, just in case)
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(log2FoldChange))

# Create heatmap matrix using Orthogroup instead of GeneID
heatmap_matrix <- final_heatmap_data %>%
    group_by(Orthogroup, Species) %>%
    summarize(
        Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
        .groups = 'drop'
    ) %>%
    pivot_wider(names_from = Species, 
                values_from = Head_Combined, 
                values_fill = list(Head_Combined = 0)) %>%
    column_to_rownames("Orthogroup") %>%
    as.matrix()

# Check if heatmap_matrix is empty
if (nrow(heatmap_matrix) == 0) {
    stop("No valid data available for heatmap matrix.")
}

color_palette <- c("cyan", "cyan3", "cyan4","lightblue4", "black", "orange3", "orange")
color_palette2 <- c("blue3", "blue2","blue", "skyblue","cyan",  "white", "red3", "red4")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 2"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 2"
)

Thorax tissues

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Function to load DEGs for a given group of species (for Thorax)
load_deg_data <- function(locusts, allspecies_df, filtered_final_orthotable) {
    degs_up <- list()
    degs_down <- list()
    degs_all <- list()
    
    # Rename the "gene_id" column in filtered_final_orthotable for consistency
    colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
    
    for (species in locusts) {
        thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))
        
        # Check if the file exists
        if (!file.exists(thorax_file)) {
            message(paste("File not found for species:", species))
            next  # Skip this iteration if the file is missing
        }
        
        # Read the data
        thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
        
        # Rename the "X" column to "GeneID"
        colnames(thorax_data)[colnames(thorax_data) == "X"] <- "GeneID"
        
        # Merge DEG data with GeneType and Orthogroup information
        thorax_data_merged <- merge(thorax_data, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
        thorax_data_merged <- merge(thorax_data_merged, filtered_final_orthotable[, c("GeneID", "Orthogroup")], by = "GeneID")
        
        # Handle missing Orthogroups
        thorax_data_merged$Orthogroup[is.na(thorax_data_merged$Orthogroup)] <- "Unknown"
        
        # Filter for significant DEGs (both upregulated and downregulated)
        thorax_up <- thorax_data_merged %>%
            filter(padj < 0.05 & log2FoldChange >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        thorax_down <- thorax_data_merged %>%
            filter(padj < 0.05 & log2FoldChange <= -1) %>%
            select(Orthogroup) %>%
            distinct()
        
        all_deg <- thorax_data_merged %>%
            filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        # Store the DEGs in the list
        degs_up[[species]] <- thorax_up$Orthogroup
        degs_down[[species]] <- thorax_down$Orthogroup
        degs_all[[species]] <- all_deg$Orthogroup
    }
    
    return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Function to display Venn diagram and corresponding datatable based on Orthogroups
display_venn_with_datatable <- function(venn_data, title, allspecies_df, filtered_final_orthotable) {
    
    # Calculate the overlapping Orthogroups
    overlap_orthogroups <- Reduce(intersect, venn_data)
    
    # Print the overlap to check if there are matching Orthogroups
    cat("Overlapping Orthogroups: \n")
    print(overlap_orthogroups)
    
    # Create a data frame for the overlapping Orthogroups
    overlap_df <- data.frame(Orthogroup = overlap_orthogroups)
    
    # Check if the overlap_df has any data
    if (nrow(overlap_df) == 0) {
        stop("No overlapping Orthogroups found.")
    }
    
    # Rename GeneID column to avoid conflict in allspecies_df
    colnames(allspecies_df)[colnames(allspecies_df) == "GeneID"] <- "GeneID_allspecies"
    
    # Merge to get species and other information from filtered_final_orthotable
    meta_brock_df <- merge(overlap_df, filtered_final_orthotable, by = "Orthogroup", all.x = TRUE)
    
    # Check if the merge produced any data
    cat("Merged Data (after Orthogroup merge): \n")
    if (nrow(meta_brock_df) == 0) {
        stop("Merge failed: No matching rows after merging Orthogroups.")
    }
#    print(head(meta_brock_df))
    
    # Rename gene_id column to GeneID_meta_brock
    colnames(meta_brock_df)[colnames(meta_brock_df) == "gene_id"] <- "GeneID_meta_brock"
    
    # Ensure that GeneID columns are the same type (character)
    meta_brock_df$GeneID_meta_brock <- as.character(meta_brock_df$GeneID_meta_brock)
    allspecies_df$GeneID_allspecies <- as.character(allspecies_df$GeneID_allspecies)
    
    # Perform the merge with renamed columns
    meta_brock_df <- merge(meta_brock_df, allspecies_df, by.x = "GeneID_meta_brock", by.y = "GeneID_allspecies", all.x = TRUE)
    
    # Generate the Venn diagram
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("gregaria", "piceifrons", "cancellata"), 
        filename = NULL, 
        output = TRUE,
        fill = c("orange", "red", "orchid"),
        alpha = 0.5,
        cex = 1.2,
        cat.cex = 0,
        main = title,
        main.cex = 1.2
    )
    
    # Clear the current plotting area before drawing the Venn diagram
    grid.newpage()
    
    # Display the Venn diagram
    grid.draw(venn_plot)
    
    # Manually create a custom legend
    legend_labels <- c("gregaria", "piceifrons", "cancellata")
    legend_colors <- c("orange", "red", "orchid")
    
    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")   # Lower the legend vertically
    
    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Display the merged overlapping Orthogroups table with datatable
    datatable(meta_brock_df, options = list(
        pageLength = 10,
        scrollX = TRUE,
        autoWidth = TRUE,
        searchHighlight = TRUE
    ),
    rownames = FALSE,
    escape = FALSE
    ) %>%
        formatStyle(
            'Species.x', target = 'cell',
            fontStyle = 'italic'
        ) %>%
        formatStyle(
            columns = names(meta_brock_df), 
            target = 'row',
            color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
            fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
            backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
        )
}

# Example for testing with your data (for Thorax)
venn_data_locusts <- load_deg_data(locusts, allspecies_df, filtered_final_orthotable)

# Prepare the data for the Venn diagrams for Thorax
venn_data_up <- list(
  gregaria = venn_data_locusts$up[["gregaria"]],
  piceifrons = venn_data_locusts$up[["piceifrons"]],
  cancellata = venn_data_locusts$up[["cancellata"]]
)

venn_data_down <- list(
  gregaria = venn_data_locusts$down[["gregaria"]],
  piceifrons = venn_data_locusts$down[["piceifrons"]],
  cancellata = venn_data_locusts$down[["cancellata"]]
)

venn_data_all <- list(
  gregaria = venn_data_locusts$all[["gregaria"]],
  piceifrons = venn_data_locusts$all[["piceifrons"]],
  cancellata = venn_data_locusts$all[["cancellata"]]
)

# Display the Venn diagram and datatable for thorax upregulated DEGs
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - Locusts", allspecies_df, filtered_final_orthotable)
Overlapping Orthogroups: 
[1] "OG0007843" "OG0007936"
Merged Data (after Orthogroup merge): 

# Display the Venn diagram and datatable for thorax downregulated DEGs
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - Locusts", allspecies_df, filtered_final_orthotable)
Overlapping Orthogroups: 
[1] "OG0007750" "OG0008372"
Merged Data (after Orthogroup merge): 

# Display the Venn diagram and datatable for all significant DEGs in Thorax
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Thorax DEGs - Locusts", allspecies_df, filtered_final_orthotable)
Overlapping Orthogroups: 
[1] "OG0007750" "OG0008372" "OG0007843" "OG0007936" "OG0008341"
Merged Data (after Orthogroup merge): 

# Define the species for Group 1
locusts <- c("gregaria", "piceifrons", "cancellata")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in locusts) {
  # Load DESeq2 results for head
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_", species, ".csv"))
  
  # Load the DESeq2 results
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Rename the "gene_id" column in filtered_final_orthotable for consistency
  colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
  
  # Merge with filtered_final_orthotable to include Orthogroup
  merged_data <- merge(thorax_data, filtered_final_orthotable, by = "GeneID", all.x = TRUE)
  
  # Check if merge was successful
  if (nrow(merged_data) == 0) {
    message(paste("No matching data for species:", species))
    next  # Skip if no matching data after merging
  }

  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  thorax_up <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(Orthogroup, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
    stop("No valid data available for heatmap generation.")
}

# Filter out rows with missing Orthogroup values
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(Orthogroup))

# Check if there are any missing values in log2FoldChange (optional, just in case)
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(log2FoldChange))

# Create heatmap matrix using Orthogroup instead of GeneID
heatmap_matrix <- final_heatmap_data %>%
    group_by(Orthogroup, Species) %>%
    summarize(
        Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
        .groups = 'drop'
    ) %>%
    pivot_wider(names_from = Species, 
                values_from = Thorax_Combined, 
                values_fill = list(Thorax_Combined = 0)) %>%
    column_to_rownames("Orthogroup") %>%
    as.matrix()

# Check if heatmap_matrix is empty
if (nrow(heatmap_matrix) == 0) {
    stop("No valid data available for heatmap matrix.")
}

color_palette <- c("cyan", "black", "orange3", "orange")
color_palette2 <- c("blue2", "white", "red3", "red4")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 2"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 2"
)

piceifrons-americana-cubense

Head tissues

# Define the species for PACclade
PACclade <- c("piceifrons", "americana", "cubense")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Function to load DEGs for a given group of species (PACclade)
load_deg_data <- function(PACclade, allspecies_df, filtered_final_orthotable) {
    degs_up <- list()
    degs_down <- list()
    degs_all <- list()
    
    # Rename the "gene_id" column in filtered_final_orthotable for consistency
    colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
    
    for (species in PACclade) {
        head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
        
        # Check if the file exists
        if (!file.exists(head_file)) {
            message(paste("File not found for species:", species))
            next  # Skip this iteration if the file is missing
        }
        
        # Read the data
        head_data <- read.csv(head_file, stringsAsFactors = FALSE)
        
        # Rename the "X" column to "GeneID"
        colnames(head_data)[colnames(head_data) == "X"] <- "GeneID"
        
        # Merge DEG data with GeneType and Orthogroup information
        head_data_merged <- merge(head_data, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
        head_data_merged <- merge(head_data_merged, filtered_final_orthotable[, c("GeneID", "Orthogroup")], by = "GeneID")
        
        # Handle missing Orthogroups
        head_data_merged$Orthogroup[is.na(head_data_merged$Orthogroup)] <- "Unknown"
        
        # Filter for significant DEGs (both upregulated and downregulated)
        head_up <- head_data_merged %>%
            filter(padj < 0.05 & log2FoldChange >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        head_down <- head_data_merged %>%
            filter(padj < 0.05 & log2FoldChange <= -1) %>%
            select(Orthogroup) %>%
            distinct()
        
        all_deg <- head_data_merged %>%
            filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        # Store the DEGs in the list
        degs_up[[species]] <- head_up$Orthogroup
        degs_down[[species]] <- head_down$Orthogroup
        degs_all[[species]] <- all_deg$Orthogroup
    }
    
    return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Function to display Venn diagram and corresponding datatable based on Orthogroups
display_venn_with_datatable <- function(venn_data, title, allspecies_df, filtered_final_orthotable) {
    # Calculate the overlapping Orthogroups
    overlap_orthogroups <- Reduce(intersect, venn_data)
    
    # If no overlapping Orthogroups, we still display the Venn diagram, but inform the user
    if(length(overlap_orthogroups) == 0) {
        cat("No overlapping Orthogroups found for", title, "\n")
    } else {
        cat("Overlapping Orthogroups: \n")
        print(overlap_orthogroups)
    }
    
    # Create a data frame for the overlapping Orthogroups (even if it's empty)
    overlap_df <- data.frame(Orthogroup = overlap_orthogroups)
    
    # Rename GeneID column to avoid conflict in allspecies_df
    colnames(allspecies_df)[colnames(allspecies_df) == "GeneID"] <- "GeneID_allspecies"
    
    # Merge to get species and other information from filtered_final_orthotable
    meta_brock_df <- merge(overlap_df, filtered_final_orthotable, by = "Orthogroup", all.x = TRUE)
    
    # Check if the merge produced any data
    if (nrow(meta_brock_df) == 0) {
        cat("No matching rows after merging Orthogroups for", title, "\n")
    }
    
    # Rename gene_id column to GeneID_meta_brock
    colnames(meta_brock_df)[colnames(meta_brock_df) == "gene_id"] <- "GeneID_meta_brock"
    
    # Ensure that GeneID columns are the same type (character)
    meta_brock_df$GeneID_meta_brock <- as.character(meta_brock_df$GeneID_meta_brock)
    allspecies_df$GeneID_allspecies <- as.character(allspecies_df$GeneID_allspecies)
    
    # Perform the merge with renamed columns
    meta_brock_df <- merge(meta_brock_df, allspecies_df, by.x = "GeneID_meta_brock", by.y = "GeneID_allspecies", all.x = TRUE)
    
    # Generate the Venn diagram
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("piceifrons", "americana", "cubense"), 
        filename = NULL, 
        output = TRUE,
        fill = c("red", "green", "yellow"),
        alpha = 0.5,
        cex = 1.2,
        cat.cex = 0,
        main = title,
        main.cex = 1.2
    )
    
    # Clear the current plotting area before drawing the Venn diagram
    grid.newpage()
    
    # Display the Venn diagram
    grid.draw(venn_plot)
    
    # Manually create a custom legend
    legend_labels <- c("piceifrons", "americana", "cubense")
    legend_colors <- c("red", "green", "yellow")
    
    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")   # Lower the legend vertically
    
    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Display the merged overlapping Orthogroups table with datatable
    datatable(meta_brock_df, options = list(
        pageLength = 10,
        scrollX = TRUE,
        autoWidth = TRUE,
        searchHighlight = TRUE
    ),
    rownames = FALSE,
    escape = FALSE
    ) %>%
        formatStyle(
            'Species.x', target = 'cell',
            fontStyle = 'italic'
        ) %>%
        formatStyle(
            columns = names(meta_brock_df), 
            target = 'row',
            color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
            fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
            backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
        )
}

# Example for testing with your data (for PACclade)
venn_data_pacclade <- load_deg_data(PACclade, allspecies_df, filtered_final_orthotable)

# Prepare the data for the Venn diagrams for PACclade
venn_data_up <- list(
  piceifrons = venn_data_pacclade$up[["piceifrons"]],
  americana = venn_data_pacclade$up[["americana"]],
  cubense = venn_data_pacclade$up[["cubense"]]
)

venn_data_down <- list(
  piceifrons = venn_data_pacclade$down[["piceifrons"]],
  americana = venn_data_pacclade$down[["americana"]],
  cubense = venn_data_pacclade$down[["cubense"]]
)

venn_data_all <- list(
  piceifrons = venn_data_pacclade$all[["piceifrons"]],
  americana = venn_data_pacclade$all[["americana"]],
  cubense = venn_data_pacclade$all[["cubense"]]
)

# Display the Venn diagram and datatable for head upregulated DEGs (PACclade)
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - PAC", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Head Upregulated DEGs - PAC 
No matching rows after merging Orthogroups for Venn Diagram of Head Upregulated DEGs - PAC 

# Display the Venn diagram and datatable for head downregulated DEGs (PACclade)
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - PAC", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Head Downregulated DEGs - PAC 
No matching rows after merging Orthogroups for Venn Diagram of Head Downregulated DEGs - PAC 

# Display the Venn diagram and datatable for all significant DEGs (PACclade)
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Head DEGs - PAC", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of All Head DEGs - PAC 
No matching rows after merging Orthogroups for Venn Diagram of All Head DEGs - PAC 

# Define the species for Group 1
PACclade <- c("piceifrons", "americana", "cubense")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in PACclade) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_", species, ".csv"))
  
  # Load the DESeq2 results
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Rename the "gene_id" column in filtered_final_orthotable for consistency
  colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
  
  # Merge with filtered_final_orthotable to include Orthogroup
  merged_data <- merge(head_data, filtered_final_orthotable, by = "GeneID", all.x = TRUE)
  
  # Check if merge was successful
  if (nrow(merged_data) == 0) {
    message(paste("No matching data for species:", species))
    next  # Skip if no matching data after merging
  }

  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  head_up <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(Orthogroup, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
    stop("No valid data available for heatmap generation.")
}

# Filter out rows with missing Orthogroup values
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(Orthogroup))

# Check if there are any missing values in log2FoldChange (optional, just in case)
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(log2FoldChange))

# Create heatmap matrix using Orthogroup instead of GeneID
heatmap_matrix <- final_heatmap_data %>%
    group_by(Orthogroup, Species) %>%
    summarize(
        Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
        .groups = 'drop'
    ) %>%
    pivot_wider(names_from = Species, 
                values_from = Head_Combined, 
                values_fill = list(Head_Combined = 0)) %>%
    column_to_rownames("Orthogroup") %>%
    as.matrix()

# Check if heatmap_matrix is empty
if (nrow(heatmap_matrix) == 0) {
    stop("No valid data available for heatmap matrix.")
}

color_palette <- c("cyan", "cyan3", "cyan4", "black", "orange3", "orange")
color_palette2 <- c("blue3", "blue2","blue",  "white", "red3", "red4")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 2"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 2"
)

Thorax tissues

# Define the species for PACclade
PACclade <- c("piceifrons", "americana", "cubense")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Function to load DEGs for a given group of species (PACclade)
load_deg_data <- function(PACclade, allspecies_df, filtered_final_orthotable) {
    degs_up <- list()
    degs_down <- list()
    degs_all <- list()
    
    # Rename the "gene_id" column in filtered_final_orthotable for consistency
    colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
    
    for (species in PACclade) {
        thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))
        
        # Check if the file exists
        if (!file.exists(head_file)) {
            message(paste("File not found for species:", species))
            next  # Skip this iteration if the file is missing
        }
        
        # Read the data
        thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
        
        # Rename the "X" column to "GeneID"
        colnames(thorax_data)[colnames(thorax_data) == "X"] <- "GeneID"
        
        # Merge DEG data with GeneType and Orthogroup information
        thorax_data_merged <- merge(thorax_data, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
        thorax_data_merged <- merge(thorax_data_merged, filtered_final_orthotable[, c("GeneID", "Orthogroup")], by = "GeneID")
        
        # Handle missing Orthogroups
        thorax_data_merged$Orthogroup[is.na(thorax_data_merged$Orthogroup)] <- "Unknown"
        
        # Filter for significant DEGs (both upregulated and downregulated)
        thorax_up <- thorax_data_merged %>%
            filter(padj < 0.05 & log2FoldChange >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        thorax_down <- thorax_data_merged %>%
            filter(padj < 0.05 & log2FoldChange <= -1) %>%
            select(Orthogroup) %>%
            distinct()
        
        all_deg <- thorax_data_merged %>%
            filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        # Store the DEGs in the list
        degs_up[[species]] <- thorax_up$Orthogroup
        degs_down[[species]] <- thorax_down$Orthogroup
        degs_all[[species]] <- all_deg$Orthogroup
    }
    
    return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Function to display Venn diagram and corresponding datatable based on Orthogroups
display_venn_with_datatable <- function(venn_data, title, allspecies_df, filtered_final_orthotable) {
    # Calculate the overlapping Orthogroups
    overlap_orthogroups <- Reduce(intersect, venn_data)
    
    # If no overlapping Orthogroups, we still display the Venn diagram, but inform the user
    if(length(overlap_orthogroups) == 0) {
        cat("No overlapping Orthogroups found for", title, "\n")
    } else {
        cat("Overlapping Orthogroups: \n")
        print(overlap_orthogroups)
    }
    
    # Create a data frame for the overlapping Orthogroups (even if it's empty)
    overlap_df <- data.frame(Orthogroup = overlap_orthogroups)
    
    # Rename GeneID column to avoid conflict in allspecies_df
    colnames(allspecies_df)[colnames(allspecies_df) == "GeneID"] <- "GeneID_allspecies"
    
    # Merge to get species and other information from filtered_final_orthotable
    meta_brock_df <- merge(overlap_df, filtered_final_orthotable, by = "Orthogroup", all.x = TRUE)
    
    # Check if the merge produced any data
    if (nrow(meta_brock_df) == 0) {
        cat("No matching rows after merging Orthogroups for", title, "\n")
    }
    
    # Rename gene_id column to GeneID_meta_brock
    colnames(meta_brock_df)[colnames(meta_brock_df) == "gene_id"] <- "GeneID_meta_brock"
    
    # Ensure that GeneID columns are the same type (character)
    meta_brock_df$GeneID_meta_brock <- as.character(meta_brock_df$GeneID_meta_brock)
    allspecies_df$GeneID_allspecies <- as.character(allspecies_df$GeneID_allspecies)
    
    # Perform the merge with renamed columns
    meta_brock_df <- merge(meta_brock_df, allspecies_df, by.x = "GeneID_meta_brock", by.y = "GeneID_allspecies", all.x = TRUE)
    
    # Generate the Venn diagram
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("piceifrons", "americana", "cubense"), 
        filename = NULL, 
        output = TRUE,
        fill = c("red", "green", "yellow"),
        alpha = 0.5,
        cex = 1.2,
        cat.cex = 0,
        main = title,
        main.cex = 1.2
    )
    
    # Clear the current plotting area before drawing the Venn diagram
    grid.newpage()
    
    # Display the Venn diagram
    grid.draw(venn_plot)
    
    # Manually create a custom legend
    legend_labels <- c("piceifrons", "americana", "cubense")
    legend_colors <- c("red", "green", "yellow")
    
    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")   # Lower the legend vertically
    
    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Display the merged overlapping Orthogroups table with datatable
    datatable(meta_brock_df, options = list(
        pageLength = 10,
        scrollX = TRUE,
        autoWidth = TRUE,
        searchHighlight = TRUE
    ),
    rownames = FALSE,
    escape = FALSE
    ) %>%
        formatStyle(
            'Species.x', target = 'cell',
            fontStyle = 'italic'
        ) %>%
        formatStyle(
            columns = names(meta_brock_df), 
            target = 'row',
            color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
            fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
            backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
        )
}

# Example for testing with your data (for PACclade)
venn_data_pacclade <- load_deg_data(PACclade, allspecies_df, filtered_final_orthotable)

# Prepare the data for the Venn diagrams for PACclade
venn_data_up <- list(
  piceifrons = venn_data_pacclade$up[["piceifrons"]],
  americana = venn_data_pacclade$up[["americana"]],
  cubense = venn_data_pacclade$up[["cubense"]]
)

venn_data_down <- list(
  piceifrons = venn_data_pacclade$down[["piceifrons"]],
  americana = venn_data_pacclade$down[["americana"]],
  cubense = venn_data_pacclade$down[["cubense"]]
)

venn_data_all <- list(
  piceifrons = venn_data_pacclade$all[["piceifrons"]],
  americana = venn_data_pacclade$all[["americana"]],
  cubense = venn_data_pacclade$all[["cubense"]]
)

# Display the Venn diagram and datatable for thorax upregulated DEGs (PACclade)
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - PAC", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Thorax Upregulated DEGs - PAC 
No matching rows after merging Orthogroups for Venn Diagram of Thorax Upregulated DEGs - PAC 

# Display the Venn diagram and datatable for head downregulated DEGs (PACclade)
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - PAC", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Thorax Downregulated DEGs - PAC 
No matching rows after merging Orthogroups for Venn Diagram of Thorax Downregulated DEGs - PAC 

# Display the Venn diagram and datatable for all significant DEGs (PACclade)
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Thorax DEGs - PAC", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of All Thorax DEGs - PAC 
No matching rows after merging Orthogroups for Venn Diagram of All Thorax DEGs - PAC 

# Define the species for PACclade
PACclade <- c("piceifrons", "americana", "cubense")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in PACclade) {
  # Load DESeq2 results for head
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_", species, ".csv"))
  
  # Load the DESeq2 results
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Rename the "gene_id" column in filtered_final_orthotable for consistency
  colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
  
  # Merge with filtered_final_orthotable to include Orthogroup
  merged_data <- merge(thorax_data, filtered_final_orthotable, by = "GeneID", all.x = TRUE)
  
  # Check if merge was successful
  if (nrow(merged_data) == 0) {
    message(paste("No matching data for species:", species))
    next  # Skip if no matching data after merging
  }

  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  thorax_up <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(Orthogroup, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
    stop("No valid data available for heatmap generation.")
}

# Filter out rows with missing Orthogroup values
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(Orthogroup))

# Check if there are any missing values in log2FoldChange (optional, just in case)
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(log2FoldChange))

# Create heatmap matrix using Orthogroup instead of GeneID
heatmap_matrix <- final_heatmap_data %>%
    group_by(Orthogroup, Species) %>%
    summarize(
        Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
        .groups = 'drop'
    ) %>%
    pivot_wider(names_from = Species, 
                values_from = Thorax_Combined, 
                values_fill = list(Thorax_Combined = 0)) %>%
    column_to_rownames("Orthogroup") %>%
    as.matrix()

# Check if heatmap_matrix is empty
if (nrow(heatmap_matrix) == 0) {
    stop("No valid data available for heatmap matrix.")
}

color_palette <- c("cyan", "black", "orange3", "orange")
color_palette2 <- c("blue2", "white", "red3", "red4")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 2"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 2"
)

Plastic species

Head tissues

# Define the species for plastic_species
plastic_species <- c("gregaria", "piceifrons", "cancellata", "americana")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Function to load DEGs for a given group of species (plastic_species)
load_deg_data <- function(plastic_species, allspecies_df, filtered_final_orthotable) {
    degs_up <- list()
    degs_down <- list()
    degs_all <- list()
    
    # Rename the "gene_id" column in filtered_final_orthotable for consistency
    colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
    
    for (species in plastic_species) {
        head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_", species, ".csv"))
        
        # Check if the file exists
        if (!file.exists(head_file)) {
            message(paste("File not found for species:", species))
            next  # Skip this iteration if the file is missing
        }
        
        # Read the data
        head_data <- read.csv(head_file, stringsAsFactors = FALSE)
        
        # Rename the "X" column to "GeneID"
        colnames(head_data)[colnames(head_data) == "X"] <- "GeneID"
        
        # Merge DEG data with GeneType and Orthogroup information
        head_data_merged <- merge(head_data, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
        head_data_merged <- merge(head_data_merged, filtered_final_orthotable[, c("GeneID", "Orthogroup")], by = "GeneID")
        
        # Handle missing Orthogroups
        head_data_merged$Orthogroup[is.na(head_data_merged$Orthogroup)] <- "Unknown"
        
        # Filter for significant DEGs (both upregulated and downregulated)
        head_up <- head_data_merged %>%
            filter(padj < 0.05 & log2FoldChange >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        head_down <- head_data_merged %>%
            filter(padj < 0.05 & log2FoldChange <= -1) %>%
            select(Orthogroup) %>%
            distinct()
        
        all_deg <- head_data_merged %>%
            filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        # Store the DEGs in the list
        degs_up[[species]] <- head_up$Orthogroup
        degs_down[[species]] <- head_down$Orthogroup
        degs_all[[species]] <- all_deg$Orthogroup
    }
    
    return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Function to display Venn diagram and corresponding datatable based on Orthogroups
display_venn_with_datatable <- function(venn_data, title, allspecies_df, filtered_final_orthotable) {
    # Calculate the overlapping Orthogroups
    overlap_orthogroups <- Reduce(intersect, venn_data)
    
    # If no overlapping Orthogroups, we still display the Venn diagram, but inform the user
    if(length(overlap_orthogroups) == 0) {
        cat("No overlapping Orthogroups found for", title, "\n")
    } else {
        cat("Overlapping Orthogroups: \n")
        print(overlap_orthogroups)
    }
    
    # Create a data frame for the overlapping Orthogroups (even if it's empty)
    overlap_df <- data.frame(Orthogroup = overlap_orthogroups)
    
    # Rename GeneID column to avoid conflict in allspecies_df
    colnames(allspecies_df)[colnames(allspecies_df) == "GeneID"] <- "GeneID_allspecies"
    
    # Merge to get species and other information from filtered_final_orthotable
    meta_brock_df <- merge(overlap_df, filtered_final_orthotable, by = "Orthogroup", all.x = TRUE)
    
    # Check if the merge produced any data
    if (nrow(meta_brock_df) == 0) {
        cat("No matching rows after merging Orthogroups for", title, "\n")
    }
    
    # Rename gene_id column to GeneID_meta_brock
    colnames(meta_brock_df)[colnames(meta_brock_df) == "gene_id"] <- "GeneID_meta_brock"
    
    # Ensure that GeneID columns are the same type (character)
    meta_brock_df$GeneID_meta_brock <- as.character(meta_brock_df$GeneID_meta_brock)
    allspecies_df$GeneID_allspecies <- as.character(allspecies_df$GeneID_allspecies)
    
    # Perform the merge with renamed columns
    meta_brock_df <- merge(meta_brock_df, allspecies_df, by.x = "GeneID_meta_brock", by.y = "GeneID_allspecies", all.x = TRUE)
    
    # Generate the Venn diagram
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("piceifrons", "americana", "cubense", "gregaria"), 
        filename = NULL, 
        output = TRUE,
        fill = c("red", "green", "yellow", "orange"),
        alpha = 0.5,
        cex = 1.2,
        cat.cex = 0,
        main = title,
        main.cex = 1.2
    )
    
    # Clear the current plotting area before drawing the Venn diagram
    grid.newpage()
    
    # Display the Venn diagram
    grid.draw(venn_plot)
    
    # Manually create a custom legend
    legend_labels <- c("piceifrons", "americana", "cubense", "gregaria")
    legend_colors <- c("red", "green", "yellow", "orange")
    
    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")   # Lower the legend vertically
    
    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Display the merged overlapping Orthogroups table with datatable
    datatable(meta_brock_df, options = list(
        pageLength = 10,
        scrollX = TRUE,
        autoWidth = TRUE,
        searchHighlight = TRUE
    ),
    rownames = FALSE,
    escape = FALSE
    ) %>%
        formatStyle(
            'Species.x', target = 'cell',
            fontStyle = 'italic'
        ) %>%
        formatStyle(
            columns = names(meta_brock_df), 
            target = 'row',
            color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
            fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
            backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
        )
}

# Example for testing with your data (for plastic_species)
venn_data_plastic_species <- load_deg_data(plastic_species, allspecies_df, filtered_final_orthotable)

# Prepare the data for the Venn diagrams for plastic_species
venn_data_up <- list(
  gregaria = venn_data_plastic_species$up[["gregaria"]],
  piceifrons = venn_data_plastic_species$up[["piceifrons"]],
  cancellata = venn_data_plastic_species$up[["cancellata"]],
  americana = venn_data_plastic_species$up[["americana"]]
)

venn_data_down <- list(
  gregaria = venn_data_plastic_species$down[["gregaria"]],
  piceifrons = venn_data_plastic_species$down[["piceifrons"]],
  cancellata = venn_data_plastic_species$down[["cancellata"]],
  americana = venn_data_plastic_species$down[["americana"]]
)

venn_data_all <- list(
  gregaria = venn_data_plastic_species$all[["gregaria"]],
  piceifrons = venn_data_plastic_species$all[["piceifrons"]],
  cancellata = venn_data_plastic_species$all[["cancellata"]],
  americana = venn_data_plastic_species$all[["americana"]]
)

# Display the Venn diagram and datatable for head upregulated DEGs (plastic_species)
display_venn_with_datatable(venn_data_up, "Venn Diagram of Head Upregulated DEGs - Plastic Species", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Head Upregulated DEGs - Plastic Species 
No matching rows after merging Orthogroups for Venn Diagram of Head Upregulated DEGs - Plastic Species 

# Display the Venn diagram and datatable for head downregulated DEGs (plastic_species)
display_venn_with_datatable(venn_data_down, "Venn Diagram of Head Downregulated DEGs - Plastic Species", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Head Downregulated DEGs - Plastic Species 
No matching rows after merging Orthogroups for Venn Diagram of Head Downregulated DEGs - Plastic Species 

# Display the Venn diagram and datatable for all significant DEGs (plastic_species)
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Head DEGs - Plastic Species", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of All Head DEGs - Plastic Species 
No matching rows after merging Orthogroups for Venn Diagram of All Head DEGs - Plastic Species 

# Define the species for Group 1
plastic_species <- c("gregaria", "piceifrons", "cancellata", "americana")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in plastic_species) {
  # Load DESeq2 results for head
  head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_", species, ".csv"))
  
  # Load the DESeq2 results
  head_data <- read.csv(head_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(head_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Rename the "gene_id" column in filtered_final_orthotable for consistency
  colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
  
  # Merge with filtered_final_orthotable to include Orthogroup
  merged_data <- merge(head_data, filtered_final_orthotable, by = "GeneID", all.x = TRUE)
  
  # Check if merge was successful
  if (nrow(merged_data) == 0) {
    message(paste("No matching data for species:", species))
    next  # Skip if no matching data after merging
  }

  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  head_up <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  head_down <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
    head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
  ) %>%
    select(Orthogroup, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
    stop("No valid data available for heatmap generation.")
}

# Filter out rows with missing Orthogroup values
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(Orthogroup))

# Check if there are any missing values in log2FoldChange (optional, just in case)
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(log2FoldChange))

# Create heatmap matrix using Orthogroup instead of GeneID
heatmap_matrix <- final_heatmap_data %>%
    group_by(Orthogroup, Species) %>%
    summarize(
        Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
        .groups = 'drop'
    ) %>%
    pivot_wider(names_from = Species, 
                values_from = Head_Combined, 
                values_fill = list(Head_Combined = 0)) %>%
    column_to_rownames("Orthogroup") %>%
    as.matrix()

# Check if heatmap_matrix is empty
if (nrow(heatmap_matrix) == 0) {
    stop("No valid data available for heatmap matrix.")
}

color_palette <- c("cyan", "cyan3", "cyan4", "black",  "orange")
color_palette2 <- c("blue3", "blue2","blue",  "white",  "red4")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Head tissue - STRATEGY 2"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 2"
)

Thorax tissues

# Define the species for plastic_species
plastic_species <- c("gregaria", "piceifrons", "cancellata", "americana")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Function to load DEGs for a given group of species (plastic_species)
load_deg_data <- function(plastic_species, allspecies_df, filtered_final_orthotable) {
    degs_up <- list()
    degs_down <- list()
    degs_all <- list()
    
    # Rename the "gene_id" column in filtered_final_orthotable for consistency
    colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
    
    for (species in plastic_species) {
        thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_", species, ".csv"))
        
        # Check if the file exists
        if (!file.exists(thorax_file)) {
            message(paste("File not found for species:", species))
            next  # Skip this iteration if the file is missing
        }
        
        # Read the data
        thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
        
        # Rename the "X" column to "GeneID"
        colnames(thorax_data)[colnames(thorax_data) == "X"] <- "GeneID"
        
        # Merge DEG data with GeneType and Orthogroup information
        thorax_data_merged <- merge(thorax_data, allspecies_df[, c("GeneID", "GeneType", "Species")], by = "GeneID")
        thorax_data_merged <- merge(thorax_data_merged, filtered_final_orthotable[, c("GeneID", "Orthogroup")], by = "GeneID")
        
        # Handle missing Orthogroups
        thorax_data_merged$Orthogroup[is.na(thorax_data_merged$Orthogroup)] <- "Unknown"
        
        # Filter for significant DEGs (both upregulated and downregulated)
        thorax_up <- thorax_data_merged %>%
            filter(padj < 0.05 & log2FoldChange >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        thorax_down <- thorax_data_merged %>%
            filter(padj < 0.05 & log2FoldChange <= -1) %>%
            select(Orthogroup) %>%
            distinct()
        
        all_deg <- thorax_data_merged %>%
            filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
            select(Orthogroup) %>%
            distinct()
        
        # Store the DEGs in the list
        degs_up[[species]] <- thorax_up$Orthogroup
        degs_down[[species]] <- thorax_down$Orthogroup
        degs_all[[species]] <- all_deg$Orthogroup
    }
    
    return(list(up = degs_up, down = degs_down, all = degs_all))
}

# Function to display Venn diagram and corresponding datatable based on Orthogroups
display_venn_with_datatable <- function(venn_data, title, allspecies_df, filtered_final_orthotable) {
    # Calculate the overlapping Orthogroups
    overlap_orthogroups <- Reduce(intersect, venn_data)
    
    # If no overlapping Orthogroups, we still display the Venn diagram, but inform the user
    if(length(overlap_orthogroups) == 0) {
        cat("No overlapping Orthogroups found for", title, "\n")
    } else {
        cat("Overlapping Orthogroups: \n")
        print(overlap_orthogroups)
    }
    
    # Create a data frame for the overlapping Orthogroups (even if it's empty)
    overlap_df <- data.frame(Orthogroup = overlap_orthogroups)
    
    # Rename GeneID column to avoid conflict in allspecies_df
    colnames(allspecies_df)[colnames(allspecies_df) == "GeneID"] <- "GeneID_allspecies"
    
    # Merge to get species and other information from filtered_final_orthotable
    meta_brock_df <- merge(overlap_df, filtered_final_orthotable, by = "Orthogroup", all.x = TRUE)
    
    # Check if the merge produced any data
    if (nrow(meta_brock_df) == 0) {
        cat("No matching rows after merging Orthogroups for", title, "\n")
    }
    
    # Rename gene_id column to GeneID_meta_brock
    colnames(meta_brock_df)[colnames(meta_brock_df) == "gene_id"] <- "GeneID_meta_brock"
    
    # Ensure that GeneID columns are the same type (character)
    meta_brock_df$GeneID_meta_brock <- as.character(meta_brock_df$GeneID_meta_brock)
    allspecies_df$GeneID_allspecies <- as.character(allspecies_df$GeneID_allspecies)
    
    # Perform the merge with renamed columns
    meta_brock_df <- merge(meta_brock_df, allspecies_df, by.x = "GeneID_meta_brock", by.y = "GeneID_allspecies", all.x = TRUE)
    
    # Generate the Venn diagram
    venn_plot <- venn.diagram(
        x = venn_data, 
        category.names = c("piceifrons", "americana", "cubense", "gregaria"), 
        filename = NULL, 
        output = TRUE,
        fill = c("red", "green", "yellow", "orange"),
        alpha = 0.5,
        cex = 1.2,
        cat.cex = 0,
        main = title,
        main.cex = 1.2
    )
    
    # Clear the current plotting area before drawing the Venn diagram
    grid.newpage()
    
    # Display the Venn diagram
    grid.draw(venn_plot)
    
    # Manually create a custom legend
    legend_labels <- c("piceifrons", "americana", "cubense", "gregaria")
    legend_colors <- c("red", "green", "yellow", "orange")
    
    # Positioning the legend lower on the right side of the plot
    legend_x <- unit(0.85, "npc")  # Adjust x position
    legend_y <- unit(0.2, "npc")   # Lower the legend vertically
    
    # Draw the legend
    for (i in 1:length(legend_labels)) {
        grid.rect(x = legend_x, y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  width = unit(0.02, "npc"), height = unit(0.02, "npc"), 
                  gp = gpar(fill = legend_colors[i], col = NA))
        grid.text(label = legend_labels[i], x = legend_x + unit(0.05, "npc"), 
                  y = legend_y - unit((i - 1) * 0.05, "npc"), 
                  just = "left", gp = gpar(cex = 0.8))
    }
    
    # Display the merged overlapping Orthogroups table with datatable
    datatable(meta_brock_df, options = list(
        pageLength = 10,
        scrollX = TRUE,
        autoWidth = TRUE,
        searchHighlight = TRUE
    ),
    rownames = FALSE,
    escape = FALSE
    ) %>%
        formatStyle(
            'Species.x', target = 'cell',
            fontStyle = 'italic'
        ) %>%
        formatStyle(
            columns = names(meta_brock_df), 
            target = 'row',
            color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
            fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal")),
            backgroundColor = styleEqual(c("red", "blue", "black"), c("white", "white", "white"))
        )
}

# Example for testing with your data (for plastic_species)
venn_data_plastic_species <- load_deg_data(plastic_species, allspecies_df, filtered_final_orthotable)

# Prepare the data for the Venn diagrams for plastic_species
venn_data_up <- list(
  gregaria = venn_data_plastic_species$up[["gregaria"]],
  piceifrons = venn_data_plastic_species$up[["piceifrons"]],
  cancellata = venn_data_plastic_species$up[["cancellata"]],
  americana = venn_data_plastic_species$up[["americana"]]
)

venn_data_down <- list(
  gregaria = venn_data_plastic_species$down[["gregaria"]],
  piceifrons = venn_data_plastic_species$down[["piceifrons"]],
  cancellata = venn_data_plastic_species$down[["cancellata"]],
  americana = venn_data_plastic_species$down[["americana"]]
)

venn_data_all <- list(
  gregaria = venn_data_plastic_species$all[["gregaria"]],
  piceifrons = venn_data_plastic_species$all[["piceifrons"]],
  cancellata = venn_data_plastic_species$all[["cancellata"]],
  americana = venn_data_plastic_species$all[["americana"]]
)

# Display the Venn diagram and datatable for thorax upregulated DEGs (plastic_species)
display_venn_with_datatable(venn_data_up, "Venn Diagram of Thorax Upregulated DEGs - Plastic Species", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Thorax Upregulated DEGs - Plastic Species 
No matching rows after merging Orthogroups for Venn Diagram of Thorax Upregulated DEGs - Plastic Species 

# Display the Venn diagram and datatable for head downregulated DEGs (plastic_species)
display_venn_with_datatable(venn_data_down, "Venn Diagram of Thorax Downregulated DEGs - Plastic Species", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of Thorax Downregulated DEGs - Plastic Species 
No matching rows after merging Orthogroups for Venn Diagram of Thorax Downregulated DEGs - Plastic Species 

# Display the Venn diagram and datatable for all significant DEGs (plastic_species)
display_venn_with_datatable(venn_data_all, "Venn Diagram of All Thorax DEGs - Plastic Species", allspecies_df, filtered_final_orthotable)
No overlapping Orthogroups found for Venn Diagram of All Thorax DEGs - Plastic Species 
No matching rows after merging Orthogroups for Venn Diagram of All Thorax DEGs - Plastic Species 

# Define the species for PACclade
plastic_species <- c("gregaria", "piceifrons", "cancellata", "americana")
input_file <- file.path(ortho_dir, "Orthogroups_genesprotein_Schisto_Nov2024.txt")
filtered_final_orthotable <- read.table(input_file, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()

# Loop through each species to process their data
for (species in plastic_species) {
  # Load DESeq2 results for head
  thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_", species, ".csv"))
  
  # Load the DESeq2 results
  thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
  
  # Check if data is empty and handle accordingly
  if (nrow(thorax_data) == 0) {
    message(paste("No data for species:", species))
    next  # Skip to the next species if there's no data
  }
  
  # Rename the "gene_id" column in filtered_final_orthotable for consistency
  colnames(filtered_final_orthotable)[colnames(filtered_final_orthotable) == "gene_id"] <- "GeneID"
  
  # Merge with filtered_final_orthotable to include Orthogroup
  merged_data <- merge(thorax_data, filtered_final_orthotable, by = "GeneID", all.x = TRUE)
  
  # Check if merge was successful
  if (nrow(merged_data) == 0) {
    message(paste("No matching data for species:", species))
    next  # Skip if no matching data after merging
  }

  # Filter for significant DEGs and select top 500 upregulated and downregulated genes for each tissue
  thorax_up <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange > 1) %>%
    arrange(desc(log2FoldChange)) %>%
    slice(1:500)
  
  thorax_down <- merged_data %>%
    filter(padj < 0.05 & log2FoldChange < -1) %>%
    arrange(log2FoldChange) %>%
    slice(1:500)
  
  # Combine data and prepare for heatmap, adding the species column
  heatmap_data <- bind_rows(
    thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
    thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
  ) %>%
    select(Orthogroup, log2FoldChange, Tissue, Regulation, Species)
  
  # Append the heatmap data to the list
  heatmap_list[[species]] <- heatmap_data
}

# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)

# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
    stop("No valid data available for heatmap generation.")
}

# Filter out rows with missing Orthogroup values
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(Orthogroup))

# Check if there are any missing values in log2FoldChange (optional, just in case)
final_heatmap_data <- final_heatmap_data %>%
    filter(!is.na(log2FoldChange))

# Create heatmap matrix using Orthogroup instead of GeneID
heatmap_matrix <- final_heatmap_data %>%
    group_by(Orthogroup, Species) %>%
    summarize(
        Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
        .groups = 'drop'
    ) %>%
    pivot_wider(names_from = Species, 
                values_from = Thorax_Combined, 
                values_fill = list(Thorax_Combined = 0)) %>%
    column_to_rownames("Orthogroup") %>%
    as.matrix()

# Check if heatmap_matrix is empty
if (nrow(heatmap_matrix) == 0) {
    stop("No valid data available for heatmap matrix.")
}

color_palette <- c("cyan", "black", "orange3", "orange")
color_palette2 <- c("blue2", "white", "red3", "red4")

# Create heatmap
pheatmap(
  heatmap_matrix,
  color = color_palette2,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to TRUE to cluster samples
  show_rownames = FALSE,  # Show gene IDs
  show_colnames = TRUE,  # Show tissue and species names
  fontsize_row = 6,      # Adjust font size for rows if necessary
  fontsize_col = 10,     # Adjust font size for columns if necessary
  main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 2"
)

pheatmap(
  heatmap_matrix,
  color = color_palette,
  cluster_rows = TRUE,  # Set to TRUE to cluster genes
  cluster_cols = FALSE,  # Set to FALSE to prevent clustering on columns
  show_rownames = FALSE,  # Hide gene IDs
  show_colnames = TRUE,   # Show tissue and species names
  fontsize_row = 6,       # Adjust font size for rows if necessary
  fontsize_col = 10,      # Adjust font size for columns if necessary
  main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 2"
)

All species

Combined tissues


sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Phoenix
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] lubridate_1.9.3     forcats_1.0.0       stringr_1.5.1      
 [4] purrr_1.0.2         tidyverse_2.0.0     readr_2.1.5        
 [7] DT_0.33             gridExtra_2.3       VennDiagram_1.7.3  
[10] futile.logger_1.4.3 tibble_3.2.1        kableExtra_1.4.0   
[13] viridis_0.6.5       viridisLite_0.4.2   RColorBrewer_1.1-3 
[16] tidyr_1.3.1         pheatmap_1.0.12     ggVennDiagram_1.5.3
[19] htmlwidgets_1.6.4   plotly_4.10.4       ggplot2_3.5.1      
[22] dplyr_1.1.4         knitr_1.48         

loaded via a namespace (and not attached):
 [1] gtable_0.3.6         xfun_0.49            bslib_0.8.0         
 [4] tzdb_0.4.0           crosstalk_1.2.1      vctrs_0.6.5         
 [7] tools_4.4.1          generics_0.1.3       fansi_1.0.6         
[10] highr_0.11           pkgconfig_2.0.3      data.table_1.16.2   
[13] lifecycle_1.0.4      farver_2.1.2         compiler_4.4.1      
[16] git2r_0.35.0         textshaping_0.4.0    munsell_0.5.1       
[19] httpuv_1.6.15        htmltools_0.5.8.1    sass_0.4.9          
[22] yaml_2.3.10          lazyeval_0.2.2       later_1.3.2         
[25] pillar_1.9.0         jquerylib_0.1.4      whisker_0.4.1       
[28] cachem_1.1.0         tidyselect_1.2.1     digest_0.6.37       
[31] stringi_1.8.4        labeling_0.4.3       rprojroot_2.0.4     
[34] fastmap_1.2.0        colorspace_2.1-1     cli_3.6.3           
[37] magrittr_2.0.3       utf8_1.2.4           withr_3.0.2         
[40] scales_1.3.0         promises_1.3.0       timechange_0.3.0    
[43] rmarkdown_2.29       lambda.r_1.2.4       httr_1.4.7          
[46] workflowr_1.7.1      ragg_1.3.3           hms_1.1.3           
[49] evaluate_1.0.1       rlang_1.1.4          futile.options_1.0.1
[52] Rcpp_1.0.13-1        glue_1.8.0           formatR_1.14        
[55] xml2_1.3.6           svglite_2.1.3        rstudioapi_0.17.1   
[58] jsonlite_1.8.9       R6_2.5.1             systemfonts_1.1.0   
[61] fs_1.6.5