Last updated: 2024-11-07
Checks: 5 2
Knit directory:
locust-comparative-genomics/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20221025) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /Users/maevatecher/Library/Mobile Documents/comappleCloudDocs/Documents/GitHub/locust-comparative-genomics/data | data |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 9fb9741. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: data/.DS_Store
Ignored: data/.Rhistory
Ignored: data/OLD/.DS_Store
Ignored: data/OLD/DEseq2_SCUBE_SCUBE_THORAX_STARnew_features/.DS_Store
Ignored: data/OLD/DEseq2_SGREG_SGREG_HEAD_STARnew_features/.DS_Store
Ignored: data/OLD/DEseq2_SGREG_SGREG_THORAX_STARnew_features/.DS_Store
Ignored: data/OLD/americana/.DS_Store
Ignored: data/OLD/americana/deg_counts/.DS_Store
Ignored: data/OLD/americana/deg_counts/STAR_newparams/.DS_Store
Ignored: data/OLD/cubense/deg_counts/STAR/cubense/featurecounts/
Ignored: data/OLD/cubense/deg_counts/STAR/gregaria/
Ignored: data/OLD/gregaria/.DS_Store
Ignored: data/OLD/gregaria/deg_counts/.DS_Store
Ignored: data/OLD/gregaria/deg_counts/STAR/.DS_Store
Ignored: data/OLD/gregaria/deg_counts/STAR/gregaria/.DS_Store
Ignored: data/OLD/gregaria/deg_counts/STAR_newparams/.DS_Store
Ignored: data/OLD/piceifrons/.DS_Store
Ignored: data/list/.DS_Store
Ignored: figures/
Ignored: tables/
Untracked files:
Untracked: data/03-americana-DESeq2-togregaria/
Untracked: data/03-cancellata-DESeq2-togregaria/
Untracked: data/03-cubense-DESeq2-togregaria/
Untracked: data/03-gregaria-DESeq2-togregaria/
Untracked: data/03-nitens-DESeq2-togregaria/
Untracked: data/03-piceifrons-DESeq2-togregaria/
Untracked: data/DEG-results/
Untracked: data/list/GO_Annotations/
Untracked: data/list/allspecies_geneid.csv
Untracked: data/list/allspecies_protein2geneid.tsv
Untracked: data/orthofinder/
Unstaged changes:
Modified: analysis/1_seq-data.Rmd
Modified: analysis/2_orthologs-prediction.Rmd
Modified: analysis/3_deseq2-results.Rmd
Modified: analysis/3_go-enrichment.Rmd
Modified: analysis/3_overlap-venn.Rmd
Modified: analysis/3_seq-data-qc.Rmd
Modified: analysis/_site.yml
Deleted: data/list/tx2gene.americana.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/3_overlap-venn.Rmd) and
HTML (docs/3_overlap-venn.html) files. If you’ve configured
a remote Git repository (see ?wflow_git_remote), click on
the hyperlinks in the table below to view the files as they were in that
past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| html | ba35b82 | Maeva A. TECHER | 2024-06-19 | Build site. |
| html | 45d0b6b | Maeva A. TECHER | 2024-05-15 | Build site. |
| Rmd | 5dff93d | Maeva A. TECHER | 2024-05-15 | wflow_publish("analysis/3_overlap-venn.Rmd") |
We start by loading all the required R packages.
#(install first from CRAN or Bioconductor)
library(knitr)
library(dplyr)
library(ggplot2)
library(plotly)
library(htmlwidgets) # For saving interactive plots
library(ggVennDiagram)
library(pheatmap)
library(tidyr)
library(RColorBrewer)
library(viridis)
library(kableExtra)
library(tibble)
# Path for all species
workDir <- "/Users/maevatecher/Library/Mobile Documents/com~apple~CloudDocs/Documents/GitHub/locust-comparative-genomics/data"
allspecies_path <- file.path(workDir, "/list/allspecies_geneid.csv")
allspecies_df <- read.table(allspecies_path, sep = ",", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)
species_list <- c("gregaria", "piceifrons", "cancellata", "americana", "cubense", "nitens")
Here our objective is to compare the abundance, composition and
overlap of the DEGs found in the head and thorax tissues of each species
between the isolated and crowded last instar females. We found that the
differential genes expressed detected by DESeq2 varied
across species and tissues but we need some perspective: Are locusts
up-regulated and down-regulated the same genes? In the later section GO
enrichment, we will investigate what are the functions of these genes as
we will see that each species seems to show different gene expression
profiles in response to density changes.
We summarized the number of genes differential expressed between density for each species and each tissues.
# Initialize an empty data frame to store results
summary_table_head <- data.frame(Species = character(),
Head_Upregulated = integer(),
Head_Upregulated_Strict = integer(),
Head_Downregulated = integer(),
Head_Downregulated_Strict = integer(),
stringsAsFactors = FALSE)
summary_table_thorax <- data.frame(Species = character(),
Thorax_Upregulated = integer(),
Thorax_Upregulated_Strict = integer(),
Thorax_Downregulated = integer(),
Thorax_Downregulated_Strict = integer(),
stringsAsFactors = FALSE)
# Loop through each species to process their data
for (species in species_list) {
# Read the DESeq2 results and sigresults files
head_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
thorax_sigresults_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
head_sigresults <- read.csv(head_sigresults_file, stringsAsFactors = FALSE)
thorax_sigresults <- read.csv(thorax_sigresults_file, stringsAsFactors = FALSE)
# Count upregulated and downregulated genes for head
head_upregulated <- nrow(head_sigresults %>% filter(log2FoldChange > 0))
head_downregulated <- nrow(head_sigresults %>% filter(log2FoldChange < -0))
head_upregulated_strict <- nrow(head_sigresults %>% filter(log2FoldChange > 1))
head_downregulated_strict <- nrow(head_sigresults %>% filter(log2FoldChange < -1))
# Count upregulated and downregulated genes for thorax
thorax_upregulated <- nrow(thorax_sigresults %>% filter(log2FoldChange > 0))
thorax_downregulated <- nrow(thorax_sigresults %>% filter(log2FoldChange < -0))
thorax_upregulated_strict <- nrow(thorax_sigresults %>% filter(log2FoldChange > 1))
thorax_downregulated_strict <- nrow(thorax_sigresults %>% filter(log2FoldChange < -1))
# Add to the summary table
summary_table_head <- rbind(summary_table_head, data.frame("Species" = species,
"Head_Upregulated" = head_upregulated,
"Head_Downregulated" = head_downregulated,
"Head_Upregulated_Strict" = head_upregulated_strict,
"Head_Downregulated_Strict" = head_downregulated_strict,
stringsAsFactors = FALSE))
summary_table_thorax <- rbind(summary_table_thorax, data.frame("Species" = species,
"Thorax_Upregulated" = thorax_upregulated,
"Thorax_Downregulated" = thorax_downregulated,
"Thorax_Upregulated_Strict" = thorax_upregulated_strict,
"Thorax_Downregulated_Strict" = thorax_downregulated_strict,
stringsAsFactors = FALSE))
}
# Print the summary table in a markdown-friendly format
knitr::kable(summary_table_head, format = "markdown", caption = "Summary of differentially expressed genes in head per species")
| Species | Head_Upregulated | Head_Downregulated | Head_Upregulated_Strict | Head_Downregulated_Strict |
|---|---|---|---|---|
| gregaria | 2709 | 2988 | 814 | 662 |
| piceifrons | 375 | 375 | 194 | 191 |
| cancellata | 689 | 756 | 301 | 386 |
| americana | 703 | 487 | 311 | 256 |
| cubense | 30 | 31 | 30 | 31 |
| nitens | 189 | 259 | 104 | 207 |
# Convert the summary table to a long format for easier plotting
summary_long_head <- summary_table_head %>%
select(Species, Head_Upregulated_Strict, Head_Downregulated_Strict) %>%
pivot_longer(cols = -Species,
names_to = c(".value", "Tissue"),
names_sep = "_")
# Adjust the values for downregulated genes to be negative
summary_long_head <- summary_long_head %>%
mutate(Head = ifelse(Tissue == "Downregulated", -Head, Head))
summary_long_head$Species <- factor(summary_long_head$Species, levels = species_list)
# Now create a barplot for head
ggplot(summary_long_head, aes(x = Species, y = Head, fill = Tissue)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Upregulated and Downregulated Genes in Head Tissues (lfc > 1)",
x = "Species",
y = "Number of Genes") +
scale_fill_manual(values = c("Upregulated" = "red2", "Downregulated" = "blue")) +
scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 850)) + # Set fixed y-axis limits
theme_minimal(base_size = 12) + # Increase base font size for better readability
theme(legend.position = "top",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 12, angle = 45, hjust = 1), # Adjust text size and angle for x-axis
axis.text.y = element_text(size = 12),
panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"),
panel.grid.minor = element_blank())

knitr::kable(summary_table_thorax, format = "markdown", caption = "Summary of differentially expressed genes in thorax per species")
| Species | Thorax_Upregulated | Thorax_Downregulated | Thorax_Upregulated_Strict | Thorax_Downregulated_Strict |
|---|---|---|---|---|
| gregaria | 2751 | 2691 | 622 | 1174 |
| piceifrons | 1517 | 1200 | 549 | 221 |
| cancellata | 686 | 648 | 289 | 303 |
| americana | 398 | 699 | 149 | 339 |
| cubense | 104 | 218 | 64 | 154 |
| nitens | 0 | 0 | 0 | 0 |
# Convert the summary table to a long format for easier plotting
summary_long_thorax <- summary_table_thorax %>%
select(Species, Thorax_Upregulated_Strict, Thorax_Downregulated_Strict) %>%
pivot_longer(cols = -Species,
names_to = c(".value", "Tissue"),
names_sep = "_")
# Adjust the values for downregulated genes to be negative
summary_long_thorax <- summary_long_thorax %>%
mutate(Thorax = ifelse(Tissue == "Downregulated", -Thorax, Thorax))
summary_long_thorax$Species <- factor(summary_long_thorax$Species, levels = species_list)
# Now create a barplot for head
ggplot(summary_long_thorax, aes(x = Species, y = Thorax, fill = Tissue)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Upregulated and Downregulated Genes in Thorax Tissues (lfc > 1)",
x = "Species",
y = "Number of Genes") +
scale_fill_manual(values = c("Upregulated" = "red2", "Downregulated" = "blue")) +
scale_y_continuous(labels = function(x) ifelse(x < 0, -x, x), limits = c(-1200, 850)) + # Set fixed y-axis limits
theme_minimal(base_size = 12) + # Increase base font size for better readability
theme(legend.position = "top",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 12, angle = 45, hjust = 1), # Adjust text size and angle for x-axis
axis.text.y = element_text(size = 12),
panel.grid.major.y = element_line(color = "grey90", linetype = "dashed"),
panel.grid.minor = element_blank())

species_list5 <- c("gregaria", "piceifrons", "cancellata", "americana", "cubense", "nitens")
# Loop through each species to process their data
for (species in species_list5) {
# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_results_Thorax_togregaria_", species, ".csv"))
head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
message(paste("No data for species:", species))
next # Skip to the next species if there's no data
}
# Filter significant DEGs for both head and thorax
head_sig_genes <- head_data %>%
filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
select(GeneID = X, log2FoldChange, padj)
thorax_sig_genes <- thorax_data %>%
filter(padj < 0.05 & abs(log2FoldChange) > 1) %>%
select(GeneID = X, log2FoldChange, padj)
# Find overlapping genes based on GeneID
overlapping_genes <- inner_join(head_sig_genes, thorax_sig_genes, by = "GeneID", suffix = c("_head", "_thorax"))
# Save the overlapping genes to a CSV file
output_file <- file.path(workDir, "DEG-results", paste0("overlapping_genes_head_thorax_", species, ".csv"))
write.csv(overlapping_genes, output_file, row.names = FALSE)
# Plot overlapping genes with scatter plot
p <- ggplot(overlapping_genes, aes(x = log2FoldChange_head, y = log2FoldChange_thorax)) +
geom_point(aes(color = case_when(
log2FoldChange_head > 0 & log2FoldChange_thorax > 0 ~ "Upregulated in Both",
log2FoldChange_head < 0 & log2FoldChange_thorax < 0 ~ "Downregulated in Both",
log2FoldChange_head > 0 & log2FoldChange_thorax < 0 ~ "Up in Head, Down in Thorax",
log2FoldChange_head < 0 & log2FoldChange_thorax > 0 ~ "Down in Head, Up in Thorax"
)), size = 3, alpha = 0.7) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
labs(
x = "Log2 Fold Change (Head)",
y = "Log2 Fold Change (Thorax)",
color = "Regulation Pattern",
title = "Comparison of Log2 Fold Changes in Overlapping Genes",
subtitle = paste("Head vs. Thorax in", species)
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, face = "italic"),
legend.position = "top"
) +
scale_color_manual(values = c(
"Upregulated in Both" = "red",
"Downregulated in Both" = "blue",
"Up in Head, Down in Thorax" = "purple",
"Down in Head, Up in Thorax" = "green"
))
# Save the static plot
ggsave(filename = file.path(workDir, "DEG-results", paste0("scatter_plot_overlapping_genes_", species, ".png")), plot = p)
print(p)
# Convert to interactive Plotly plot
interactive_plot <- ggplotly(p) %>%
layout(title = paste("Scatter Plot of Overlapping Genes:", species), hoverlabel = list(bgcolor = "white"))
}






# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()
# Loop through each species to process their data
for (species in species_list) {
# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
# Load the data
head_data <- read.csv(head_file, stringsAsFactors = FALSE)
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
# Check if data is empty and handle accordingly
if (nrow(head_data) == 0 || nrow(thorax_data) == 0) {
message(paste("No data for species:", species))
next # Skip to the next species if there's no data
}
# Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
head_up <- head_data %>%
filter(padj < 0.05 & log2FoldChange > 1) %>%
arrange(desc(log2FoldChange)) %>%
slice(1:500)
head_down <- head_data %>%
filter(padj < 0.05 & log2FoldChange < -1) %>%
arrange(log2FoldChange) %>%
slice(1:500)
thorax_up <- thorax_data %>%
filter(padj < 0.05 & log2FoldChange > 1) %>%
arrange(desc(log2FoldChange)) %>%
slice(1:500)
thorax_down <- thorax_data %>%
filter(padj < 0.05 & log2FoldChange < -1) %>%
arrange(log2FoldChange) %>%
slice(1:500)
# Combine data and prepare for heatmap, adding the species column
heatmap_data <- bind_rows(
head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species),
thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
) %>%
select(GeneID, log2FoldChange, Tissue, Regulation, Species)
# Append the heatmap data to the list
heatmap_list[[species]] <- heatmap_data
}
# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)
# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
stop("No valid data available for heatmap generation.")
}
# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
group_by(GeneID, Species) %>%
summarize(
Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
.groups = 'drop'
) %>%
pivot_longer(cols = c("Head_Combined", "Thorax_Combined"),
names_to = c("Tissue", ".value"),
names_sep = "_") %>%
# Combine the Head and Thorax values for each GeneID while keeping Species
group_by(GeneID, Species) %>%
summarize(
Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
.groups = 'drop'
) %>%
pivot_wider(names_from = Species,
values_from = c(Head, Thorax),
values_fill = list(Head = 0, Thorax = 0)) %>%
column_to_rownames("GeneID") %>%
as.matrix()
color_palette <- c("cyan", "cyan2", "cyan3", "black", "orange3", "orange2", "orange")
color_palette2 <- c("blue3", "blue2", "blue1", "white", "red", "red2", "red3")
# Create heatmap
pheatmap(
heatmap_matrix,
color = color_palette2,
cluster_rows = TRUE, # Set to TRUE to cluster genes
cluster_cols = FALSE, # Set to TRUE to cluster samples
show_rownames = FALSE, # Show gene IDs
show_colnames = TRUE, # Show tissue and species names
fontsize_row = 6, # Adjust font size for rows if necessary
fontsize_col = 10, # Adjust font size for columns if necessary
main = "Heatmap of Gene Expression of Head and Thorax tissue - STRATEGY 1"
)

pheatmap(
heatmap_matrix,
color = color_palette,
cluster_rows = TRUE, # Set to TRUE to cluster genes
cluster_cols = FALSE, # Set to FALSE to prevent clustering on columns
show_rownames = FALSE, # Hide gene IDs
show_colnames = TRUE, # Show tissue and species names
fontsize_row = 6, # Adjust font size for rows if necessary
fontsize_col = 10, # Adjust font size for columns if necessary
main = "Ordered Heatmap of Gene Expression of Head and Thorax tissue - STRATEGY 1"
)

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()
# Loop through each species to process their data
for (species in species_list) {
# Load DESeq2 results for head and thorax
head_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Head_togregaria_", species, ".csv"))
# Load the data
head_data <- read.csv(head_file, stringsAsFactors = FALSE)
# Check if data is empty and handle accordingly
if (nrow(head_data) == 0) {
message(paste("No data for species:", species))
next # Skip to the next species if there's no data
}
# Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
head_up <- head_data %>%
filter(padj < 0.05 & log2FoldChange > 1) %>%
arrange(desc(log2FoldChange)) %>%
slice(1:500)
head_down <- head_data %>%
filter(padj < 0.05 & log2FoldChange < -1) %>%
arrange(log2FoldChange) %>%
slice(1:500)
# Combine data and prepare for heatmap, adding the species column
heatmap_data <- bind_rows(
head_up %>% mutate(Tissue = "Head", Regulation = "Upregulated", Species = species),
head_down %>% mutate(Tissue = "Head", Regulation = "Downregulated", Species = species)
) %>%
select(GeneID, log2FoldChange, Tissue, Regulation, Species)
# Append the heatmap data to the list
heatmap_list[[species]] <- heatmap_data
}
# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)
# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
stop("No valid data available for heatmap generation.")
}
# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
group_by(GeneID, Species) %>%
summarize(
Head_Combined = sum(log2FoldChange[Tissue == "Head"], na.rm = TRUE),
.groups = 'drop'
) %>%
pivot_longer(cols = c("Head_Combined"),
names_to = c("Tissue", ".value"),
names_sep = "_") %>%
# Combine the Head and Thorax values for each GeneID while keeping Species
group_by(GeneID, Species) %>%
summarize(
Head = sum(ifelse(Tissue == "Head", Combined, 0), na.rm = TRUE),
.groups = 'drop'
) %>%
pivot_wider(names_from = Species,
values_from = c(Head),
values_fill = list(Head = 0)) %>%
column_to_rownames("GeneID") %>%
as.matrix()
color_palette <- c("cyan", "cyan2", "cyan3", "black", "orange3", "orange2", "orange")
color_palette2 <- c("blue3", "blue2", "blue1", "white", "red", "red2", "red3")
# Create heatmap
pheatmap(
heatmap_matrix,
color = color_palette2,
cluster_rows = TRUE, # Set to TRUE to cluster genes
cluster_cols = FALSE, # Set to TRUE to cluster samples
show_rownames = FALSE, # Show gene IDs
show_colnames = TRUE, # Show tissue and species names
fontsize_row = 6, # Adjust font size for rows if necessary
fontsize_col = 10, # Adjust font size for columns if necessary
main = "Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

pheatmap(
heatmap_matrix,
color = color_palette,
cluster_rows = TRUE, # Set to TRUE to cluster genes
cluster_cols = FALSE, # Set to FALSE to prevent clustering on columns
show_rownames = FALSE, # Hide gene IDs
show_colnames = TRUE, # Show tissue and species names
fontsize_row = 6, # Adjust font size for rows if necessary
fontsize_col = 10, # Adjust font size for columns if necessary
main = "Ordered Heatmap of Gene Expression of Head tissue - STRATEGY 1"
)

# Initialize an empty list to store heatmap data for each species
heatmap_list <- list()
# Loop through each species to process their data
for (species in species_list) {
# Load DESeq2 results for head and thorax
thorax_file <- file.path(workDir, "DEG-results", paste0("DESeq2_sigresults_Thorax_togregaria_", species, ".csv"))
# Load the data
thorax_data <- read.csv(thorax_file, stringsAsFactors = FALSE)
# Check if data is empty and handle accordingly
if (nrow(thorax_data) == 0) {
message(paste("No data for species:", species))
next # Skip to the next species if there's no data
}
# Filter for significant DEGs and select top 100 upregulated and downregulated genes for each tissue
thorax_up <- thorax_data %>%
filter(padj < 0.05 & log2FoldChange > 1) %>%
arrange(desc(log2FoldChange)) %>%
slice(1:500)
thorax_down <- thorax_data %>%
filter(padj < 0.05 & log2FoldChange < -1) %>%
arrange(log2FoldChange) %>%
slice(1:500)
# Combine data and prepare for heatmap, adding the species column
heatmap_data <- bind_rows(
thorax_up %>% mutate(Tissue = "Thorax", Regulation = "Upregulated", Species = species),
thorax_down %>% mutate(Tissue = "Thorax", Regulation = "Downregulated", Species = species)
) %>%
select(GeneID, log2FoldChange, Tissue, Regulation, Species)
# Append the heatmap data to the list
heatmap_list[[species]] <- heatmap_data
}
# Combine all species data into a single dataframe for heatmap matrix preparation
final_heatmap_data <- bind_rows(heatmap_list)
# Check if final_heatmap_data is empty before proceeding
if (nrow(final_heatmap_data) == 0) {
stop("No valid data available for heatmap generation.")
}
# Create heatmap matrix
heatmap_matrix <- final_heatmap_data %>%
group_by(GeneID, Species) %>%
summarize(
Thorax_Combined = sum(log2FoldChange[Tissue == "Thorax"], na.rm = TRUE),
.groups = 'drop'
) %>%
pivot_longer(cols = c( "Thorax_Combined"),
names_to = c("Tissue", ".value"),
names_sep = "_") %>%
# Combine the Head and Thorax values for each GeneID while keeping Species
group_by(GeneID, Species) %>%
summarize(
Thorax = sum(ifelse(Tissue == "Thorax", Combined, 0), na.rm = TRUE),
.groups = 'drop'
) %>%
pivot_wider(names_from = Species,
values_from = c(Thorax),
values_fill = list(Thorax = 0)) %>%
column_to_rownames("GeneID") %>%
as.matrix()
color_palette <- c("cyan", "black", "orange")
color_palette2 <- c("blue3", "white", "red3")
# Create heatmap
pheatmap(
heatmap_matrix,
color = color_palette2,
cluster_rows = TRUE, # Set to TRUE to cluster genes
cluster_cols = FALSE, # Set to TRUE to cluster samples
show_rownames = FALSE, # Show gene IDs
show_colnames = TRUE, # Show tissue and species names
fontsize_row = 6, # Adjust font size for rows if necessary
fontsize_col = 10, # Adjust font size for columns if necessary
main = "Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

pheatmap(
heatmap_matrix,
color = color_palette,
cluster_rows = TRUE, # Set to TRUE to cluster genes
cluster_cols = FALSE, # Set to FALSE to prevent clustering on columns
show_rownames = FALSE, # Hide gene IDs
show_colnames = TRUE, # Show tissue and species names
fontsize_row = 6, # Adjust font size for rows if necessary
fontsize_col = 10, # Adjust font size for columns if necessary
main = "Ordered Heatmap of Gene Expression of Thorax tissue - STRATEGY 1"
)

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.7
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.2.1 kableExtra_1.4.0 viridis_0.6.5
[4] viridisLite_0.4.2 RColorBrewer_1.1-3 tidyr_1.3.1
[7] pheatmap_1.0.12 ggVennDiagram_1.5.3 htmlwidgets_1.6.4
[10] plotly_4.10.4 ggplot2_3.5.1 dplyr_1.1.4
[13] knitr_1.48
loaded via a namespace (and not attached):
[1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 xml2_1.3.6
[5] stringi_1.8.4 digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1
[9] grid_4.4.1 fastmap_1.2.0 rprojroot_2.0.4 workflowr_1.7.1
[13] jsonlite_1.8.9 whisker_0.4.1 gridExtra_2.3 promises_1.3.0
[17] httr_1.4.7 purrr_1.0.2 fansi_1.0.6 crosstalk_1.2.1
[21] scales_1.3.0 textshaping_0.4.0 lazyeval_0.2.2 jquerylib_0.1.4
[25] cli_3.6.3 rlang_1.1.4 munsell_0.5.1 withr_3.0.2
[29] cachem_1.1.0 yaml_2.3.10 tools_4.4.1 colorspace_2.1-1
[33] httpuv_1.6.15 vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4
[37] git2r_0.35.0 stringr_1.5.1 fs_1.6.5 ragg_1.3.3
[41] pkgconfig_2.0.3 pillar_1.9.0 bslib_0.8.0 later_1.3.2
[45] gtable_0.3.6 glue_1.8.0 data.table_1.16.2 Rcpp_1.0.13-1
[49] systemfonts_1.1.0 xfun_0.49 tidyselect_1.2.1 highr_0.11
[53] rstudioapi_0.17.1 farver_2.1.2 htmltools_0.5.8.1 labeling_0.4.3
[57] svglite_2.1.3 rmarkdown_2.29 compiler_4.4.1