Last updated: 2025-02-28
Checks: 6 1
Knit directory: CX5461_Project/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20250129)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version dd3f95a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Unstaged changes:
Modified: analysis/Colorectal.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/Colorectal.Rmd
) and HTML
(docs/Colorectal.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 1c1e1e4 | sayanpaul01 | 2025-02-28 | Commit |
html | 1c1e1e4 | sayanpaul01 | 2025-02-28 | Commit |
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.3.2
library(tidyr)
Warning: package 'tidyr' was built under R version 4.3.3
library(org.Hs.eg.db)
Warning: package 'AnnotationDbi' was built under R version 4.3.2
Warning: package 'BiocGenerics' was built under R version 4.3.1
Warning: package 'Biobase' was built under R version 4.3.1
Warning: package 'IRanges' was built under R version 4.3.1
Warning: package 'S4Vectors' was built under R version 4.3.1
library(clusterProfiler)
Warning: package 'clusterProfiler' was built under R version 4.3.3
library(biomaRt)
Warning: package 'biomaRt' was built under R version 4.3.2
# Define correct timepoints
timepoints <- c("6hr", "24hr", "72hr")
# Initialize an empty dataframe to store chi-square results
all_chi_results <- data.frame()
# Loop through each timepoint and perform Chi-square test
for (time in timepoints) {
# Load proportion data for the given timepoint
file_path <- paste0("data/Colorectal/Proportion_data_", time, ".csv")
# Check if the file exists before reading
if (!file.exists(file_path)) {
cat("\n🚨 Warning: File does not exist:", file_path, "\nSkipping this timepoint...\n")
next # Skip this iteration if the file is missing
}
proportion_data <- read.csv(file_path)
# Extract counts for Non-Response group
non_response_counts <- proportion_data %>%
filter(Set == "Non Response") %>%
dplyr::select(DEG, `Non.DEG`) %>%
unlist(use.names = FALSE) # Convert to numeric vector
# Debugging: Print Non-Response counts
cat("\nNon-Response Counts for", time, ":\n")
print(non_response_counts)
# Initialize a list to store chi-square test results for this timepoint
chi_results <- list()
# Perform chi-square test for each response group
for (group in unique(proportion_data$Set)) {
if (group == "Non Response") next # Skip Non Response group
# Extract counts for the current response group
group_counts <- proportion_data %>%
filter(Set == group) %>%
dplyr::select(DEG, `Non.DEG`) %>%
unlist(use.names = FALSE) # Convert to numeric vector
# Ensure valid counts for chi-square test
if (length(group_counts) < 2) group_counts <- c(group_counts, 0)
if (length(non_response_counts) < 2) non_response_counts <- c(non_response_counts, 0)
# Create contingency table
contingency_table <- matrix(c(
group_counts[1], group_counts[2], # Current response group counts
non_response_counts[1], non_response_counts[2] # Non-Response counts
), nrow = 2, byrow = TRUE)
# Debugging: Print contingency table
cat("\nProcessing Group:", group, "at", time, "\n")
cat("Contingency Table:\n")
print(contingency_table)
# Perform chi-square test
test_result <- chisq.test(contingency_table)
p_value <- test_result$p.value
significance <- ifelse(p_value < 0.05, "*", "") # Mark * for p < 0.05
# Store results
chi_results[[group]] <- data.frame(
Set = group,
Timepoint = time,
Chi2 = test_result$statistic,
p_value = p_value,
Significance = significance
)
}
# Combine results for this timepoint into a single dataframe
chi_results <- do.call(rbind, chi_results)
# Append to the overall results dataframe
all_chi_results <- rbind(all_chi_results, chi_results)
}
# Save final chi-square results
write.csv(all_chi_results, "data/Colorectal//Chi_Square_Results_All.csv", row.names = FALSE)
# Load the saved datasets
prob_all_1 <- read.csv("data/prob_all_1.csv")$Entrez_ID
prob_all_2 <- read.csv("data/prob_all_2.csv")$Entrez_ID
prob_all_3 <- read.csv("data/prob_all_3.csv")$Entrez_ID
prob_all_4 <- read.csv("data/prob_all_4.csv")$Entrez_ID
# Example Response Groups Data (Replace with actual data)
response_groups <- list(
"Non Response" = prob_all_1, # Replace 'prob_all_1', 'prob_all_2', etc. with your actual response group dataframes
"CX_DOX Shared Late Response" = prob_all_2,
"DOX-Specific Response" = prob_all_3,
"Late High Dose DOX-Specific Response" = prob_all_4
)
# Load the proportion data again for visualization
proportion_data <-read.csv("data/Colorectal/Proportion_data.csv")
# Merge chi-square results into proportion data for plotting
proportion_data <- proportion_data %>%
left_join(all_chi_results %>% dplyr::select(Set, Timepoint, Significance), by = c("Set", "Timepoint"))
# Convert to factors for ordered display
proportion_data$Set <- factor(proportion_data$Set, levels = c(
"Non Response",
"CX_DOX Shared Late Response",
"DOX-Specific Response",
"Late High Dose DOX-Specific Response"
))
proportion_data$Timepoint <- factor(proportion_data$Timepoint, levels = timepoints)
# Plot proportions with significance stars
ggplot(proportion_data, aes(x = Set, y = Percentage, fill = Category)) +
geom_bar(stat = "identity", position = "stack") +
facet_wrap(~Timepoint, scales = "fixed") +
scale_fill_manual(values = c("DEG" = "#e41a1c", "Non-DEG" = "#377eb8")) +
geom_text(
data = proportion_data %>% filter(Significance == "*") %>% distinct(Set, Timepoint, Significance),
aes(x = Set, y = 105, label = Significance),
inherit.aes = FALSE,
size = 6,
color = "black",
hjust = 0.5
) +
labs(
title = "Proportion of Colorectal cancer DEGs and Non-DEGs\nAcross Response Groups (6hr, 24hr, 72hr)",
x = "Response Groups",
y = "Percentage",
fill = "Category"
) +
theme_minimal() +
theme(
plot.title = element_text(size = rel(1.5), hjust = 0.5),
axis.title = element_text(size = 15, color = "black"),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
legend.title = element_blank(),
panel.border = element_rect(color = "black", fill = NA, size = 1.2),
strip.background = element_blank(),
strip.text = element_text(size = 12, face = "bold")
)
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
Version | Author | Date |
---|---|---|
1c1e1e4 | sayanpaul01 | 2025-02-28 |
# Load DEGs Data
CX_0.1_3 <- read.csv("data/DEGs/Toptable_CX_0.1_3.csv")
CX_0.1_24 <- read.csv("data/DEGs/Toptable_CX_0.1_24.csv")
CX_0.1_48 <- read.csv("data/DEGs/Toptable_CX_0.1_48.csv")
CX_0.5_3 <- read.csv("data/DEGs/Toptable_CX_0.5_3.csv")
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.1_3 <- read.csv("data/DEGs/Toptable_DOX_0.1_3.csv")
DOX_0.1_24 <- read.csv("data/DEGs/Toptable_DOX_0.1_24.csv")
DOX_0.1_48 <- read.csv("data/DEGs/Toptable_DOX_0.1_48.csv")
DOX_0.5_3 <- read.csv("data/DEGs/Toptable_DOX_0.5_3.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
# Extract Significant DEGs
DEG1 <- as.character(CX_0.1_3$Entrez_ID[CX_0.1_3$adj.P.Val < 0.05])
DEG2 <- as.character(CX_0.1_24$Entrez_ID[CX_0.1_24$adj.P.Val < 0.05])
DEG3 <- as.character(CX_0.1_48$Entrez_ID[CX_0.1_48$adj.P.Val < 0.05])
DEG4 <- as.character(CX_0.5_3$Entrez_ID[CX_0.5_3$adj.P.Val < 0.05])
DEG5 <- as.character(CX_0.5_24$Entrez_ID[CX_0.5_24$adj.P.Val < 0.05])
DEG6 <- as.character(CX_0.5_48$Entrez_ID[CX_0.5_48$adj.P.Val < 0.05])
DEG7 <- as.character(DOX_0.1_3$Entrez_ID[DOX_0.1_3$adj.P.Val < 0.05])
DEG8 <- as.character(DOX_0.1_24$Entrez_ID[DOX_0.1_24$adj.P.Val < 0.05])
DEG9 <- as.character(DOX_0.1_48$Entrez_ID[DOX_0.1_48$adj.P.Val < 0.05])
DEG10 <- as.character(DOX_0.5_3$Entrez_ID[DOX_0.5_3$adj.P.Val < 0.05])
DEG11 <- as.character(DOX_0.5_24$Entrez_ID[DOX_0.5_24$adj.P.Val < 0.05])
DEG12 <- as.character(DOX_0.5_48$Entrez_ID[DOX_0.5_48$adj.P.Val < 0.05])
data_6hr <- read.csv("data/Colorectal/6hr.csv", stringsAsFactors = FALSE)
data_24hr <- read.csv("data/Colorectal/24hr.csv", stringsAsFactors = FALSE)
data_72hr <- read.csv("data/Colorectal/72hr.csv", stringsAsFactors = FALSE)
# Map gene symbols to Entrez IDs using org.Hs.eg.db
data_6hr <- data_6hr %>%
mutate(Entrez_ID = mapIds(org.Hs.eg.db,
keys = Symbol,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"))
data_24hr <- data_24hr %>%
mutate(Entrez_ID = mapIds(org.Hs.eg.db,
keys = Symbol,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"))
data_72hr <- data_72hr %>%
mutate(Entrez_ID = mapIds(org.Hs.eg.db,
keys = Symbol,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"))
# Define CX-5461 DEG lists
CX_DEGs <- list(
"CX_0.1_3" = DEG1, "CX_0.1_24" = DEG2, "CX_0.1_48" = DEG3,
"CX_0.5_3" = DEG4, "CX_0.5_24" = DEG5, "CX_0.5_48" = DEG6
)
# Define DOX DEG lists
DOX_DEGs <- list(
"DOX_0.1_3" = DEG7, "DOX_0.1_24" = DEG8, "DOX_0.1_48" = DEG9,
"DOX_0.5_3" = DEG10, "DOX_0.5_24" = DEG11, "DOX_0.5_48" = DEG12
)
# Extract Entrez_IDs from cancer gene datasets for 6hr, 24hr, and 72hr
lit_genes <- list(
"6hr" = na.omit(data_6hr$Entrez_ID),
"24hr" = na.omit(data_24hr$Entrez_ID),
"72hr" = na.omit(data_72hr$Entrez_ID)
)
# Combine CX-5461 DEGs into a dataframe with a "Drug" column
CX_DEGs_df <- bind_rows(
lapply(CX_DEGs, function(ids) data.frame(Entrez_ID = ids, Drug = "CX-5461")),
.id = "Sample"
)
# Combine DOX DEGs into a dataframe with a "Drug" column
DOX_DEGs_df <- bind_rows(
lapply(DOX_DEGs, function(ids) data.frame(Entrez_ID = ids, Drug = "DOX")),
.id = "Sample"
)
# Merge CX-5461 and DOX datasets
DEGs_df <- bind_rows(CX_DEGs_df, DOX_DEGs_df)
# Match Entrez_IDs with DEGs from 6hr, 24hr, and 72hr datasets
DEGs_df <- DEGs_df %>%
mutate(
Category_6hr = ifelse(Entrez_ID %in% lit_genes[["6hr"]], "Yes", "No"),
Category_24hr = ifelse(Entrez_ID %in% lit_genes[["24hr"]], "Yes", "No"),
Category_72hr = ifelse(Entrez_ID %in% lit_genes[["72hr"]], "Yes", "No")
)
# Convert to long format for ggplot
proportion_data <- DEGs_df %>%
pivot_longer(cols = starts_with("Category"), names_to = "Timepoint", values_to = "Category") %>%
group_by(Sample, Drug, Timepoint, Category) %>%
summarise(Count = n(), .groups = "drop") %>%
group_by(Sample, Drug, Timepoint) %>%
mutate(Percentage = (Count / sum(Count)) * 100)
# **Fix: Normalize Percentages to Sum Exactly 100%**
proportion_data <- proportion_data %>%
group_by(Sample, Timepoint) %>%
mutate(Percentage = round(Percentage, 2)) %>% # Round each percentage
mutate(Adjustment = 100 - sum(Percentage, na.rm = TRUE)) %>% # Compute rounding difference
mutate(Percentage = ifelse(Category == "No", Percentage + Adjustment, Percentage)) %>% # Apply adjustment to "No"
mutate(Percentage = ifelse(Percentage < 0, 0, Percentage)) %>% # Prevent negatives
mutate(Percentage = ifelse(Percentage > 100, 100, Percentage)) %>% # Prevent values > 100
ungroup() %>% # Remove grouping to avoid select() errors
replace_na(list(Percentage = 0)) # Ensure no NA values
# **Ensure "Yes" is at the Bottom and "No" is at the Top**
proportion_data$Category <- factor(proportion_data$Category, levels = c("Yes", "No"))
# Define correct factor orders for samples and timepoints
sample_order <- c(
"CX_0.1_3", "CX_0.1_24", "CX_0.1_48", "CX_0.5_3", "CX_0.5_24", "CX_0.5_48",
"DOX_0.1_3", "DOX_0.1_24", "DOX_0.1_48", "DOX_0.5_3", "DOX_0.5_24", "DOX_0.5_48"
)
proportion_data$Sample <- factor(proportion_data$Sample, levels = sample_order)
proportion_data$Timepoint <- factor(
proportion_data$Timepoint,
levels = c("Category_6hr", "Category_24hr", "Category_72hr"),
labels = c("6hr", "24hr", "72hr")
)
# **Generate Combined Proportion Plot for CX-5461 and DOX**
ggplot(proportion_data, aes(x = Sample, y = Percentage, fill = Category)) +
geom_bar(stat = "identity", position = "stack") + # Stacked bars
facet_wrap(~Timepoint, scales = "fixed") + # Facet by Timepoint (6hr, 24hr, 72hr)
scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 100)) + # Y-axis as percentage
scale_fill_manual(values = c("Yes" = "#e41a1c", "No" = "#377eb8")) + # Yes (Red) at Bottom, No (Blue) on Top
labs(
title = "Proportion of Human Colorectal Cancer Genes in CX-5461 and DOX DEGs",
x = "Samples (CX-5461 and DOX)",
y = "Percentage",
fill = "Category"
) +
theme_minimal() +
theme(
plot.title = element_text(size = rel(1.5), hjust = 0.5),
axis.title = element_text(size = 15, color = "black"),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
legend.title = element_blank(),
panel.border = element_rect(color = "black", fill = NA, linewidth = 1.2), # Updated for ggplot2 3.4.0+
strip.background = element_blank(),
strip.text = element_text(size = 12, face = "bold")
)
# Import the CSV file
data_6hr_logFC <- read.csv("data/Colorectal/6hrlogFC.csv", stringsAsFactors = FALSE)
# Map gene symbols to Entrez IDs using org.Hs.eg.db
data_6hr_logFC <- data_6hr_logFC %>%
mutate(Entrez_ID = mapIds(org.Hs.eg.db,
keys = Symbol,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"))
# Merge datasets on Entrez_ID for CX and DOX at both concentrations (3 hours)
merged_CX_0.1 <- merge(data_6hr_logFC, CX_0.1_3, by = "Entrez_ID")
merged_CX_0.5 <- merge(data_6hr_logFC, CX_0.5_3, by = "Entrez_ID")
merged_DOX_0.1 <- merge(data_6hr_logFC, DOX_0.1_3, by = "Entrez_ID")
merged_DOX_0.5 <- merge(data_6hr_logFC, DOX_0.5_3, by = "Entrez_ID")
# Remove NA values
merged_CX_0.1 <- na.omit(merged_CX_0.1)
merged_CX_0.5 <- na.omit(merged_CX_0.5)
merged_DOX_0.1 <- na.omit(merged_DOX_0.1)
merged_DOX_0.5 <- na.omit(merged_DOX_0.5)
# Rename columns explicitly to avoid conflicts
colnames(merged_CX_0.1) <- colnames(merged_CX_0.5) <-
colnames(merged_DOX_0.1) <- colnames(merged_DOX_0.5) <-
c("Entrez_ID", "Symbol_6hr", "logFC_6hr", "logFC_Drug", "AveExpr_Drug", "t_Drug", "P.Value_Drug", "adj.P.Val_Drug", "B_Drug")
# Add drug and concentration labels for faceting
merged_CX_0.1$Drug <- "CX (3hr)"
merged_CX_0.5$Drug <- "CX (3hr)"
merged_DOX_0.1$Drug <- "DOX (3hr)"
merged_DOX_0.5$Drug <- "DOX (3hr)"
merged_CX_0.1$Concentration <- "0.1 µM"
merged_CX_0.5$Concentration <- "0.5 µM"
merged_DOX_0.1$Concentration <- "0.1 µM"
merged_DOX_0.5$Concentration <- "0.5 µM"
# Combine all datasets into a single data frame
merged_data <- rbind(
merged_CX_0.1[, c("Entrez_ID", "logFC_Drug", "logFC_6hr", "Concentration", "Drug")],
merged_CX_0.5[, c("Entrez_ID", "logFC_Drug", "logFC_6hr", "Concentration", "Drug")],
merged_DOX_0.1[, c("Entrez_ID", "logFC_Drug", "logFC_6hr", "Concentration", "Drug")],
merged_DOX_0.5[, c("Entrez_ID", "logFC_Drug", "logFC_6hr", "Concentration", "Drug")]
)
# Calculate correlations for each facet
cor_CX_0.1 <- cor.test(merged_CX_0.1$logFC_Drug, merged_CX_0.1$logFC_6hr, method = "pearson")
cor_CX_0.5 <- cor.test(merged_CX_0.5$logFC_Drug, merged_CX_0.5$logFC_6hr, method = "pearson")
cor_DOX_0.1 <- cor.test(merged_DOX_0.1$logFC_Drug, merged_DOX_0.1$logFC_6hr, method = "pearson")
cor_DOX_0.5 <- cor.test(merged_DOX_0.5$logFC_Drug, merged_DOX_0.5$logFC_6hr, method = "pearson")
# Data frame for r and p-values annotations
correlation_data <- data.frame(
Drug = rep(c("CX (3hr)", "DOX (3hr)"), each = 2),
Concentration = rep(c("0.1 µM", "0.5 µM"), times = 2),
x = max(merged_data$logFC_Drug, na.rm = TRUE) * 0.75, # Adjusted placement for better fit
y = max(merged_data$logFC_6hr, na.rm = TRUE) * 0.85,
label = c(
paste0("r = ", round(cor_CX_0.1$estimate, 3), "\np = ", signif(cor_CX_0.1$p.value, 3)),
paste0("r = ", round(cor_CX_0.5$estimate, 3), "\np = ", signif(cor_CX_0.5$p.value, 3)),
paste0("r = ", round(cor_DOX_0.1$estimate, 3), "\np = ", signif(cor_DOX_0.1$p.value, 3)),
paste0("r = ", round(cor_DOX_0.5$estimate, 3), "\np = ", signif(cor_DOX_0.5$p.value, 3))
)
)
# Create scatter plot with facets for concentration (vertically) and drug (horizontally)
scatter_plot <- ggplot(merged_data, aes(x = logFC_Drug, y = logFC_6hr)) +
geom_point(alpha = 0.6, color = "black") + # Black scatter points
geom_smooth(method = "lm", color = "black", se = FALSE) + # Black regression line
labs(
title = "Correlation between Drug (3hr) and Colorectal 6hr logFC",
x = "logFC (Drug_3hr)",
y = "logFC (Colorectal 6hr)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
# Outer box around the entire facet plot
panel.border = element_rect(color = "black", fill = NA, linewidth = 2),
# Black box around each facet title
strip.background = element_rect(fill = "white", color = "black", linewidth = 1.5),
strip.text = element_text(size = 12, face = "bold", color = "black")
) +
# Facet by drug (horizontally) and concentration (vertically)
facet_grid(Concentration ~ Drug) +
# Add r and p-value in the upper-right corner of each facet with better fitting font size
geom_text(data = correlation_data,
aes(x = x, y = y, label = label),
inherit.aes = FALSE, size = 3, fontface = "bold") # Reduced font size for better fit
# Display plot
print(scatter_plot)
# Import the CSV file
data_24hr_logFC <- read.csv("data/Colorectal/24hrlogFC.csv", stringsAsFactors = FALSE)
# Map gene symbols to Entrez IDs using org.Hs.eg.db
data_24hr_logFC <- data_24hr_logFC %>%
mutate(Entrez_ID = mapIds(org.Hs.eg.db,
keys = Symbol,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"))
# Merge datasets on Entrez_ID for CX and DOX at both concentrations (24 hours)
merged_CX_0.1_24 <- merge(data_24hr_logFC, CX_0.1_24, by = "Entrez_ID")
merged_CX_0.5_24 <- merge(data_24hr_logFC, CX_0.5_24, by = "Entrez_ID")
merged_DOX_0.1_24 <- merge(data_24hr_logFC, DOX_0.1_24, by = "Entrez_ID")
merged_DOX_0.5_24 <- merge(data_24hr_logFC, DOX_0.5_24, by = "Entrez_ID")
# Remove NA values
merged_CX_0.1_24 <- na.omit(merged_CX_0.1_24)
merged_CX_0.5_24 <- na.omit(merged_CX_0.5_24)
merged_DOX_0.1_24 <- na.omit(merged_DOX_0.1_24)
merged_DOX_0.5_24 <- na.omit(merged_DOX_0.5_24)
# Rename columns explicitly to avoid conflicts
colnames(merged_CX_0.1_24) <- colnames(merged_CX_0.5_24) <-
colnames(merged_DOX_0.1_24) <- colnames(merged_DOX_0.5_24) <-
c("Entrez_ID", "Symbol_24hr", "logFC_24hr", "logFC_Drug", "AveExpr_Drug", "t_Drug", "P.Value_Drug", "adj.P.Val_Drug", "B_Drug")
# Add drug and concentration labels for faceting
merged_CX_0.1_24$Drug <- "CX (24hr)"
merged_CX_0.5_24$Drug <- "CX (24hr)"
merged_DOX_0.1_24$Drug <- "DOX (24hr)"
merged_DOX_0.5_24$Drug <- "DOX (24hr)"
merged_CX_0.1_24$Concentration <- "0.1 µM"
merged_CX_0.5_24$Concentration <- "0.5 µM"
merged_DOX_0.1_24$Concentration <- "0.1 µM"
merged_DOX_0.5_24$Concentration <- "0.5 µM"
# Combine all datasets into a single data frame
merged_data_24hr <- rbind(
merged_CX_0.1_24[, c("Entrez_ID", "logFC_Drug", "logFC_24hr", "Concentration", "Drug")],
merged_CX_0.5_24[, c("Entrez_ID", "logFC_Drug", "logFC_24hr", "Concentration", "Drug")],
merged_DOX_0.1_24[, c("Entrez_ID", "logFC_Drug", "logFC_24hr", "Concentration", "Drug")],
merged_DOX_0.5_24[, c("Entrez_ID", "logFC_Drug", "logFC_24hr", "Concentration", "Drug")]
)
# Calculate correlations for each facet
cor_CX_0.1_24 <- cor.test(merged_CX_0.1_24$logFC_Drug, merged_CX_0.1_24$logFC_24hr, method = "pearson")
cor_CX_0.5_24 <- cor.test(merged_CX_0.5_24$logFC_Drug, merged_CX_0.5_24$logFC_24hr, method = "pearson")
cor_DOX_0.1_24 <- cor.test(merged_DOX_0.1_24$logFC_Drug, merged_DOX_0.1_24$logFC_24hr, method = "pearson")
cor_DOX_0.5_24 <- cor.test(merged_DOX_0.5_24$logFC_Drug, merged_DOX_0.5_24$logFC_24hr, method = "pearson")
# Data frame for r and p-values annotations
correlation_data_24hr <- data.frame(
Drug = rep(c("CX (24hr)", "DOX (24hr)"), each = 2),
Concentration = rep(c("0.1 µM", "0.5 µM"), times = 2),
x = max(merged_data_24hr$logFC_Drug, na.rm = TRUE) * 0.75, # Adjusted placement for better fit
y = max(merged_data_24hr$logFC_24hr, na.rm = TRUE) * 0.85,
label = c(
paste0("r = ", round(cor_CX_0.1_24$estimate, 3), "\np = ", signif(cor_CX_0.1_24$p.value, 3)),
paste0("r = ", round(cor_CX_0.5_24$estimate, 3), "\np = ", signif(cor_CX_0.5_24$p.value, 3)),
paste0("r = ", round(cor_DOX_0.1_24$estimate, 3), "\np = ", signif(cor_DOX_0.1_24$p.value, 3)),
paste0("r = ", round(cor_DOX_0.5_24$estimate, 3), "\np = ", signif(cor_DOX_0.5_24$p.value, 3))
)
)
# Create scatter plot with facets for concentration (vertically) and drug (horizontally)
scatter_plot_24hr <- ggplot(merged_data_24hr, aes(x = logFC_Drug, y = logFC_24hr)) +
geom_point(alpha = 0.6, color = "black") + # Black scatter points
geom_smooth(method = "lm", color = "black", se = FALSE) + # Black regression line
labs(
title = "Correlation between Drug (24hr) and Colorectal 24hr logFC",
x = "logFC (Drug_24hr)",
y = "logFC (Colorectal 24hr)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
# Outer box around the entire facet plot
panel.border = element_rect(color = "black", fill = NA, linewidth = 2),
# Black box around each facet title
strip.background = element_rect(fill = "white", color = "black", linewidth = 1.5),
strip.text = element_text(size = 12, face = "bold", color = "black")
) +
# Facet by drug (horizontally) and concentration (vertically)
facet_grid(Concentration ~ Drug) +
# Add r and p-value in the upper-right corner of each facet with better fitting font size
geom_text(data = correlation_data_24hr,
aes(x = x, y = y, label = label),
inherit.aes = FALSE, size = 3, fontface = "bold") # Reduced font size for better fit
# Display plot
print(scatter_plot_24hr)
# Import the CSV file
data_72hr_logFC <- read.csv("data/Colorectal/72hrlogFC.csv", stringsAsFactors = FALSE)
# Map gene symbols to Entrez IDs using org.Hs.eg.db
data_72hr_logFC <- data_72hr_logFC %>%
mutate(Entrez_ID = mapIds(org.Hs.eg.db,
keys = Symbol,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"))
# Merge datasets on Entrez_ID for CX and DOX at both concentrations (48 hours)
merged_CX_0.1_48 <- merge(data_72hr_logFC, CX_0.1_48, by = "Entrez_ID")
merged_CX_0.5_48 <- merge(data_72hr_logFC, CX_0.5_48, by = "Entrez_ID")
merged_DOX_0.1_48 <- merge(data_72hr_logFC, DOX_0.1_48, by = "Entrez_ID")
merged_DOX_0.5_48 <- merge(data_72hr_logFC, DOX_0.5_48, by = "Entrez_ID")
# Remove NA values
merged_CX_0.1_48 <- na.omit(merged_CX_0.1_48)
merged_CX_0.5_48 <- na.omit(merged_CX_0.5_48)
merged_DOX_0.1_48 <- na.omit(merged_DOX_0.1_48)
merged_DOX_0.5_48 <- na.omit(merged_DOX_0.5_48)
# Rename columns explicitly to avoid conflicts
colnames(merged_CX_0.1_48) <- colnames(merged_CX_0.5_48) <-
colnames(merged_DOX_0.1_48) <- colnames(merged_DOX_0.5_48) <-
c("Entrez_ID", "Symbol_72hr", "logFC_72hr", "logFC_Drug", "AveExpr_Drug", "t_Drug", "P.Value_Drug", "adj.P.Val_Drug", "B_Drug")
# Add drug and concentration labels for faceting
merged_CX_0.1_48$Drug <- "CX (48hr)"
merged_CX_0.5_48$Drug <- "CX (48hr)"
merged_DOX_0.1_48$Drug <- "DOX (48hr)"
merged_DOX_0.5_48$Drug <- "DOX (48hr)"
merged_CX_0.1_48$Concentration <- "0.1 µM"
merged_CX_0.5_48$Concentration <- "0.5 µM"
merged_DOX_0.1_48$Concentration <- "0.1 µM"
merged_DOX_0.5_48$Concentration <- "0.5 µM"
# Combine all datasets into a single data frame
merged_data_48hr <- rbind(
merged_CX_0.1_48[, c("Entrez_ID", "logFC_Drug", "logFC_72hr", "Concentration", "Drug")],
merged_CX_0.5_48[, c("Entrez_ID", "logFC_Drug", "logFC_72hr", "Concentration", "Drug")],
merged_DOX_0.1_48[, c("Entrez_ID", "logFC_Drug", "logFC_72hr", "Concentration", "Drug")],
merged_DOX_0.5_48[, c("Entrez_ID", "logFC_Drug", "logFC_72hr", "Concentration", "Drug")]
)
# Calculate correlations for each facet
cor_CX_0.1_48 <- cor.test(merged_CX_0.1_48$logFC_Drug, merged_CX_0.1_48$logFC_72hr, method = "pearson")
cor_CX_0.5_48 <- cor.test(merged_CX_0.5_48$logFC_Drug, merged_CX_0.5_48$logFC_72hr, method = "pearson")
cor_DOX_0.1_48 <- cor.test(merged_DOX_0.1_48$logFC_Drug, merged_DOX_0.1_48$logFC_72hr, method = "pearson")
cor_DOX_0.5_48 <- cor.test(merged_DOX_0.5_48$logFC_Drug, merged_DOX_0.5_48$logFC_72hr, method = "pearson")
# Data frame for r and p-values annotations
correlation_data_48hr <- data.frame(
Drug = rep(c("CX (48hr)", "DOX (48hr)"), each = 2),
Concentration = rep(c("0.1 µM", "0.5 µM"), times = 2),
x = max(merged_data_48hr$logFC_Drug, na.rm = TRUE) * 0.75, # Adjusted placement for better fit
y = max(merged_data_48hr$logFC_72hr, na.rm = TRUE) * 0.85,
label = c(
paste0("r = ", round(cor_CX_0.1_48$estimate, 3), "\np = ", signif(cor_CX_0.1_48$p.value, 3)),
paste0("r = ", round(cor_CX_0.5_48$estimate, 3), "\np = ", signif(cor_CX_0.5_48$p.value, 3)),
paste0("r = ", round(cor_DOX_0.1_48$estimate, 3), "\np = ", signif(cor_DOX_0.1_48$p.value, 3)),
paste0("r = ", round(cor_DOX_0.5_48$estimate, 3), "\np = ", signif(cor_DOX_0.5_48$p.value, 3))
)
)
# Create scatter plot with facets for concentration (vertically) and drug (horizontally)
scatter_plot_48hr <- ggplot(merged_data_48hr, aes(x = logFC_Drug, y = logFC_72hr)) +
geom_point(alpha = 0.6, color = "black") + # Black scatter points
geom_smooth(method = "lm", color = "black", se = FALSE) + # Black regression line
labs(
title = "Correlation between Drug (48hr) and Colorectal 72hr logFC",
x = "logFC (Drug_48hr)",
y = "logFC (Colorectal 72hr)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
# Outer box around the entire facet plot
panel.border = element_rect(color = "black", fill = NA, linewidth = 2),
# Black box around each facet title
strip.background = element_rect(fill = "white", color = "black", linewidth = 1.5),
strip.text = element_text(size = 12, face = "bold", color = "black")
) +
# Facet by drug (horizontally) and concentration (vertically)
facet_grid(Concentration ~ Drug) +
# Add r and p-value in the upper-right corner of each facet with better fitting font size
geom_text(data = correlation_data_48hr,
aes(x = x, y = y, label = label),
inherit.aes = FALSE, size = 3, fontface = "bold") # Reduced font size for better fit
# Display plot
print(scatter_plot_48hr)
sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] biomaRt_2.58.2 clusterProfiler_4.10.1 org.Hs.eg.db_3.18.0
[4] AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.1
[7] Biobase_2.62.0 BiocGenerics_0.48.1 tidyr_1.3.1
[10] dplyr_1.1.4 ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_1.8.9
[4] magrittr_2.0.3 farver_2.1.2 rmarkdown_2.29
[7] fs_1.6.3 zlibbioc_1.48.0 vctrs_0.6.5
[10] memoise_2.0.1 RCurl_1.98-1.13 ggtree_3.10.1
[13] htmltools_0.5.8.1 progress_1.2.3 curl_6.0.1
[16] gridGraphics_0.5-1 sass_0.4.9 bslib_0.8.0
[19] plyr_1.8.9 cachem_1.0.8 whisker_0.4.1
[22] igraph_2.1.1 lifecycle_1.0.4 pkgconfig_2.0.3
[25] Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1
[28] gson_0.1.0 GenomeInfoDbData_1.2.11 digest_0.6.34
[31] aplot_0.2.3 enrichplot_1.22.0 colorspace_2.1-0
[34] patchwork_1.3.0 rprojroot_2.0.4 RSQLite_2.3.3
[37] labeling_0.4.3 filelock_1.0.3 mgcv_1.9-1
[40] httr_1.4.7 polyclip_1.10-7 compiler_4.3.0
[43] bit64_4.0.5 withr_3.0.2 BiocParallel_1.36.0
[46] viridis_0.6.5 DBI_1.2.3 ggforce_0.4.2
[49] MASS_7.3-60 rappdirs_0.3.3 HDO.db_0.99.1
[52] tools_4.3.0 ape_5.8 scatterpie_0.2.4
[55] httpuv_1.6.15 glue_1.7.0 nlme_3.1-166
[58] GOSemSim_2.28.1 promises_1.3.0 grid_4.3.0
[61] shadowtext_0.1.4 reshape2_1.4.4 fgsea_1.28.0
[64] generics_0.1.3 gtable_0.3.6 data.table_1.14.10
[67] hms_1.1.3 xml2_1.3.6 tidygraph_1.3.1
[70] XVector_0.42.0 ggrepel_0.9.6 pillar_1.10.1
[73] stringr_1.5.1 yulab.utils_0.1.8 later_1.3.2
[76] splines_4.3.0 tweenr_2.0.3 BiocFileCache_2.10.2
[79] treeio_1.26.0 lattice_0.22-5 bit_4.0.5
[82] tidyselect_1.2.1 GO.db_3.18.0 Biostrings_2.70.1
[85] knitr_1.49 git2r_0.35.0 gridExtra_2.3
[88] xfun_0.50 graphlayouts_1.2.0 stringi_1.8.3
[91] workflowr_1.7.1 lazyeval_0.2.2 ggfun_0.1.8
[94] yaml_2.3.10 evaluate_1.0.3 codetools_0.2-20
[97] ggraph_2.2.1 tibble_3.2.1 qvalue_2.34.0
[100] ggplotify_0.1.2 cli_3.6.1 munsell_0.5.1
[103] jquerylib_0.1.4 Rcpp_1.0.12 GenomeInfoDb_1.38.8
[106] dbplyr_2.5.0 png_0.1-8 XML_3.99-0.17
[109] parallel_4.3.0 blob_1.2.4 prettyunits_1.2.0
[112] DOSE_3.28.2 bitops_1.0-7 viridisLite_0.4.2
[115] tidytree_0.4.6 scales_1.3.0 purrr_1.0.2
[118] crayon_1.5.3 rlang_1.1.3 cowplot_1.1.3
[121] fastmatch_1.1-4 KEGGREST_1.42.0