Last updated: 2025-02-28

Checks: 6 1

Knit directory: CX5461_Project/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20250129) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version fa75295. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Unstaged changes:
    Modified:   analysis/Tissue.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Tissue.Rmd) and HTML (docs/Tissue.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd fa75295 sayanpaul01 2025-02-28 Commit
html fa75295 sayanpaul01 2025-02-28 Commit
Rmd cf29408 sayanpaul01 2025-02-28 Commit
html cf29408 sayanpaul01 2025-02-28 Commit

📌 Tissue specificity analysis (Correlation Heatmap)

📌 Load Required Libraries

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.3.2
library(tidyr)
Warning: package 'tidyr' was built under R version 4.3.3
library(org.Hs.eg.db)
Warning: package 'AnnotationDbi' was built under R version 4.3.2
Warning: package 'BiocGenerics' was built under R version 4.3.1
Warning: package 'Biobase' was built under R version 4.3.1
Warning: package 'IRanges' was built under R version 4.3.1
Warning: package 'S4Vectors' was built under R version 4.3.1
library(clusterProfiler)
Warning: package 'clusterProfiler' was built under R version 4.3.3
library(biomaRt)
Warning: package 'biomaRt' was built under R version 4.3.2

📌 Load Data

# Read the CSV file into R
file_path <- "data/count.csv"
df <- read.csv(file_path, check.names = FALSE)

# Remove 'x' from column headers
colnames(df) <- gsub("^x", "", colnames(df))

# View the updated dataframe
head(df)

# View updated column names
colnames(df)


# Step 1: Calculate IPSC_CM
# Select columns with 'VEH' in their names
veh_columns <- grep("VEH", colnames(df), value = TRUE)

# Calculate the average logCPM across VEH samples
df$IPSC_CM <- rowMeans(df[, veh_columns], na.rm = TRUE)

# Create a new dataframe with ENTREZID, SYMBOL, and IPSC_CM
veh_avg_df <- df[, c("Entrez_ID","IPSC_CM")]

# Step 2: Read the Tissue_Gtex dataset
Tissue_Gtex <- read.csv("data/Tissue_Gtex.csv")

# Step 3: Convert Ensembl IDs to Entrez IDs using biomaRt
mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

# Extract Entrez IDs for the Ensembl IDs in the gene.id column
gene_ids <- Tissue_Gtex$gene.id
conversion <- getBM(
  attributes = c("ensembl_gene_id", "entrezgene_id"),
  filters = "ensembl_gene_id",
  values = gene_ids,
  mart = mart
)

# Merge Entrez IDs back into Tissue_Gtex
Tissue_Gtex <- merge(Tissue_Gtex, conversion, by.x = "gene.id", by.y = "ensembl_gene_id", all.x = TRUE)

# Rename the column for consistency
colnames(Tissue_Gtex)[colnames(Tissue_Gtex) == "entrezgene_id"] <- "Entrez_ID"

# Step 4: Read the veh_avg_df dataframe
veh_avg_df <- df[, c("Entrez_ID", "IPSC_CM")]

# Step 5: Merge veh_avg_df with Tissue_Gtex by Entrez_ID
# Ensure column types match
veh_avg_df$Entrez_ID <- as.character(veh_avg_df$Entrez_ID)
Tissue_Gtex$Entrez_ID <- as.character(Tissue_Gtex$Entrez_ID)

# Perform the merge
merged_df <- merge(veh_avg_df, Tissue_Gtex, by = "Entrez_ID", all.x = TRUE)

# Step 6: Remove rows with NA values
cleaned_df <- na.omit(merged_df) 

📌 Correlation heatmap (Tissue specificity)

# Filter relevant tissue columns
tissue_cols <- colnames(cleaned_df)[which(colnames(cleaned_df) %in% c(
  "IPSC_CM", "Adrenal.Gland", "Spleen", "Heart...Atrial", "Pancreas","Artery","Breast",
  "small.Intestine","Colon","Nerve...Tibial","Esophagus","Muscle...Skeletal",
  "Thyroid","Heart..Ventricle", "Stomach", "Uterus","Vagina", "Skin", "Ovary", "Liver", "Lung", "Brain", "Pituitary", "Testis", 
  "Prostate", "Salivary.Gland"))]

data_subset <- cleaned_df[, tissue_cols]

# Compute Pearson and Spearman correlations
pearson_corr <- cor(data_subset, method = "pearson", use = "complete.obs")
spearman_corr <- cor(data_subset, method = "spearman", use = "complete.obs")

# Reorder tissues by highest correlation with IPSC_CM
order_pearson <- order(pearson_corr["IPSC_CM", ], decreasing = TRUE)
order_spearman <- order(spearman_corr["IPSC_CM", ], decreasing = TRUE)

pearson_corr <- pearson_corr[order_pearson, order_pearson]
spearman_corr <- spearman_corr[order_spearman, order_spearman]

# Plot heatmaps
library(pheatmap)
Warning: package 'pheatmap' was built under R version 4.3.1
# Pearson correlation heatmap
pheatmap(pearson_corr, 
         cluster_rows = FALSE, 
         cluster_cols = FALSE, 
         main = "Pearson Correlation Heatmap", 
         color = colorRampPalette(c("blue", "white", "red"))(100), 
         display_numbers = TRUE)

Version Author Date
cf29408 sayanpaul01 2025-02-28
# Spearman correlation heatmap
pheatmap(spearman_corr, 
         cluster_rows = FALSE, 
         cluster_cols = FALSE, 
         main = "Spearman Correlation Heatmap", 
         color = colorRampPalette(c("blue", "white", "red"))(100), 
         display_numbers = TRUE)

Version Author Date
cf29408 sayanpaul01 2025-02-28

📌 Tissue specific gene proportions in corrmotif clusters

📌 Load Datasets

# Load your Heart_genes dataset
heart_genes <- read.csv("data/Human_Heart_Genes.csv", stringsAsFactors = FALSE)

# Convert Gene names to Entrez IDs in heart_genes
heart_genes$Entrez_ID <- mapIds(
  org.Hs.eg.db, 
  keys = heart_genes$Gene, 
  column = "ENTREZID", 
  keytype = "SYMBOL", 
  multiVals = "first"
)

# Create a vector of Entrez IDs specific to the heart
heart_entrez_ids <- na.omit(heart_genes$Entrez_ID)


# Load the saved datasets
prob_all_1 <- read.csv("data/prob_all_1.csv")$Entrez_ID
prob_all_2 <- read.csv("data/prob_all_2.csv")$Entrez_ID
prob_all_3 <- read.csv("data/prob_all_3.csv")$Entrez_ID
prob_all_4 <- read.csv("data/prob_all_4.csv")$Entrez_ID

📌 Tissue specific gene proportions analysis

# Example Response Groups Data (Replace with actual data)
response_groups <- list(
  "Non Response" = prob_all_1, # Replace 'prob_all_1', 'prob_all_2', etc. with your actual response group dataframes
  "CX_DOX Shared Late Response" = prob_all_2,
  "DOX-Specific Response" = prob_all_3,
  "Late High Dose DOX-Specific Response" = prob_all_4
)

# Combine all response groups into a single dataframe
response_groups_df <- bind_rows(
  lapply(response_groups, function(ids) data.frame(Entrez_ID = ids)),
  .id = "Set"
)

# Categorize genes into Heart-specific and Non-Heart-specific
response_groups_df <- response_groups_df %>%
  mutate(
    Category = ifelse(Entrez_ID %in% heart_entrez_ids, "Heart-specific Genes", "Non-Heart-specific Genes")
  )

# Calculate counts for Heart-specific and Non-Heart-specific genes in each response group
proportion_data <- response_groups_df %>%
  group_by(Set, Category) %>%
  summarize(Count = n(), .groups = "drop")

# Reference counts for "Non Response"
non_response_counts <- proportion_data %>%
  filter(Set == "Non Response") %>%
  dplyr::select(Category, Count) %>%
  {setNames(.$Count, .$Category)} # Create named vector for "Non Response" counts

# Perform Chi-square test for selected response groups compared to "Non Response"
chi_results <- proportion_data %>%
  filter(Set != "Non Response") %>% # Exclude "Non Response"
  group_by(Set) %>%
  summarize(
    p_value = {
      # Extract counts for the current response group
      group_counts <- Count[Category %in% c("Heart-specific Genes", "Non-Heart-specific Genes")]
      # Ensure there are no missing categories, fill with 0 if missing
      if (length(group_counts) < 2) group_counts <- c(group_counts, 0)
      # Create contingency table
      contingency_table <- matrix(c(
        group_counts[1], group_counts[2],
        non_response_counts["Heart-specific Genes"], non_response_counts["Non-Heart-specific Genes"]
      ), nrow = 2)
      # Print the contingency table for debugging
      print(paste("Set:", unique(Set)))
      print("Contingency Table:")
      print(contingency_table)
      # Perform chi-square test
      if (all(contingency_table >= 0 & is.finite(contingency_table))) {
        chisq.test(contingency_table)$p.value
      } else {
        NA
      }
    },
    .groups = "drop"
  ) %>%
  mutate(Significance = ifelse(!is.na(p_value) & p_value < 0.05, "*", ""))
[1] "Set: CX_DOX Shared Late Response"
[1] "Contingency Table:"
     [,1] [,2]
[1,]    2  123
[2,]  412 6908
[1] "Set: DOX-Specific Response"
[1] "Contingency Table:"
     [,1] [,2]
[1,]   62  123
[2,] 1469 6908
[1] "Set: Late High Dose DOX-Specific Response"
[1] "Contingency Table:"
     [,1] [,2]
[1,]  159  123
[2,] 4715 6908
# Merge chi-square results back into the proportion data
proportion_data <- proportion_data %>%
  left_join(chi_results %>% dplyr::select(Set, p_value, Significance), by = "Set")

# Calculate proportions for plotting
proportion_data <- proportion_data %>%
  group_by(Set) %>%
  mutate(Percentage = (Count / sum(Count)) * 100)

# Define the correct order for response groups
response_order <- c(
  "Non Response",
  "CX_DOX Shared Late Response",
  "DOX-Specific Response",
  "Late High Dose DOX-Specific Response"
)
proportion_data$Set <- factor(proportion_data$Set, levels = response_order)

# Create the proportion plot with significance stars
ggplot(proportion_data, aes(x = Set, y = Percentage, fill = Category)) +
  geom_bar(stat = "identity", position = "stack") +
  geom_text(
    data = proportion_data %>% distinct(Set, Significance), # Add stars for significant groups
    aes(x = Set, y = 105, label = Significance), # Position stars above the bars
    inherit.aes = FALSE,
    size = 6,
    color = "black",
    hjust = 0.5
  ) +
  scale_fill_manual(values = c(
    "Heart-specific Genes" = "#4daf4a",  # Green
    "Non-Heart-specific Genes" = "#377eb8"   # Blue
  )) +
  labs(
    title = "Proportion of Heart-Specific and Non-Heart-Specific Genes Across Response Sets",
    x = "Response Sets",
    y = "Percentage of Genes",
    fill = "Gene Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = rel(1.5), hjust = 0.5),
    axis.title = element_text(size = 15, color = "black"),
    axis.ticks = element_line(linewidth = 1.5),
    axis.line = element_line(linewidth = 1.5),
    axis.text.x = element_text(size = 10, color = "black", angle = 45, hjust = 1),
    strip.text = element_text(size = 12, face = "bold"),
    panel.border = element_rect(color = "black", fill = NA, linewidth = 1.5) # Add border
  )
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_text()`).

Version Author Date
fa75295 sayanpaul01 2025-02-28

sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] pheatmap_1.0.12        biomaRt_2.58.2         clusterProfiler_4.10.1
 [4] org.Hs.eg.db_3.18.0    AnnotationDbi_1.64.1   IRanges_2.36.0        
 [7] S4Vectors_0.40.1       Biobase_2.62.0         BiocGenerics_0.48.1   
[10] tidyr_1.3.1            dplyr_1.1.4            ggplot2_3.5.1         

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3      rstudioapi_0.17.1       jsonlite_1.8.9         
  [4] magrittr_2.0.3          farver_2.1.2            rmarkdown_2.29         
  [7] fs_1.6.3                zlibbioc_1.48.0         vctrs_0.6.5            
 [10] memoise_2.0.1           RCurl_1.98-1.13         ggtree_3.10.1          
 [13] htmltools_0.5.8.1       progress_1.2.3          curl_6.0.1             
 [16] gridGraphics_0.5-1      sass_0.4.9              bslib_0.8.0            
 [19] plyr_1.8.9              cachem_1.0.8            whisker_0.4.1          
 [22] igraph_2.1.1            lifecycle_1.0.4         pkgconfig_2.0.3        
 [25] Matrix_1.6-1.1          R6_2.5.1                fastmap_1.1.1          
 [28] gson_0.1.0              GenomeInfoDbData_1.2.11 digest_0.6.34          
 [31] aplot_0.2.3             enrichplot_1.22.0       colorspace_2.1-0       
 [34] patchwork_1.3.0         rprojroot_2.0.4         RSQLite_2.3.3          
 [37] labeling_0.4.3          filelock_1.0.3          httr_1.4.7             
 [40] polyclip_1.10-7         compiler_4.3.0          bit64_4.0.5            
 [43] withr_3.0.2             BiocParallel_1.36.0     viridis_0.6.5          
 [46] DBI_1.2.3               ggforce_0.4.2           MASS_7.3-60            
 [49] rappdirs_0.3.3          HDO.db_0.99.1           tools_4.3.0            
 [52] ape_5.8                 scatterpie_0.2.4        httpuv_1.6.15          
 [55] glue_1.7.0              nlme_3.1-166            GOSemSim_2.28.1        
 [58] promises_1.3.0          grid_4.3.0              shadowtext_0.1.4       
 [61] reshape2_1.4.4          fgsea_1.28.0            generics_0.1.3         
 [64] gtable_0.3.6            data.table_1.14.10      hms_1.1.3              
 [67] xml2_1.3.6              tidygraph_1.3.1         XVector_0.42.0         
 [70] ggrepel_0.9.6           pillar_1.10.1           stringr_1.5.1          
 [73] yulab.utils_0.1.8       later_1.3.2             splines_4.3.0          
 [76] tweenr_2.0.3            BiocFileCache_2.10.2    treeio_1.26.0          
 [79] lattice_0.22-5          bit_4.0.5               tidyselect_1.2.1       
 [82] GO.db_3.18.0            Biostrings_2.70.1       knitr_1.49             
 [85] git2r_0.35.0            gridExtra_2.3           xfun_0.50              
 [88] graphlayouts_1.2.0      stringi_1.8.3           workflowr_1.7.1        
 [91] lazyeval_0.2.2          ggfun_0.1.8             yaml_2.3.10            
 [94] evaluate_1.0.3          codetools_0.2-20        ggraph_2.2.1           
 [97] tibble_3.2.1            qvalue_2.34.0           ggplotify_0.1.2        
[100] cli_3.6.1               munsell_0.5.1           jquerylib_0.1.4        
[103] Rcpp_1.0.12             GenomeInfoDb_1.38.8     dbplyr_2.5.0           
[106] png_0.1-8               XML_3.99-0.17           parallel_4.3.0         
[109] blob_1.2.4              prettyunits_1.2.0       DOSE_3.28.2            
[112] bitops_1.0-7            viridisLite_0.4.2       tidytree_0.4.6         
[115] scales_1.3.0            purrr_1.0.2             crayon_1.5.3           
[118] rlang_1.1.3             cowplot_1.1.3           fastmatch_1.1-4        
[121] KEGGREST_1.42.0