Knowles 2019 data

Last updated: 2023-06-16

Checks: 7 0

Knit directory: Cardiotoxicity/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20230109)

The command set.seed(20230109) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: fd4fe27

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version fd4fe27. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/41588_2018_171_MOESM3_ESMeQTL_ST2_for paper.csv
    Ignored:    data/Arr_GWAS.txt
    Ignored:    data/Arr_geneset.RDS
    Ignored:    data/BC_cell_lines.csv
    Ignored:    data/CADGWASgene_table.csv
    Ignored:    data/CAD_geneset.RDS
    Ignored:    data/Clamp_Summary.csv
    Ignored:    data/Cormotif_24_k1-5_raw.RDS
    Ignored:    data/DAgostres24.RDS
    Ignored:    data/DAtable1.csv
    Ignored:    data/DDEMresp_list.csv
    Ignored:    data/DDE_reQTL.txt
    Ignored:    data/DDEresp_list.csv
    Ignored:    data/DEG-GO/
    Ignored:    data/DEG_cormotif.RDS
    Ignored:    data/DF_Plate_Peak.csv
    Ignored:    data/DRC48hoursdata.csv
    Ignored:    data/Da24counts.txt
    Ignored:    data/Dx24counts.txt
    Ignored:    data/Dx_reQTL_specific.txt
    Ignored:    data/Ep24counts.txt
    Ignored:    data/GOIsig.csv
    Ignored:    data/GOplots.R
    Ignored:    data/GTEX_setsimple.csv
    Ignored:    data/GTEx_gene_list.csv
    Ignored:    data/HFGWASgene_table.csv
    Ignored:    data/HF_geneset.RDS
    Ignored:    data/Heart_Left_Ventricle.v8.egenes.txt
    Ignored:    data/Hf_GWAS.txt
    Ignored:    data/K_cluster
    Ignored:    data/K_cluster_kisthree.csv
    Ignored:    data/K_cluster_kistwo.csv
    Ignored:    data/LDH48hoursdata.csv
    Ignored:    data/Mt24counts.txt
    Ignored:    data/RINsamplelist.txt
    Ignored:    data/Seonane2019supp1.txt
    Ignored:    data/TOP2Bi-24hoursGO_analysis.csv
    Ignored:    data/TR24counts.txt
    Ignored:    data/Top2biresp_cluster24h.csv
    Ignored:    data/Viabilitylistfull.csv
    Ignored:    data/allexpressedgenes.txt
    Ignored:    data/allgenes.txt
    Ignored:    data/allmatrix.RDS
    Ignored:    data/avgLD50.RDS
    Ignored:    data/backGL.txt
    Ignored:    data/cormotif_3hk1-8.RDS
    Ignored:    data/cormotif_initalK5.RDS
    Ignored:    data/cormotif_initialK5.RDS
    Ignored:    data/cormotif_initialall.RDS
    Ignored:    data/counts24hours.RDS
    Ignored:    data/cpmcount.RDS
    Ignored:    data/cpmnorm_counts.csv
    Ignored:    data/crispr_genes.csv
    Ignored:    data/cvd_GWAS.txt
    Ignored:    data/dat_cpm.RDS
    Ignored:    data/data_outline.txt
    Ignored:    data/efit2.RDS
    Ignored:    data/efit2results.RDS
    Ignored:    data/ensembl_backup.RDS
    Ignored:    data/ensgtotal.txt
    Ignored:    data/filenameonly.txt
    Ignored:    data/filtered_cpm_counts.csv
    Ignored:    data/filtered_raw_counts.csv
    Ignored:    data/filtermatrix_x.RDS
    Ignored:    data/folder_05top/
    Ignored:    data/geneDoxonlyQTL.csv
    Ignored:    data/gene_corr_frame.RDS
    Ignored:    data/gene_prob_tran3h.RDS
    Ignored:    data/gene_probabilityk5.RDS
    Ignored:    data/gostresTop2bi_ER.RDS
    Ignored:    data/gostresTop2bi_LR
    Ignored:    data/gostresTop2bi_LR.RDS
    Ignored:    data/gostresTop2bi_TI.RDS
    Ignored:    data/gostrescoNR
    Ignored:    data/gtex/
    Ignored:    data/heartgenes.csv
    Ignored:    data/individualDRCfile.RDS
    Ignored:    data/individual_DRC48.RDS
    Ignored:    data/individual_LDH48.RDS
    Ignored:    data/knowfig4.csv
    Ignored:    data/knowfig5.csv
    Ignored:    data/mymatrix.RDS
    Ignored:    data/nonresponse_cluster24h.csv
    Ignored:    data/norm_LDH.csv
    Ignored:    data/norm_counts.csv
    Ignored:    data/old_sets/
    Ignored:    data/plan2plot.png
    Ignored:    data/raw_counts.csv
    Ignored:    data/response_cluster24h.csv
    Ignored:    data/sigVDA24.txt
    Ignored:    data/sigVDA3.txt
    Ignored:    data/sigVDX24.txt
    Ignored:    data/sigVDX3.txt
    Ignored:    data/sigVEP24.txt
    Ignored:    data/sigVEP3.txt
    Ignored:    data/sigVMT24.txt
    Ignored:    data/sigVMT3.txt
    Ignored:    data/sigVTR24.txt
    Ignored:    data/sigVTR3.txt
    Ignored:    data/siglist.RDS
    Ignored:    data/table3a.omar
    Ignored:    data/toplistall.RDS
    Ignored:    data/tvl24hour.txt
    Ignored:    data/tvl24hourw.txt
    Ignored:    data/venn_code.R

Untracked files:
    Untracked:  .RDataTmp
    Untracked:  .RDataTmp1
    Untracked:  .RDataTmp2
    Untracked:  OmicNavigator_learn.R
    Untracked:  analysis/GOI_plots.Rmd
    Untracked:  analysis/Seoane_chrom.Rmd
    Untracked:  code/cpm_boxplot.R
    Untracked:  cormotif_probability_genelist.csv
    Untracked:  individual-legenddark2.png
    Untracked:  installed_old.rda
    Untracked:  motif_ER.txt
    Untracked:  motif_LR.txt
    Untracked:  motif_NR.txt
    Untracked:  motif_TI.txt
    Untracked:  output/output-old/
    Untracked:  reneebasecode.R

Unstaged changes:
    Modified:   analysis/DEG-GO_analysis.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/other_analysis.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Knowles2019.Rmd) and HTML (docs/Knowles2019.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	fd4fe27	reneeisnowhere	2023-06-16	fix spelling
html	d6ad3fe	reneeisnowhere	2023-06-16	Build site.
Rmd	4191e24	reneeisnowhere	2023-06-16	adding lcpm plots for dox spec
html	bd0e45f	reneeisnowhere	2023-06-15	Build site.
Rmd	af93421	reneeisnowhere	2023-06-15	adding in seperate files
Rmd	1a1a034	reneeisnowhere	2023-06-15	updates galore
Rmd	f8f511a	reneeisnowhere	2023-06-15	updates and simplifications of code
Rmd	7fc7ec7	reneeisnowhere	2023-06-14	updating code

package loading

Knowles data

The code below is how I wrangled the knowles supplemental lists into entrezgene_ids to overlap with my expressed gene lists

Enrichment of Pairwise

backGL <- read.csv("data/backGL.txt",     row.names =1)
drug_palNoVeh <- c("#8B006D","#DF707E","#F1B72B", "#3386DD","#707031")
drug_palc <- c("#8B006D","#DF707E","#F1B72B", "#3386DD","#707031","#41B333")
time <- rep((rep(c("3h", "24h"), c(6,6))), 6)
time <- ordered(time, levels =c("3h", "24h"))
drug <- rep(c("Daunorubicin","Doxorubicin","Epirubicin","Mitoxantrone","Trastuzumab", "Vehicle"),12)
mat_drug <- c("Daunorubicin","Doxorubicin","Epirubicin","Mitoxantrone")
indv <- as.factor(rep(c(1,2,3,4,5,6), c(12,12,12,12,12,12)))
labeltop <- (interaction(substring(drug, 0, 2), indv, time))
level_order2 <- c('75','87','77','79','78','71')


toplistall <- readRDS("data/toplistall.RDS")
knowles4 <-readRDS("output/knowles4.RDS")
knowles5 <-readRDS("output/knowles5.RDS")
DOXmeSNPs <- readRDS("output/DOXmeSNPs.RDS")
DOXreQTLs <- readRDS("output/DOXreQTLs.RDS")
DNRmeSNPs <- readRDS("output/DNRmeSNPs.RDS")
DNRreQTLs <- readRDS("output/DNRreQTLs.RDS")
EPImeSNPs <- readRDS("output/EPImeSNPs.RDS")
EPIreQTLs <- readRDS("output/EPIreQTLs.RDS")
MTXmeSNPs <- readRDS("output/MTXmeSNPs.RDS")
MTXreQTLs <- readRDS("output/MTXreQTLs.RDS")

toplist24hr <-   toplistall %>%
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  filter(time=="24_hours")

toplist3hr <-   toplistall %>%
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  filter(time=="3_hours")

Determining the genetic basis of anthracycline-cardiotoxicity by molecular response QTL mapping in induced cardiomyocytes
David A Knowles, Courtney K Burrows†, John D Blischak, Kristen M Patterson, Daniel J Serie, Nadine Norton, Carole Ober, Jonathan K Pritchard, Yoav Gilad

Knowles \(~~et~ al.~\) eLife 2018;7:e33480. DOI: https://doi.org/10.7554/eLife.33480

My first question was about transcriptional response at the 24 hour mark with my treatments. 3 hour RNA-seq had low numbers of DEGs,so my initial focus is at 24 hours. This time also happens to be when the Knowles paper collected their RNA-seq data.

Supplementary 4 contains a list of 518 SNPs within 1 Mb of TSS, which had a detectable marginal effect on expression (5% FDR). When converted from ensembl gene id to entrez gene id, my list of genes for this supplement = 521. I will call these meSNPS for marginal effect QTLs.

Supplementary 5 contains a list of 376 response eQTLs (reQTLs). These are variants that were associated with modulation of transcriptomic response to doxorubicin treatment. After database name conversion, I have 377 genes.

meSNPdf <- toplistall %>% 
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  mutate(sigcount = if_else(adj.P.Val < 0.05,'sig','notsig'))%>%
  mutate(meSNP = if_else( ENTREZID %in% knowles4$entrezgene_id, "y" , "no")) %>% 
  dplyr::select(ENTREZID,id,time,meSNP,sigcount) %>%
  group_by(id,time,meSNP,sigcount) %>% 
  tally() %>% 
  pivot_wider(id_cols = c(id,time,sigcount), names_from = meSNP,names_glue = "meSNP_{meSNP}", values_from = n)

meSNPdf %>% 
  kable(., caption= "Number of genes from minimally expressed SNPs found in DEGs (sig and nonsig) broken down by drug,time, and significance") %>% 
  kable_paper("striped", full_width = FALSE) %>% 
  kable_styling(full_width = FALSE, position = "left",bootstrap_options = c("striped"),font_size = 18) %>% 
  scroll_box(width = "80%", height = "400px")

Number of genes from minimally expressed SNPs found in DEGs (sig and nonsig) broken down by drug,time, and significance
id	time	sigcount	meSNP_no	meSNP_y
Daunorubicin	3_hours	notsig	13032	497
Daunorubicin	3_hours	sig	549	6
Daunorubicin	24_hours	notsig	6916	304
Daunorubicin	24_hours	sig	6665	199
Doxorubicin	3_hours	notsig	13565	503
Doxorubicin	3_hours	sig	16	NA
Doxorubicin	24_hours	notsig	7249	319
Doxorubicin	24_hours	sig	6332	184
Epirubicin	3_hours	notsig	13365	499
Epirubicin	3_hours	sig	216	4
Epirubicin	24_hours	notsig	7551	331
Epirubicin	24_hours	sig	6030	172
Mitoxantrone	3_hours	notsig	13524	502
Mitoxantrone	3_hours	sig	57	1
Mitoxantrone	24_hours	notsig	12284	473
Mitoxantrone	24_hours	sig	1297	30
Trastuzumab	3_hours	notsig	13581	503
Trastuzumab	24_hours	notsig	13581	503

reQTLdf <- toplistall %>% 
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  mutate(sigcount = if_else(adj.P.Val < 0.05,'sig','notsig'))%>%
  mutate(reQTL = if_else( ENTREZID %in% knowles5$entrezgene_id, "y" , "no")) %>% 
  dplyr::select(ENTREZID,id,time,reQTL,sigcount) %>%
  group_by(id,time,reQTL,sigcount) %>% 
  tally() %>% 
  pivot_wider(id_cols = c(id,time,sigcount), names_from = reQTL,names_glue = "reQTL_{reQTL}", values_from = n)

reQTLdf %>% 
  kable(., caption= "Number of genes from response eQTLS found in DEGs (sig and nonsig) broken down by drug,time, and significance") %>% 
  kable_paper("striped", full_width = FALSE) %>% 
  kable_styling(full_width = FALSE, position = "left",bootstrap_options = c("striped"),font_size = 18) %>% 
  scroll_box(width = "80%", height = "400px")

Number of genes from response eQTLS found in DEGs (sig and nonsig) broken down by drug,time, and significance
id	time	sigcount	reQTL_no	reQTL_y
Daunorubicin	3_hours	notsig	13168	361
Daunorubicin	3_hours	sig	542	13
Daunorubicin	24_hours	notsig	7033	187
Daunorubicin	24_hours	sig	6677	187
Doxorubicin	3_hours	notsig	13694	374
Doxorubicin	3_hours	sig	16	NA
Doxorubicin	24_hours	notsig	7374	194
Doxorubicin	24_hours	sig	6336	180
Epirubicin	3_hours	notsig	13494	370
Epirubicin	3_hours	sig	216	4
Epirubicin	24_hours	notsig	7684	198
Epirubicin	24_hours	sig	6026	176
Mitoxantrone	3_hours	notsig	13652	374
Mitoxantrone	3_hours	sig	58	NA
Mitoxantrone	24_hours	notsig	12423	334
Mitoxantrone	24_hours	sig	1287	40
Trastuzumab	3_hours	notsig	13710	374
Trastuzumab	24_hours	notsig	13710	374

dataframK_45 <- toplistall %>%
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  mutate(K4 = if_else(ENTREZID %in% knowles4$entrezgene_id,'y','no'))%>%
  mutate(K5 = if_else(ENTREZID %in% knowles5$entrezgene_id,'y','no'))%>%
  filter(adj.P.Val<0.05) %>%
   group_by(time, id) %>%
 dplyr::summarize(n=n(), meSNP=sum(if_else(K4=="y",1,0)), reQTL=sum(if_else(K5=="y",1,0))) %>% 
  as.data.frame()


dataframK_45 %>% 
kable(., caption= "Summary of meSNPs and reQTLs found in my DEGs") %>% 
  kable_paper("striped", full_width = FALSE) %>% 
  kable_styling(font_size = 18) %>% 
  scroll_box(width = "80%", height = "400px")

Summary of meSNPs and reQTLs found in my DEGs
time	id	n	meSNP	reQTL
3_hours	Daunorubicin	555	6	13
3_hours	Doxorubicin	16	0	0
3_hours	Epirubicin	220	4	4
3_hours	Mitoxantrone	58	1	0
24_hours	Daunorubicin	6864	199	187
24_hours	Doxorubicin	6516	184	180
24_hours	Epirubicin	6202	172	176
24_hours	Mitoxantrone	1327	30	40

Visualization of proportions

toplistall %>%
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  mutate(sigcount = if_else(adj.P.Val < 0.05,'sig','notsig'))%>%
  mutate(meSNP=if_else(ENTREZID %in%knowles4$entrezgene_id,"y","no")) %>% 
  mutate(reQTL=if_else(ENTREZID %in%knowles5$entrezgene_id,"a","b")) %>% 
  filter(time =="24_hours") %>% 
  group_by(id,sigcount,meSNP) %>% 
  summarize(K4count=n())%>% 
    pivot_wider(id_cols = c(id,sigcount), names_from=c(meSNP), values_from=K4count) %>% 
   mutate(SNPprop=(y/(y+no)*100)) %>% 
       ggplot(., aes(x=id, y=SNPprop)) +
       geom_col()+
       geom_text(aes(x=id, label = sprintf("%.2f",SNPprop), vjust=-.2))+
       #geom_text(aes(label = expression(paste0("number"~a,"out of",~b))))+
       facet_wrap(~sigcount)+
       ggtitle("non-significant and significant enrichment proporitions of Knowles meSNPs ")

toplistall %>%
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  mutate(sigcount = if_else(adj.P.Val < 0.05,'sig','notsig'))%>%
  mutate(meSNP=if_else(ENTREZID %in%knowles4$entrezgene_id,"y","no")) %>% 
  mutate(reQTL=if_else(ENTREZID %in%knowles5$entrezgene_id,"a","b")) %>% 
  filter(time =="24_hours") %>% 
  group_by(id,sigcount,reQTL) %>% 
  summarize(K5count=n())%>% 
    pivot_wider(id_cols = c(id,sigcount), names_from=c(reQTL), values_from=K5count) %>% 
   mutate(QTLprop=(a/(a+b)*100)) %>% 
       ggplot(., aes(x=id, y=QTLprop)) +
       geom_col()+
       geom_text(aes(x=id, label = sprintf("%.2f",QTLprop), vjust=-.2))+
       #geom_text(aes(label = expression(paste0("number"~a,"out of",~b))))+
       facet_wrap(~sigcount)+
       ggtitle("non-significant and significant enrichment proporitions of Knowles reQTLs ")

chi_squarek4 <- toplistall %>% 
  mutate(id = as.factor(id)) %>%
  dplyr::filter(id!="Trastuzumab") %>% 
  mutate(sigcount = if_else(adj.P.Val <0.05,'sig','notsig'))%>%
  mutate(meSNP=if_else(ENTREZID %in%knowles4$entrezgene_id,"y","no")) %>% 
  group_by(id,time) %>% 
  summarise(pvalue= chisq.test(meSNP, sigcount)$p.value) 

chi_squarek5 <-   toplistall %>% 
  mutate(id = as.factor(id)) %>%
  dplyr::filter(id!="Trastuzumab") %>% 
  mutate(sigcount = if_else(adj.P.Val <0.05,'sig','notsig'))%>%
  mutate(reQTL=if_else(ENTREZID %in%knowles5$entrezgene_id,"a","b")) %>% 
  group_by(id,time) %>% 
  summarise(pvalue= chisq.test(reQTL, sigcount)$p.value) 

chi_squarek4 %>% 
  kable(., caption= "meSNPs (mininmally expressed QTLS) chi-squared pvalues") %>% 
  kable_paper("striped", full_width = FALSE) %>%  
  kable_styling(full_width = FALSE, position = "right",bootstrap_options = c("striped"),font_size = 18) %>%   scroll_box(width = "50%", height = "400px")

meSNPs (mininmally expressed QTLS) chi-squared pvalues
id	time	pvalue
Daunorubicin	24_hours	0.0000338
Daunorubicin	3_hours	0.0018777
Doxorubicin	24_hours	0.0000113
Doxorubicin	3_hours	0.9232984
Epirubicin	24_hours	0.0000074
Epirubicin	3_hours	0.2189641
Mitoxantrone	24_hours	0.0086490
Mitoxantrone	3_hours	0.6853643

chi_squarek5 %>% 
  kable(., caption= "reQTLS (response eQTLS) chi-squared pvalues") %>% 
  kable_paper("striped", full_width = FALSE) %>%  
  kable_styling(full_width = FALSE, position = "right",bootstrap_options = c("striped"),font_size = 18) %>%   scroll_box(width = "50%", height = "400px")

reQTLS (response eQTLS) chi-squared pvalues
id	time	pvalue
Daunorubicin	24_hours	0.6576304
Daunorubicin	3_hours	0.7387684
Doxorubicin	24_hours	0.4965954
Doxorubicin	3_hours	1.0000000
Epirubicin	24_hours	0.2539396
Epirubicin	3_hours	0.5705567
Mitoxantrone	24_hours	0.4445513
Mitoxantrone	3_hours	0.3946217

We can “see” the proportions of SNPs are not enriched significantly in the DE genes compared to the non-DE genes (chi square test).
We do not see significant enrichment of reQTLS in our DE gene sets over the non-DE gene sets.

So what about enrichment of meSNPs in DE compared to reQTLs in significantly DE gene sets?

Knowles_count <- 
  toplistall %>%
  filter(id!='Trastuzumab') %>% 
  mutate(id = as.factor(id)) %>%
  mutate(time=factor(time, levels=c("3_hours","24_hours"))) %>%
  group_by(time, id) %>%
  mutate(K4 = if_else(ENTREZID %in% knowles4$entrezgene_id,1,0))%>%
  mutate(K5 = if_else(ENTREZID %in% knowles5$entrezgene_id,1,0))%>%
  filter(adj.P.Val<0.05) %>%
  dplyr::summarize(n=n(), K4=sum(K4), K5=sum(K5)) %>% 
  as.tibble() %>% 
  dplyr::select(time,id,K4,K5) %>% rename("K4_y"='K4',"K5_y"='K5') %>% 
  mutate(time = case_match(time, '3_hours'~'3 hrs',
                           '24_hours'~'24 hrs',.default = time)) %>% 
  mutate(K4_n= 503-K4_y, K5_n=371-K5_y) %>% 
  pivot_longer(!c(time,id), names_to='QTL',values_to="gene_count") %>%
  separate(QTL,into=c("QTL_type",'group'),sep = '_') %>%  
  mutate(QTL_type =case_match(QTL_type, 'K4'~'meQTLs',
                         'K5'~'reQTLs',.default = QTL_type)) %>% 
  mutate(time=factor(time, levels=c("3 hrs","24 hrs"))) %>% 
  group_by(id,time,QTL_type) %>% 
  mutate(percent=gene_count/sum(gene_count)*100)
  
  ggplot(Knowles_count, aes(x=QTL_type,y=gene_count, group=group,  fill=group))+
    geom_col(position='fill')+
    # guides(fill="none")+
    facet_wrap(time~id,nrow=2,ncol=4)+
    theme_classic()+
   scale_color_manual(values=drug_palNoVeh)+
    scale_fill_manual(values=drug_palNoVeh)+
       scale_y_continuous(label = scales::percent)

  ###This works
  chi_frame <- Knowles_count %>%
  pivot_wider(id_cols = c(time, id,QTL_type), names_from = group, values_from = gene_count) 

testDNR3chi <- matrix(c(6,13,497, 358), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n"))) 
    DNR_3chi <- chisq.test(testDNR3chi,correct=FALSE)$p.value
testDNR24chi <- matrix(c(199,187,304, 184), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n")))  
    DNR_24chi <- chisq.test(testDNR24chi,correct=FALSE)$p.value

testDOX3chi <- matrix(c(0,0,503, 371), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n")))  
    DOX_3chi <- chisq.test(testDOX3chi,correct=FALSE)$p.value
testDOX24chi <- matrix(c(184,180,318, 191), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n"))) 
    DOX_24chi <- chisq.test(testDOX24chi,correct=FALSE)$p.value

testEPI3chi <- matrix(c(4,4,499, 367), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n"))) 
    EPI_3chi <- chisq.test(testEPI3chi,correct=FALSE)$p.value
testEPI24chi <- matrix(c(172,176,331, 195), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n"))) 
    EPI_24chi <- chisq.test(testEPI24chi,correct=FALSE)$p.value
  
testMTX3chi <- matrix(c(1,0,502, 370), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n"))) 
    MTX_3chi <- chisq.test(testMTX3chi,correct=FALSE)$p.value
testMTX24chi <- matrix(c(30,40,473, 331), nrow=2, ncol=2, byrow=FALSE,
dimnames=list(c("k4", "k5"),c( "y", "n"))) 
    MTX_24chi <- chisq.test(testMTX24chi,correct=FALSE)$p.value
K4nK5chi <- data.frame(treatment=c('DNR_3','DNR_24','DOX_3','DOX_24','EPI_3','EPI_24','MTX_3','MTX_24'), chi_p.value=c(DNR_3chi,DNR_24chi,DOX_3chi,DOX_24chi,EPI_3chi,EPI_24chi,MTX_3chi,MTX_24chi))

K4nK5chi %>% 
  separate(treatment, into= c('Drug','time')) %>% 
  pivot_wider(id_cols = Drug, names_from = time, values_from = chi_p.value) %>% 
  kable(., caption= "Chi Square p. values from chi-square test between proportions of sig-DE meQTLs and reQTLS by time and treatment") %>% 
  kable_paper("striped", full_width = TRUE) %>%  
  kable_styling(full_width = FALSE, font_size = 16) %>% 
  scroll_box( height = "500px")

Chi Square p. values from chi-square test between proportions of sig-DE meQTLs and reQTLS by time and treatment
Drug	3	24
DNR	0.0205682	0.0014217
DOX	NaN	0.0004405
EPI	0.6641977	0.0000770
MTX	0.3908069	0.0095036

Here we found that 24 hour reQTLs are significantly enriched in sig-DEgens compared to meQTLS, with Daunorubicin 3 hour treatment also showing significant enrichment.

toplistall %>% 
  filter(time=="24_hours") %>% 
  group_by(time, id) %>% 
  mutate(sigcount = if_else(adj.P.Val < 0.05,'sig','notsig'))%>%
  ggplot(., aes(x=id, y=abs(logFC)))+
  geom_boxplot(aes(fill=id))+
  ggpubr::fill_palette(palette =drug_palNoVeh)+
  guides(fill=guide_legend(title = "Treatment"))+
  facet_wrap(sigcount~time)+
  theme_bw()+
  xlab("")+
  ylab("abs |Log Fold Change|")+
  theme_bw()+
  facet_wrap(~time)+
  #ggtitle("All QTLs from all expressed genes (n=507)")+
  theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        # axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        strip.background = element_rect(fill = "transparent"),
        axis.text = element_text(size = 8, color = "black", angle = 0),
        strip.text.x = element_text(size = 12, color = "black", face = "bold"))

Dox-reQTLs

When I look ot the reQTLs across treatments at 24 hours, I see that dox has a total of 180 reQTLS. Because this gene set was specifically about doxorubicin response eQTLs, I wanted to see if these reQTLs were only Dox specific, AC specific, Top2i specific, or show any type of pattern.

reQTLcombine=list(DNRreQTLs$ENTREZID,DOXreQTLs$ENTREZID,EPIreQTLs$ENTREZID,MTXreQTLs$ENTREZID)

length(unique(c(DNRreQTLs$ENTREZID,DOXreQTLs$ENTREZID,EPIreQTLs$ENTREZID,MTXreQTLs$ENTREZID)))

[1] 218

print("number of unique genes expressed in pairwise DE set")

[1] "number of unique genes expressed in pairwise DE set"

QTLoverlaps <- VennDiagram::get.venn.partitions(reQTLcombine)
DoxonlyQTL <-  QTLoverlaps$..values..[[14]]
     ggVennDiagram::ggVennDiagram(reQTLcombine, category.names = c("DNR-187\nsigDEG","DOX-180 \nsigDEG\n","EPI-176\nsigDEG\n","MTX-40\nsigDEG"),label = "count") +scale_x_continuous(expand = expansion(mult = .2))+scale_y_continuous(expand = expansion(mult = .1))

Here, I found 119 reQTLs that were related to all anthracyclines, with 31 reQTLS related to all Top2i drugs at 24 hours.

 DOXeQTL_table <- toplistall %>% 
  mutate(sigcount = if_else(adj.P.Val < 0.05,'sig','notsig'))%>%
  mutate(DOXreQTLs=if_else(ENTREZID %in%DOXreQTLs$ENTREZID,"y","no")) %>% 
  dplyr::filter(time =="24_hours") %>% 
  dplyr::select(ENTREZID,id,DOXreQTLs,sigcount) %>%
  group_by(id,DOXreQTLs,sigcount) %>% 
  tally() %>% as.data.frame() %>% 
  pivot_wider(.,id_cols = c(id,DOXreQTLs),names_from = sigcount,values_from = n) %>% 
  dplyr::select(id,DOXreQTLs,sig) 
DOXeQTL_table%>% 
  kable(., caption= "All 24 hour sidDEGs broken down by drug and DOXreQTL status") %>% 
  kable_paper("striped", full_width = FALSE) %>% 
  kable_styling(full_width = FALSE, position = "left",bootstrap_options = c("striped"),font_size = 18) %>% 
  scroll_box(width = "80%", height = "400px")

All 24 hour sidDEGs broken down by drug and DOXreQTL status
id	DOXreQTLs	sig
Daunorubicin	no	6695
Daunorubicin	y	169
Doxorubicin	no	6336
Doxorubicin	y	180
Epirubicin	no	6048
Epirubicin	y	154
Mitoxantrone	no	1291
Mitoxantrone	y	36
Trastuzumab	no	NA
Trastuzumab	y	NA

DOXeQTL_table %>%
  dplyr::filter(DOXreQTLs=="y") %>% 
  mutate(opp= 180-sig) %>% 
  filter(id !=c('Trastuzumab','Doxorubicin')) %>% 
  rename("yes"=sig, "no"=opp) %>% 
  pivot_longer(!c(id,DOXreQTLs), names_to="group", values_to = "count") %>% 
  mutate(id=factor(id, levels=c("Daunorubicin","Epirubicin", "Mitoxantrone"))) %>% 
ggplot(., aes(y=id,x=count,fill=group))+
     geom_col(position='fill')+
    theme_classic()+
   scale_color_manual(values=c("red4","navy"))+
    scale_fill_manual(values=c("#2297E6","red2"))+
  ggtitle("DOXreQTLs overlaps")

cpm_boxplot <- function(cpmcounts, GOI,brewer_palette, fill_colors, ylab) {
  ##GOI needs to be ENTREZID
  df <- cpmcounts
  ##order of dataframe should have time,id, ENTREZID,SYMBOL logFC,AveExpr,and adj.P.Val (8 columns)
    df_plot <- df %>% 
      dplyr::filter(rownames(.)==GOI) %>%
      pivot_longer(everything(),
                   names_to = "treatment",values_to = "counts") %>%
      separate(treatment, c("drug","indv","time")) %>%
      mutate(time=factor(time, levels =c("3h", "24h"))) %>%
      mutate(indv=factor(indv, levels = c(1,2,3,4,5,6))) %>%
      mutate(drug =case_match(drug, "Da"~"Daunorubicin",
                            "Do"~"Doxorubicin",
                            "Ep"~"Epirubicin",
                            "Mi"~"Mitoxantrone",
                            "Tr"~"Trastuzumab",
                            "Ve"~"Vehicle", .default = drug))
    plot <- ggplot2::ggplot(df_plot, aes(x=drug, y=counts))+
      geom_boxplot(position="identity",aes(fill=drug))+
      geom_point(aes(col=indv, size=2, alpha=0.5))+
      guides(alpha= "none", size= "none")+
      scale_color_brewer(palette = brewer_palette, guide = "none")+
      scale_fill_manual(values=fill_colors)+
      facet_wrap("time", nrow=1, ncol=2)+
      theme_bw()+
      ylab(ylab)+
      xlab("")+
      theme(strip.background = element_rect(fill = "white"),
          plot.title = element_text(size=18,hjust = 0.5),
          axis.title = element_text(size = 15, color = "black"),
          axis.ticks = element_line(linewidth = 1.5),
          axis.line = element_line(linewidth = 1.5),
          axis.text.x = element_text(size = 12, color = "white", angle = 0),
          strip.text.x = element_text(size = 15, color = "black", face = "bold"))
    print(plot)
}

DOX specific-Dox dE-reqtl

DoxonlyQTL <-  as.numeric(QTLoverlaps$..values..[[14]])
# geneDoxonlyQTL <- getBM(attributes=my_attributes,filters ='entrezgene_id',
#                   values =DoxonlyQTL, mart = ensembl)
# write.csv(geneDoxonlyQTL,"data/geneDoxonlyQTL.csv")
# saveRDS(cpmcounts,"data/cpmcount.RDS")
geneDoxonlyQTL <- read.csv("data/geneDoxonlyQTL.csv",row.names = 1)
cpmcounts <- readRDS("data/cpmcount.RDS")

for (g in seq(from=1, to=length(geneDoxonlyQTL$entrezgene_id))){
  a <- geneDoxonlyQTL$hgnc_symbol[g]
  cpm_boxplot(cpmcounts,GOI=geneDoxonlyQTL[g,1],"Dark2",drug_palc,
                ylab=bquote(~italic(.(a))~log[2]~"cpm "))

}

sessionInfo()

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ComplexHeatmap_2.12.1 broom_1.0.5           kableExtra_1.3.4     
 [4] sjmisc_2.8.9          scales_1.2.1          ggpubr_0.6.0         
 [7] cowplot_1.1.1         RColorBrewer_1.1-3    biomaRt_2.52.0       
[10] ggsignif_0.6.4        lubridate_1.9.2       forcats_1.0.0        
[13] stringr_1.5.0         dplyr_1.1.2           purrr_1.0.1          
[16] readr_2.1.4           tidyr_1.3.0           tibble_3.2.1         
[19] ggplot2_3.4.2         tidyverse_2.0.0       limma_3.52.4         
[22] workflowr_1.7.0      

loaded via a namespace (and not attached):
  [1] backports_1.4.1        circlize_0.4.15        BiocFileCache_2.4.0   
  [4] systemfonts_1.0.4      GenomeInfoDb_1.32.4    digest_0.6.31         
  [7] foreach_1.5.2          htmltools_0.5.5        fansi_1.0.4           
 [10] magrittr_2.0.3         memoise_2.0.1          cluster_2.1.4         
 [13] doParallel_1.0.17      tzdb_0.4.0             Biostrings_2.64.1     
 [16] matrixStats_1.0.0      svglite_2.1.1          timechange_0.2.0      
 [19] prettyunits_1.1.1      RVenn_1.1.0            colorspace_2.1-0      
 [22] blob_1.2.4             rvest_1.0.3            rappdirs_0.3.3        
 [25] xfun_0.39              callr_3.7.3            crayon_1.5.2          
 [28] RCurl_1.98-1.12        jsonlite_1.8.5         iterators_1.0.14      
 [31] glue_1.6.2             gtable_0.3.3           zlibbioc_1.42.0       
 [34] XVector_0.36.0         webshot_0.5.4          GetoptLong_1.0.5      
 [37] car_3.1-2              shape_1.4.6            BiocGenerics_0.42.0   
 [40] abind_1.4-5            futile.options_1.0.1   DBI_1.1.3             
 [43] rstatix_0.7.2          Rcpp_1.0.10            viridisLite_0.4.2     
 [46] progress_1.2.2         units_0.8-2            clue_0.3-64           
 [49] proxy_0.4-27           bit_4.0.5              stats4_4.2.2          
 [52] httr_1.4.6             pkgconfig_2.0.3        XML_3.99-0.14         
 [55] farver_2.1.1           sass_0.4.6             dbplyr_2.3.2          
 [58] utf8_1.2.3             tidyselect_1.2.0       labeling_0.4.2        
 [61] rlang_1.1.1            later_1.3.1            AnnotationDbi_1.58.0  
 [64] munsell_0.5.0          tools_4.2.2            cachem_1.0.8          
 [67] cli_3.6.1              generics_0.1.3         RSQLite_2.3.1         
 [70] ggVennDiagram_1.2.2    sjlabelled_1.2.0       evaluate_0.21         
 [73] fastmap_1.1.1          yaml_2.3.7             processx_3.8.1        
 [76] knitr_1.43             bit64_4.0.5            fs_1.6.2              
 [79] KEGGREST_1.36.3        whisker_0.4.1          formatR_1.14          
 [82] xml2_1.3.4             compiler_4.2.2         rstudioapi_0.14       
 [85] filelock_1.0.2         curl_5.0.1             png_0.1-8             
 [88] e1071_1.7-13           bslib_0.5.0            stringi_1.7.12        
 [91] highr_0.10             ps_1.7.5               futile.logger_1.4.3   
 [94] classInt_0.4-9         vctrs_0.6.2            pillar_1.9.0          
 [97] lifecycle_1.0.3        jquerylib_0.1.4        GlobalOptions_0.1.2   
[100] bitops_1.0-7           insight_0.19.2         httpuv_1.6.11         
[103] R6_2.5.1               promises_1.2.0.1       KernSmooth_2.23-21    
[106] IRanges_2.30.1         codetools_0.2-19       lambda.r_1.2.4        
[109] rprojroot_2.0.3        rjson_0.2.21           withr_2.5.0           
[112] S4Vectors_0.34.0       GenomeInfoDbData_1.2.8 parallel_4.2.2        
[115] hms_1.1.3              VennDiagram_1.7.3      class_7.3-22          
[118] rmarkdown_2.22         carData_3.0-5          git2r_0.32.0          
[121] sf_1.0-13              getPass_0.2-2          Biobase_2.56.0