Last updated: 2023-11-28

Checks: 7 0

Knit directory: Cardiotoxicity/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20230109) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8606776. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/variance_values by gene.png
    Ignored:    data/41588_2018_171_MOESM3_ESMeQTL_ST2_for paper.csv
    Ignored:    data/Arr_GWAS.txt
    Ignored:    data/Arr_geneset.RDS
    Ignored:    data/BC_cell_lines.csv
    Ignored:    data/BurridgeDOXTOX.RDS
    Ignored:    data/CADGWASgene_table.csv
    Ignored:    data/CAD_geneset.RDS
    Ignored:    data/CALIMA_Data/
    Ignored:    data/CMD04_75DRCviability.csv
    Ignored:    data/CMD04_87DRCviability.csv
    Ignored:    data/CMD05_75DRCviability.csv
    Ignored:    data/CMD05_87DRCviability.csv
    Ignored:    data/Clamp_Summary.csv
    Ignored:    data/Cormotif_24_k1-5_raw.RDS
    Ignored:    data/Counts_RNA_ERMatthews.txt
    Ignored:    data/DAgostres24.RDS
    Ignored:    data/DAtable1.csv
    Ignored:    data/DDEMresp_list.csv
    Ignored:    data/DDE_reQTL.txt
    Ignored:    data/DDEresp_list.csv
    Ignored:    data/DEG-GO/
    Ignored:    data/DEG_cormotif.RDS
    Ignored:    data/DF_Plate_Peak.csv
    Ignored:    data/DRC48hoursdata.csv
    Ignored:    data/Da24counts.txt
    Ignored:    data/Dx24counts.txt
    Ignored:    data/Dx_reQTL_specific.txt
    Ignored:    data/EPIstorelist24.RDS
    Ignored:    data/Ep24counts.txt
    Ignored:    data/FC_necela.RDS
    Ignored:    data/FC_necela_names.RDS
    Ignored:    data/Full_LD_rep.csv
    Ignored:    data/GOIsig.csv
    Ignored:    data/GOplots.R
    Ignored:    data/GTEX_setsimple.csv
    Ignored:    data/GTEX_sig24.RDS
    Ignored:    data/GTEx_gene_list.csv
    Ignored:    data/HFGWASgene_table.csv
    Ignored:    data/HF_geneset.RDS
    Ignored:    data/Heart_Left_Ventricle.v8.egenes.txt
    Ignored:    data/Heatmap_mat.RDS
    Ignored:    data/Heatmap_sig.RDS
    Ignored:    data/Hf_GWAS.txt
    Ignored:    data/K_cluster
    Ignored:    data/K_cluster_kisthree.csv
    Ignored:    data/K_cluster_kistwo.csv
    Ignored:    data/Knowles_log2cpm_real.RDS
    Ignored:    data/Knowles_variation_data.RDS
    Ignored:    data/Knowlesvarlist.RDS
    Ignored:    data/LD50_05via.csv
    Ignored:    data/LDH48hoursdata.csv
    Ignored:    data/Mt24counts.txt
    Ignored:    data/NoRespDEG_final.csv
    Ignored:    data/RINsamplelist.txt
    Ignored:    data/RNA_seq_trial.RDS
    Ignored:    data/Seonane2019supp1.txt
    Ignored:    data/TMMnormed_x.RDS
    Ignored:    data/TOP2Bi-24hoursGO_analysis.csv
    Ignored:    data/TR24counts.txt
    Ignored:    data/TableS10.csv
    Ignored:    data/TableS11.csv
    Ignored:    data/TableS9.csv
    Ignored:    data/Top2_expression.RDS
    Ignored:    data/Top2biresp_cluster24h.csv
    Ignored:    data/Var_test_list.RDS
    Ignored:    data/Var_test_list24.RDS
    Ignored:    data/Var_test_list24alt.RDS
    Ignored:    data/Var_test_list3.RDS
    Ignored:    data/Vargenes.RDS
    Ignored:    data/Viabilitylistfull.csv
    Ignored:    data/allexpressedgenes.txt
    Ignored:    data/allfinal3hour.RDS
    Ignored:    data/allgenes.txt
    Ignored:    data/allmatrix.RDS
    Ignored:    data/allmymatrix.RDS
    Ignored:    data/annotation_data_frame.RDS
    Ignored:    data/averageviabilitytable.RDS
    Ignored:    data/avgLD50.RDS
    Ignored:    data/avg_LD50.RDS
    Ignored:    data/backGL.txt
    Ignored:    data/burr_genes.RDS
    Ignored:    data/calcium_data.RDS
    Ignored:    data/clamp_summary.RDS
    Ignored:    data/cormotif_3hk1-8.RDS
    Ignored:    data/cormotif_initalK5.RDS
    Ignored:    data/cormotif_initialK5.RDS
    Ignored:    data/cormotif_initialall.RDS
    Ignored:    data/cormotifprobs.csv
    Ignored:    data/counts24hours.RDS
    Ignored:    data/cpmcount.RDS
    Ignored:    data/cpmnorm_counts.csv
    Ignored:    data/crispr_genes.csv
    Ignored:    data/ctnnt_results.txt
    Ignored:    data/cvd_GWAS.txt
    Ignored:    data/dat_cpm.RDS
    Ignored:    data/data_outline.txt
    Ignored:    data/drug_noveh1.csv
    Ignored:    data/efit2.RDS
    Ignored:    data/efit2_final.RDS
    Ignored:    data/efit2results.RDS
    Ignored:    data/ensembl_backup.RDS
    Ignored:    data/ensgtotal.txt
    Ignored:    data/filcpm_counts.RDS
    Ignored:    data/filenameonly.txt
    Ignored:    data/filtered_cpm_counts.csv
    Ignored:    data/filtered_raw_counts.csv
    Ignored:    data/filtermatrix_x.RDS
    Ignored:    data/folder_05top/
    Ignored:    data/geneDoxonlyQTL.csv
    Ignored:    data/gene_corr_df.RDS
    Ignored:    data/gene_corr_frame.RDS
    Ignored:    data/gene_prob_tran3h.RDS
    Ignored:    data/gene_probabilityk5.RDS
    Ignored:    data/geneset_24.RDS
    Ignored:    data/gostresTop2bi_ER.RDS
    Ignored:    data/gostresTop2bi_LR
    Ignored:    data/gostresTop2bi_LR.RDS
    Ignored:    data/gostresTop2bi_TI.RDS
    Ignored:    data/gostrescoNR
    Ignored:    data/gtex/
    Ignored:    data/heartgenes.csv
    Ignored:    data/hsa_kegg_anno.RDS
    Ignored:    data/individualDRCfile.RDS
    Ignored:    data/individual_DRC48.RDS
    Ignored:    data/individual_LDH48.RDS
    Ignored:    data/indv_noveh1.csv
    Ignored:    data/kegglistDEG.RDS
    Ignored:    data/kegglistDEG24.RDS
    Ignored:    data/kegglistDEG3.RDS
    Ignored:    data/knowfig4.csv
    Ignored:    data/knowfig5.csv
    Ignored:    data/label_list.RDS
    Ignored:    data/ld50_table.csv
    Ignored:    data/mean_vardrug1.csv
    Ignored:    data/mean_varframe.csv
    Ignored:    data/mymatrix.RDS
    Ignored:    data/new_ld50avg.RDS
    Ignored:    data/nonresponse_cluster24h.csv
    Ignored:    data/norm_LDH.csv
    Ignored:    data/norm_counts.csv
    Ignored:    data/old_sets/
    Ignored:    data/organized_drugframe.csv
    Ignored:    data/plan2plot.png
    Ignored:    data/plot_intv_list.RDS
    Ignored:    data/plot_list_DRC.RDS
    Ignored:    data/qval24hr.RDS
    Ignored:    data/qval3hr.RDS
    Ignored:    data/qvalueEPItemp.RDS
    Ignored:    data/raw_counts.csv
    Ignored:    data/response_cluster24h.csv
    Ignored:    data/sampsettrz.RDS
    Ignored:    data/sigVDA24.txt
    Ignored:    data/sigVDA3.txt
    Ignored:    data/sigVDX24.txt
    Ignored:    data/sigVDX3.txt
    Ignored:    data/sigVEP24.txt
    Ignored:    data/sigVEP3.txt
    Ignored:    data/sigVMT24.txt
    Ignored:    data/sigVMT3.txt
    Ignored:    data/sigVTR24.txt
    Ignored:    data/sigVTR3.txt
    Ignored:    data/siglist.RDS
    Ignored:    data/siglist_final.RDS
    Ignored:    data/siglist_old.RDS
    Ignored:    data/slope_table.csv
    Ignored:    data/supp10_24hlist.RDS
    Ignored:    data/supp10_3hlist.RDS
    Ignored:    data/supp_normLDH48.RDS
    Ignored:    data/supp_pca_all_anno.RDS
    Ignored:    data/table3a.omar
    Ignored:    data/test_run_sample_list.txt
    Ignored:    data/testlist.txt
    Ignored:    data/toplistall.RDS
    Ignored:    data/trtonly_24h_genes.RDS
    Ignored:    data/trtonly_3h_genes.RDS
    Ignored:    data/tvl24hour.txt
    Ignored:    data/tvl24hourw.txt
    Ignored:    data/venn_code.R
    Ignored:    data/viability.RDS

Untracked files:
    Untracked:  .RDataTmp
    Untracked:  .RDataTmp1
    Untracked:  .RDataTmp2
    Untracked:  .RDataTmp3
    Untracked:  3hr all.pdf
    Untracked:  Code_files_list.csv
    Untracked:  Data_files_list.csv
    Untracked:  Doxorubicin_vehicle_3_24.csv
    Untracked:  Doxtoplist.csv
    Untracked:  EPIqvalue_analysis.Rmd
    Untracked:  GWAS_list_of_interest.xlsx
    Untracked:  KEGGpathwaylist.R
    Untracked:  OmicNavigator_learn.R
    Untracked:  SigDoxtoplist.csv
    Untracked:  analysis/DRC_viability_check.Rmd
    Untracked:  analysis/cellcycle_kegg_genes.R
    Untracked:  analysis/ciFIT.R
    Untracked:  analysis/export_to_excel.R
    Untracked:  analysis/featureCountsPLAY.R
    Untracked:  cleanupfiles_script.R
    Untracked:  code/biomart_gene_names.R
    Untracked:  code/constantcode.R
    Untracked:  code/corMotifcustom.R
    Untracked:  code/cpm_boxplot.R
    Untracked:  code/extracting_ggplot_data.R
    Untracked:  code/movingfilesto_ppl.R
    Untracked:  code/pearson_extract_func.R
    Untracked:  code/pearson_tox_extract.R
    Untracked:  code/plot1C.fun.R
    Untracked:  code/spearman_extract_func.R
    Untracked:  code/venndiagramcolor_control.R
    Untracked:  cormotif_p.post.list_4.csv
    Untracked:  figS1024h.pdf
    Untracked:  individual-legenddark2.png
    Untracked:  installed_old.rda
    Untracked:  motif_ER.txt
    Untracked:  motif_LR.txt
    Untracked:  motif_NR.txt
    Untracked:  motif_TI.txt
    Untracked:  output/DNR_DEGlist.csv
    Untracked:  output/DNRvenn.RDS
    Untracked:  output/DOX_DEGlist.csv
    Untracked:  output/DOXvenn.RDS
    Untracked:  output/EPI_DEGlist.csv
    Untracked:  output/EPIvenn.RDS
    Untracked:  output/FC_necela.RDS
    Untracked:  output/FC_necela_names.RDS
    Untracked:  output/Figures/
    Untracked:  output/GTEXv8_gene_median_tpm.RDS
    Untracked:  output/GTEXv8_gene_tpm_heart_left_ventricle.RDS
    Untracked:  output/HER2_gene.RDS
    Untracked:  output/KEGGcellcyclegenes.RDS
    Untracked:  output/Knowles_log2cpm.csv
    Untracked:  output/MTX_DEGlist.csv
    Untracked:  output/MTXvenn.RDS
    Untracked:  output/SETA_analysis_reyes.RDS
    Untracked:  output/TRZ_DEGlist.csv
    Untracked:  output/TableS8.csv
    Untracked:  output/Volcanoplot_10
    Untracked:  output/Volcanoplot_10.RDS
    Untracked:  output/allfinal_sup10.RDS
    Untracked:  output/counts_v8_heart_left_ventricle_gct.RDS
    Untracked:  output/crisprfoldchange.RDS
    Untracked:  output/endocytosisgenes.csv
    Untracked:  output/gene_corr_fig9.RDS
    Untracked:  output/legend_b.RDS
    Untracked:  output/motif_ERrep.RDS
    Untracked:  output/motif_LRrep.RDS
    Untracked:  output/motif_NRrep.RDS
    Untracked:  output/motif_TI_rep.RDS
    Untracked:  output/necela_list_test.RDS
    Untracked:  output/necela_val_genes.RDS
    Untracked:  output/output-old/
    Untracked:  output/rank24genes.csv
    Untracked:  output/rank3genes.csv
    Untracked:  output/reneem@ls6.tacc.utexas.edu
    Untracked:  output/sequencinginformationforsupp.csv
    Untracked:  output/sequencinginformationforsupp.prn
    Untracked:  output/sigVDA24.txt
    Untracked:  output/sigVDA3.txt
    Untracked:  output/sigVDX24.txt
    Untracked:  output/sigVDX3.txt
    Untracked:  output/sigVEP24.txt
    Untracked:  output/sigVEP3.txt
    Untracked:  output/sigVMT24.txt
    Untracked:  output/sigVMT3.txt
    Untracked:  output/sigVTR24.txt
    Untracked:  output/sigVTR3.txt
    Untracked:  output/supplementary_motif_list_GO.RDS
    Untracked:  output/toptablebydrug.RDS
    Untracked:  output/trop_knowles_fun.csv
    Untracked:  output/tvl24hour.txt
    Untracked:  output/x_counts.RDS
    Untracked:  reneebasecode.R

Unstaged changes:
    Modified:   analysis/GTEx_genes.Rmd
    Modified:   analysis/Var_genes.Rmd
    Modified:   analysis/after_comments.Rmd
    Modified:   analysis/variance_scrip.Rmd
    Modified:   output/daplot.RDS
    Modified:   output/dxplot.RDS
    Modified:   output/epplot.RDS
    Modified:   output/mtplot.RDS
    Modified:   output/plan2plot.png
    Modified:   output/trplot.RDS
    Modified:   output/veplot.RDS

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Knowels_trop_analysis.Rmd) and HTML (docs/Knowels_trop_analysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 8606776 reneeisnowhere 2023-11-28 updated with GO on Highly variable genes
html b651aa9 reneeisnowhere 2023-11-27 Build site.
Rmd 522ca8c reneeisnowhere 2023-11-27 adding more analysis
html b474072 reneeisnowhere 2023-11-21 Build site.
Rmd 5c45b92 reneeisnowhere 2023-11-21 adding knowles data again
html b4d55c4 reneeisnowhere 2023-11-21 Build site.
Rmd b1b9563 reneeisnowhere 2023-11-21 adding logcpm with knowles data
html b442590 reneeisnowhere 2023-11-21 Build site.
Rmd b464f89 reneeisnowhere 2023-11-21 updates to links again
html 19a3502 reneeisnowhere 2023-11-21 Build site.
Rmd d17d270 reneeisnowhere 2023-11-21 adding links
html c72467c reneeisnowhere 2023-11-09 Build site.
Rmd 8a6ebc1 reneeisnowhere 2023-11-09 updates on plots
html 9af9df6 reneeisnowhere 2023-11-09 Build site.
Rmd 60c64d1 reneeisnowhere 2023-11-09 adding boxplots
html d12232a reneeisnowhere 2023-11-07 Build site.
Rmd 441f82b reneeisnowhere 2023-11-07 adding more code
html cae46a9 reneeisnowhere 2023-11-07 Build site.
html bd0342c reneeisnowhere 2023-11-07 Build site.
Rmd 453ebe6 reneeisnowhere 2023-11-07 adding code
Rmd d6ecce9 reneeisnowhere 2023-11-07 adding code
html ae9124e reneeisnowhere 2023-10-30 Build site.
Rmd 74c2dc1 reneeisnowhere 2023-10-30 updated
Rmd d970e84 reneeisnowhere 2023-10-30 adding more analysis

library(tidyverse)
library(ggsignif)
library(cowplot)
library(ggpubr)
library(scales)
library(sjmisc)
library(kableExtra)
library(broom)
library(ComplexHeatmap)
library(ggVennDiagram)
library(biomaRt)
library(limma)
library(edgeR)
library(RColorBrewer)
palette_colors_mine <- colorRampPalette(colors = c("green","white","purple","red" ))(60)
scales::show_col(palette_colors_mine)

Version Author Date
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Here I will attempt to recreate my correlation analysis on the knowles data using their troponin and RNAseq log2cpm.

### genes I want to know about
interest_genes <- read.csv("output/GOI_genelist.txt", row.names = 1)
trop_knowles <- read.csv("output/trop_knowles_fun.csv", row.names = 1)
Knowles_log2cpm <- readRDS("data/Knowles_log2cpm_real.RDS")
trop0.625 <- trop_knowles %>% 
  filter(dosage <1) 
store <- Knowles_log2cpm %>% 
  dplyr::select( 'ESGN',ends_with(c('0.625', '0'))) %>% 
  dplyr::filter(ESGN %in% interest_genes$ensembl_gene_id) %>% 
  pivot_longer(cols=!ESGN, names_to = "ind", values_to = "counts") %>% 
  separate(ind,into=c("cell_line","dosage"), sep = ":") %>%
  mutate(dosage = as.numeric(dosage)) %>% 
  full_join(., trop0.625, by=c("cell_line", "dosage")) %>% 
  group_by(cell_line) %>% 
  full_join(., interest_genes, by = c("ESGN" = "ensembl_gene_id"))
  
  
  ###new graph stuff
  for (gene in interest_genes$ensembl_gene_id){
    gene_plot <- store %>% 
      dplyr::filter(ESGN == gene) %>%
      ggplot(., aes(x=troponin, y=counts))+
      geom_point(aes(col=cell_line))+
      geom_smooth(method="lm")+
      
      facet_wrap(hgnc_symbol~dosage, scales="free")+
      theme_classic()+
      xlab("troponin I expression") +
      ylab("Gene counts in log2 cpm") +
      ggtitle(expression(paste("Correlation between counts and troponin I Knowles")))+
      scale_color_manual(values = palette_colors_mine, aesthetics = c("color", "fill"), guide=FALSE)+
      # guides(fill="none")+
     ggpubr:: stat_cor(method="spearman",
                       # cor.coef.name="rho",
               aes(label = paste(..r.label.., ..p.label.., sep = "~`,\n`~")),
               color = "black",
               label.x.npc = 0.01,
               label.y.npc=0.01, 
               size = 3)+
      theme(plot.title = element_text(size = rel(1.5), hjust = 0.5,face = "bold"),
            axis.title = element_text(size = 15, color = "black"),
            axis.ticks = element_line(size = 1.5),
            axis.text = element_text(size = 8, color = "black", angle = 20),
            strip.text.x = element_text(size = 12, color = "black", face = "italic"))
   print(gene_plot)
   
  }

Version Author Date
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Version Author Date
b442590 reneeisnowhere 2023-11-21
9af9df6 reneeisnowhere 2023-11-09
bd0342c reneeisnowhere 2023-11-07

Knowles Boxplots of Fig9 genes

Knowles_log2cpm_box <- readRDS("data/Knowles_log2cpm_real.RDS")

store_box <- Knowles_log2cpm_box %>% 
  # dplyr::select( 'ESGN',ends_with(c('0.625', '0'))) %>% 
  dplyr::filter(ESGN %in% interest_genes$ensembl_gene_id) %>% 
  pivot_longer(cols=!ESGN, names_to = "ind", values_to = "counts") %>% 
  separate(ind,into=c("cell_line","dosage"), sep = ":") %>%
  mutate(dosage = as.numeric(dosage)) %>% 
  # full_join(., trop0.625, by=c("cell_line", "dosage")) %>% 
  group_by(cell_line) %>% 
  full_join(., interest_genes, by = c("ESGN" = "ensembl_gene_id"))
store_box %>% 
  mutate(dosage=factor(dosage, levels=c('0','0.000', '0.625','1.25', '2.5','5'))) %>% 
  ggplot(., aes(x=dosage,y=counts), group=dosage)+
  geom_boxplot()+
  facet_wrap(~hgnc_symbol)

Version Author Date
9af9df6 reneeisnowhere 2023-11-09

RNA-seq trial analysis

Analysis of expressed genes

RNA_seq_trial<- readRDS("data/RNA_seq_trial.RDS")

all_cpmcount <-  read_table("data/Counts_RNA_ERMatthews.txt")
cpm_count_main <- readRDS("data/cpmcount.RDS") %>% rownames_to_column(var = "ENTREZID")
colnames(cpm_count_main) <- colnames(all_cpmcount)


test_run_sample_list <- read.csv("data/test_run_sample_list.txt", row.names = 1)

colnames(RNA_seq_trial) <- c("ENTREZID",test_run_sample_list$Sample_ID)

lcpm_trial <- RNA_seq_trial %>% 
  column_to_rownames("ENTREZID") %>% 
  cpm(., log=TRUE) %>% 
  as.data.frame() #%>% 
 

row_means <- rowMeans(lcpm_trial)
x_trial <- lcpm_trial[row_means > 0,]
dim(x_trial)
[1] 13277     4
list_genes_trial <- rownames(x_trial)
ggVennDiagram::ggVennDiagram(list(list_genes_trial, cpm_count_main$ENTREZID),
                             category.names = c("Trialgenes","Maingenes"),
              show_intersect = TRUE,
              set_color = "black",
              label = "count",
              label_percent_digit = 1,
              label_size = 4,
              label_alpha = 0,
              label_color = "black",
              edge_lty = "solid", set_size = 4.5)#+

Correlation of counts files

lcpm_trial_full <- RNA_seq_trial %>% 
  column_to_rownames("ENTREZID") %>% 
  cpm(., log=TRUE) %>% 
  as.data.frame() %>% 
  rownames_to_column(var = "ENTREZID")

lcpm_trial_full %>%
  column_to_rownames(var="ENTREZID") %>%
  cor(.) %>% 
  Heatmap(.,layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 10))})

Version Author Date
ae9124e reneeisnowhere 2023-10-30
lcpm_main <- all_cpmcount %>% 
  column_to_rownames("ENTREZID") %>% 
  cpm(., log=TRUE) %>% 
  as.data.frame() %>% 
  rownames_to_column(var = "ENTREZID") %>% 
  dplyr::select(ENTREZID, all_of(starts_with("DOX"))) %>% 
  dplyr::select(ENTREZID, all_of(ends_with("3h")))  
  
combined_data <- lcpm_main %>%
  full_join(., lcpm_trial_full, by= "ENTREZID") %>%
  column_to_rownames("ENTREZID") %>% 
  dplyr::select(starts_with("DOX"),`3hr_0.5`)%>% 
  cor(.,) 


  
  Heatmap(combined_data,column_title = "Full gene list",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(combined_data, i, j)), x, y, 
            gp = gpar(fontsize = 10))})

Version Author Date
b4d55c4 reneeisnowhere 2023-11-21
ae9124e reneeisnowhere 2023-10-30
  only79_ind <- lcpm_main %>%
  full_join(., lcpm_trial_full, by= "ENTREZID") %>% 
    dplyr::select(ENTREZID,'3hr_0.5',"DOX.4.3h") %>% 
    column_to_rownames("ENTREZID") %>% 
  cor(.,) 

  
  Heatmap(only79_ind,column_title = "Full gene list_79",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(only79_ind, i, j)), x, y, 
            gp = gpar(fontsize = 10))})

Version Author Date
9af9df6 reneeisnowhere 2023-11-09
ae9124e reneeisnowhere 2023-10-30
lcpm_main_veh <- all_cpmcount %>% 
  column_to_rownames("ENTREZID") %>% 
  cpm(., log=TRUE) %>% 
  as.data.frame() %>% 
  rownames_to_column(var = "ENTREZID") %>% 
  dplyr::select(ENTREZID, all_of(c(starts_with("DOX"),starts_with("VEH")))) %>% 
   dplyr::select(ENTREZID, all_of(ends_with("3h")))  
  

combined_data_veh<- lcpm_main_veh %>%
  full_join(., lcpm_trial_full, by= "ENTREZID") %>% 
  column_to_rownames("ENTREZID") %>% 
  cor(.,) 
  
  
  
  Heatmap(combined_data_veh, column_title = "all genes in list, no filtering",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(combined_data_veh, i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
ae9124e reneeisnowhere 2023-10-30
lcpm_trial_filter_main <- lcpm_trial_full %>% 
  filter(ENTREZID %in% cpm_count_main$ENTREZID)
 


lcpm_trial_filter_main %>% 
column_to_rownames(var="ENTREZID") %>%
  cor(.) %>% 
  Heatmap(.,column_title = "Using 14,084 expressed genes from Main data",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
ae9124e reneeisnowhere 2023-10-30
lcpm_trial_filter <- lcpm_trial_full %>% 
  filter(ENTREZID %in% list_genes_trial)
 

lcpm_trial_filter %>% 
column_to_rownames(var="ENTREZID") %>%
  cor(.) %>% 
  Heatmap(.,column_title = "Using 13277 expressed genes",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
ae9124e reneeisnowhere 2023-10-30
lcpm_main_filter_trial <- lcpm_main_veh %>% 
  filter(ENTREZID %in% list_genes_trial)

lcpm_trial_filter %>% 
  full_join(., lcpm_main_filter_trial, by = "ENTREZID") %>% 
  column_to_rownames(var="ENTREZID") %>%
  cor(.) %>% 
  Heatmap(.,column_title = "Using 13277 expressed genes",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
ae9124e reneeisnowhere 2023-10-30
lcpm_trial_filter_main %>% 
  left_join(., lcpm_main, by = "ENTREZID") %>% 
  column_to_rownames(var="ENTREZID") %>%
  dplyr::select(DOX.4.3h,starts_with(("3hr")))%>% 
  cor(.) %>% 
  Heatmap(.,column_title = "Using 14084 expressed genes, just 79-1",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
bd0342c reneeisnowhere 2023-11-07
hr3_indv4 <- lcpm_trial_filter_main %>% 
  left_join(., lcpm_main, by = "ENTREZID") %>% 
  column_to_rownames(var="ENTREZID") %>%
  dplyr::select(DOX.4.3h,`3hr_0.5`,`3hr_0.0`)%>% 
  cor(.) %>% 
  Heatmap(.,column_title = "Using 14084 expressed genes, just 79-1",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})
  
plot(hr3_indv4)

Version Author Date
bd0342c reneeisnowhere 2023-11-07

correlation heatmap of 3hr Dox 1-6 individuals and trial data

lcpm_trial_filter_main %>% 
  left_join(., lcpm_main, by = "ENTREZID") %>% 
  column_to_rownames(var="ENTREZID") %>%
  dplyr::select(starts_with("DOX"),`3hr_0.5`)%>% 
  cor(.) %>% 
  Heatmap(.,column_title = "Using 14084 expressed genes, just 79-1 with all 3 hour samples",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
9af9df6 reneeisnowhere 2023-11-09
lcpm_main %>% 
  left_join(., lcpm_trial_full, by = "ENTREZID") %>% 
  column_to_rownames(var="ENTREZID") %>%
  dplyr::select(starts_with("DOX"),`3hr_0.5`)%>% 
  cor(.) %>% 
  Heatmap(.,column_title = "all 29395 genes expriment with trial 0.5 uM",
          layer_fun = function(j, i, x, y, width, height, fill) {
              grid.text(sprintf("%.3f", pindex(., i, j)), x, y, 
            gp = gpar(fontsize = 8))})

Version Author Date
b4d55c4 reneeisnowhere 2023-11-21

barplots

backGL <-read_csv("data/backGL.txt", 
    col_types = cols(...1 = col_skip()))

GOI_genelist <- read.csv("output/GOI_genelist.txt", row.names = 1)

cpm_boxplot_trial <-function(lcpm_trial, GOI, ylab) {
    ##GOI needs to be ENTREZID
  df_plot <- lcpm_trial %>% 
    dplyr::filter(rownames(.)== GOI) %>%
    pivot_longer(everything(),
                 names_to = "treatment",
                 values_to = "counts") %>%
    separate(treatment, c("time","conc"), sep= "_") %>%
    mutate(conc = factor(conc,levels=c('0.0','0.1','0.5','1.0'), labels = c ("NT", "0.1 uM", "0.5 uM", "1.0 uM")))
  
 plota <-  ggplot2::ggplot(df_plot, aes(x=conc, y= counts))+
    geom_col(position="identity")+
    theme_bw()+
    ylab(ylab)+
    xlab("")+
     ggtitle(paste(GOI))+
      theme(
        # strip.background = element_rect(fill = "white",linetype=1, linewidth = 0.5),
          plot.title = element_text(size=12,hjust = 0.5,face="bold"),
          axis.title = element_text(size = 10, color = "black"),
          axis.ticks = element_line(linewidth = 1.0),
          panel.background = element_rect(colour = "black", size=1),
          # axis.text.x = element_blank(),
          strip.text.x = element_text(margin = margin(2,0,2,0, "pt"),face = "bold"))
    print(plota)
}
  



for (g in seq(1:11)){
  datafilter <- GOI_genelist
  a <- GOI_genelist[g,3]
  # b <- datafilter[g,1]
  cpm_boxplot_trial(lcpm_trial,GOI=datafilter[g,1],
                           ylab =bquote(~italic(.(a))~log[2]~"cpm "))
  
}  

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

Version Author Date
c72467c reneeisnowhere 2023-11-09
d12232a reneeisnowhere 2023-11-07

expression of trial RNA seq data ### Knowles log2cpm 24hr and my log2cpm 24hr

kcpm <- store_box %>%  
  mutate(dosage=factor(dosage, levels=c('0','0.000', '0.625','1.25', '2.5','5'))) %>% 
  dplyr::filter(dosage==("0")|dosage == "0.625") %>% 
  mutate(expr="K")
  
lcpm_24h <- all_cpmcount %>% 
  column_to_rownames("ENTREZID") %>% 
  cpm(., log=TRUE) %>% 
  as.data.frame() %>% 
  rownames_to_column(var = "ENTREZID") %>% 
  dplyr::select(ENTREZID, all_of(starts_with(c("DOX","VEH")))) %>% 
  dplyr::select(ENTREZID, all_of(ends_with("24h")))  %>% 
  dplyr::filter(ENTREZID %in% interest_genes$entrezgene_id) %>% 
  pivot_longer(cols=!ENTREZID, names_to = "ind", values_to = "counts") %>%   mutate(ENTREZID = as.numeric(ENTREZID)) %>% 
  full_join(., interest_genes, by = c("ENTREZID"="entrezgene_id")) %>% 
  mutate(expr="ME") %>% 
  rename("ESGN"="ensembl_gene_id","entrezgene_id"="ENTREZID") %>% 
  separate(ind, into = c("dosage","cell_line",NA)) %>% 
  mutate(dosage=case_match(dosage,"DOX"~"0.5", .default = dosage)) 
  
lcpm_24h %>% 
  rbind(.,kcpm) %>% 
  mutate(dosage=factor(dosage, levels=c('0','0.625',"VEH","0.5"))) %>% 
  ggplot(., aes(x=dosage,y=counts), group=expr)+
  geom_boxplot()+
  facet_wrap(~hgnc_symbol, scales="free_y" )#+

Version Author Date
b474072 reneeisnowhere 2023-11-21
b4d55c4 reneeisnowhere 2023-11-21
  # geom_signif(
  #   comparisons = list(
  #     c('0', '0.5'),
  #     c('0', '0.625'),
  #     c('0.5','0.625')
  #   ),
  #   test = "t.test",
  #   tip_length = 0.01,
  #   map_signif_level = FALSE,
  #   textsize = 4,
  #   step_increase = 0.3
  # ) 

Replication of variance figures using Knowels data

# saveRDS(store_var, "data/Knowles_variation_data.RDS")
# store_var <- Knowles_log2cpm %>% 
#   dplyr::select( 'ESGN',ends_with(c('0.625', '0'))) %>% 
#   rowwise() %>% 
#   mutate(mean_DOX=mean(c_across(ends_with('0.625'))),
#          var_DOX=var(c_across(ends_with('0.625'))),
#         mean_NT=mean(c_across(ends_with('0'))),
#          var_NT=var(c_across(ends_with('0')))) %>% 
#   mutate(data=tidy(var.test(c_across(ends_with("0.625")),c_across(ends_with("0")))))# %>% 
  # dplyr::select("ESGN","mean_DOX","var_DOX","mean_NT", "var_NT","data")


store_var <- readRDS("data/Knowles_variation_data.RDS")

knowlesdrug<- store_var %>% 
  dplyr::select("ESGN","mean_DOX","var_DOX","mean_NT", "var_NT") %>% 
  pivot_longer(cols = !"ESGN", names_to = "short", values_to = "values") %>% 
  separate(short, into=c("calc","treatment")) #%>% 
knowlesdrug %>% 
  as.data.frame() %>% 
  dplyr::filter(calc == "mean") %>% 
  ggplot(., aes(x= treatment, y=values))+
  geom_boxplot()+
  ggtitle("Knowles Means across all genes")+
  geom_signif(
    comparisons = list(
      c("DOX", "NT")),
    test = "t.test",
    tip_length = 0.01,
    map_signif_level = FALSE
    # textsize = 4,
    # # y_position = 11,
    # step_increase = 0.05
  )

Version Author Date
b651aa9 reneeisnowhere 2023-11-27
knowlesdrug %>% 
  as.data.frame() %>% 
  dplyr::filter(calc == "var") %>% 
  ggplot(., aes(x= treatment, y=values))+
  geom_boxplot(outlier.shape= NA)+
  ggtitle(" Knowles Variance across all genes")+
  
  geom_signif(
    comparisons = list(
      c("DOX", "NT")),
    test = "t.test",
    tip_length = 0.01,
    y_position = 0.5,
    # vjust=1,
    map_signif_level = FALSE)+
 ylim(NA,1.25)

Version Author Date
b651aa9 reneeisnowhere 2023-11-27

qvalue data

library(qvalue)


# p_list <- map_df(store_var$data,~as.data.frame(.x$p.value))
#  rownames(p_list) <- store_var$ESGN

p_list <- store_var %>% 
  unnest(data) %>% 
   dplyr::select(ESGN,statistic,p.value)
estDOXk <- qvalue(p_list$p.value) 
 hist(estDOXk)

 plot(estDOXk)

summary(estDOXk) 

Call:
qvalue(p = p_list$p.value)

pi0:    0.3993136   

Cumulative number of significant calls:

          <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1    <1
p-value     1919   2762  4065   4832  5533 6453 12317
q-value     1616   2447  3895   4810  5723 6956 12317
local FDR   1143   1759  2724   3313  3800 4507 12317
knowlesvar <- data.frame("ESGN"=p_list$ESGN,"pvalues"=estDOXk$pvalues,"qvalues"=estDOXk$qvalues,"lfdr"= estDOXk$lfdr)  

intersecting_K <- knowlesvar %>% 
  filter(qvalues<0.1)
my_qval_list24 <- readRDS("data/qval24hr.RDS") 

EPI508_list <- my_qval_list24 %>% 
  dplyr::select(ENTREZID,EPIqvalues) %>% 
  filter(EPIqvalues<0.1) %>% 
  dplyr::select(ENTREZID) %>%
  mutate(ENTREZID=as.numeric(ENTREZID)) %>% 
  left_join(.,backGL, by="ENTREZID")
Knowlesvarlist <- readRDS("data/Knowlesvarlist.RDS")
  
# Knowlesvarlist<- getBM(attributes=my_attributes,filters ='ensembl_gene_id',values = intersecting_K$ESGN, mart = ensembl)

length(intersect(EPI508_list$ENTREZID,Knowlesvarlist$entrezgene_id))
[1] 367
intersect_genes <- EPI508_list %>% 
  dplyr::filter(ENTREZID %in% Knowlesvarlist$entrezgene_id)

intersect_genes %>% 
kable(.,caption = "EPI Highly variable genes found in Knowles higly  variable DOX genes") %>% 
  kable_paper("striped", full_width = FALSE) %>%  
  kable_styling(full_width = TRUE, bootstrap_options = c("striped","hover")) %>% 
  scroll_box(width = "100%", height = "400px")
EPI Highly variable genes found in Knowles higly variable DOX genes
ENTREZID SYMBOL
49856 WRAP73
23463 ICMT
5195 PEX14
57085 AGTRAP
23207 PLEKHM2
79363 CPLANE2
55160 ARHGEF10L
55616 ASAP3
57095 PITHD1
11313 LYPLA2
84065 TMEM222
23673 STX12
56063 TMEM234
3065 HDAC1
127544 RNF19B
113444 SMIM12
27095 TRAPPC3
112950 MED8
9670 IPO13
149483 CCDC17
387338 NSUN4
8543 LMO4
64858 DCLRE1B
128077 LIX1L
388695 LYSMD1
6944 VPS72
65005 MRPL9
4000 LMNA
11266 DUSP12
261726 TIPRL
5279 PIGC
83479 DDX59
134 ADORA1
64853 AIDA
163859 SDE2
5664 PSEN2
65094 JMJD4
126731 CCSAP
22796 COG2
79723 SUV39H2
253430 IPMK
26091 HERC4
219738 FAM241B
11319 ECD
5532 PPP3CB
79933 SYNPO2L
27063 ANKRD1
118924 FRA10AC1
282991 BLOC1S2
10360 NPM3
9937 DCLRE1A
9184 BUB3
161 AP2A2
7748 ZNF195
10612 TRIM3
23647 ARFIP2
65975 STK33
113174 SAAL1
627 BDNF
79797 ZNF408
10978 CLP1
55048 VPS37C
5436 POLR2G
144097 SPINDOC
57410 SCYL1
10524 KAT5
25855 BRMS1
10432 RBM14
338692 ANKRD13D
5883 RAD9A
5499 PPP1CA
2950 GSTP1
116985 ARAP1
5612 THAP12
56946 EMSY
282679 AQP11
51585 PCF11
60492 CCDC90B
9440 MED17
54851 ANKRD49
10929 SRSF8
5049 PAFAH1B2
219902 TLCD5
219899 TBCEL
9538 EI24
219833 KCNJ5-AS1
57102 C12orf4
8079 MLF2
25977 NECAP1
79657 RPAP3
55652 SLC48A1
1019 CDK4
23041 MON2
29110 TBK1
253827 MSRB3
117177 RAB3IP
22822 PHLDA1
89795 NAV3
121053 NOPCHAP1
1407 CRY1
51228 GLTP
400073 C12orf76
5564 PRKAB1
51499 TRIAP1
56616 DIABLO
387893 KMT5A
11066 SNRNP35
5901 RAN
100289635 ZNF605
221178 SPATA13
283537 SLC46A3
2963 GTF2F2
337867 UBAC2
3981 LIG4
84945 ABHD13
3621 ING1
253959 RALGAPA1
54813 KLHL28
6617 SNAPC1
123016 TTC8
51527 GSKIP
2972 BRF1
22893 BAHD1
9325 TRIP4
5371 PML
60490 PPCDC
79631 EFL1
84219 WDR24
23059 CLUAP1
2072 ERCC4
91949 COG7
23568 ARL2BP
29070 CCDC113
8824 CES2
1874 E2F4
6560 SLC12A4
146198 ZFP90
54386 TERF2IP
5119 CHMP1A
64359 NXN
5048 PAFAH1B1
388324 INCA1
9135 RABEP1
57336 ZNF287
79736 TEFM
55813 UTP6
54475 NLE1
5193 PEX12
22794 CASC3
3292 HSD17B1
10614 HEXIM1
114881 OSBPL7
29916 SNX11
8405 SPOP
10237 SLC35B1
81558 FAM117A
55316 RSAD1
8161 COIL
6426 SRSF1
54903 MKS1
55771 PRR11
284161 GDPD1
6198 RPS6KB1
57508 INTS2
84923 FAM104A
55028 C17orf80
6730 SRP68
9489 PGS1
9775 EIF4A3
57521 RPTOR
79643 CHMP6
5881 RAC3
8780 RIOK3
55364 IMPACT
54531 MIER2
126308 MOB3A
29985 SLC39A3
51343 FZR1
51588 PIAS4
6455 SH3GL1
5609 MAP2K7
79603 CERS4
93134 ZNF561
126070 ZNF440
2193 FARSA
85360 SYDE1
8907 AP1M1
54858 PGPEP1
93436 ARMC6
79414 LRFN3
163087 ZNF383
84503 ZNF527
22835 ZFP30
284323 ZNF780A
29950 SERTAD1
90324 CCDC97
56006 SMG9
7773 ZNF230
7769 ZNF226
9668 ZNF432
147657 ZNF480
112724 RDH13
163033 ZNF579
147694 ZNF548
100293516 ZNF587B
25799 ZNF324
7260 EIPR1
55006 TRMT61B
253635 GPATCH11
92906 HNRNPLL
8491 MAP4K3
57504 MTA3
53335 BCL11A
5861 RAB1A
27332 ZNF638
8446 DUSP11
1716 DGUOK
84173 ELMOD3
129303 TMEM150A
5903 RANBP2
10254 STAM2
10213 PSMD14
115677 NOSTRIN
79828 METTL8
80067 DCAF17
129831 RBM45
3628 INPP1
6775 STAT4
9330 GTF3C3
57404 CYP20A1
54891 INO80D
377007 KLHL30
4735 SEPTIN2
80023 NRSN2
55968 NSFL1C
55317 AP5S1
64412 GZF1
51230 PHF20
25980 AAR2
63905 MANBAL
10904 BLCAP
60625 DHX35
51098 IFT52
51006 SLC35C2
10564 ARFGEF2
11054 OGFR
80331 DNAJC5
29104 N6AMT1
84221 SPATC1L
51586 MED15
6598 SMARCB1
84700 MYO18B
402055 SRRD
84164 ASCC2
23780 APOL2
129138 ANKRD54
9463 PICK1
84247 RTL6
132001 TAMM41
23609 MKRN2
22908 SACM1L
51385 ZNF589
64925 CCDC71
11070 TMEM115
28972 SPCS1
25871 NEPRO
55254 TMEM39A
131601 TPRA1
7879 RAB7A
51122 COMMD2
64393 ZMAT3
86 ACTL6A
55486 PARL
90407 TMEM41A
1487 CTBP1
7469 NELFA
152 ADRA2C
132789 GNPDA2
10606 PAICS
92597 MOB1B
266812 NAP1L5
56916 SMARCAD1
55212 BBS7
90826 PRMT9
4750 NEK1
55100 WDR70
55814 BDP1
79810 PTCD2
167153 TENT2
9765 ZFYVE16
55781 RIOK2
90355 MACIR
153733 CCDC112
153443 SRFBP1
8572 PDLIM4
202052 DNAJC18
23438 HARS2
10826 FAXDC2
9443 MED7
5917 RARS1
80315 CPEB4
8899 PRPF4B
10473 HMGN4
7746 ZSCAN9
8449 DHX16
57827 C6orf47
578 BAK1
5467 PPARD
6428 SRSF3
9477 MED20
9533 POLR1C
26036 ZNF451
51715 RAB23
25821 MTO1
57226 LYRM2
23376 UFL1
26235 FBXL4
91749 MFSD4B
10758 TRAF3IP2
5689 PSMB1
5575 PRKAR1B
90639 COX19
84262 PSMG3
8379 MAD1L1
54476 RNF216
221830 POLR1F
3364 HUS1
55695 NSUN5
9569 GTF2IRD1
113878 DTX2
6717 SRI
9069 CLDN12
10898 CPSF4
3268 AGFG2
5001 ORC5
60561 RINT1
64418 TMEM168
27153 ZNF777
80346 REEP4
5533 PPP3CC
157313 CDCA2
23087 TRIM35
157574 FBXO16
7336 UBE2V2
90362 FAM110B
55824 PAG1
55656 INTS8
51123 ZNF706
51105 PHF20L1
203062 TSNARE1
55958 KLHL9
54840 APTX
55035 NOL8
10592 SMC2
5998 RGS3
51552 RAB14
399665 FAM102A
84885 ZDHHC12
5900 RALGDS
6837 MED22
57109 REXO4
92715 DPH7
23708 GSPT2
29934 SNX12
139596 UPRT
64860 ARMCX5
644596 SMIM10L2B
# saveRDS(Knowlesvarlist,"data/Knowlesvarlist.RDS")

GO analysis of Highly Variable genes

I am using two backgrounds:
(1) the expressed background of the expressed genes in my data in the A set (n = 14,084)
(2) the full EPI highly variable genes from my data in the B set (n = 508) set 2 or B did not yield anything!

A set

library(gprofiler2)
library(org.Hs.eg.db)
### This is for retrieving all genes annotated in the Herpes simplex virus list
# https://support.bioconductor.org/p/109871/
library(limma)
tab <- getGeneKEGGLinks(species="hsa")
tab$Symbol <- mapIds(org.Hs.eg.db, tab$GeneID,
                       column="SYMBOL", keytype="ENTREZID")
KEGG_05168 <- tab %>% dplyr::filter(PathwayID=="hsa05168")

# backGL <-read_csv("data/backGL.txt", 
#     col_types = cols(...1 = col_skip()))

# SETA_resgenes <- gost(query = intersect_genes$SYMBOL,
#                     organism = "hsapiens",
#                     significant = FALSE,
#                     ordered_query = TRUE,
#                     domain_scope = "custom",
#                     measure_underrepresentation = FALSE,
#                     evcodes = FALSE,
#                     user_threshold = 0.05,
#                     correction_method = c("fdr"),
#                     custom_bg = backGL$SYMBOL,
#                     sources=c("KEGG"))
# saveRDS(SETA_resgenes,"data/DEG-GO/SETA_resgenes.RDS")
SETA_resgenes <- readRDS("data/DEG-GO/SETA_resgenes.RDS")

Set_A_genes <- gostplot(SETA_resgenes, capped = FALSE, interactive = TRUE)
Set_A_genes
setA_table <- SETA_resgenes$result %>% 
  dplyr::select(c(source, term_id,
                  term_name,intersection_size, 
                   term_size, p_value))# %>% 

list_intersect_path <- KEGG_05168 %>% 
  filter(Symbol%in% intersect_genes$SYMBOL) 

list_intersect_path%>% 
  kable(., caption = "List of intersecting genes from the Herpes simplex KEGG pathway") %>%
  kable_paper("striped", full_width = FALSE) %>%
  kable_styling(
    full_width = FALSE,
    position = "left",
    bootstrap_options = c("striped", "hover")
  ) %>%
  scroll_box(width = "100%", height = "400px")
List of intersecting genes from the Herpes simplex KEGG pathway
GeneID PathwayID Symbol
100289635 hsa05168 ZNF605
10929 hsa05168 SRSF8
126070 hsa05168 ZNF440
146198 hsa05168 ZFP90
147657 hsa05168 ZNF480
147694 hsa05168 ZNF548
163087 hsa05168 ZNF383
22835 hsa05168 ZFP30
25799 hsa05168 ZNF324
27153 hsa05168 ZNF777
284323 hsa05168 ZNF780A
29110 hsa05168 TBK1
51385 hsa05168 ZNF589
5371 hsa05168 PML
5499 hsa05168 PPP1CA
578 hsa05168 BAK1
6426 hsa05168 SRSF1
6428 hsa05168 SRSF3
7748 hsa05168 ZNF195
7769 hsa05168 ZNF226
7773 hsa05168 ZNF230
84503 hsa05168 ZNF527
93134 hsa05168 ZNF561
9668 hsa05168 ZNF432
setA_table %>% 
  kable(., caption = "List of KEGG pathways enriched") %>%
  kable_paper("striped", full_width = FALSE) %>%
  kable_styling(
    full_width = FALSE,
    position = "left",
    bootstrap_options = c("striped", "hover")
  ) %>%
  scroll_box(width = "100%", height = "400px")
List of KEGG pathways enriched
source term_id term_name intersection_size term_size p_value
KEGG KEGG:05168 Herpes simplex virus 1 infection 19 415 0.0462437
KEGG KEGG:00900 Terpenoid backbone biosynthesis 1 21 0.2135685
KEGG KEGG:04144 Endocytosis 13 232 0.2135685
KEGG KEGG:04213 Longevity regulating pathway - multiple species 4 55 0.2343037
KEGG KEGG:03250 Viral life cycle - HIV-1 5 56 0.2343037
KEGG KEGG:05031 Amphetamine addiction 3 49 0.2343037
KEGG KEGG:03015 mRNA surveillance pathway 5 86 0.2343037
KEGG KEGG:00565 Ether lipid metabolism 3 28 0.3935654
KEGG KEGG:04146 Peroxisome 1 68 0.3935654
KEGG KEGG:04218 Cellular senescence 5 145 0.3935654
KEGG KEGG:04924 Renin secretion 2 50 0.3935654
KEGG KEGG:05166 Human T-cell leukemia virus 1 infection 5 180 0.4307887
KEGG KEGG:04710 Circadian rhythm 2 30 0.4307887
KEGG KEGG:03013 Nucleocytoplasmic transport 5 99 0.4307887
KEGG KEGG:03020 RNA polymerase 3 34 0.5006801
KEGG KEGG:04330 Notch signaling pathway 1 51 0.5006801
KEGG KEGG:05016 Huntington disease 4 256 0.5006801
KEGG KEGG:05212 Pancreatic cancer 5 75 0.5006801
KEGG KEGG:04110 Cell cycle 5 151 0.5006801
KEGG KEGG:04022 cGMP-PKG signaling pathway 3 133 0.5006801
KEGG KEGG:04666 Fc gamma R-mediated phagocytosis 1 75 0.5006801
KEGG KEGG:00563 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis 1 25 0.5006801
KEGG KEGG:04720 Long-term potentiation 2 55 0.5006801
KEGG KEGG:05034 Alcoholism 3 112 0.5150544
KEGG KEGG:00970 Aminoacyl-tRNA biosynthesis 3 44 0.5150544
KEGG KEGG:05170 Human immunodeficiency virus 1 infection 6 165 0.5150544
KEGG KEGG:03060 Protein export 2 23 0.5150544
KEGG KEGG:00564 Glycerophospholipid metabolism 1 76 0.5150544
KEGG KEGG:05220 Chronic myeloid leukemia 1 75 0.5158959
KEGG KEGG:05132 Salmonella infection 1 214 0.5466451
KEGG KEGG:00310 Lysine degradation 2 59 0.5466451
KEGG KEGG:04115 p53 signaling pathway 2 65 0.5466451
KEGG KEGG:04910 Insulin signaling pathway 4 121 0.5494968
KEGG KEGG:04152 AMPK signaling pathway 3 103 0.5520660
KEGG KEGG:04140 Autophagy - animal 5 133 0.5520660
KEGG KEGG:04145 Phagosome 1 99 0.5520660
KEGG KEGG:04120 Ubiquitin mediated proteolysis 4 134 0.5520660
KEGG KEGG:05418 Fluid shear stress and atherosclerosis 4 114 0.5520660
KEGG KEGG:04215 Apoptosis - multiple species 2 30 0.5520660
KEGG KEGG:05032 Morphine addiction 1 56 0.5520660
KEGG KEGG:04114 Oocyte meiosis 2 108 0.5520660
KEGG KEGG:04961 Endocrine and other factor-regulated calcium reabsorption 1 41 0.5520660
KEGG KEGG:04370 VEGF signaling pathway 2 49 0.5520660
KEGG KEGG:04931 Insulin resistance 3 95 0.5520660
KEGG KEGG:04923 Regulation of lipolysis in adipocytes 1 41 0.5520660
KEGG KEGG:04922 Glucagon signaling pathway 2 84 0.5520660
KEGG KEGG:04921 Oxytocin signaling pathway 3 115 0.5520660
KEGG KEGG:04919 Thyroid hormone signaling pathway 1 112 0.5520660
KEGG KEGG:04728 Dopaminergic synapse 2 103 0.5520660
KEGG KEGG:04613 Neutrophil extracellular trap formation 1 100 0.5520660
KEGG KEGG:04211 Longevity regulating pathway 3 78 0.5520660
KEGG KEGG:05030 Cocaine addiction 1 35 0.5520660
KEGG KEGG:04660 T cell receptor signaling pathway 2 70 0.5520660
KEGG KEGG:03008 Ribosome biogenesis in eukaryotes 3 75 0.5520660
KEGG KEGG:05202 Transcriptional misregulation in cancer 1 128 0.5520660
KEGG KEGG:05210 Colorectal cancer 4 84 0.5520660
KEGG KEGG:05163 Human cytomegalovirus infection 5 176 0.5520660
KEGG KEGG:05221 Acute myeloid leukemia 2 56 0.5520660
KEGG KEGG:05169 Epstein-Barr virus infection 3 153 0.5520660
KEGG KEGG:04024 cAMP signaling pathway 3 151 0.5520660
KEGG KEGG:03450 Non-homologous end-joining 1 12 0.5520660
KEGG KEGG:00983 Drug metabolism - other enzymes 2 44 0.5520660
KEGG KEGG:05167 Kaposi sarcoma-associated herpesvirus infection 3 145 0.5520660
KEGG KEGG:00770 Pantothenate and CoA biosynthesis 1 15 0.5520660
KEGG KEGG:00562 Inositol phosphate metabolism 3 66 0.5520660
KEGG KEGG:05410 Hypertrophic cardiomyopathy 2 73 0.5520660
KEGG KEGG:05412 Arrhythmogenic right ventricular cardiomyopathy 1 63 0.5520660
KEGG KEGG:05414 Dilated cardiomyopathy 1 78 0.5520660
KEGG KEGG:05225 Hepatocellular carcinoma 6 145 0.5520660
KEGG KEGG:05203 Viral carcinogenesis 1 162 0.5520660
KEGG KEGG:05164 Influenza A 3 103 0.5528215
KEGG KEGG:04071 Sphingolipid signaling pathway 3 107 0.6010714
KEGG KEGG:04664 Fc epsilon RI signaling pathway 2 46 0.6037440
KEGG KEGG:04658 Th1 and Th2 cell differentiation 1 53 0.6289663
KEGG KEGG:04662 B cell receptor signaling pathway 1 56 0.6289663
KEGG KEGG:03010 Ribosome 1 127 0.6289663
KEGG KEGG:04070 Phosphatidylinositol signaling system 1 91 0.6289663
KEGG KEGG:00480 Glutathione metabolism 1 44 0.6289663
KEGG KEGG:04721 Synaptic vesicle cycle 1 51 0.6289663
KEGG KEGG:05219 Bladder cancer 1 37 0.6289663
KEGG KEGG:05235 PD-L1 expression and PD-1 checkpoint pathway in cancer 1 69 0.6289663
KEGG KEGG:05162 Measles 2 98 0.6289663
KEGG KEGG:04210 Apoptosis 1 118 0.6289663
KEGG KEGG:05204 Chemical carcinogenesis - DNA adducts 1 23 0.6312264
KEGG KEGG:04625 C-type lectin receptor signaling pathway 1 76 0.6312264
KEGG KEGG:05165 Human papillomavirus infection 1 272 0.6312264
KEGG KEGG:05160 Hepatitis C 2 115 0.6312264
KEGG KEGG:04659 Th17 cell differentiation 1 64 0.6312264
KEGG KEGG:05135 Yersinia infection 3 112 0.6312264
KEGG KEGG:04936 Alcoholic liver disease 3 97 0.6312264
KEGG KEGG:04150 mTOR signaling pathway 3 135 0.6312264
KEGG KEGG:03022 Basal transcription factors 2 40 0.6312264
KEGG KEGG:04012 ErbB signaling pathway 2 78 0.6312264
KEGG KEGG:03050 Proteasome 2 42 0.6312264
KEGG KEGG:05206 MicroRNAs in cancer 1 149 0.6312264
KEGG KEGG:00982 Drug metabolism - cytochrome P450 1 21 0.6312264
KEGG KEGG:04014 Ras signaling pathway 3 171 0.6324571
KEGG KEGG:04724 Glutamatergic synapse 1 74 0.6324571
KEGG KEGG:01522 Endocrine resistance 2 85 0.6324571
KEGG KEGG:04350 TGF-beta signaling pathway 2 84 0.6324571
KEGG KEGG:01524 Platinum drug resistance 1 65 0.6324571
KEGG KEGG:00980 Metabolism of xenobiotics by cytochrome P450 1 26 0.6324571
KEGG KEGG:04136 Autophagy - other 1 31 0.6350212
KEGG KEGG:04650 Natural killer cell mediated cytotoxicity 1 63 0.6424831
KEGG KEGG:04380 Osteoclast differentiation 1 87 0.6424831
KEGG KEGG:05231 Choline metabolism in cancer 2 84 0.6480728
KEGG KEGG:03420 Nucleotide excision repair 1 43 0.6480728
KEGG KEGG:05223 Non-small cell lung cancer 1 67 0.6663121
KEGG KEGG:04010 MAPK signaling pathway 2 240 0.6663121
KEGG KEGG:05218 Melanoma 1 58 0.6663121
KEGG KEGG:04750 Inflammatory mediator regulation of TRP channels 1 72 0.6663121
KEGG KEGG:05215 Prostate cancer 1 86 0.6663121
KEGG KEGG:04620 Toll-like receptor signaling pathway 2 60 0.6663121
KEGG KEGG:04722 Neurotrophin signaling pathway 1 110 0.6663121
KEGG KEGG:05214 Glioma 1 67 0.6663121
KEGG KEGG:04622 RIG-I-like receptor signaling pathway 1 49 0.6663121
KEGG KEGG:04137 Mitophagy - animal 1 70 0.6663121
KEGG KEGG:04623 Cytosolic DNA-sensing pathway 1 42 0.6663121
KEGG KEGG:03460 Fanconi anemia pathway 1 48 0.6663121
KEGG KEGG:04530 Tight junction 2 137 0.6663121
KEGG KEGG:04920 Adipocytokine signaling pathway 1 51 0.6697082
KEGG KEGG:04310 Wnt signaling pathway 5 138 0.6801724
KEGG KEGG:05200 Pathways in cancer 1 409 0.6812521
KEGG KEGG:04914 Progesterone-mediated oocyte maturation 3 86 0.6812521
KEGG KEGG:04913 Ovarian steroidogenesis 1 28 0.6822912
KEGG KEGG:05417 Lipid and atherosclerosis 2 153 0.6855345
KEGG KEGG:03040 Spliceosome 4 132 0.6855345
KEGG KEGG:05152 Tuberculosis 1 106 0.6855345
KEGG KEGG:04392 Hippo signaling pathway - multiple species 1 25 0.6909349
KEGG KEGG:04360 Axon guidance 1 162 0.6909349
KEGG KEGG:04714 Thermogenesis 6 195 0.6949445
KEGG KEGG:04657 IL-17 signaling pathway 1 56 0.6949445
KEGG KEGG:05222 Small cell lung cancer 1 87 0.6949445
KEGG KEGG:04340 Hedgehog signaling pathway 1 49 0.6949445
KEGG KEGG:05416 Viral myocarditis 1 40 0.7004481
KEGG KEGG:05161 Hepatitis B 3 136 0.7004481
KEGG KEGG:04611 Platelet activation 1 89 0.7004481
KEGG KEGG:00140 Steroid hormone biosynthesis 1 19 0.7004481
KEGG KEGG:04270 Vascular smooth muscle contraction 1 94 0.7004481
KEGG KEGG:00600 Sphingolipid metabolism 1 44 0.7004481
KEGG KEGG:04371 Apelin signaling pathway 2 106 0.7004481
KEGG KEGG:04933 AGE-RAGE signaling pathway in diabetic complications 1 91 0.7004481
KEGG KEGG:05017 Spinocerebellar ataxia 1 124 0.7004481
KEGG KEGG:04217 Necroptosis 3 106 0.7004481
KEGG KEGG:05224 Breast cancer 2 117 0.7096628
KEGG KEGG:04630 JAK-STAT signaling pathway 3 88 0.7096628
KEGG KEGG:04141 Protein processing in endoplasmic reticulum 4 161 0.7096628
KEGG KEGG:04020 Calcium signaling pathway 1 157 0.7366591
KEGG KEGG:04261 Adrenergic signaling in cardiomyocytes 1 128 0.7385556
KEGG KEGG:04390 Hippo signaling pathway 1 134 0.7428813
KEGG KEGG:05216 Thyroid cancer 1 35 0.7474086
KEGG KEGG:04520 Adherens junction 1 67 0.7590965
KEGG KEGG:05020 Prion disease 1 220 0.7590965
KEGG KEGG:04080 Neuroactive ligand-receptor interaction 1 108 0.7590965
KEGG KEGG:05131 Shigellosis 3 210 0.7590965
KEGG KEGG:05134 Legionellosis 1 39 0.7590965
KEGG KEGG:05022 Pathways of neurodegeneration - multiple diseases 2 385 0.7707889
KEGG KEGG:01521 EGFR tyrosine kinase inhibitor resistance 1 74 0.7707889
KEGG KEGG:05415 Diabetic cardiomyopathy 1 169 0.7785395
KEGG KEGG:05321 Inflammatory bowel disease 1 22 0.7785395
KEGG KEGG:01240 Biosynthesis of cofactors 2 115 0.7785395
KEGG KEGG:00830 Retinol metabolism 1 16 0.7785395
KEGG KEGG:05205 Proteoglycans in cancer 1 170 0.7785395
KEGG KEGG:00520 Amino sugar and nucleotide sugar metabolism 1 46 0.7785395
KEGG KEGG:00000 KEGG root term 2 5548 0.7785395
KEGG KEGG:04151 PI3K-Akt signaling pathway 4 254 0.7785395
KEGG KEGG:04934 Cushing syndrome 1 121 0.7785395
KEGG KEGG:04068 FoxO signaling pathway 1 109 0.7785395
KEGG KEGG:04510 Focal adhesion 1 177 0.7785395
KEGG KEGG:05014 Amyotrophic lateral sclerosis 8 303 0.7990055
KEGG KEGG:04932 Non-alcoholic fatty liver disease 1 131 0.8015859
KEGG KEGG:04810 Regulation of actin cytoskeleton 1 179 0.8042126
KEGG KEGG:04066 HIF-1 signaling pathway 1 89 0.8135646
KEGG KEGG:04621 NOD-like receptor signaling pathway 1 117 0.8135646
KEGG KEGG:00230 Purine metabolism 2 97 0.8135646
KEGG KEGG:05213 Endometrial cancer 1 57 0.8135646
KEGG KEGG:05010 Alzheimer disease 1 315 0.8135646
KEGG KEGG:04912 GnRH signaling pathway 1 76 0.8135646
KEGG KEGG:05100 Bacterial invasion of epithelial cells 1 74 0.8135646
KEGG KEGG:04064 NF-kappa B signaling pathway 1 71 0.8148829
KEGG KEGG:01232 Nucleotide metabolism 1 69 0.8149497
KEGG KEGG:05217 Basal cell carcinoma 1 49 0.8297427
KEGG KEGG:04668 TNF signaling pathway 1 87 0.8437401
KEGG KEGG:05171 Coronavirus disease - COVID-19 1 162 0.8625927
KEGG KEGG:03320 PPAR signaling pathway 1 50 0.8755330
KEGG KEGG:04926 Relaxin signaling pathway 1 104 0.8755330
KEGG KEGG:05226 Gastric cancer 1 117 0.8811497
KEGG KEGG:04142 Lysosome 1 118 0.8839738
KEGG KEGG:01100 Metabolic pathways 3 1146 0.8970300
KEGG KEGG:05146 Amoebiasis 1 64 0.9038124
KEGG KEGG:04062 Chemokine signaling pathway 1 113 0.9447847
KEGG KEGG:05207 Chemical carcinogenesis - receptor activation 1 142 0.9507918
KEGG KEGG:05130 Pathogenic Escherichia coli infection 2 150 0.9522431
KEGG KEGG:05012 Parkinson disease 2 226 0.9522431
KEGG KEGG:04015 Rap1 signaling pathway 1 164 0.9522431
KEGG KEGG:04550 Signaling pathways regulating pluripotency of stem cells 1 104 0.9552542
KEGG KEGG:05208 Chemical carcinogenesis - reactive oxygen species 1 186 0.9552542
KEGG KEGG:04072 Phospholipase D signaling pathway 1 110 0.9806823

B set

# SETB_resgenes <- gost(query = intersect_genes$ENTREZID,
#                     organism = "hsapiens",
#                     significant = FALSE,
#                     ordered_query = TRUE,
#                     domain_scope = "custom",
#                     measure_underrepresentation = FALSE,
#                     evcodes = FALSE,
#                     user_threshold = 0.05,
#                     correction_method = c("fdr"),
#                     custom_bg = EPI508_list$ENTREZID,
#                     sources=c("KEGG"))
# saveRDS(SETB_resgenes,"data/DEG-GO/SETB_resgenes.RDS")
# SETB_resgenes <- readRDS("data/DEG-GO/SETB_resgenes.RDS")
# 
# Set_B_genes <- gostplot(SETB_resgenes, capped = FALSE, interactive = TRUE)
# Set_B_genes
# 
# setB_table <- SETB_resgenes$result %>% 
#   dplyr::select(c(source, term_id,
#                   term_name,intersection_size, 
#                    term_size, p_value))# %>% 
# 
# # list_intersect_path <- KEGG_05168 %>% 
# #   filter(Symbol%in% intersect_genes$SYMBOL) 
# 
# setB_table%>% 
#   kable(., caption = "No enrichment with this small background of KEGG pathway") %>%
#   kable_paper("striped", full_width = FALSE) %>%
#   kable_styling(
#     full_width = FALSE,
#     position = "left",
#     bootstrap_options = c("striped", "hover")
#   ) %>%
#   scroll_box(width = "100%", height = "400px")