Last updated: 2024-01-30

Checks: 7 0

Knit directory: ATAC_learning/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20231016) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version ccb3f28. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/initial_complete_stats_run1.txt
    Ignored:    data/multiqc_fastqc_run1.txt
    Ignored:    data/multiqc_fastqc_run2.txt
    Ignored:    data/multiqc_genestat_run1.txt
    Ignored:    data/multiqc_genestat_run2.txt

Untracked files:
    Untracked:  code/just_for_Fun.R

Unstaged changes:
    Modified:   README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Fastqc_results.Rmd) and HTML (docs/Fastqc_results.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd ccb3f28 reneeisnowhere 2024-01-30 first updates

library(tidyverse)
# library(ggsignif)
# library(cowplot)
# library(ggpubr)
# library(scales)
# library(sjmisc)
library(kableExtra)
# library(broom)
# library(biomaRt)
library(RColorBrewer)
# library(gprofiler2)
# library(qvalue)

This code takes the multiqc fastqc output file and: splits by rows to trimmed and non trimmed, then separates the trimmed file names into catagories I want, then adds back in the non trimmed data rows (while also splitting file name like the trimmed file name). after rbind, I split treatmenttime by position, fix the names of the time column, remove numbers from trt column, add a new column called “trimmed” where I add in a vector that lets me group by trimmed file verses non trimmed file, the select only those columns containing the columns I want to keep.

multiqc_fastqc2 <- read_csv("data/multiqc_fastqc_run2.txt")
multiqc_general_stats2 <- read_csv("data/multiqc_genestat_run2.txt")


fastqc_full <- multiqc_fastqc2 %>% 
  slice_tail(n=144) %>% 
  separate(Filename, into = c(NA,"ind","treatmenttime",NA,"read")) %>% 
  rbind(., (multiqc_fastqc2 %>% slice_head(n=144) %>% separate(Filename, into = c("ind","treatmenttime",NA,"read")))) %>% 
  separate_wider_position(., col =treatmenttime,c(2,trt=2,time=3),too_few = "align_start") %>% 
  mutate(time=case_match(trt,"E2"~"24h","E3"~"3h","M2"~"24h", "M3"~"3h","T2"~"24h","T3"~"3h","V2"~"24h","V3"~"3h",.default = time)) %>% 
  mutate(trt=gsub("[[:digit:]]", "", trt) ) %>% 
  mutate(trimmed = if_else(grepl(pattern ="^trim", x = Sample)==TRUE, "yes","no")) %>% 
  select(Sample:read, trimmed,`Total Sequences`:avg_sequence_length) %>% 
  full_join(., multiqc_general_stats2, join_by(Sample)) %>% 
   rename("percent_gc"="FastQC_mqc-generalstats-fastqc-percent_gc",
         "avg_seq_len"= "FastQC_mqc-generalstats-fastqc-avg_sequence_length",
         "percent_dup"= "FastQC_mqc-generalstats-fastqc-percent_duplicates",
         "percent_fails"= "FastQC_mqc-generalstats-fastqc-percent_fails",
         "total_sequences"= "FastQC_mqc-generalstats-fastqc-total_sequences") %>% 
  mutate(ind = factor(ind, levels = c("Ind1", "Ind2", "Ind3", "Ind4", "Ind5", "Ind6"))) %>%
  mutate(time = factor(time, levels = c("3h", "24h"), labels= c("3 hours","24 hours"))) %>% 
  mutate(trt = factor(trt, levels = c("DX","E", "DA","M", "T", "V"), labels = c("DOX","EPI", "DNR", "MTX", "TRZ", "VEH"))) 

(in this case, Sample, ind, trt, time read, trimmed, Total sequences, Flagged poor quality, sequence length, %GC,total deduplicated %,and avg sequence length) I also then addin the gen_stats file and rename the columns to normal things.

fastqc_full
# A tibble: 288 × 17
   Sample                     ind   trt   time   read  trimmed `Total Sequences`
   <chr>                      <fct> <fct> <fct>  <chr> <chr>               <dbl>
 1 trimmed_Ind1_75DA24h_S7_R1 Ind1  DNR   24 ho… R1    yes              42551223
 2 trimmed_Ind1_75DA24h_S7_R2 Ind1  DNR   24 ho… R2    yes              42551223
 3 trimmed_Ind1_75DA3h_S1_R1  Ind1  DNR   3 hou… R1    yes              48230311
 4 trimmed_Ind1_75DA3h_S1_R2  Ind1  DNR   3 hou… R2    yes              48230311
 5 trimmed_Ind1_75DX24h_S8_R1 Ind1  DOX   24 ho… R1    yes              43284466
 6 trimmed_Ind1_75DX24h_S8_R2 Ind1  DOX   24 ho… R2    yes              43284466
 7 trimmed_Ind1_75DX3h_S2_R1  Ind1  DOX   3 hou… R1    yes              44480840
 8 trimmed_Ind1_75DX3h_S2_R2  Ind1  DOX   3 hou… R2    yes              44480840
 9 trimmed_Ind1_75E24h_S9_R1  Ind1  EPI   24 ho… R1    yes              42757767
10 trimmed_Ind1_75E24h_S9_R2  Ind1  EPI   24 ho… R2    yes              42757767
# ℹ 278 more rows
# ℹ 10 more variables: `Sequences flagged as poor quality` <dbl>,
#   `Sequence length` <chr>, `%GC` <dbl>, total_deduplicated_percentage <dbl>,
#   avg_sequence_length <dbl>, percent_dup <dbl>, percent_gc <dbl>,
#   avg_seq_len <dbl>, percent_fails <dbl>, total_sequences <dbl>
drug_pal <- c("#8B006D","#DF707E","#F1B72B", "#3386DD","#707031","#41B333")
fastqc_full %>% 
  filter(trimmed=="no") %>%
  ggplot(., aes(x=trt, y= `Total Sequences`))+
  geom_col(aes(fill= trt))+
  facet_wrap(ind~time)+
  scale_fill_manual(values=drug_pal)+
  theme_bw()+
  ggtitle("Total Sequences, untrimmed")+
      # ylab(ylab)+
      xlab("")+
      theme(strip.background = element_rect(fill = "white",linetype=1, linewidth = 0.5),
          plot.title = element_text(size=14,hjust = 0.5,face="bold"),
          axis.title = element_text(size = 10, color = "black"),
          axis.ticks = element_line(linewidth = 0.5),
          axis.line = element_line(linewidth = 0.5),
          axis.text.x = element_blank(),
          strip.text.x = element_text(margin = margin(2,0,2,0, "pt"),face = "bold"))

fastqc_full %>% 
  filter(trimmed=="yes") %>% 
  ggplot(., aes(x=trt, y= `Total Sequences`))+
  geom_col(aes(fill= trt))+
  facet_wrap(ind~time)+
  scale_fill_manual(values=drug_pal)+
  theme_bw()+
  ggtitle("Total Sequences, trimmed")+
      # ylab(ylab)+
      xlab("")+
      theme(strip.background = element_rect(fill = "white",linetype=1, linewidth = 0.5),
          plot.title = element_text(size=14,hjust = 0.5,face="bold"),
          axis.title = element_text(size = 10, color = "black"),
          axis.ticks = element_line(linewidth = 0.5),
          axis.line = element_line(linewidth = 0.5),
          axis.text.x = element_blank(),
          strip.text.x = element_text(margin = margin(2,0,2,0, "pt"),face = "bold"))

totseq <- fastqc_full %>% 
  dplyr::filter(read =='R1') %>% 
  # group_by(ind,trt,time) %>% 
 select(Sample, ind, trt, time, trimmed, `Total Sequences`) %>% 
  pivot_wider(id_cols = c(ind,trt,time), names_from = trimmed, values_from = `Total Sequences`) %>% 
  mutate(perc_removed=(no-yes)/no*100) #%>% 
 kable(list(totseq[1:36,], totseq[37:72,]),caption= "Summary of Total sequences before and after trimming, with percentage of removed sequences") %>%
  kable_paper("striped", full_width = FALSE) %>%
  kable_styling(full_width = FALSE,font_size = 18) #%>%
Summary of Total sequences before and after trimming, with percentage of removed sequences
ind trt time yes no perc_removed
Ind1 DNR 24 hours 42551223 42552417 0.0028060
Ind1 DNR 3 hours 48230311 48236407 0.0126378
Ind1 DOX 24 hours 43284466 43287755 0.0075980
Ind1 DOX 3 hours 44480840 44487109 0.0140917
Ind1 EPI 24 hours 42757767 42760898 0.0073221
Ind1 EPI 3 hours 43425217 43436091 0.0250345
Ind1 MTX 24 hours 42126655 42128326 0.0039665
Ind1 MTX 3 hours 34347823 34353188 0.0156172
Ind1 TRZ 24 hours 39592932 39618391 0.0642606
Ind1 TRZ 3 hours 32503524 32515737 0.0375603
Ind1 VEH 24 hours 43814361 43825945 0.0264318
Ind1 VEH 3 hours 33684259 33702857 0.0551823
Ind2 DNR 24 hours 45642245 45642976 0.0016016
Ind2 DNR 3 hours 49765031 49766630 0.0032130
Ind2 DOX 24 hours 47963129 47964028 0.0018743
Ind2 DOX 3 hours 43218795 43219481 0.0015872
Ind2 EPI 24 hours 37876743 37877549 0.0021279
Ind2 EPI 3 hours 46509493 46509868 0.0008063
Ind2 MTX 24 hours 47565953 47566199 0.0005172
Ind2 MTX 3 hours 46547872 46548086 0.0004597
Ind2 TRZ 24 hours 42224852 42225791 0.0022238
Ind2 TRZ 3 hours 38631467 38632152 0.0017731
Ind2 VEH 24 hours 38128833 38135521 0.0175375
Ind2 VEH 3 hours 39053023 39053395 0.0009525
Ind3 DNR 24 hours 37681759 37682029 0.0007165
Ind3 DNR 3 hours 71147689 71149112 0.0020000
Ind3 DOX 24 hours 39048735 39049130 0.0010115
Ind3 DOX 3 hours 43756089 43756875 0.0017963
Ind3 EPI 24 hours 41488932 41489927 0.0023982
Ind3 EPI 3 hours 50138075 50139561 0.0029637
Ind3 MTX 24 hours 36498742 36498973 0.0006329
Ind3 MTX 3 hours 49431453 49431897 0.0008982
Ind3 TRZ 24 hours 40927652 40928915 0.0030858
Ind3 TRZ 3 hours 38780291 38781957 0.0042958
Ind3 VEH 24 hours 31324889 31327676 0.0088963
Ind3 VEH 3 hours 44853273 44855052 0.0039661
ind trt time yes no perc_removed
Ind4 DNR 24 hours 41965134 41969945 0.0114630
Ind4 DNR 3 hours 57305036 57308058 0.0052733
Ind4 DOX 24 hours 39197691 39206235 0.0217925
Ind4 DOX 3 hours 38800646 38806358 0.0147192
Ind4 EPI 24 hours 43019600 43031730 0.0281885
Ind4 EPI 3 hours 43496611 43500262 0.0083931
Ind4 MTX 24 hours 41030501 41031137 0.0015500
Ind4 MTX 3 hours 41964607 41971195 0.0156965
Ind4 TRZ 24 hours 45437929 45444700 0.0148994
Ind4 TRZ 3 hours 53428222 53433439 0.0097635
Ind4 VEH 24 hours 38506744 38516586 0.0255526
Ind4 VEH 3 hours 37431664 37434854 0.0085215
Ind5 DNR 24 hours 46899322 46905257 0.0126532
Ind5 DNR 3 hours 49843467 49867445 0.0480835
Ind5 DOX 24 hours 36866406 36868936 0.0068621
Ind5 DOX 3 hours 50184488 50205715 0.0422800
Ind5 EPI 24 hours 40050758 40060480 0.0242683
Ind5 EPI 3 hours 45761710 45805846 0.0963545
Ind5 MTX 24 hours 47153857 47154510 0.0013848
Ind5 MTX 3 hours 35648456 35648586 0.0003647
Ind5 TRZ 24 hours 44126954 44145364 0.0417031
Ind5 TRZ 3 hours 38314546 38349375 0.0908203
Ind5 VEH 24 hours 49541600 49581967 0.0814147
Ind5 VEH 3 hours 41343052 41487846 0.3490034
Ind6 DNR 24 hours 41059389 41059920 0.0012932
Ind6 DNR 3 hours 44095342 44095825 0.0010953
Ind6 DOX 24 hours 37577690 37578841 0.0030629
Ind6 DOX 3 hours 45117924 45118756 0.0018440
Ind6 EPI 24 hours 43365041 43366727 0.0038878
Ind6 EPI 3 hours 43646213 43647011 0.0018283
Ind6 MTX 24 hours 41005624 41006798 0.0028629
Ind6 MTX 3 hours 45458055 45458459 0.0008887
Ind6 TRZ 24 hours 35700533 35701361 0.0023192
Ind6 TRZ 3 hours 35875024 35875416 0.0010927
Ind6 VEH 24 hours 39816658 39820062 0.0085485
Ind6 VEH 3 hours 42608440 42608980 0.0012673
   # scroll_box(width = "100%", height = "400px")
  
  totseq %>% 
    ggplot(.,aes(x=trt,y=perc_removed) )+
    geom_col(aes(fill= trt))+
  facet_wrap(ind~time)+
  scale_fill_manual(values=drug_pal)+
  theme_bw()+
  ggtitle("Total Sequences, percent removed")+
      # ylab(ylab)+
      xlab("")+
      theme(strip.background = element_rect(fill = "white",linetype=1, linewidth = 0.5),
          plot.title = element_text(size=14,hjust = 0.5,face="bold"),
          axis.title = element_text(size = 10, color = "black"),
          axis.ticks = element_line(linewidth = 0.5),
          axis.line = element_line(linewidth = 0.5),
          axis.text.x = element_blank(),
          strip.text.x = element_text(margin = margin(2,0,2,0, "pt"),face = "bold"))

Percenttrimmed


sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-3 kableExtra_1.3.4   lubridate_1.9.3    forcats_1.0.0     
 [5] stringr_1.5.0      dplyr_1.1.3        purrr_1.0.2        readr_2.1.4       
 [9] tidyr_1.3.0        tibble_3.2.1       ggplot2_3.4.4      tidyverse_2.0.0   
[13] workflowr_1.7.1   

loaded via a namespace (and not attached):
 [1] gtable_0.3.4      xfun_0.41         bslib_0.5.1       processx_3.8.2   
 [5] callr_3.7.3       tzdb_0.4.0        vctrs_0.6.4       tools_4.3.1      
 [9] ps_1.7.5          generics_0.1.3    parallel_4.3.1    fansi_1.0.5      
[13] highr_0.10        pkgconfig_2.0.3   webshot_0.5.5     lifecycle_1.0.4  
[17] farver_2.1.1      compiler_4.3.1    git2r_0.32.0      munsell_0.5.0    
[21] getPass_0.2-2     httpuv_1.6.12     htmltools_0.5.7   sass_0.4.7       
[25] yaml_2.3.7        crayon_1.5.2      later_1.3.1       pillar_1.9.0     
[29] jquerylib_0.1.4   whisker_0.4.1     cachem_1.0.8      tidyselect_1.2.0 
[33] rvest_1.0.3       digest_0.6.33     stringi_1.7.12    labeling_0.4.3   
[37] rprojroot_2.0.4   fastmap_1.1.1     grid_4.3.1        colorspace_2.1-0 
[41] cli_3.6.1         magrittr_2.0.3    utf8_1.2.4        withr_2.5.2      
[45] scales_1.2.1      promises_1.2.1    bit64_4.0.5       timechange_0.2.0 
[49] rmarkdown_2.25    httr_1.4.7        bit_4.0.5         hms_1.1.3        
[53] evaluate_0.23     knitr_1.45        viridisLite_0.4.2 rlang_1.1.2      
[57] Rcpp_1.0.11       glue_1.6.2        xml2_1.3.5        svglite_2.1.2    
[61] rstudioapi_0.15.0 vroom_1.6.4       jsonlite_1.8.7    R6_2.5.1         
[65] systemfonts_1.0.5 fs_1.6.3