Last updated: 2023-03-18
Checks: 6 1
Knit directory: duplex_sequencing_screen/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown is untracked by Git. To know which version of the R
Markdown file created these results, you’ll want to first commit it to
the Git repo. If you’re still working on the analysis, you can ignore
this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200402)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 81a5af3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/Consensus_Data/.Rhistory
Ignored: data/Consensus_Data/Novogene_lane11/sample1/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample1/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample2/archive/sscs_aligned_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample2/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample2/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample3/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample3/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample4/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample4/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample5/variant_caller_outputs/sscs_L858R_aligned_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample5/variant_caller_outputs/sscs_L858R_aligned_filtered_sample5.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample6/archive/sscs_aligned_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample6/sscs_L858R_aligned_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample6/variant_caller_outputs/variants_ann_sample6.csv.gz
Ignored: data/Consensus_Data/Novogene_lane11/sample7/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane12/sample1/low_sscscounts/sscs_aligned_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane12/sample1/sscs_aligned_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane12/sample3/sscs_combined_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane12/sample5/sscs_combined_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane12/sample7/sscs_combined_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane12/sample9/sscs_combined_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample1/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample1/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample10/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample10/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample11/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample11/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample12/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample12/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample2/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample3/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample4/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample5/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample6/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample7/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample7/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample8/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample8/variant_caller_outputs/
Ignored: data/Consensus_Data/Novogene_lane13/sample9/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane13/sample9/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample10_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample10_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample10_combined/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample11/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample11/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample11/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample12/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample12/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample12/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample13/
Ignored: data/Consensus_Data/Novogene_lane14/sample14_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample14_combined/sscs.filt_1.fa.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample14_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample14_combined/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample14b/
Ignored: data/Consensus_Data/Novogene_lane14/sample15/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample15/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample15/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample16/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample16/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample16/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample17/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample17/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample17/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample18/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample18/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample18/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample1_combined/
Ignored: data/Consensus_Data/Novogene_lane14/sample2_combined/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample3/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample4/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample5/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample6/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample7/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample7/variant_caller_outputs/duplex/
Ignored: data/Consensus_Data/Novogene_lane14/sample8/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample8/variant_caller_outputs/
Ignored: data/Consensus_Data/Novogene_lane14/sample9/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane14/sample9/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/Novogene_lane2/
Ignored: data/Consensus_Data/Novogene_lane3/
Ignored: data/Consensus_Data/Novogene_lane4/
Ignored: data/Consensus_Data/Novogene_lane5/
Ignored: data/Consensus_Data/Novogene_lane6/
Ignored: data/Consensus_Data/Novogene_lane7/
Ignored: data/Consensus_Data/Ranomics_Pooled/
Ignored: data/Consensus_Data/archive/
Ignored: data/Consensus_Data/novogene_lane15/sample_1/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_1/firstrun(lowsequencing)/duplex/
Ignored: data/Consensus_Data/novogene_lane15/sample_1/firstrun(lowsequencing)/sscs/
Ignored: data/Consensus_Data/novogene_lane15/sample_1/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_2/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_2/firstrun(lowsequencing)/sscs/
Ignored: data/Consensus_Data/novogene_lane15/sample_2/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/ngs/Sample3_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/ngs/sample3a(firsthalf)/Sample3_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/ngs/variants_ann.csv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_3/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_4/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_4/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_4/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_4/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_5/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/variant_caller_outputs/.empty/
Ignored: data/Consensus_Data/novogene_lane15/sample_5/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_6/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_6/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_7/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane15/sample_7/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample10/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample10/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample11/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample11/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample12/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample12/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample12/sscs/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane16a/Sample13/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample13/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample13/sscs/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane16a/Sample14/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample14/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample1_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample1_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample2/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample2/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample3/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample3/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample4/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample4/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample5/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample5/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample6/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample6/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample7/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample7/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample8/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample8/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample9/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/Sample9/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16a/duplex/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane16b/Sample10/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample10/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample11/sscs/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane16b/Sample15/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample15/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample1_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample1_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample2/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample2/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample3/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample3/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample4/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample4/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample5/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample5/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample6/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample6/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample7_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample7_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample8_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample8_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample9/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane16b/Sample9/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample10/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample10/duplex/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane17/sample10/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample11/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample11/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample1_combined/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/duplex/low_depth/
Ignored: data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample1_combined/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample2/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample2/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample3/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample3/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample4/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample4/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample5/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample5/low_seq_depth/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample5/low_seq_depth/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample5/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample6/low_seq_depths/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample6/low_seq_depths/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample6/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample7/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample7/low_seq_depths/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample7/low_seq_depths/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample7/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample8/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample8/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample9/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17/sample9/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17b/Sample1 copy 2/duplex/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane17b/Sample1 copy 2/sscs/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane17b/Sample1 copy 3/duplex/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane17b/Sample1 copy 3/sscs/variant_caller_outputs/
Ignored: data/Consensus_Data/novogene_lane17b/Sample1/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17b/Sample1/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17b/Sample2/duplex/duplex.consensus.counts.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17b/Sample2/duplex/duplex_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/novogene_lane17b/Sample2/sscs/sscs_sorted_filtered.tsv.gz
Ignored: data/Consensus_Data/sscs_dcs_comparisons/
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_1.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_1.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_3.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_3.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_1.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_1.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_3.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_3.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_1.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_1.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_3.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_3.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_5.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_5.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_1.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_1.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_3.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_3.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_5.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_5.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_6.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_6.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D4.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D2.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D2.1.consensus.variant-calls.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D4.1.consensus.variant-calls.genome.vcf.gz
Ignored: output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D4.1.consensus.variant-calls.vcf.gz
Untracked files:
Untracked: BCRABL_Imatinib_SatMut_Screen_031423.csv
Untracked: analysis/ABL_BaseEditor_Analyses.Rmd
Untracked: analysis/ABL_Gnomad_analysis.Rmd
Untracked: analysis/figure/
Untracked: code/shortest_codon_finder.R
Untracked: data/BE_ABL_Merged/
Untracked: data/Clinvar_ABL/
Untracked: data/Gnomad_ABL/
Untracked: data/Short_et_al_fig1/
Untracked: inSM_butnotin_BEmerged.csv
Untracked: output/BE_SM_FigLegend.pdf
Untracked: output/BE_SM_Plots/
Unstaged changes:
Modified: .DS_Store
Modified: analysis/.DS_Store
Modified: analysis/ABL_cosmic_analysis.Rmd
Modified: analysis/ABL_unevenness_analysis.Rmd
Modified: code/.DS_Store
Modified: code/variants_parser.R
Modified: data/.DS_Store
Modified: data/Consensus_Data/.DS_Store
Modified: data/Consensus_Data/Novogene_lane11/.DS_Store
Modified: data/Consensus_Data/Novogene_lane11/sample3/.DS_Store
Modified: data/Consensus_Data/Novogene_lane11/sample3/duplex/.DS_Store
Modified: data/Consensus_Data/Novogene_lane11/sample4/.DS_Store
Modified: data/Consensus_Data/Novogene_lane11/sample4/duplex/.DS_Store
Modified: data/Consensus_Data/Novogene_lane12/.DS_Store
Modified: data/Consensus_Data/Novogene_lane13/.DS_Store
Modified: data/Consensus_Data/Novogene_lane13/sample3/.DS_Store
Modified: data/Consensus_Data/Novogene_lane13/sample5/.DS_Store
Modified: data/Consensus_Data/Novogene_lane14/.DS_Store
Modified: data/Consensus_Data/Novogene_lane14/sample9/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_3/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_5/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_6/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
Modified: data/Consensus_Data/novogene_lane15/sample_7/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane16a/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample10/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample1_combined/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample1_combined/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample2/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample2/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample3/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample4/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample5/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample5/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample5/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample6/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample6/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample7_combined/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample7_combined/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample7_combined/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample8_combined/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample8_combined/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample8_combined/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane16b/Sample9/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample10/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample11/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample1_combined/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample2/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample3/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample3/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample4/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample4/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample4/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample5/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample6/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample6/duplex/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample6/low_seq_depths/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample6/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample7/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample8/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample9/.DS_Store
Modified: data/Consensus_Data/novogene_lane17/sample9/sscs/.DS_Store
Modified: data/Consensus_Data/novogene_lane17b/.DS_Store
Modified: data/Consensus_Data/novogene_lane17b/Sample1/.DS_Store
Modified: data/Consensus_Data/prj00053/.DS_Store
Modified: data/Refs/.DS_Store
Modified: data/Refs/EGFR/.DS_Store
Modified: output/.DS_Store
Modified: output/Twinstrand/.DS_Store
Modified: shinyapp/.DS_Store
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
There are no past versions. Publish this analysis with
wflow_publish()
to start tracking its development.
# Contingency table maker function.
# Input: dataframe of sgRNAs and mutants and whether the enrichment scores are significant
# Output: a 2x2 contingency table
# input_df=bedata_inner_simple_filtered%>%filter(Type%in%"ABE")
contab_maker=function(input_df){
PP=input_df%>%filter(significance_status%in%"Both")
PP_unique=length(unique(PP$species))
NN=input_df%>%filter(significance_status%in%"Neither")
NN_unique=length(unique(NN$species))
NP=input_df%>%filter(significance_status%in%"BEOnly")
NP_unique=length(unique(NP$species))
PN=input_df%>%filter(significance_status%in%"SMOnly")
PN_unique=length(unique(PN$species))
table=rbind(cbind(PP_unique,PN_unique),cbind(NP_unique,NN_unique))
table
}
Data Parsing
# rm(list=ls())
bedata_inner=read.csv("data/BE_ABL_Merged/BE_SatMut_Screen_Inner20230315.csv",header=T,stringsAsFactors = F)
bedata_outer=read.csv("data/BE_ABL_Merged/BE_SatMut_Screen_Outer20230315.csv",header=T,stringsAsFactors = F)
# a=bedata_inner%>%select(ct_screen1_before,ct_screen2_before,pvalue)
#
# a=unique(bedata_inner$Mutation)
# # Data has 196 unique mutants
# a=unique(bedata_inner$sgRNA.Seq)
# # The 196 unique mutants are made by 277 unique guides
# a=unique(bedata_inner[bedata_inner$Significant%in%"True","Mutation"])
# # Amongst these 196 unique mutants, 45 have a statistically significant sat mut score
#
# a=unique(bedata_outer$Mutation)
# #5788 mutations overall
# a=unique(bedata_outer$sgRNA.Seq)
# # 2811 guides
# bedata_outer_filtered=bedata_outer%>%filter(ABL1_AA>=242,ABL1_AA<=494)
# a=unique(bedata_outer_filtered$Mutation)
# # out of the 5788 mutants, 717 are in the kinase
# a=unique(bedata_outer_filtered$sgRNA.Seq)
# # these 717 mutants are made by 349 guides
# a=unique(bedata_outer$species)
# # 2201 sat mut mutants
# a=unique(bedata_inner$species)
# # out of the 717 BE mutants, 290 are seen in the SM screens
# a=bedata_inner%>%filter(!Ref_AA%in%ref_aa)
# a=bedata_inner%>%filter(!Alt_AA%in%alt_aa)
# a=bedata_inner%>%filter(!Mutation%in%species)
# a=bedata_inner%>%filter(!Ref_Codon%in%ref_codon)
# a=bedata_inner%>%filter(!ABL1_AA%in%protein_start)
# ^^Just making sure that there are no cases where the BE data and the sat mut data coordinates are different
# a=bedata_inner%>%mutate(Alt_Codon=toupper(Alt_Codon))%>%filter(!Alt_Codon%in%alt_codon)
# The clause above is basically saying that 606 out of 760 ALT codons are different in the BE data than in the sat mut data, which makes sense
bedata_inner_simple=bedata_inner%>%
select(-c(X,Unnamed..0_x,Strand,consequence_terms,
ct_screen1_before,ct_screen1_after,ct_screen2_before,ct_screen2_after,
depth_screen1_before,depth_screen1_after,depth_screen2_before,depth_screen2_after,
maf_screen1_before,maf_screen1_after,maf_screen2_before,maf_screen2_after,
Ref_AA,ABL1_AA,Alt_AA,Ref_Codon,Mutation,alt_start_pos))
bedata_inner_simple=bedata_inner_simple%>%
rename(BE.Alt_Codon=Alt_Codon,SM.Alt_Codon=alt_codon,BE.LFC=BE_LFC,BE.pval=BE_p.value,BE.FDR=BE_FDR,SM.pval=pvalue,SM.padj=padj,SM.Significant=Significant,SM.ref=ref,SM.alt=alt,SM.LFC=log2FoldChange)%>%
relocate(ref_aa,protein_start,alt_aa,species,ref_codon,BE.Alt_Codon,SM.Alt_Codon,alt_codon_shortest,n_nuc_min,AA_Number)
# Now that I have simplified the dataset to include meaningful columns, I am going to add a column that has a flag for the number of Sat Mut nucleotides
bedata_inner_simple=bedata_inner_simple%>%rowwise%>%mutate(BE.n_nuc=str_count(sgRNA_Nuc,",")+1,SM.n_nuc=nchar(SM.alt))%>%relocate(BE.n_nuc,.after=sgRNA_Nuc)%>%relocate(SM.n_nuc,.after=SM.alt)
bedata_inner_simple$SM.Significant=toupper(bedata_inner_simple$SM.Significant)
bedata_inner_simple$SM.Significant=as.factor(bedata_inner_simple$SM.Significant)
# Adding the flag for a mutant captured by more than one sgRNA
bedata_inner_simple=bedata_inner_simple%>%mutate(SM.netgr_obs_mean=mean(netgr_obs_screen1,netgr_obs_screen2))
bedata_inner_simple=bedata_inner_simple%>%
mutate(BE.Significant=case_when(BE.FDR<0.02~T,
T~F),
SM.Significant=case_when(SM.padj<0.05~T,
T~F),
significance_status=case_when((BE.Significant%in%T)&&(SM.Significant%in%F)~"BEOnly",
(BE.Significant%in%F)&&(SM.Significant%in%T)~"SMOnly",
(BE.Significant%in%T)&&(SM.Significant%in%T)~"Both",
T~"Neither"))
# bedata_inner_simple=bedata_inner_simple%>%
# mutate(BE.Significant=case_when(BE.LFC<=-1~T,
# BE.LFC>=1~T,
# T~F),
# significance_status=case_when((BE.Significant%in%T)&&(SM.Significant%in%F)~"BEOnly",
# (BE.Significant%in%F)&&(SM.Significant%in%T)~"SMOnly",
# (BE.Significant%in%T)&&(SM.Significant%in%T)~"Both",
# T~"Neither"))
There are a lot fewer mutants in the merged BE SM dataset than in the IL3 dataset Main question: How many unique mutants in the IL3 dataset are not pseudo-counted and have not equal background counts and depths across two replicates and are in the kinase and are present in the BE data?
Figuring out how to predict which sgRNAs make what mutants
# Algorithm:
# For each unique sgRNA
# Calculate how many mutants are possible that we see in our library
# Figure out which mutants are MNVs
# For sgRNAs that only have one mutant, that's the default choice
# For sgRNAs with >1 possible mutant,
# choose the mutant that is closest to the 5th position away from the PAM
# If two mutants are the same distance away from the 5th position (i.e. at position 4 and 6), choose the higher distance mutant
# Note that this algorithm is PAM agnostic
# Note that this algorithm does not use MNVs
# Rule:
# x=bedata_inner_simple%>%filter(sgRNA_possible_mutants==2)
# x=bedata_inner_simple%>%filter(sgRNA_possible_mutants>=2)
# x=bedata_inner_simple
bedata_inner_simple=bedata_inner_simple%>%select(Type,sgRNA.Seq,species,sgRNA_Nuc,BE.n_nuc,ref_codon,BE.Alt_Codon,BE.LFC,BE.pval,BE.FDR,BE.Significant,SM.netgr_obs_mean,SM.Significant,significance_status)
# Calculating distances
# For SNPs, the distance is simple,
# For MNVs, choose one of the distances (simplifying assumption, I know)
bedata_inner_simple=bedata_inner_simple%>%mutate(sgRNA_Nuc=gsub("\\[|\\]","",sgRNA_Nuc))
# Figuring out which mutants are at the minimum distance
# What if two mutants are within the same distance? Choose the first mutant
bedata_inner_simple=bedata_inner_simple%>%mutate(distance=case_when(BE.n_nuc%in%1~sgRNA_Nuc,
BE.n_nuc>=2~strsplit(sgRNA_Nuc,",")[[1]][1]))
bedata_inner_simple$distance=as.numeric(bedata_inner_simple$distance)
bedata_inner_simple=bedata_inner_simple%>%mutate(distance_from_5=abs(5-distance))
# For each sgRNA, figure out which mutant is at the minimum distance
# If there are multiple mutants at the minimum distance, note down both of them.
bedata_sum=bedata_inner_simple%>%
group_by(sgRNA.Seq)%>%
summarize(mutants_per_sgRNA=n(),
mindist=min(distance_from_5),
species.mindist=paste(species[which(distance_from_5==min(distance_from_5))],collapse=", "))
# Sometimes a guide makes the same amino acid substitution two different ways (eg a snp and an mnv that make the same substitution). When this happens, the algorithm thinks that the guide is making two separate amino acid substitutions. this next conditional statement is going to remove those duplicates.
bedata_sum=bedata_sum%>%
rowwise()%>%
mutate(species.mindist=case_when(
strsplit(species.mindist,", ")[[1]][1]==
strsplit(species.mindist,", ")[[1]][2]~strsplit(species.mindist,", ")[[1]][1],
T~species.mindist))
bedata_inner_simple=merge(bedata_inner_simple,bedata_sum,by="sgRNA.Seq")
bedata_inner_simple_filtered=bedata_inner_simple%>%rowwise()%>%filter(species%in%species.mindist)
# bedata_mutant_sum=bedata_inner_simple_filtered%>%group_by(species,sgRNA.Seq)%>%summarize(num_ways=n())
# note that num_ways is the number of different ways that mutant can be made by that sgRNA.
# I use this
############# Are our sgRNA predictions working well?
# Questions to ask:
# What % of the sgRNAs get a single mutant
bedata_inner_simple_filtered=bedata_inner_simple_filtered%>%
# rowwise()%>%
mutate(multi_mutant_sgRNA=grepl(",",species.mindist))
x=bedata_inner_simple_filtered%>%filter(multi_mutant_sgRNA%in%F)
length(unique(bedata_inner_simple$sgRNA.Seq))
[1] 337
length(unique(x$sgRNA.Seq))
[1] 305
# Therefore 305 of 337 sgRNAs are predicted to make a single mutant if you choose a simple algorithm that helps pair sgRNAs with cognate by distance from the PAM
length(unique(bedata_inner_simple$species))
[1] 290
length(unique(bedata_inner_simple_filtered$species))
[1] 170
# However doing this distance-based filtering also reduces the total mutants you see from 283 to 179 (if you discard all mutants that aren't the highest priority to be made by a given sgRNA), This is because some mutants are far away from a guide and won't be predicted to be made
# Filter 1: discard
# What % of the mutants are predicted to be made by a single sgRNA?
bedata_singlemutants=bedata_inner_simple_filtered%>%
filter(multi_mutant_sgRNA%in%F)
# bedata_singlemutants=bedata_singlemutants%>%group_by(species)%>%summarize(ct=n())
bedata_singlemutants_sum=bedata_singlemutants%>%group_by(species.mindist)%>%summarize(ct=n())
# a=bedata_singlemutants_sum%>%filter(ct%in%c(1,2))
# a=bedata_singlemutants_sum%>%filter(ct%in%c(2))
# b=bedata_inner_simple_sum%>%filter(sgRNA.Seq=="GCCGTGAAGACCTTGAAGGA")
# 68 of the 171 mutants are predicted to be made by a single guide.
# 120 of the 171 mutants are predicted to be made by one or two guides
# If a mutant is expected to be the primary mutant made by two guides, do I just take the mean of the mutants made by the two guides?
length(unique(bedata_inner_simple$sgRNA.Seq))
[1] 337
######Next question to ask: of the mutants made by a single (or two) guides, what is the correlation like? Also you could modify the predicter so that the mutants that are made by a single guide that are the same as an mnv as they are as snv are figured out
# Of the sgRNAs that make multinucleotide variants. In how many cases is the SNV the same substitution as the mnv?
ggplot(bedata_inner_simple_filtered,aes(x=mutants_per_sgRNA))+geom_histogram(binwidth = 1,color="black",fill="gray")+
scale_x_continuous("# Unique mutants possible per sgRNA before distance-based prediction",breaks=c(1,2,3,4,5,6,7))+
scale_y_continuous("sgRNAs")
bedata_inner_simple_filtered_nodups=bedata_inner_simple_filtered%>%select(species,sgRNA.Seq)%>%group_by(sgRNA.Seq,species)%>%filter(row_number()==1)
# The no dups filtering takes out sgRNAs making the same mutant a couple of different ways
bedata_inner_simple_filtered_sum=bedata_inner_simple_filtered_nodups%>%group_by(sgRNA.Seq)%>%summarize(ct=n())
ggplot(bedata_inner_simple_filtered_sum,aes(x=ct))+geom_histogram(binwidth = 1,color="black",fill="gray")+
scale_x_continuous("# Unique mutants possible per sgRNA after distance-based prediction")+
scale_y_continuous("sgRNAs")
sgrnas_per_mutant=bedata_inner_simple%>%group_by(species)%>%summarize(sgRNAs_per_mutant=n())
ggplot(sgrnas_per_mutant,aes(x=sgRNAs_per_mutant))+geom_histogram(binwidth = 1,color="black",fill="gray")+
scale_x_continuous("# Unique sgRNAs that are predicted to have highest priority to make the mutant")+
scale_y_continuous("Mutants")
ggplot(bedata_singlemutants_sum,aes(x=ct))+geom_histogram(binwidth = 1,color="black",fill="gray")+
scale_x_continuous("# Unique sgRNAs that are predicted to have highest priority to make the mutant",breaks=c(1,2,3,4,5,6,7))+
scale_y_continuous("Mutants")
Plotting all the data
Plotting
ggplot(bedata_inner_simple,aes(x=BE.LFC))+geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(bedata_inner_simple,aes(x=SM.netgr_obs_mean))+geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#######################Plotting all data unfiltered#####################
# Figure Legend
ggplot(bedata_inner_simple%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
scale_y_continuous("SM Net Growth Rate")
# ggsave("output/BE_SM_FigLegend.pdf",width=10,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
geom_text()+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_abe_all_labels.pdf",width=5,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_abe_all_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
geom_text()+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_cbe_all_labels.pdf",width=5,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_cbe_all_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)
#######################Plotting data filtered for sgRNA Predictions#####################
ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
geom_text()+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_abe_filtered_labels.pdf",width=5,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_abe_filtered_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
geom_text()+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_cbe_filtered_labels.pdf",width=5,height=4,units = "in",useDingbats=F)
ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
scale_y_continuous("SM Net Growth Rate")+
cleanup+
theme(legend.position="none")
# ggsave("output/BE_SM_Plots/be_sm_cbe_filtered_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)
#############
bedata_inner_grouped=bedata_inner_simple_filtered%>%group_by(Type,species)%>%summarize(BE.LFC=mean(BE.LFC),SM.netgr_obs_mean=mean(SM.netgr_obs_mean))
`summarise()` has grouped output by 'Type'. You can override using the `.groups` argument.
bedata_inner_grouped=bedata_inner_grouped%>%rowwise()%>%
mutate(BE.Significant=case_when(BE.LFC<=-1~T,
BE.LFC>=1~T,
T~F),
SM.Significant=case_when(SM.netgr_obs_mean<.04~T,
SM.netgr_obs_mean>.065~T,
T~F),
significance_status=case_when((BE.Significant%in%T)&&(SM.Significant%in%F)~"BEOnly",
(BE.Significant%in%F)&&(SM.Significant%in%T)~"SMOnly",
(BE.Significant%in%T)&&(SM.Significant%in%T)~"Both",
T~"Neither"))
ggplot(bedata_inner_grouped%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
geom_text()+
scale_y_continuous("SM Net Growth Rate")+
cleanup
# theme(legend.position="none")+
# ggsave("output/BE_SM_Plots/be_sm_abe_filtered_labels_mean.pdf",width=5,height=4,units = "in",useDingbats=F)
#######################Figuring out correlation coefficients and FETs#####################
# ABE all
testdata=bedata_inner_simple%>%filter(Type%in%"ABE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.2652287
table=contab_maker(testdata)
table
PP_unique PN_unique
[1,] 36 17
[2,] 87 69
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 0.1464
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.8342767 3.4669174
sample estimates:
odds ratio
1.675423
# CBE all
testdata=bedata_inner_simple%>%filter(Type%in%"CBE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] -0.004758033
table=contab_maker(testdata)
table
PP_unique PN_unique
[1,] 30 30
[2,] 86 92
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 0.8817
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.570784 2.004210
sample estimates:
odds ratio
1.069446
# ABE Filtered
testdata=bedata_inner_simple_filtered%>%filter(Type%in%"ABE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.3223448
table=contab_maker(testdata)
table
PP_unique PN_unique
[1,] 20 8
[2,] 47 29
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 0.4893
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.5571454 4.5808096
sample estimates:
odds ratio
1.536308
# CBE Filtered
testdata=bedata_inner_simple_filtered%>%filter(Type%in%"CBE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.1067391
table=contab_maker(testdata)
table
PP_unique PN_unique
[1,] 13 13
[2,] 42 38
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.339141 2.416453
sample estimates:
odds ratio
0.9056182
# ABE Filtered Means
testdata=bedata_inner_grouped%>%filter(Type%in%"ABE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.2889848
table=contab_maker(testdata)
table
PP_unique PN_unique
[1,] 12 36
[2,] 6 34
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 0.296
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.5727923 6.8107397
sample estimates:
odds ratio
1.875528
##################################
############# Of the sgRNAs that score, how often do any of the predicted variants score (when considered as single mutations.) How does this break down by PAM sequence?
############## When a variant is "covered" by an sgRNA in the library, AND it scores by Saturation Mutagenesis, How often does the sgRNA score in the base editor screen? How does this break down by PAM?
table=contab_maker(bedata_inner_simple%>%filter(Type%in%"ABE"))
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 0.1464
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.8342767 3.4669174
sample estimates:
odds ratio
1.675423
table=contab_maker(bedata_inner_simple%>%filter(Type%in%"CBE"))
fisher.test(table)
Fisher's Exact Test for Count Data
data: table
p-value = 0.8817
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.570784 2.004210
sample estimates:
odds ratio
1.069446
############## Do mult-nucleotide variants score more often than single nucleotide variants. in the base editor library, Is this regardless of PAM sequence?
############## Are multinucleotide variant hits flagging regions of the protein that have more phenotypes than average? Less than average? Or just average? How does this break down by PAM?
############## If a region is negative by sgRNA how often is it negative by saturation mutagenesis? How is this affected by PAM others?
# bedata_filtered_forexport=bedata_inner_simple_filtered%>%select(-c(distance,))
# write.csv(bedata_inner_simple_filtered%>%select(-c(multi_mutant_sgRNA)),"data/BE_ABL_Merged/BE_SatMut_Screen_Kinase_Filtered_031823.csv")
# write.csv(bedata_inner_simple,"data/BE_ABL_Merged/BE_SatMut_Screen_Kinase_All_031823.csv")
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] RColorBrewer_1.1-2 doParallel_1.0.15 iterators_1.0.12 foreach_1.5.0
[5] tictoc_1.0 plotly_4.9.2.1 ggplot2_3.3.3 dplyr_1.0.6
[9] stringr_1.4.0
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 xfun_0.31 bslib_0.3.1 purrr_0.3.4
[5] colorspace_1.4-1 vctrs_0.3.8 generics_0.0.2 htmltools_0.5.2
[9] viridisLite_0.3.0 yaml_2.2.1 utf8_1.1.4 rlang_0.4.11
[13] jquerylib_0.1.4 later_1.0.0 pillar_1.6.1 glue_1.4.1
[17] withr_2.4.2 DBI_1.1.0 lifecycle_1.0.0 munsell_0.5.0
[21] gtable_0.3.0 workflowr_1.6.2 htmlwidgets_1.5.1 codetools_0.2-16
[25] evaluate_0.14 labeling_0.3 knitr_1.28 fastmap_1.1.0
[29] httpuv_1.5.2 fansi_0.4.1 Rcpp_1.0.4.6 promises_1.1.0
[33] backports_1.1.7 scales_1.1.1 jsonlite_1.7.2 farver_2.0.3
[37] fs_1.4.1 digest_0.6.25 stringi_1.7.5 rprojroot_1.3-2
[41] grid_4.0.0 tools_4.0.0 magrittr_2.0.1 sass_0.4.1
[45] lazyeval_0.2.2 tibble_3.1.2 crayon_1.4.1 tidyr_1.1.3
[49] pkgconfig_2.0.3 ellipsis_0.3.2 data.table_1.12.8 assertthat_0.2.1
[53] rmarkdown_2.14 httr_1.4.2 R6_2.4.1 git2r_0.27.1
[57] compiler_4.0.0