Last updated: 2023-03-18

Checks: 6 1

Knit directory: duplex_sequencing_screen/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200402) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 81a5af3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/Consensus_Data/.Rhistory
    Ignored:    data/Consensus_Data/Novogene_lane11/sample1/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample1/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample2/archive/sscs_aligned_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample2/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample2/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample3/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample3/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample4/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample4/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample5/variant_caller_outputs/sscs_L858R_aligned_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample5/variant_caller_outputs/sscs_L858R_aligned_filtered_sample5.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample6/archive/sscs_aligned_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample6/sscs_L858R_aligned_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample6/variant_caller_outputs/variants_ann_sample6.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane11/sample7/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane12/sample1/low_sscscounts/sscs_aligned_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane12/sample1/sscs_aligned_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane12/sample3/sscs_combined_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane12/sample5/sscs_combined_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane12/sample7/sscs_combined_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane12/sample9/sscs_combined_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample1/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample1/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample10/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample10/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample11/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample11/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample12/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample12/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample2/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample3/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample4/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample5/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample6/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample7/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample7/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample8/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample8/variant_caller_outputs/
    Ignored:    data/Consensus_Data/Novogene_lane13/sample9/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane13/sample9/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample10_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample10_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample10_combined/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample11/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample11/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample11/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample12/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample12/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample12/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample13/
    Ignored:    data/Consensus_Data/Novogene_lane14/sample14_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample14_combined/sscs.filt_1.fa.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample14_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample14_combined/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample14b/
    Ignored:    data/Consensus_Data/Novogene_lane14/sample15/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample15/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample15/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample16/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample16/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample16/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample17/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample17/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample17/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample18/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample18/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample18/sscs/variant_caller_outputs/archive/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample1_combined/
    Ignored:    data/Consensus_Data/Novogene_lane14/sample2_combined/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample3/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample4/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample5/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample6/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample7/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample7/variant_caller_outputs/duplex/
    Ignored:    data/Consensus_Data/Novogene_lane14/sample8/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample8/variant_caller_outputs/
    Ignored:    data/Consensus_Data/Novogene_lane14/sample9/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane14/sample9/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/Novogene_lane2/
    Ignored:    data/Consensus_Data/Novogene_lane3/
    Ignored:    data/Consensus_Data/Novogene_lane4/
    Ignored:    data/Consensus_Data/Novogene_lane5/
    Ignored:    data/Consensus_Data/Novogene_lane6/
    Ignored:    data/Consensus_Data/Novogene_lane7/
    Ignored:    data/Consensus_Data/Ranomics_Pooled/
    Ignored:    data/Consensus_Data/archive/
    Ignored:    data/Consensus_Data/novogene_lane15/sample_1/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_1/firstrun(lowsequencing)/duplex/
    Ignored:    data/Consensus_Data/novogene_lane15/sample_1/firstrun(lowsequencing)/sscs/
    Ignored:    data/Consensus_Data/novogene_lane15/sample_1/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_2/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_2/firstrun(lowsequencing)/sscs/
    Ignored:    data/Consensus_Data/novogene_lane15/sample_2/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/ngs/Sample3_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/ngs/sample3a(firsthalf)/Sample3_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/ngs/variants_ann.csv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_3/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_4/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_4/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_4/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_4/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_5/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/variant_caller_outputs/.empty/
    Ignored:    data/Consensus_Data/novogene_lane15/sample_5/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_6/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_6/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_7/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane15/sample_7/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample10/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample10/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample11/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample11/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample12/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample12/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample12/sscs/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample13/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample13/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample13/sscs/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample14/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample14/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample1_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample1_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample2/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample2/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample3/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample3/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample4/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample4/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample5/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample5/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample6/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample6/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample7/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample7/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample8/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample8/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample9/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/Sample9/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16a/duplex/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample10/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample10/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample11/sscs/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample15/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample15/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample1_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample1_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample2/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample2/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample3/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample3/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample4/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample4/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample5/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample5/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample6/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample6/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample7_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample7_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample8_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample8_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample9/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane16b/Sample9/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample10/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample10/duplex/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane17/sample10/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample11/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample11/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample1_combined/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/duplex/low_depth/
    Ignored:    data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample1_combined/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample2/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample2/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample3/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample3/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample4/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample4/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample5/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample5/low_seq_depth/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample5/low_seq_depth/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample5/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample6/low_seq_depths/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample6/low_seq_depths/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample6/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample7/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample7/low_seq_depths/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample7/low_seq_depths/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample7/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample8/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample8/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample9/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17/sample9/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample1 copy 2/duplex/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample1 copy 2/sscs/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample1 copy 3/duplex/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample1 copy 3/sscs/variant_caller_outputs/
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample1/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample1/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample2/duplex/duplex.consensus.counts.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample2/duplex/duplex_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/novogene_lane17b/Sample2/sscs/sscs_sorted_filtered.tsv.gz
    Ignored:    data/Consensus_Data/sscs_dcs_comparisons/
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_1.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_1.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_3.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/il3_indep_3.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_1.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_1.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_3.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane3/sorted_3.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_1.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_1.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_3.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_3.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_5.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/il3_indep_5.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_1.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_1.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_3.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_3.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_5.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_5.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_6.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane4/sorted_6.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_High_D4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Low_D4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP4_Im_Medium_D4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_High_D4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Low_D4.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D2.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D2.1.consensus.variant-calls.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D4.1.consensus.variant-calls.genome.vcf.gz
    Ignored:    output/Twinstrand/ABL1AppOutput/Novogene_Lane5/RP5_Im_Medium_D4.1.consensus.variant-calls.vcf.gz

Untracked files:
    Untracked:  BCRABL_Imatinib_SatMut_Screen_031423.csv
    Untracked:  analysis/ABL_BaseEditor_Analyses.Rmd
    Untracked:  analysis/ABL_Gnomad_analysis.Rmd
    Untracked:  analysis/figure/
    Untracked:  code/shortest_codon_finder.R
    Untracked:  data/BE_ABL_Merged/
    Untracked:  data/Clinvar_ABL/
    Untracked:  data/Gnomad_ABL/
    Untracked:  data/Short_et_al_fig1/
    Untracked:  inSM_butnotin_BEmerged.csv
    Untracked:  output/BE_SM_FigLegend.pdf
    Untracked:  output/BE_SM_Plots/

Unstaged changes:
    Modified:   .DS_Store
    Modified:   analysis/.DS_Store
    Modified:   analysis/ABL_cosmic_analysis.Rmd
    Modified:   analysis/ABL_unevenness_analysis.Rmd
    Modified:   code/.DS_Store
    Modified:   code/variants_parser.R
    Modified:   data/.DS_Store
    Modified:   data/Consensus_Data/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane11/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane11/sample3/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane11/sample3/duplex/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane11/sample4/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane11/sample4/duplex/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane12/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane13/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane13/sample3/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane13/sample5/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane14/.DS_Store
    Modified:   data/Consensus_Data/Novogene_lane14/sample9/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_3/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_3/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_5/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_5/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_6/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_6/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/variant_caller_outputs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/firstrun(lowsequencing)/sscs/variant_caller_outputs/variants_distribution_files/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane15/sample_7/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16a/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample10/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample1_combined/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample1_combined/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample2/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample2/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample3/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample4/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample5/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample5/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample5/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample6/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample6/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample7_combined/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample7_combined/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample7_combined/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample8_combined/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample8_combined/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample8_combined/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane16b/Sample9/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample10/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample11/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample1_combined/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample1_combined/low_depth/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample2/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample3/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample3/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample4/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample4/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample4/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample5/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample6/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample6/duplex/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample6/low_seq_depths/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample6/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample7/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample8/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample9/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17/sample9/sscs/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17b/.DS_Store
    Modified:   data/Consensus_Data/novogene_lane17b/Sample1/.DS_Store
    Modified:   data/Consensus_Data/prj00053/.DS_Store
    Modified:   data/Refs/.DS_Store
    Modified:   data/Refs/EGFR/.DS_Store
    Modified:   output/.DS_Store
    Modified:   output/Twinstrand/.DS_Store
    Modified:   shinyapp/.DS_Store

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


# Contingency table maker function.
# Input: dataframe of sgRNAs and mutants and whether the enrichment scores are significant
# Output: a 2x2 contingency table
# input_df=bedata_inner_simple_filtered%>%filter(Type%in%"ABE")
contab_maker=function(input_df){
  PP=input_df%>%filter(significance_status%in%"Both")
  PP_unique=length(unique(PP$species))
  
  NN=input_df%>%filter(significance_status%in%"Neither")
  NN_unique=length(unique(NN$species))
  
  NP=input_df%>%filter(significance_status%in%"BEOnly")
  NP_unique=length(unique(NP$species))
  
  PN=input_df%>%filter(significance_status%in%"SMOnly")
  PN_unique=length(unique(PN$species))
  
  table=rbind(cbind(PP_unique,PN_unique),cbind(NP_unique,NN_unique))
  table
}

Data Parsing

# rm(list=ls())
bedata_inner=read.csv("data/BE_ABL_Merged/BE_SatMut_Screen_Inner20230315.csv",header=T,stringsAsFactors = F)
bedata_outer=read.csv("data/BE_ABL_Merged/BE_SatMut_Screen_Outer20230315.csv",header=T,stringsAsFactors = F)
# a=bedata_inner%>%select(ct_screen1_before,ct_screen2_before,pvalue)
# 
# a=unique(bedata_inner$Mutation)
# # Data has 196 unique mutants
# a=unique(bedata_inner$sgRNA.Seq)
# # The 196 unique mutants are made by 277 unique guides
# a=unique(bedata_inner[bedata_inner$Significant%in%"True","Mutation"])
# # Amongst these 196 unique mutants, 45 have a statistically significant sat mut score

# 
# a=unique(bedata_outer$Mutation)
# #5788 mutations overall
# a=unique(bedata_outer$sgRNA.Seq)
# # 2811 guides
# bedata_outer_filtered=bedata_outer%>%filter(ABL1_AA>=242,ABL1_AA<=494)
# a=unique(bedata_outer_filtered$Mutation)
# # out of the 5788 mutants, 717 are in the kinase
# a=unique(bedata_outer_filtered$sgRNA.Seq)
# # these 717 mutants are made by 349 guides
# a=unique(bedata_outer$species)
# # 2201 sat mut mutants
# a=unique(bedata_inner$species)
# # out of the 717 BE mutants, 290 are seen in the SM screens

# a=bedata_inner%>%filter(!Ref_AA%in%ref_aa)
# a=bedata_inner%>%filter(!Alt_AA%in%alt_aa)
# a=bedata_inner%>%filter(!Mutation%in%species)
# a=bedata_inner%>%filter(!Ref_Codon%in%ref_codon)
# a=bedata_inner%>%filter(!ABL1_AA%in%protein_start)
# ^^Just making sure that there are no cases where the BE data and the sat mut data coordinates are different
# a=bedata_inner%>%mutate(Alt_Codon=toupper(Alt_Codon))%>%filter(!Alt_Codon%in%alt_codon)
# The clause above is basically saying that 606 out of 760 ALT codons are different in the BE data than in the sat mut data, which makes sense




bedata_inner_simple=bedata_inner%>%
  select(-c(X,Unnamed..0_x,Strand,consequence_terms,
            ct_screen1_before,ct_screen1_after,ct_screen2_before,ct_screen2_after,
            depth_screen1_before,depth_screen1_after,depth_screen2_before,depth_screen2_after,
            maf_screen1_before,maf_screen1_after,maf_screen2_before,maf_screen2_after,
            Ref_AA,ABL1_AA,Alt_AA,Ref_Codon,Mutation,alt_start_pos))

bedata_inner_simple=bedata_inner_simple%>%
  rename(BE.Alt_Codon=Alt_Codon,SM.Alt_Codon=alt_codon,BE.LFC=BE_LFC,BE.pval=BE_p.value,BE.FDR=BE_FDR,SM.pval=pvalue,SM.padj=padj,SM.Significant=Significant,SM.ref=ref,SM.alt=alt,SM.LFC=log2FoldChange)%>%
  relocate(ref_aa,protein_start,alt_aa,species,ref_codon,BE.Alt_Codon,SM.Alt_Codon,alt_codon_shortest,n_nuc_min,AA_Number)

# Now that I have simplified the dataset to include meaningful columns, I am going to add a column that has a flag for the number of Sat Mut nucleotides
bedata_inner_simple=bedata_inner_simple%>%rowwise%>%mutate(BE.n_nuc=str_count(sgRNA_Nuc,",")+1,SM.n_nuc=nchar(SM.alt))%>%relocate(BE.n_nuc,.after=sgRNA_Nuc)%>%relocate(SM.n_nuc,.after=SM.alt)
bedata_inner_simple$SM.Significant=toupper(bedata_inner_simple$SM.Significant)
bedata_inner_simple$SM.Significant=as.factor(bedata_inner_simple$SM.Significant)
# Adding the flag for a mutant captured by more than one sgRNA



bedata_inner_simple=bedata_inner_simple%>%mutate(SM.netgr_obs_mean=mean(netgr_obs_screen1,netgr_obs_screen2))

bedata_inner_simple=bedata_inner_simple%>%
  mutate(BE.Significant=case_when(BE.FDR<0.02~T,
                                  T~F),
         SM.Significant=case_when(SM.padj<0.05~T,
                                  T~F),
         significance_status=case_when((BE.Significant%in%T)&&(SM.Significant%in%F)~"BEOnly",
                                       (BE.Significant%in%F)&&(SM.Significant%in%T)~"SMOnly",
                                       (BE.Significant%in%T)&&(SM.Significant%in%T)~"Both",
                                       T~"Neither"))

# bedata_inner_simple=bedata_inner_simple%>%
#   mutate(BE.Significant=case_when(BE.LFC<=-1~T,
#                                   BE.LFC>=1~T,
#                                   T~F),
#          significance_status=case_when((BE.Significant%in%T)&&(SM.Significant%in%F)~"BEOnly",
#                                        (BE.Significant%in%F)&&(SM.Significant%in%T)~"SMOnly",
#                                        (BE.Significant%in%T)&&(SM.Significant%in%T)~"Both",
#                                        T~"Neither"))

There are a lot fewer mutants in the merged BE SM dataset than in the IL3 dataset Main question: How many unique mutants in the IL3 dataset are not pseudo-counted and have not equal background counts and depths across two replicates and are in the kinase and are present in the BE data?

Figuring out how to predict which sgRNAs make what mutants

# Algorithm:
# For each unique sgRNA
# Calculate how many mutants are possible that we see in our library
# Figure out which mutants are MNVs
# For sgRNAs that only have one mutant, that's the default choice
# For sgRNAs with >1 possible mutant,
# choose the mutant that is closest to the 5th position away from the PAM
# If two mutants are the same distance away from the 5th position (i.e. at position 4 and 6), choose the higher distance mutant

# Note that this algorithm is PAM agnostic
# Note that this algorithm does not use MNVs

# Rule: 

# x=bedata_inner_simple%>%filter(sgRNA_possible_mutants==2)
# x=bedata_inner_simple%>%filter(sgRNA_possible_mutants>=2)
# x=bedata_inner_simple
bedata_inner_simple=bedata_inner_simple%>%select(Type,sgRNA.Seq,species,sgRNA_Nuc,BE.n_nuc,ref_codon,BE.Alt_Codon,BE.LFC,BE.pval,BE.FDR,BE.Significant,SM.netgr_obs_mean,SM.Significant,significance_status)
# Calculating distances
# For SNPs, the distance is simple,
# For MNVs, choose one of the distances (simplifying assumption, I know)
bedata_inner_simple=bedata_inner_simple%>%mutate(sgRNA_Nuc=gsub("\\[|\\]","",sgRNA_Nuc))
# Figuring out which mutants are at the minimum distance
# What if two mutants are within the same distance? Choose the first mutant
bedata_inner_simple=bedata_inner_simple%>%mutate(distance=case_when(BE.n_nuc%in%1~sgRNA_Nuc,
                                BE.n_nuc>=2~strsplit(sgRNA_Nuc,",")[[1]][1]))
bedata_inner_simple$distance=as.numeric(bedata_inner_simple$distance)
bedata_inner_simple=bedata_inner_simple%>%mutate(distance_from_5=abs(5-distance))

# For each sgRNA, figure out which mutant is at the minimum distance
# If there are multiple mutants at the minimum distance, note down both of them.
bedata_sum=bedata_inner_simple%>%
  group_by(sgRNA.Seq)%>%
  summarize(mutants_per_sgRNA=n(),
            mindist=min(distance_from_5),
            species.mindist=paste(species[which(distance_from_5==min(distance_from_5))],collapse=", "))
# Sometimes a guide makes the same amino acid substitution two different ways (eg a snp and an mnv that make the same substitution). When this happens, the algorithm thinks that the guide is making two separate amino acid substitutions. this next conditional statement is going to remove those duplicates.
bedata_sum=bedata_sum%>%
  rowwise()%>%
  mutate(species.mindist=case_when(
    strsplit(species.mindist,", ")[[1]][1]==
      strsplit(species.mindist,", ")[[1]][2]~strsplit(species.mindist,", ")[[1]][1],
                                   T~species.mindist))



bedata_inner_simple=merge(bedata_inner_simple,bedata_sum,by="sgRNA.Seq")

bedata_inner_simple_filtered=bedata_inner_simple%>%rowwise()%>%filter(species%in%species.mindist)

# bedata_mutant_sum=bedata_inner_simple_filtered%>%group_by(species,sgRNA.Seq)%>%summarize(num_ways=n())
# note that num_ways is the number of different ways that mutant can be made by that sgRNA.
# I use this 

############# Are our sgRNA predictions working well?
# Questions to ask:
# What % of the sgRNAs get a single mutant 
bedata_inner_simple_filtered=bedata_inner_simple_filtered%>%
  # rowwise()%>%
  mutate(multi_mutant_sgRNA=grepl(",",species.mindist))
x=bedata_inner_simple_filtered%>%filter(multi_mutant_sgRNA%in%F)
length(unique(bedata_inner_simple$sgRNA.Seq))
[1] 337
length(unique(x$sgRNA.Seq))
[1] 305
# Therefore 305 of 337 sgRNAs are predicted to make a single mutant if you choose a simple algorithm that helps pair sgRNAs with cognate by distance from the PAM

length(unique(bedata_inner_simple$species))
[1] 290
length(unique(bedata_inner_simple_filtered$species))
[1] 170
# However doing this distance-based filtering also reduces the total mutants you see from 283 to 179 (if you discard all mutants that aren't the highest priority to be made by a given sgRNA), This is because some mutants are far away from a guide and won't be predicted to be made

# Filter 1: discard 
# What % of the mutants are predicted to be made by a single sgRNA?
bedata_singlemutants=bedata_inner_simple_filtered%>%
  filter(multi_mutant_sgRNA%in%F)
# bedata_singlemutants=bedata_singlemutants%>%group_by(species)%>%summarize(ct=n())
bedata_singlemutants_sum=bedata_singlemutants%>%group_by(species.mindist)%>%summarize(ct=n())

# a=bedata_singlemutants_sum%>%filter(ct%in%c(1,2))
# a=bedata_singlemutants_sum%>%filter(ct%in%c(2))
# b=bedata_inner_simple_sum%>%filter(sgRNA.Seq=="GCCGTGAAGACCTTGAAGGA")


# 68 of the 171 mutants are predicted to be made by a single guide.
# 120 of the 171 mutants are predicted to be made by one or two guides
# If a mutant is expected to be the primary mutant made by two guides, do I just take the mean of the mutants made by the two guides?
length(unique(bedata_inner_simple$sgRNA.Seq))
[1] 337
######Next question to ask: of the mutants made by a single (or two) guides, what is the correlation like? Also you could modify the predicter so that the mutants that are made by a single guide that are the same as an mnv as they are as snv are figured out

# Of the sgRNAs that make multinucleotide variants. In how many cases is the SNV the same substitution as the mnv?


ggplot(bedata_inner_simple_filtered,aes(x=mutants_per_sgRNA))+geom_histogram(binwidth = 1,color="black",fill="gray")+
  scale_x_continuous("# Unique mutants possible per sgRNA before distance-based prediction",breaks=c(1,2,3,4,5,6,7))+
  scale_y_continuous("sgRNAs")

bedata_inner_simple_filtered_nodups=bedata_inner_simple_filtered%>%select(species,sgRNA.Seq)%>%group_by(sgRNA.Seq,species)%>%filter(row_number()==1)
# The no dups filtering takes out sgRNAs making the same mutant a couple of different ways
bedata_inner_simple_filtered_sum=bedata_inner_simple_filtered_nodups%>%group_by(sgRNA.Seq)%>%summarize(ct=n())
ggplot(bedata_inner_simple_filtered_sum,aes(x=ct))+geom_histogram(binwidth = 1,color="black",fill="gray")+
  scale_x_continuous("# Unique mutants possible per sgRNA after distance-based prediction")+
  scale_y_continuous("sgRNAs")

sgrnas_per_mutant=bedata_inner_simple%>%group_by(species)%>%summarize(sgRNAs_per_mutant=n())
ggplot(sgrnas_per_mutant,aes(x=sgRNAs_per_mutant))+geom_histogram(binwidth = 1,color="black",fill="gray")+
  scale_x_continuous("# Unique sgRNAs that are predicted to have highest priority to make the mutant")+
  scale_y_continuous("Mutants")

ggplot(bedata_singlemutants_sum,aes(x=ct))+geom_histogram(binwidth = 1,color="black",fill="gray")+
  scale_x_continuous("# Unique sgRNAs that are predicted to have highest priority to make the mutant",breaks=c(1,2,3,4,5,6,7))+
  scale_y_continuous("Mutants")

Plotting all the data

Plotting

ggplot(bedata_inner_simple,aes(x=BE.LFC))+geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(bedata_inner_simple,aes(x=SM.netgr_obs_mean))+geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#######################Plotting all data unfiltered#####################

# Figure Legend
ggplot(bedata_inner_simple%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
         geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
         scale_y_continuous("SM Net Growth Rate")

# ggsave("output/BE_SM_FigLegend.pdf",width=10,height=4,units = "in",useDingbats=F)



ggplot(bedata_inner_simple%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
  geom_text()+
  scale_y_continuous("SM Net Growth Rate")+
  cleanup+
  theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_abe_all_labels.pdf",width=5,height=4,units = "in",useDingbats=F)

ggplot(bedata_inner_simple%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
         geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
         scale_y_continuous("SM Net Growth Rate")+
         cleanup+
         theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_abe_all_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)


ggplot(bedata_inner_simple%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
  geom_text()+
  scale_y_continuous("SM Net Growth Rate")+
  cleanup+
  theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_cbe_all_labels.pdf",width=5,height=4,units = "in",useDingbats=F)

ggplot(bedata_inner_simple%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
         geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
         scale_y_continuous("SM Net Growth Rate")+
         cleanup+
         theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_cbe_all_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)

#######################Plotting data filtered for sgRNA Predictions#####################
ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
  geom_text()+
  scale_y_continuous("SM Net Growth Rate")+
  cleanup+
  theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_abe_filtered_labels.pdf",width=5,height=4,units = "in",useDingbats=F)

ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
         geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
         scale_y_continuous("SM Net Growth Rate")+
         cleanup+
         theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_abe_filtered_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)


ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
  geom_text()+
  scale_y_continuous("SM Net Growth Rate")+
  cleanup+
  theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_cbe_filtered_labels.pdf",width=5,height=4,units = "in",useDingbats=F)

ggplot(bedata_inner_simple_filtered%>%filter(Type%in%"CBE"),aes(x=BE.LFC,y=SM.netgr_obs_mean))+
         geom_point(color="black",shape=21,size=2,aes(fill=significance_status))+
         scale_y_continuous("SM Net Growth Rate")+
         cleanup+
         theme(legend.position="none")

# ggsave("output/BE_SM_Plots/be_sm_cbe_filtered_ponts.pdf",width=5,height=4,units = "in",useDingbats=F)

#############
bedata_inner_grouped=bedata_inner_simple_filtered%>%group_by(Type,species)%>%summarize(BE.LFC=mean(BE.LFC),SM.netgr_obs_mean=mean(SM.netgr_obs_mean))
`summarise()` has grouped output by 'Type'. You can override using the `.groups` argument.
bedata_inner_grouped=bedata_inner_grouped%>%rowwise()%>%
  mutate(BE.Significant=case_when(BE.LFC<=-1~T,
                                  BE.LFC>=1~T,
                                  T~F),
         SM.Significant=case_when(SM.netgr_obs_mean<.04~T,
                                  SM.netgr_obs_mean>.065~T,
                                  T~F),
         significance_status=case_when((BE.Significant%in%T)&&(SM.Significant%in%F)~"BEOnly",
                                       (BE.Significant%in%F)&&(SM.Significant%in%T)~"SMOnly",
                                       (BE.Significant%in%T)&&(SM.Significant%in%T)~"Both",
                                       T~"Neither"))

ggplot(bedata_inner_grouped%>%filter(Type%in%"ABE"),aes(x=BE.LFC,y=SM.netgr_obs_mean,label=species,color=significance_status))+
  geom_text()+
  scale_y_continuous("SM Net Growth Rate")+
  cleanup

  # theme(legend.position="none")+
# ggsave("output/BE_SM_Plots/be_sm_abe_filtered_labels_mean.pdf",width=5,height=4,units = "in",useDingbats=F)



#######################Figuring out correlation coefficients and FETs#####################
# ABE all
testdata=bedata_inner_simple%>%filter(Type%in%"ABE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.2652287
table=contab_maker(testdata)
table
     PP_unique PN_unique
[1,]        36        17
[2,]        87        69
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 0.1464
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.8342767 3.4669174
sample estimates:
odds ratio 
  1.675423 
# CBE all
testdata=bedata_inner_simple%>%filter(Type%in%"CBE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] -0.004758033
table=contab_maker(testdata)
table
     PP_unique PN_unique
[1,]        30        30
[2,]        86        92
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 0.8817
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.570784 2.004210
sample estimates:
odds ratio 
  1.069446 
# ABE Filtered
testdata=bedata_inner_simple_filtered%>%filter(Type%in%"ABE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.3223448
table=contab_maker(testdata)
table
     PP_unique PN_unique
[1,]        20         8
[2,]        47        29
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 0.4893
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.5571454 4.5808096
sample estimates:
odds ratio 
  1.536308 
# CBE Filtered
testdata=bedata_inner_simple_filtered%>%filter(Type%in%"CBE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.1067391
table=contab_maker(testdata)
table
     PP_unique PN_unique
[1,]        13        13
[2,]        42        38
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.339141 2.416453
sample estimates:
odds ratio 
 0.9056182 
# ABE Filtered Means
testdata=bedata_inner_grouped%>%filter(Type%in%"ABE")
cor(testdata$BE.LFC,testdata$SM.netgr_obs_mean)
[1] 0.2889848
table=contab_maker(testdata)
table
     PP_unique PN_unique
[1,]        12        36
[2,]         6        34
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 0.296
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.5727923 6.8107397
sample estimates:
odds ratio 
  1.875528 
##################################


############# Of the sgRNAs that score, how often do any of the predicted variants score (when considered as single mutations.) How does this break down by PAM sequence?
############## When a variant is "covered" by an sgRNA in the library, AND it scores by Saturation Mutagenesis, How often does the sgRNA score in the base editor screen? How does this break down by PAM?
table=contab_maker(bedata_inner_simple%>%filter(Type%in%"ABE"))
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 0.1464
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.8342767 3.4669174
sample estimates:
odds ratio 
  1.675423 
table=contab_maker(bedata_inner_simple%>%filter(Type%in%"CBE"))
fisher.test(table)

    Fisher's Exact Test for Count Data

data:  table
p-value = 0.8817
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.570784 2.004210
sample estimates:
odds ratio 
  1.069446 
############## Do mult-nucleotide variants score more often than single nucleotide variants. in the base editor library,     Is this regardless of PAM sequence?

############## Are multinucleotide variant hits flagging regions of the protein that have more phenotypes than average? Less than average? Or just average? How does this break down by PAM?

############## If a region is negative by sgRNA how often is it negative by saturation mutagenesis? How is this affected by PAM others?
# bedata_filtered_forexport=bedata_inner_simple_filtered%>%select(-c(distance,))
# write.csv(bedata_inner_simple_filtered%>%select(-c(multi_mutant_sgRNA)),"data/BE_ABL_Merged/BE_SatMut_Screen_Kinase_Filtered_031823.csv")

# write.csv(bedata_inner_simple,"data/BE_ABL_Merged/BE_SatMut_Screen_Kinase_All_031823.csv")

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS  10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] RColorBrewer_1.1-2 doParallel_1.0.15  iterators_1.0.12   foreach_1.5.0     
[5] tictoc_1.0         plotly_4.9.2.1     ggplot2_3.3.3      dplyr_1.0.6       
[9] stringr_1.4.0     

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0  xfun_0.31         bslib_0.3.1       purrr_0.3.4      
 [5] colorspace_1.4-1  vctrs_0.3.8       generics_0.0.2    htmltools_0.5.2  
 [9] viridisLite_0.3.0 yaml_2.2.1        utf8_1.1.4        rlang_0.4.11     
[13] jquerylib_0.1.4   later_1.0.0       pillar_1.6.1      glue_1.4.1       
[17] withr_2.4.2       DBI_1.1.0         lifecycle_1.0.0   munsell_0.5.0    
[21] gtable_0.3.0      workflowr_1.6.2   htmlwidgets_1.5.1 codetools_0.2-16 
[25] evaluate_0.14     labeling_0.3      knitr_1.28        fastmap_1.1.0    
[29] httpuv_1.5.2      fansi_0.4.1       Rcpp_1.0.4.6      promises_1.1.0   
[33] backports_1.1.7   scales_1.1.1      jsonlite_1.7.2    farver_2.0.3     
[37] fs_1.4.1          digest_0.6.25     stringi_1.7.5     rprojroot_1.3-2  
[41] grid_4.0.0        tools_4.0.0       magrittr_2.0.1    sass_0.4.1       
[45] lazyeval_0.2.2    tibble_3.1.2      crayon_1.4.1      tidyr_1.1.3      
[49] pkgconfig_2.0.3   ellipsis_0.3.2    data.table_1.12.8 assertthat_0.2.1 
[53] rmarkdown_2.14    httr_1.4.2        R6_2.4.1          git2r_0.27.1     
[57] compiler_4.0.0