Last updated: 2021-05-28

Checks: 7 0

Knit directory: MINTIE-paper-analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200415) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version ed3d2b6. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/cache/
    Ignored:    data/.DS_Store
    Ignored:    data/RCH_B-ALL/
    Ignored:    data/leucegene/.DS_Store
    Ignored:    data/leucegene/KMT2A-PTD_results/.DS_Store
    Ignored:    data/leucegene/normals_ncontrols_test_results/.DS_Store
    Ignored:    data/leucegene/normals_ncontrols_test_results/ncon0/.DS_Store
    Ignored:    data/leucegene/normals_ncontrols_test_results/ncon1/.DS_Store
    Ignored:    data/leucegene/salmon_out/
    Ignored:    data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls
    Ignored:    data/leucegene/validation_results/.DS_Store
    Ignored:    data/simu/.DS_Store
    Ignored:    data/simu/results/.DS_Store
    Ignored:    data/simu/results/MINTIE/.DS_Store
    Ignored:    data/simu/results/MINTIE/varying_dispersion/.DS_Store
    Ignored:    output/Leucegene_gene_counts.tsv
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/

Untracked files:
    Untracked:  data/leucegene/validation_results/TAP/

Unstaged changes:
    Modified:   .Rprofile

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Leucegene_Validation.Rmd) and HTML (docs/Leucegene_Validation.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd ed3d2b6 Marek Cmero 2021-05-28 Figures tweaks and reordering.
html c60c3b4 Marek Cmero 2021-05-18 Build site.
Rmd c46b2c5 Marek Cmero 2021-05-18 Added summary figure of variants found in Leucegene cohort
html 4206f12 Marek Cmero 2021-04-30 Build site.
Rmd dde9f5b Marek Cmero 2021-04-30 wflow_publish(files = list.files(pattern = "*Rmd"))
Rmd 9595530 Marek Cmero 2021-04-30 Updated analyses
html 4b8113e Marek Cmero 2020-07-03 Build site.
Rmd 42ce21b Marek Cmero 2020-07-03 Added CICERO results
html 42ce21b Marek Cmero 2020-07-03 Added CICERO results
html 379b944 Marek Cmero 2020-06-26 Build site.
Rmd 5448658 Marek Cmero 2020-06-26 Update with missing leucegene sample
html e9e4917 Marek Cmero 2020-06-24 Build site.
Rmd 9434bfe Marek Cmero 2020-06-24 Updated results with latest MINTIE run. Fixed bug with KMT2A PTD checking in different controls. Added leucegene
html 0b21347 Marek Cmero 2020-06-11 Build site.
Rmd fa6bf0c Marek Cmero 2020-06-11 Updated with new results; improved tables
html fa6bf0c Marek Cmero 2020-06-11 Updated with new results; improved tables
html 3702862 Marek Cmero 2020-05-18 Removed MLM samples from final B-ALL results
html a166ab8 Marek Cmero 2020-05-08 Build site.
html a600688 Marek Cmero 2020-05-07 Build site.
Rmd 0fde0b8 Marek Cmero 2020-05-07 Added RCH B-ALL analysis
html 1c40e33 Marek Cmero 2020-05-07 Build site.
Rmd bbc278a Marek Cmero 2020-05-07 Refactoring
html 87b4e62 Marek Cmero 2020-05-07 Build site.
Rmd af503f2 Marek Cmero 2020-05-07 Refactoring
html 5c045b5 Marek Cmero 2020-05-07 Build site.
Rmd d8d5b96 Marek Cmero 2020-05-07 Added Leucegene variant validation
html 90c7fd9 Marek Cmero 2020-05-06 Build site.
Rmd 44d8c37 Marek Cmero 2020-05-06 Build leucegene validation notebook.
Rmd ff4b1dc Marek Cmero 2020-05-06 Leucegene results

# util
library(data.table)
library(dplyr)
library(here)
library(stringr)

# plotting/tables
library(ggplot2)
library(gt)

# bioinformatics
library(GenomicRanges)
options(stringsAsFactors = FALSE)
source(here("code/leucegene_helper.R"))
source(here("code/simu_helper.R"))

Leucegene Validation

Here we analyse the results of MINTIE run on a number of Leucegene samples, including the effect of controls on a cohort with KMT2A-PTD variants. We also check whether MINTIE has called known variants within the cohort.

# load SRX to patient ID lookup table
kmt2a_patient_lookup <- read.delim(here("data/leucegene/sample_info/KMT2A-PTD_samples.txt"),
                                   header = FALSE,
                                   col.names = c("sample", "patient"))

kmt2a_results_dir <- here("data/leucegene/KMT2A-PTD_results")

# load KMT2A cohort comparisons against all other controls
kmt2a_results <- load_controls_comparison(kmt2a_results_dir)
kmt2a_results <- inner_join(kmt2a_results, kmt2a_patient_lookup, by = "sample")

# load other validation reults and truth table
truth <- read.delim(here("data/leucegene/sample_info/variant_validation_table.tsv"), sep = "\t")
leucegene_results_dir <- here("data/leucegene/validation_results/MINTIE/")
validation <- list.files(leucegene_results_dir, full.names = TRUE) %>%
                lapply(., read.delim) %>%
                rbindlist(fill = TRUE) %>%
                filter(logFC > 5)

KMT2A-PTD controls comparison

MINTIE paper Supplementary Figure 6. Shows the number of variant genes found in the Leucegene cohort containing KMT2A PTDs.

results_summary <- get_results_summary(mutate(kmt2a_results,
                                              sample = patient,
                                              group_var = controls),
                                       group_var_name = "controls")

# build table
results_summary %>%
    group_by(controls) %>%
    summarise(min = min(V1), median = median(V1), max = max(V1)) %>%
    data.frame() %>%
    gt() %>%
     tab_header(
        title = md("**Total variant genes called using different controls**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    cols_label(
        controls = md("**Controls**"),
        min = md("**Min**"),
        median = md("**Median**"),
        max = md("**Max**")
    ) 
Total variant genes called using different controls
Controls Min Median Max
AML_controls 130 211.5 1852
normal_controls 264 645.5 2265
normal_controls_reduced 490 913.0 2552
no_controls 4750 9255.5 11796
ggplot(results_summary, aes(sample, V1, fill = controls)) +
    geom_bar(position = position_dodge2(width = 0.9, preserve = "single"), stat = "identity") +
    theme_bw() +
    xlab("") +
    ylab("Genes with variants") +
    coord_flip() +
    theme(legend.position = "bottom") +
    scale_fill_brewer(palette="Dark2",
                      labels =  c("AML_controls" = "13 AMLs",
                                  "normal_controls" = "13 normals",
                                  "normal_controls_reduced" = "3 normals",
                                  "no_controls" = "No controls"))

Version Author Date
4206f12 Marek Cmero 2021-04-30
e9e4917 Marek Cmero 2020-06-24
fa6bf0c Marek Cmero 2020-06-11
3702862 Marek Cmero 2020-05-18
87b4e62 Marek Cmero 2020-05-07
90c7fd9 Marek Cmero 2020-05-06

KMT2A variants found in cohort

MINTIE paper Supplementary Table 1. Shows whether MINITE found a KMT2A SV in each sample for the given control group. Coverage obtained from Audemard et al. spreadsheet containing the Leucegene results must be manually added to data/leucegene/sample_info to run the code.

# load results from km paper for coverage of KMT2A PTDs
kmt2a_lgene_km_results <- read.csv(here("data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls"), sep="\t") %>%
                            mutate(patient = Sample) %>%
                            group_by(patient) %>%
                            summarise(coverage = max(Min.coverage))

# check whether MINTIE found a KMT2A SV in each control set
found_using_cancon <- get_samples_with_kmt2a_sv(kmt2a_results, "AML_controls")
found_using_norcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls")
found_using_redcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls_reduced")
found_using_nocon  <- get_samples_with_kmt2a_sv(kmt2a_results, "no_controls")

# make the table
kmt2a_control_comp <- inner_join(kmt2a_patient_lookup, kmt2a_lgene_km_results, by = "patient") %>%
                        arrange(desc(coverage))
kmt2a_control_comp$`13_AMLs` <- kmt2a_control_comp$sample %in% found_using_cancon
kmt2a_control_comp$`13_normals` <- kmt2a_control_comp$sample %in% found_using_norcon
kmt2a_control_comp$`3_normals` <- kmt2a_control_comp$sample %in% found_using_redcon
kmt2a_control_comp$`no_controls` <- kmt2a_control_comp$sample %in% found_using_nocon

# build output table
kmt2a_control_comp %>%
    gt() %>% 
    cols_label(
        sample = md("**Sample**"),
        patient = md("**Patient**"),
        coverage = md("**Coverage**"),
        `13_AMLs` = md("**13 AMLs**"),
        `13_normals` = md("**13 Normals**"),
        `3_normals` = md("**3 Normals**"),
        `no_controls` = md("**No Controls**")
    ) %>%
    tab_header(
        title = md("**KMT2A PTDs found in Leucegene cohort**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`13_AMLs`),
            rows = `13_AMLs`)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`13_normals`),
            rows = `13_normals`)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`3_normals`),
            rows = `3_normals`)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`no_controls`),
            rows = `no_controls`)
    )
KMT2A PTDs found in Leucegene cohort
Sample Patient Coverage 13 AMLs 13 Normals 3 Normals No Controls
SRX958906 07H152 158 FALSE TRUE TRUE TRUE
SRX332646 09H115 125 FALSE TRUE TRUE TRUE
SRX957230 06H146 87 TRUE TRUE TRUE TRUE
SRX957223 05H111 79 TRUE TRUE TRUE TRUE
SRX332659 11H021 63 FALSE TRUE TRUE TRUE
SRX332633 05H050 58 TRUE TRUE TRUE TRUE
SRX959061 13H150 58 FALSE TRUE FALSE TRUE
SRX959044 13H048 57 TRUE TRUE TRUE TRUE
SRX958974 10H070 53 TRUE TRUE TRUE TRUE
SRX958963 10H007 50 TRUE TRUE TRUE TRUE
SRX958959 09H106 49 TRUE TRUE TRUE TRUE
SRX959060 13H141 45 TRUE TRUE TRUE TRUE
SRX958945 09H058 29 TRUE TRUE TRUE TRUE
SRX958907 07H155 23 FALSE TRUE TRUE TRUE
SRX381854 08H112 22 TRUE TRUE TRUE TRUE
SRX332645 09H113 17 TRUE TRUE TRUE TRUE
SRX959001 11H183 16 FALSE FALSE FALSE FALSE
SRX381852 08H012 15 FALSE FALSE FALSE FALSE
SRX958932 08H138 15 FALSE FALSE FALSE TRUE
SRX381865 11H008 13 FALSE FALSE FALSE TRUE
SRX958873 06H048 10 FALSE FALSE FALSE TRUE
SRX958922 08H063 6 FALSE FALSE FALSE FALSE
SRX958961 10H001 6 FALSE FALSE FALSE TRUE
SRX958844 04H111 3 FALSE FALSE FALSE FALSE

Leucegene variants found by MINTIE

# add KMT2A results against AML controls as validation
validation <- filter(kmt2a_results, controls == "normal_controls") %>%
                select(-c(controls, patient)) %>%
                select(colnames(validation)) %>%
                rbind(., validation)

get_results_by_gene(validation) %>%
    group_by(sample) %>%
    summarise(vargenes = length(unique(gene))) %>%
    summarise(min = min(vargenes),
              median = median(vargenes),
              max = max(vargenes)) %>%
    data.frame() %>%
    gt() %>%
    tab_header(
        title = md("**Total MINTIE variant genes called by sample**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    cols_label(
        min = md("**Min**"),
        median = md("**Median**"),
        max = md("**Max**")
    )
Total MINTIE variant genes called by sample
Min Median Max
261 592 2265

Summary of variants found by MINTIE, Arriba, Squid and TAP

Figure 4A.

Note that the TAP results must be obtained from Supplementary Table 4 from Chiu et al. 2018.

# load other callers
arriba_results <- get_results(here("data/leucegene/validation_results/Arriba"))
squid_results <- get_results(here("data/leucegene/validation_results/Squid"))
tap_results <- read.delim(here("data/leucegene/validation_results/TAP/TAP_leucegene_results.txt"), sep = "\t")

# get variant gene locs (needed to check Squid results)
vargene_locs <- read.delim(here("data/leucegene/leucegene_vargene_locs.tsv"), sep = "\t")
vgx <- GRanges(seqnames = vargene_locs$chrom,
               ranges = IRanges(start = vargene_locs$start, end = vargene_locs$end),
               genes = vargene_locs$gene)

# make truth table
truth_table <- rowwise(truth) %>% 
    mutate(mintie_found = is_variant_in_mintie_results(Experiment, gene1, gene2, variant, validation)) %>%
    mutate(arriba_found = is_variant_in_results(Experiment, gene1, gene2, variant, "arriba", arriba_results)) %>%
    mutate(tap_found = is_variant_in_results(patient_ID, gene1, gene2, variant, "tap", tap_results)) %>%
    mutate(squid_found = is_variant_in_squid_results(Experiment, gene1, gene2, vgx, squid_results)) %>%
    data.frame()

gt(truth_table) %>%
     tab_header(
        title = md("**Variants found in Leucegene cohort**")
    ) %>%
    cols_label(
        patient_ID = md("**Patient**"),
        Experiment = md("**Experiment**"),
        gene1 = md("**Gene 1**"),
        gene2 = md("**Gene 2**"),
        variant = md("**Variant**"),
        cohort = md("**Cohort**"),
        mintie_found = md("**MINTIE Found**"),
        arriba_found = md("**Arriba Found**"),
        squid_found = md("**Squid Found**"),
        tap_found = md("**TAP Found**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(mintie_found),
            rows = mintie_found)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(arriba_found),
            rows = arriba_found)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(squid_found),
            rows = squid_found)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(tap_found),
            rows = tap_found)
    )
Variants found in Leucegene cohort
Patient Experiment Gene 1 Gene 2 Variant Cohort MINTIE Found Arriba Found TAP Found Squid Found
03H065 SRX729615 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
03H083 SRX729616 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
03H095 SRX729602 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
03H109 SRX729580 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
03H112 SRX729581 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
03H112 SRX729581 FLT3 ITD CBF TRUE FALSE TRUE FALSE
04H030 SRX729603 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
04H061 SRX729582 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
04H091 SRX729583 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
05H042 SRX729617 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE FALSE
05H099 SRX958862 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
05H113 SRX729604 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
05H118 SRX729618 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
05H136 SRX729605 CBFB MYH11 fusion CBF TRUE TRUE TRUE FALSE
05H184 SRX729619 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
06H020 SRX729606 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
06H035 SRX729620 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
06H115 SRX729607 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
07H099 SRX381851 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
07H137 SRX729621 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
07H144 SRX729585 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
08H034 SRX729622 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
08H042 SRX729623 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
08H072 SRX729624 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE FALSE
08H072 SRX729624 FLT3 ITD CBF TRUE TRUE TRUE FALSE
08H081 SRX729586 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
08H099 SRX729608 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
09H016 SRX729587 CBFB MYH11 fusion CBF FALSE TRUE TRUE TRUE
09H040 SRX729625 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
09H066 SRX729588 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
10H008 SRX729609 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
10H030 SRX729626 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
10H119 SRX729627 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
11H022 SRX729610 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
11H022 SRX729610 FLT3 ITD CBF FALSE FALSE TRUE FALSE
11H104 SRX729589 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
11H107 SRX729628 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
11H179 SRX729611 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
12H042 SRX729590 CBFB MYH11 fusion CBF FALSE TRUE TRUE TRUE
12H044 SRX729591 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
12H045 SRX729629 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE FALSE
12H098 SRX729630 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
12H165 SRX729592 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
12H166 SRX729631 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
12H180 SRX729632 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
12H183 SRX729633 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
13H066 SRX729612 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
13H120 SRX959058 CBFB MYH11 fusion CBF TRUE TRUE TRUE TRUE
13H169 SRX959064 RUNX1 RUNX1T1 fusion CBF TRUE TRUE TRUE TRUE
04H111 SRX958844 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE TRUE
05H050 SRX332633 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
05H111 SRX957223 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
06H048 SRX958873 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE FALSE
06H146 SRX957230 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
07H152 SRX958906 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
07H155 SRX958907 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
08H012 SRX381852 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE TRUE
08H063 SRX958922 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE FALSE
08H112 SRX381854 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE FALSE
08H138 SRX958932 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE TRUE
09H058 SRX958945 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
09H106 SRX958959 KMT2A PTD KMT2A-PTD TRUE TRUE TRUE TRUE
09H113 SRX332645 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
09H115 SRX332646 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE FALSE
10H001 SRX958961 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE FALSE
10H007 SRX958963 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
10H070 SRX958974 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
11H008 SRX381865 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE FALSE
11H021 SRX332659 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE TRUE
11H183 SRX959001 KMT2A PTD KMT2A-PTD FALSE TRUE FALSE TRUE
13H048 SRX959044 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE FALSE
13H141 SRX959060 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE FALSE
13H150 SRX959061 KMT2A PTD KMT2A-PTD TRUE TRUE FALSE FALSE
03H041 SRX332627 NUP98 NSD1 fusion NUP98-NSD1 FALSE TRUE TRUE TRUE
03H041 SRX332627 FLT3 ITD NUP98-NSD1 TRUE TRUE TRUE FALSE
05H034 SRX958856 NUP98 NSD1 fusion NUP98-NSD1 TRUE TRUE TRUE TRUE
05H163 SRX332635 NUP98 NSD1 fusion NUP98-NSD1 TRUE TRUE TRUE TRUE
08H049 SRX958915 NUP98 NSD1 fusion NUP98-NSD1 TRUE TRUE TRUE TRUE
08H049 SRX958915 FLT3 ITD NUP98-NSD1 TRUE FALSE TRUE FALSE
10H038 SRX381861 NUP98 NSD1 fusion NUP98-NSD1 TRUE TRUE TRUE TRUE
11H027 SRX958987 NUP98 NSD1 fusion NUP98-NSD1 TRUE TRUE TRUE TRUE
11H027 SRX958987 FLT3 ITD NUP98-NSD1 TRUE TRUE TRUE FALSE
11H160 SRX332667 NUP98 NSD1 fusion NUP98-NSD1 TRUE TRUE TRUE TRUE
11H160 SRX332667 FLT3 ITD NUP98-NSD1 TRUE FALSE TRUE FALSE
# tally up detected variants into summary table
truth_summary <- truth_table %>%
    group_by(gene1, gene2, variant) %>%
    summarise(mintie_detected = sum(mintie_found),
              arriba_detected = sum(arriba_found),
              squid_detected = sum(squid_found),
              tap_detected = sum(tap_found),
              total = length(mintie_found)) %>%
    data.frame()

gt(truth_summary) %>%
    tab_header(
        title = md("**Summary of variants found in Leucegene cohort**")
    ) %>%
    cols_label(
        gene1 = md("**Gene 1**"),
        gene2 = md("**Gene 2**"),
        variant = md("**Variant**"),
        mintie_detected = md("**MINTIE Detected**"),
        arriba_detected = md("**Arriba Detected**"),
        squid_detected = md("**Squid Detected**"),
        tap_detected = md("**TAP Detected**"),
        total = md("**Total**")
    ) %>%
    tab_options(
        table.font.size = 12
    )
Summary of variants found in Leucegene cohort
Gene 1 Gene 2 Variant MINTIE Detected Arriba Detected Squid Detected TAP Detected Total
CBFB MYH11 fusion 24 26 25 26 26
FLT3 ITD 6 3 0 7 7
KMT2A PTD 16 24 15 1 24
NUP98 NSD1 fusion 6 7 7 7 7
RUNX1 RUNX1T1 fusion 20 20 17 20 20
ts <- truth_summary %>%
        reshape2::melt() %>%
        group_by(variant, variable) %>%
        summarise(detected = sum(value)) %>%
        data.frame()
ts$method <- gsub("_detected", "", ts$variable)

# reorder factors
ts$method <- factor(ts$method, levels = c("mintie", "arriba", "squid", "tap", "total"))
ts$variant <- factor(paste0(ts$variant, "s"), levels=c("fusions", "PTDs", "ITDs"))

ggplot(ts[ts$method != "total",], aes(method, detected)) +
    geom_bar(stat = "identity") +
    theme_bw() +
    xlab("Detected") +
    ylab("Method") +
    scale_x_discrete(labels=c("MINTIE", "Arriba", "SQUID", "TAP")) +
    scale_y_discrete(limits=seq(0, 53, 5)) +
    geom_hline(data=ts[ts$method == "total",], aes(yintercept=detected), colour="salmon") +
    facet_grid(~variant)

Version Author Date
c60c3b4 Marek Cmero 2021-05-18

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicRanges_1.42.0 GenomeInfoDb_1.26.2  IRanges_2.24.1      
 [4] S4Vectors_0.28.1     BiocGenerics_0.36.0  gt_0.2.2            
 [7] ggplot2_3.3.3        stringr_1.4.0        here_1.0.1          
[10] dplyr_1.0.4          data.table_1.13.6    workflowr_1.6.2     

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0       xfun_0.21              reshape2_1.4.4        
 [4] purrr_0.3.4            colorspace_2.0-0       vctrs_0.3.6           
 [7] generics_0.1.0         htmltools_0.5.1.1      yaml_2.2.1            
[10] rlang_0.4.10           later_1.1.0.1          pillar_1.4.7          
[13] glue_1.4.2             withr_2.4.1            DBI_1.1.1             
[16] RColorBrewer_1.1-2     plyr_1.8.6             GenomeInfoDbData_1.2.4
[19] lifecycle_1.0.0        zlibbioc_1.36.0        commonmark_1.7        
[22] munsell_0.5.0          gtable_0.3.0           evaluate_0.14         
[25] labeling_0.4.2         knitr_1.31             httpuv_1.5.5          
[28] highr_0.8              Rcpp_1.0.6             promises_1.2.0.1      
[31] scales_1.1.1           backports_1.2.1        checkmate_2.0.0       
[34] XVector_0.30.0         farver_2.0.3           fs_1.5.0              
[37] digest_0.6.27          stringi_1.5.3          grid_4.0.3            
[40] rprojroot_2.0.2        tools_4.0.3            bitops_1.0-6          
[43] sass_0.3.1             magrittr_2.0.1         RCurl_1.98-1.2        
[46] tibble_3.0.6           crayon_1.4.1           whisker_0.4           
[49] pkgconfig_2.0.3        ellipsis_0.3.1         assertthat_0.2.1      
[52] rmarkdown_2.6          R6_2.5.0               git2r_0.28.0          
[55] compiler_4.0.3