Last updated: 2020-06-26

Checks: 7 0

Knit directory: MINTIE-paper-analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200415) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/cache/
    Ignored:    data/RCH_B-ALL/
    Ignored:    data/leucegene/salmon_out/
    Ignored:    data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls
    Ignored:    output/Leucegene_gene_counts.tsv
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/
    Ignored:    packrat/src/

Untracked files:
    Untracked:  update_results.sh

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 5448658 Marek Cmero 2020-06-25 Update with missing leucegene sample
html e9e4917 Marek Cmero 2020-06-24 Build site.
Rmd 9434bfe Marek Cmero 2020-06-24 Updated results with latest MINTIE run. Fixed bug with KMT2A PTD checking in different controls. Added leucegene
html 0b21347 Marek Cmero 2020-06-11 Build site.
Rmd fa6bf0c Marek Cmero 2020-06-11 Updated with new results; improved tables
html fa6bf0c Marek Cmero 2020-06-11 Updated with new results; improved tables
html a166ab8 Marek Cmero 2020-05-08 Build site.
html a600688 Marek Cmero 2020-05-07 Build site.
Rmd 0fde0b8 Marek Cmero 2020-05-07 Added RCH B-ALL analysis
html 1c40e33 Marek Cmero 2020-05-07 Build site.
Rmd bbc278a Marek Cmero 2020-05-07 Refactoring
html 87b4e62 Marek Cmero 2020-05-07 Build site.
Rmd af503f2 Marek Cmero 2020-05-07 Refactoring
html 5c045b5 Marek Cmero 2020-05-07 Build site.
Rmd d8d5b96 Marek Cmero 2020-05-07 Added Leucegene variant validation
html 90c7fd9 Marek Cmero 2020-05-06 Build site.
Rmd 44d8c37 Marek Cmero 2020-05-06 Build leucegene validation notebook.
Rmd ff4b1dc Marek Cmero 2020-05-06 Leucegene results

# util
library(data.table)
library(dplyr)
library(here)
library(stringr)

# plotting/tables
library(ggplot2)
library(gt)
options(stringsAsFactors = FALSE)
source(here("code/leucegene_helper.R"))

Leucegene Validation

Here we analyse the results of MINTIE run on a number of Leucegene samples, including the effect of controls on a cohort with KMT2A-PTD variants. We also check whether MINTIE has called known variants within the cohort.

# load SRX to patient ID lookup table
kmt2a_patient_lookup <- read.delim(here("data/leucegene/sample_info/KMT2A-PTD_samples.txt"),
                                   header = FALSE,
                                   col.names = c("sample", "patient"))

kmt2a_results_dir <- here("data/leucegene/KMT2A-PTD_results")

# load KMT2A cohort comparisons against all other controls
kmt2a_results <- load_controls_comparison(kmt2a_results_dir)
kmt2a_results <- inner_join(kmt2a_results, kmt2a_patient_lookup, by = "sample")

# load other validation reults and truth table
truth <- read.delim(here("data/leucegene/sample_info/variant_validation_table.tsv"), sep = "\t")
leucegene_results_dir <- here("data/leucegene/validation_results")
validation <- list.files(leucegene_results_dir, full.names = TRUE) %>%
                lapply(., read.delim) %>%
                rbindlist(fill = TRUE) %>%
                filter(logFC > 5)

KMT2A-PTD controls comparison

MINTIE paper Supplementary Figure 2. Shows the number of variant genes found in the Leucegene cohort containing KMT2A PTDs.

results_summary <- get_results_summary(mutate(kmt2a_results,
                                              sample = patient,
                                              group_var = controls),
                                       group_var_name = "controls")

# build table
results_summary %>%
    group_by(controls) %>%
    summarise(min = min(V1), median = median(V1), max = max(V1)) %>%
    data.frame() %>%
    gt() %>%
     tab_header(
        title = md("**Total variant genes called using different controls**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    cols_label(
        controls = md("**Controls**"),
        min = md("**Min**"),
        median = md("**Median**"),
        max = md("**Max**")
    ) 
Total variant genes called using different controls
Controls Min Median Max
AML_controls 105 177.5 1321
normal_controls 216 459.0 1680
normal_controls_reduced 337 643.5 2055
ggplot(results_summary, aes(sample, V1, fill = controls)) +
    geom_bar(position = position_dodge2(width = 0.9, preserve = "single"), stat = "identity") +
    theme_bw() +
    xlab("") +
    ylab("Genes with variants") +
    coord_flip() +
    theme(legend.position = "bottom") +
    scale_fill_brewer(palette="Dark2",
                      labels =  c("AML_controls" = "13 AMLs",
                                  "normal_controls" = "13 normals",
                                  "normal_controls_reduced" = "3 normals"))

Version Author Date
e9e4917 Marek Cmero 2020-06-24
fa6bf0c Marek Cmero 2020-06-11
87b4e62 Marek Cmero 2020-05-07
90c7fd9 Marek Cmero 2020-05-06

KMT2A variants found in cohort

MINTIE paper Supplementary Table 1. Shows whether MINITE found a KMT2A SV in each sample for the given control group. Coverage obtained from Audemard et al. spreadsheet containing the Leucegene results must be manually added to data/leucegene/sample_info to run the code.

# load results from km paper for coverage of KMT2A PTDs
kmt2a_lgene_km_results <- read.csv(here("data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls"), sep="\t") %>%
                            mutate(patient = Sample) %>%
                            group_by(patient) %>%
                            summarise(coverage = max(Min.coverage))

# check whether MINTIE found a KMT2A SV in each control set
found_using_cancon <- get_samples_with_kmt2a_sv(kmt2a_results, "AML_controls")
found_using_norcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls")
found_using_redcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls_reduced")

# make the table
kmt2a_control_comp <- inner_join(kmt2a_patient_lookup, kmt2a_lgene_km_results, by = "patient") %>%
                        arrange(desc(coverage))
kmt2a_control_comp$`13_AMLs` <- kmt2a_control_comp$sample %in% found_using_cancon
kmt2a_control_comp$`13_normals` <- kmt2a_control_comp$sample %in% found_using_norcon
kmt2a_control_comp$`3_normals` <- kmt2a_control_comp$sample %in% found_using_redcon

# build output table
kmt2a_control_comp %>%
    gt() %>% 
    cols_label(
        sample = md("**Sample**"),
        patient = md("**Patient**"),
        coverage = md("**Coverage**"),
        `13_AMLs` = md("**13 AMLs**"),
        `13_normals` = md("**13 Normals**"),
        `3_normals` = md("**3 Normals**")
    ) %>%
    tab_header(
        title = md("**KMT2A PTDs found in Leucegene cohort**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`13_AMLs`),
            rows = `13_AMLs`)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`13_normals`),
            rows = `13_normals`)
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(`3_normals`),
            rows = `3_normals`)
    )
KMT2A PTDs found in Leucegene cohort
Sample Patient Coverage 13 AMLs 13 Normals 3 Normals
SRX958906 07H152 158 FALSE TRUE TRUE
SRX332646 09H115 125 FALSE TRUE TRUE
SRX957230 06H146 87 TRUE TRUE TRUE
SRX957223 05H111 79 TRUE TRUE TRUE
SRX332659 11H021 63 FALSE TRUE TRUE
SRX332633 05H050 58 TRUE TRUE TRUE
SRX959061 13H150 58 FALSE TRUE FALSE
SRX959044 13H048 57 TRUE TRUE TRUE
SRX958974 10H070 53 TRUE TRUE TRUE
SRX958963 10H007 50 TRUE TRUE TRUE
SRX958959 09H106 49 TRUE TRUE TRUE
SRX959060 13H141 45 TRUE TRUE TRUE
SRX958945 09H058 29 TRUE TRUE TRUE
SRX958907 07H155 23 FALSE TRUE TRUE
SRX381854 08H112 22 TRUE TRUE TRUE
SRX332645 09H113 17 TRUE TRUE TRUE
SRX959001 11H183 16 FALSE FALSE FALSE
SRX381852 08H012 15 FALSE FALSE FALSE
SRX958932 08H138 15 FALSE FALSE FALSE
SRX381865 11H008 13 FALSE FALSE FALSE
SRX958873 06H048 10 FALSE FALSE FALSE
SRX958922 08H063 6 FALSE FALSE FALSE
SRX958961 10H001 6 FALSE FALSE FALSE
SRX958844 04H111 3 FALSE FALSE FALSE

Leucegene variants found by MINTIE

# add KMT2A results against AML controls as validation
validation <- filter(kmt2a_results, controls == "normal_controls") %>%
                select(-c(controls, patient)) %>%
                select(colnames(validation)) %>%
                rbind(., validation)

truth_table <- rowwise(truth) %>% 
    mutate(found = is_variant_in_sample(Experiment, gene1, gene2, variant, validation)) %>%
    data.frame()

truth_table %>% 
    group_by(gene1, gene2, variant) %>%
    summarise(detected = sum(found),
              total = length(found)) %>%
    data.table() %>%
    gt() %>%
    tab_header(
        title = md("**Summary of variants found in Leucegene cohort**")
    ) %>%
    cols_label(
        gene1 = md("**Gene 1**"),
        gene2 = md("**Gene 2**"),
        variant = md("**Variant**"),
        detected = md("**Detected**"),
        total = md("**Total**")
    ) %>%
    tab_options(
        table.font.size = 12
    )
Summary of variants found in Leucegene cohort
Gene 1 Gene 2 Variant Detected Total
CBFB MYH11 fusion 24 26
FLT3 ITD 6 7
KMT2A PTD 16 24
NUP98 NSD1 fusion 7 7
RUNX11 RUNX1T1 fusion 20 20
gt(truth_table) %>%
     tab_header(
        title = md("**Variants found in Leucegene cohort**")
    ) %>%
    cols_label(
        patient_ID = md("**Patient**"),
        Experiment = md("**Experiment**"),
        gene1 = md("**Gene 1**"),
        gene2 = md("**Gene 2**"),
        variant = md("**Variant**"),
        cohort = md("**Cohort**"),
        found = md("**Found**")
    ) %>%
    tab_options(
        table.font.size = 12
    ) %>%
    tab_style(
        style = cell_fill(color = "lightgreen"),
        locations = cells_body(
            columns = vars(found),
            rows = found)
    )
Variants found in Leucegene cohort
Patient Experiment Gene 1 Gene 2 Variant Cohort Found
03H065 SRX729615 RUNX11 RUNX1T1 fusion CBF TRUE
03H083 SRX729616 RUNX11 RUNX1T1 fusion CBF TRUE
03H095 SRX729602 CBFB MYH11 fusion CBF TRUE
03H109 SRX729580 CBFB MYH11 fusion CBF TRUE
03H112 SRX729581 CBFB MYH11 fusion CBF TRUE
03H112 SRX729581 FLT3 ITD CBF TRUE
04H030 SRX729603 CBFB MYH11 fusion CBF TRUE
04H061 SRX729582 CBFB MYH11 fusion CBF TRUE
04H091 SRX729583 CBFB MYH11 fusion CBF TRUE
05H042 SRX729617 RUNX11 RUNX1T1 fusion CBF TRUE
05H099 SRX958862 CBFB MYH11 fusion CBF TRUE
05H113 SRX729604 CBFB MYH11 fusion CBF TRUE
05H118 SRX729618 RUNX11 RUNX1T1 fusion CBF TRUE
05H136 SRX729605 CBFB MYH11 fusion CBF TRUE
05H184 SRX729619 RUNX11 RUNX1T1 fusion CBF TRUE
06H020 SRX729606 CBFB MYH11 fusion CBF TRUE
06H035 SRX729620 RUNX11 RUNX1T1 fusion CBF TRUE
06H115 SRX729607 CBFB MYH11 fusion CBF TRUE
07H099 SRX381851 CBFB MYH11 fusion CBF TRUE
07H137 SRX729621 RUNX11 RUNX1T1 fusion CBF TRUE
07H144 SRX729585 CBFB MYH11 fusion CBF TRUE
08H034 SRX729622 RUNX11 RUNX1T1 fusion CBF TRUE
08H042 SRX729623 RUNX11 RUNX1T1 fusion CBF TRUE
08H072 SRX729624 RUNX11 RUNX1T1 fusion CBF TRUE
08H072 SRX729624 FLT3 ITD CBF TRUE
08H081 SRX729586 CBFB MYH11 fusion CBF TRUE
08H099 SRX729608 CBFB MYH11 fusion CBF TRUE
09H016 SRX729587 CBFB MYH11 fusion CBF FALSE
09H040 SRX729625 RUNX11 RUNX1T1 fusion CBF TRUE
09H066 SRX729588 CBFB MYH11 fusion CBF TRUE
10H008 SRX729609 CBFB MYH11 fusion CBF TRUE
10H030 SRX729626 RUNX11 RUNX1T1 fusion CBF TRUE
10H119 SRX729627 RUNX11 RUNX1T1 fusion CBF TRUE
11H022 SRX729610 CBFB MYH11 fusion CBF TRUE
11H022 SRX729610 FLT3 ITD CBF FALSE
11H104 SRX729589 CBFB MYH11 fusion CBF TRUE
11H107 SRX729628 RUNX11 RUNX1T1 fusion CBF TRUE
11H179 SRX729611 CBFB MYH11 fusion CBF TRUE
12H042 SRX729590 CBFB MYH11 fusion CBF FALSE
12H044 SRX729591 CBFB MYH11 fusion CBF TRUE
12H045 SRX729629 RUNX11 RUNX1T1 fusion CBF TRUE
12H098 SRX729630 RUNX11 RUNX1T1 fusion CBF TRUE
12H165 SRX729592 CBFB MYH11 fusion CBF TRUE
12H166 SRX729631 RUNX11 RUNX1T1 fusion CBF TRUE
12H180 SRX729632 RUNX11 RUNX1T1 fusion CBF TRUE
12H183 SRX729633 RUNX11 RUNX1T1 fusion CBF TRUE
13H066 SRX729612 CBFB MYH11 fusion CBF TRUE
13H120 SRX959058 CBFB MYH11 fusion CBF TRUE
13H169 SRX959064 RUNX11 RUNX1T1 fusion CBF TRUE
04H111 SRX958844 KMT2A PTD KMT2A-PTD FALSE
05H050 SRX332633 KMT2A PTD KMT2A-PTD TRUE
05H111 SRX957223 KMT2A PTD KMT2A-PTD TRUE
06H048 SRX958873 KMT2A PTD KMT2A-PTD FALSE
06H146 SRX957230 KMT2A PTD KMT2A-PTD TRUE
07H152 SRX958906 KMT2A PTD KMT2A-PTD TRUE
07H155 SRX958907 KMT2A PTD KMT2A-PTD TRUE
08H012 SRX381852 KMT2A PTD KMT2A-PTD FALSE
08H063 SRX958922 KMT2A PTD KMT2A-PTD FALSE
08H112 SRX381854 KMT2A PTD KMT2A-PTD TRUE
08H138 SRX958932 KMT2A PTD KMT2A-PTD FALSE
09H058 SRX958945 KMT2A PTD KMT2A-PTD TRUE
09H106 SRX958959 KMT2A PTD KMT2A-PTD TRUE
09H113 SRX332645 KMT2A PTD KMT2A-PTD TRUE
09H115 SRX332646 KMT2A PTD KMT2A-PTD TRUE
10H001 SRX958961 KMT2A PTD KMT2A-PTD FALSE
10H007 SRX958963 KMT2A PTD KMT2A-PTD TRUE
10H070 SRX958974 KMT2A PTD KMT2A-PTD TRUE
11H008 SRX381865 KMT2A PTD KMT2A-PTD FALSE
11H021 SRX332659 KMT2A PTD KMT2A-PTD TRUE
11H183 SRX959001 KMT2A PTD KMT2A-PTD FALSE
13H048 SRX959044 KMT2A PTD KMT2A-PTD TRUE
13H141 SRX959060 KMT2A PTD KMT2A-PTD TRUE
13H150 SRX959061 KMT2A PTD KMT2A-PTD TRUE
03H041 SRX332627 NUP98 NSD1 fusion NUP98-NSD1 TRUE
03H041 SRX332627 FLT3 ITD NUP98-NSD1 TRUE
05H034 SRX958856 NUP98 NSD1 fusion NUP98-NSD1 TRUE
05H163 SRX332635 NUP98 NSD1 fusion NUP98-NSD1 TRUE
08H049 SRX958915 NUP98 NSD1 fusion NUP98-NSD1 TRUE
08H049 SRX958915 FLT3 ITD NUP98-NSD1 TRUE
10H038 SRX381861 NUP98 NSD1 fusion NUP98-NSD1 TRUE
11H027 SRX958987 NUP98 NSD1 fusion NUP98-NSD1 TRUE
11H027 SRX958987 FLT3 ITD NUP98-NSD1 TRUE
11H160 SRX332667 NUP98 NSD1 fusion NUP98-NSD1 TRUE
11H160 SRX332667 FLT3 ITD NUP98-NSD1 TRUE

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /config/RStudio/R/3.6.1/lib64/R/lib/libRblas.so
LAPACK: /config/RStudio/R/3.6.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gt_0.2.1          ggplot2_3.3.1     stringr_1.4.0     here_0.1         
[5] dplyr_1.0.0       data.table_1.12.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         RColorBrewer_1.1-2 pillar_1.4.4      
 [4] compiler_3.6.1     git2r_0.26.1       workflowr_1.4.0   
 [7] tools_3.6.1        digest_0.6.21      evaluate_0.14     
[10] lifecycle_0.2.0    tibble_3.0.1       gtable_0.3.0      
[13] checkmate_2.0.0    pkgconfig_2.0.3    rlang_0.4.6       
[16] commonmark_1.7     yaml_2.2.0         xfun_0.10         
[19] withr_2.1.2        knitr_1.25         generics_0.0.2    
[22] fs_1.4.1           vctrs_0.3.1        sass_0.2.0        
[25] rprojroot_1.3-2    grid_3.6.1         tidyselect_1.1.0  
[28] glue_1.4.1         R6_2.4.0           rmarkdown_1.16    
[31] farver_2.0.3       purrr_0.3.2        magrittr_1.5      
[34] whisker_0.4        backports_1.1.4    scales_1.1.1      
[37] ellipsis_0.3.0     htmltools_0.4.0    colorspace_1.4-1  
[40] labeling_0.3       stringi_1.4.3      munsell_0.5.0     
[43] crayon_1.3.4