Last updated: 2020-05-18

Checks: 7 0

Knit directory: MINTIE-paper-analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200415) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/cache/
    Ignored:    data/RCH_B-ALL/
    Ignored:    data/leucegene/salmon_out/
    Ignored:    data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls
    Ignored:    output/Leucegene_gene_counts.tsv
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/
    Ignored:    packrat/src/

Untracked files:
    Untracked:  .here

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html 3aee926 Marek Cmero 2020-05-18 Build site.
html bf478ec Marek Cmero 2020-05-12 Build site.
html a166ab8 Marek Cmero 2020-05-08 Build site.
html a600688 Marek Cmero 2020-05-07 Build site.
Rmd 0fde0b8 Marek Cmero 2020-05-07 Added RCH B-ALL analysis
html 1c40e33 Marek Cmero 2020-05-07 Build site.
Rmd bbc278a Marek Cmero 2020-05-07 Refactoring
html 87b4e62 Marek Cmero 2020-05-07 Build site.
Rmd af503f2 Marek Cmero 2020-05-07 Refactoring
html 5c045b5 Marek Cmero 2020-05-07 Build site.
Rmd d8d5b96 Marek Cmero 2020-05-07 Added Leucegene variant validation
html 90c7fd9 Marek Cmero 2020-05-06 Build site.
Rmd 44d8c37 Marek Cmero 2020-05-06 Build leucegene validation notebook.
Rmd ff4b1dc Marek Cmero 2020-05-06 Leucegene results

# util
library(data.table)
library(dplyr)
library(here)
library(stringr)

# plotting
library(ggplot2)
options(stringsAsFactors = FALSE)
source(here("code/leucegene_helper.R"))

Leucegene Validation

Here we analyse the results of MINTIE run on a number of Leucegene samples, including the effect of controls on a cohort with KMT2A-PTD variants. We also check whether MINTIE has called known variants within the cohort.

# load SRX to patient ID lookup table
kmt2a_patient_lookup <- read.delim(here("data/leucegene/sample_info/KMT2A-PTD_samples.txt"),
                                   header = FALSE,
                                   col.names = c("sample", "patient"))

kmt2a_results_dir <- here("data/leucegene/KMT2A-PTD_results")

# load KMT2A cohort comparisons against all other controls
kmt2a_results <- load_controls_comparison(kmt2a_results_dir)
kmt2a_results <- inner_join(kmt2a_results, kmt2a_patient_lookup, by = "sample")

# load other validation reults and truth table
truth <- read.delim(here("data/leucegene/sample_info/variant_validation_table.tsv"), sep = "\t")
leucegene_results_dir <- here("data/leucegene/validation_results")
validation <- list.files(leucegene_results_dir, full.names = TRUE) %>%
                lapply(., read.delim) %>%
                rbindlist() %>%
                filter(logFC > 5)

KMT2A-PTD controls comparison

MINTIE paper Supplementary Figure 2. Shows the number of variant genes found in the Leucegene cohort containing KMT2A PTDs.

results_summary <- get_results_summary(mutate(kmt2a_results,
                                              sample = patient,
                                              group_var = controls),
                                       group_var_name = "controls")

print("Total variant genes called using different controls:")
[1] "Total variant genes called using different controls:"
results_summary %>%
    group_by(controls) %>%
    summarise(min = min(V1), median = median(V1), max = max(V1)) %>%
    data.frame() %>%
    print()
                 controls min median  max
1            AML_controls 129  213.5 2093
2         normal_controls 266  508.0 2365
3 normal_controls_reduced 501  794.0 2562
ggplot(results_summary, aes(sample, V1, fill = controls)) +
    geom_bar(position = position_dodge2(width = 0.9, preserve = "single"), stat = "identity") +
    theme_bw() +
    xlab("") +
    ylab("Genes with variants") +
    coord_flip() +
    theme(legend.position = "bottom") +
    scale_fill_brewer(palette="Dark2",
                      labels =  c("AML_controls" = "13 AMLs",
                                  "normal_controls" = "13 normals",
                                  "normal_controls_reduced" = "3 normals"))

Version Author Date
bf478ec Marek Cmero 2020-05-12
87b4e62 Marek Cmero 2020-05-07
90c7fd9 Marek Cmero 2020-05-06

KMT2A variants found in cohort

MINTIE paper Supplementary Table 1. Shows whether MINITE found a KMT2A SV in each sample for the given control group. Coverage obtained from Audemard et al. spreadsheet containing the Leucegene results must be manually added to data/leucegene/sample_info to run the code.

# load results from km paper for coverage of KMT2A PTDs
kmt2a_lgene_km_results <- read.csv(here("data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls"), sep="\t") %>%
                            mutate(patient = Sample) %>%
                            group_by(patient) %>%
                            summarise(coverage = max(Min.coverage))

# check whether MINTIE found a KMT2A SV in each control set
found_using_cancon <- get_samples_with_kmt2a_sv(kmt2a_results, "AML_controls")
found_using_norcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls")
found_using_redcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls_reduced")

# make the table
kmt2a_control_comp <- inner_join(kmt2a_patient_lookup, kmt2a_lgene_km_results, by = "patient") %>%
                        arrange(desc(coverage))
kmt2a_control_comp$`13_AMLs` <- kmt2a_control_comp$sample %in% found_using_cancon
kmt2a_control_comp$`13_normals` <- kmt2a_control_comp$sample %in% found_using_norcon
kmt2a_control_comp$`3_normals` <- kmt2a_control_comp$sample %in% found_using_redcon

print(kmt2a_control_comp)
      sample patient coverage 13_AMLs 13_normals 3_normals
1  SRX958906  07H152      158    TRUE       TRUE      TRUE
2  SRX332646  09H115      125    TRUE       TRUE      TRUE
3  SRX957230  06H146       87    TRUE       TRUE      TRUE
4  SRX957223  05H111       79    TRUE       TRUE      TRUE
5  SRX332659  11H021       63    TRUE       TRUE      TRUE
6  SRX332633  05H050       58    TRUE       TRUE      TRUE
7  SRX959061  13H150       58    TRUE       TRUE      TRUE
8  SRX959044  13H048       57    TRUE       TRUE      TRUE
9  SRX958974  10H070       53    TRUE       TRUE      TRUE
10 SRX958963  10H007       50    TRUE       TRUE      TRUE
11 SRX958959  09H106       49    TRUE       TRUE      TRUE
12 SRX959060  13H141       45    TRUE       TRUE      TRUE
13 SRX958945  09H058       29    TRUE       TRUE      TRUE
14 SRX958907  07H155       23    TRUE       TRUE      TRUE
15 SRX381854  08H112       22    TRUE       TRUE      TRUE
16 SRX332645  09H113       17    TRUE       TRUE      TRUE
17 SRX959001  11H183       16   FALSE      FALSE     FALSE
18 SRX381852  08H012       15   FALSE      FALSE     FALSE
19 SRX958932  08H138       15   FALSE      FALSE     FALSE
20 SRX381865  11H008       13   FALSE      FALSE     FALSE
21 SRX958873  06H048       10   FALSE      FALSE     FALSE
22 SRX958922  08H063        6   FALSE      FALSE     FALSE
23 SRX958961  10H001        6   FALSE      FALSE     FALSE
24 SRX958844  04H111        3   FALSE      FALSE     FALSE

Leucegene variants found by MINTIE

# add KMT2A results against AML controls as validation
validation <- filter(kmt2a_results, controls == "AML_controls") %>%
                select(-c(controls, patient)) %>%
                rbind(., validation)

rowwise(truth) %>% 
    mutate(found = is_variant_in_sample(Experiment, gene1, gene2, variant, validation)) %>%
    data.frame() %>%
    print()
   patient_ID Experiment  gene1   gene2 variant     cohort found
1      03H065  SRX729615 RUNX11 RUNX1T1  fusion        CBF  TRUE
2      03H083  SRX729616 RUNX11 RUNX1T1  fusion        CBF  TRUE
3      03H095  SRX729602   CBFB   MYH11  fusion        CBF  TRUE
4      03H109  SRX729580   CBFB   MYH11  fusion        CBF  TRUE
5      03H112  SRX729581   CBFB   MYH11  fusion        CBF  TRUE
6      03H112  SRX729581   FLT3             ITD        CBF  TRUE
7      04H030  SRX729603   CBFB   MYH11  fusion        CBF  TRUE
8      04H061  SRX729582   CBFB   MYH11  fusion        CBF  TRUE
9      04H091  SRX729583   CBFB   MYH11  fusion        CBF  TRUE
10     05H042  SRX729617 RUNX11 RUNX1T1  fusion        CBF  TRUE
11     05H099  SRX958862   CBFB   MYH11  fusion        CBF  TRUE
12     05H113  SRX729604   CBFB   MYH11  fusion        CBF  TRUE
13     05H118  SRX729618 RUNX11 RUNX1T1  fusion        CBF  TRUE
14     05H136  SRX729605   CBFB   MYH11  fusion        CBF  TRUE
15     05H184  SRX729619 RUNX11 RUNX1T1  fusion        CBF  TRUE
16     06H020  SRX729606   CBFB   MYH11  fusion        CBF  TRUE
17     06H035  SRX729620 RUNX11 RUNX1T1  fusion        CBF  TRUE
18     06H115  SRX729607   CBFB   MYH11  fusion        CBF  TRUE
19     07H099  SRX381851   CBFB   MYH11  fusion        CBF  TRUE
20     07H137  SRX729621 RUNX11 RUNX1T1  fusion        CBF  TRUE
21     07H144  SRX729585   CBFB   MYH11  fusion        CBF  TRUE
22     08H034  SRX729622 RUNX11 RUNX1T1  fusion        CBF  TRUE
23     08H042  SRX729623 RUNX11 RUNX1T1  fusion        CBF  TRUE
24     08H072  SRX729624 RUNX11 RUNX1T1  fusion        CBF  TRUE
25     08H072  SRX729624   FLT3             ITD        CBF  TRUE
26     08H081  SRX729586   CBFB   MYH11  fusion        CBF  TRUE
27     08H099  SRX729608   CBFB   MYH11  fusion        CBF  TRUE
28     09H016  SRX729587   CBFB   MYH11  fusion        CBF FALSE
29     09H040  SRX729625 RUNX11 RUNX1T1  fusion        CBF  TRUE
30     09H066  SRX729588   CBFB   MYH11  fusion        CBF  TRUE
31     10H008  SRX729609   CBFB   MYH11  fusion        CBF  TRUE
32     10H030  SRX729626 RUNX11 RUNX1T1  fusion        CBF  TRUE
33     10H119  SRX729627 RUNX11 RUNX1T1  fusion        CBF  TRUE
34     11H022  SRX729610   CBFB   MYH11  fusion        CBF  TRUE
35     11H022  SRX729610   FLT3             ITD        CBF FALSE
36     11H104  SRX729589   CBFB   MYH11  fusion        CBF  TRUE
37     11H107  SRX729628 RUNX11 RUNX1T1  fusion        CBF  TRUE
38     11H179  SRX729611   CBFB   MYH11  fusion        CBF  TRUE
39     12H042  SRX729590   CBFB   MYH11  fusion        CBF FALSE
40     12H044  SRX729591   CBFB   MYH11  fusion        CBF  TRUE
41     12H045  SRX729629 RUNX11 RUNX1T1  fusion        CBF  TRUE
42     12H098  SRX729630 RUNX11 RUNX1T1  fusion        CBF  TRUE
43     12H165  SRX729592   CBFB   MYH11  fusion        CBF  TRUE
44     12H166  SRX729631 RUNX11 RUNX1T1  fusion        CBF  TRUE
45     12H180  SRX729632 RUNX11 RUNX1T1  fusion        CBF  TRUE
46     12H183  SRX729633 RUNX11 RUNX1T1  fusion        CBF  TRUE
47     13H066  SRX729612   CBFB   MYH11  fusion        CBF  TRUE
48     13H120  SRX959058   CBFB   MYH11  fusion        CBF  TRUE
49     13H169  SRX959064 RUNX11 RUNX1T1  fusion        CBF FALSE
50     04H111  SRX958844  KMT2A             PTD  KMT2A-PTD FALSE
51     05H050  SRX332633  KMT2A             PTD  KMT2A-PTD  TRUE
52     05H111  SRX957223  KMT2A             PTD  KMT2A-PTD  TRUE
53     06H048  SRX958873  KMT2A             PTD  KMT2A-PTD FALSE
54     06H146  SRX957230  KMT2A             PTD  KMT2A-PTD  TRUE
55     07H152  SRX958906  KMT2A             PTD  KMT2A-PTD FALSE
56     07H155  SRX958907  KMT2A             PTD  KMT2A-PTD FALSE
57     08H012  SRX381852  KMT2A             PTD  KMT2A-PTD FALSE
58     08H063  SRX958922  KMT2A             PTD  KMT2A-PTD FALSE
59     08H112  SRX381854  KMT2A             PTD  KMT2A-PTD  TRUE
60     08H138  SRX958932  KMT2A             PTD  KMT2A-PTD FALSE
61     09H058  SRX958945  KMT2A             PTD  KMT2A-PTD  TRUE
62     09H106  SRX958959  KMT2A             PTD  KMT2A-PTD  TRUE
63     09H113  SRX332645  KMT2A             PTD  KMT2A-PTD  TRUE
64     09H115  SRX332646  KMT2A             PTD  KMT2A-PTD FALSE
65     10H001  SRX958961  KMT2A             PTD  KMT2A-PTD FALSE
66     10H007  SRX958963  KMT2A             PTD  KMT2A-PTD  TRUE
67     10H070  SRX958974  KMT2A             PTD  KMT2A-PTD  TRUE
68     11H008  SRX381865  KMT2A             PTD  KMT2A-PTD FALSE
69     11H021  SRX332659  KMT2A             PTD  KMT2A-PTD FALSE
70     11H183  SRX959001  KMT2A             PTD  KMT2A-PTD FALSE
71     13H048  SRX959044  KMT2A             PTD  KMT2A-PTD  TRUE
72     13H141  SRX959060  KMT2A             PTD  KMT2A-PTD  TRUE
73     13H150  SRX959061  KMT2A             PTD  KMT2A-PTD FALSE
74     03H041  SRX332627  NUP98    NSD1  fusion NUP98-NSD1  TRUE
75     03H041  SRX332627   FLT3             ITD NUP98-NSD1  TRUE
76     05H034  SRX958856  NUP98    NSD1  fusion NUP98-NSD1  TRUE
77     05H163  SRX332635  NUP98    NSD1  fusion NUP98-NSD1  TRUE
78     08H049  SRX958915  NUP98    NSD1  fusion NUP98-NSD1  TRUE
79     08H049  SRX958915   FLT3             ITD NUP98-NSD1  TRUE
80     10H038  SRX381861  NUP98    NSD1  fusion NUP98-NSD1  TRUE
81     11H027  SRX958987  NUP98    NSD1  fusion NUP98-NSD1  TRUE
82     11H027  SRX958987   FLT3             ITD NUP98-NSD1  TRUE
83     11H160  SRX332667  NUP98    NSD1  fusion NUP98-NSD1  TRUE
84     11H160  SRX332667   FLT3             ITD NUP98-NSD1  TRUE

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /config/RStudio/R/3.6.1/lib64/R/lib/libRblas.so
LAPACK: /config/RStudio/R/3.6.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.2.1     stringr_1.4.0     here_0.1          dplyr_0.8.3      
[5] data.table_1.12.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         knitr_1.25         whisker_0.4       
 [4] magrittr_1.5       workflowr_1.4.0    munsell_0.5.0     
 [7] tidyselect_0.2.5   colorspace_1.4-1   R6_2.4.0          
[10] rlang_0.4.0        tools_3.6.1        grid_3.6.1        
[13] gtable_0.3.0       xfun_0.10          withr_2.1.2       
[16] git2r_0.26.1       htmltools_0.3.6    lazyeval_0.2.2    
[19] yaml_2.2.0         rprojroot_1.3-2    digest_0.6.21     
[22] assertthat_0.2.1   tibble_2.1.3       crayon_1.3.4      
[25] RColorBrewer_1.1-2 purrr_0.3.2        fs_1.3.1          
[28] glue_1.3.1         evaluate_0.14      rmarkdown_1.16    
[31] labeling_0.3       stringi_1.4.3      compiler_3.6.1    
[34] pillar_1.4.2       scales_1.0.0       backports_1.1.4   
[37] pkgconfig_2.0.3