twinstrand_spikeins_data

Last updated: 2020-04-20

Checks: 7 0

Knit directory: duplex_sequencing_screen/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200402)

The command set.seed(20200402) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 2bba93e

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  analysis/grant_fig.pdf
    Untracked:  analysis/grant_fig_v2.pdf
    Untracked:  data/Combined_data_frame_IC_Mutprob_abundance.csv
    Untracked:  data/IC50HeatMap.csv
    Untracked:  data/Twinstrand/
    Untracked:  data/gfpenrichmentdata.csv
    Untracked:  data/heatmap_concat_data.csv
    Untracked:  grant_fig.pdf
    Untracked:  grant_fig_v2.pdf
    Untracked:  output/archive/
    Untracked:  output/ic50data_all_conc.csv
    Untracked:  shinyapp/

Unstaged changes:
    Deleted:    data/README.md
    Modified:   output/twinstrand_maf_merge.csv
    Modified:   output/twinstrand_simple_melt_merge.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	2bba93e	haiderinam	2020-04-20	wflow_publish(“analysis/*.Rmd“)
html	c2930d5	haiderinam	2020-04-03	Build site.
html	6af2cdc	haiderinam	2020-04-03	Build site.
Rmd	5cda9d3	haiderinam	2020-04-03	wflow_publish(“analysis/*.Rmd“)
html	0b9b87b	haiderinam	2020-04-02	Build site.
Rmd	fc5b9c0	haiderinam	2020-04-02	wflow_publish(“analysis/*.Rmd“)
html	2bce927	haiderinam	2020-04-02	Build site.
html	4ed9b35	haiderinam	2020-04-02	Build site.
html	1e2f469	haiderinam	2020-04-02	Build site.
Rmd	50aa720	haiderinam	2020-04-02	wflow_publish(“analysis/*.Rmd“)
html	e11eec5	haiderinam	2020-04-02	Build site.
html	b1cbbfa	haiderinam	2020-04-02	Build site.
Rmd	4d5a7ce	haiderinam	2020-04-02	wflow_publish(“analysis/*.Rmd“)

Please change required directories this chunk if compiling in R rather than RmD

#Inputs:
conc_for_predictions=0.8
net_gr_wodrug=0.05
#Reading required tables
ic50data=read.csv("data/heatmap_concat_data.csv",header = T,stringsAsFactors = F)
# ic50data=read.csv("../data/heatmap_concat_data.csv",header = T,stringsAsFactors = F)

twinstrand_maf=read.table("data/Twinstrand/prj00053-2019-12-02.deliverables/all.mut",sep="\t",header = T,stringsAsFactors = F)
# twinstrand_maf=read.table("../data/Twinstrand/prj00053-2019-12-02.deliverables/all.mut",sep="\t",header = T,stringsAsFactors = F)

names=read.table("data/Twinstrand/prj00053-2019-12-02.deliverables/manifest.tsv",sep="\t",header = T,stringsAsFactors = F)
# names=read.table("../data/Twinstrand/prj00053-2019-12-02.deliverables/manifest.tsv",sep="\t",header = T,stringsAsFactors = F)

Data Parsing– Dose Response Data

Importing model with BCR-ABL mutant dose responses (Chuan’s data)

Also applying the 2-parameter logistic. Upper and lower limits are fixed. May be good or bad.

#Deciding not to use nls() because it's a pain in the ...
#https://www.youtube.com/watch?v=aXpJE7IGiPY this has a nice overview of curve fitting
# library(dplyr)
# rm(list=ls())
####Getting effect of drug on growth rate####

ic50data=ic50data[c(1:10),]
ic50data_long=melt(ic50data,id.vars = "conc",variable.name = "species",value.name = "y")
#Removing useless mutants (for example keeping only maxipreps and removing low growth rate mutants)
ic50data_long=ic50data_long%>%filter(species%in%c("Wt","V299L_H","E355A","D276G_maxi","H396R","F317L","F359I","E459K","G250E","F359C","F359V","M351T","L248V","E355G_maxi","Q252H_maxi","Y253F","F486S_maxi","H396P_maxi","E255K","Y253H","T315I","E255V"))



#Making standardized names
ic50data_long$mutant=ic50data_long$species
ic50data_long=ic50data_long%>%
  # filter(conc=="0.625")%>%
  # filter(conc=="1.25")%>%
  mutate(mutant=case_when(species=="F486S_maxi"~"F486S",
                          species=="H396P_maxi"~"H396P",
                          species=="Q252H_maxi"~"Q252H",
                          species=="E355G_maxi"~"E355G",
                          species=="D276G_maxi"~"D276G",
                          species=="V299L_H" ~ "V299L",
                          species==mutant ~as.character(mutant)))

# ic50data_long_625$species[order((ic50data_long_625$y),decreasing = T)]

#In the next step, I'm ordering mutants by decreasing resposne to the 625nM dose. Then I use this to change the levels of the species factor from more to less resistant. This helps with ggplot because now I can color the mutants with decreasing resistance
ic50data_long_625=ic50data_long%>%filter(conc==.625)
ic50data_long$species=factor(ic50data_long$species,levels = as.character(ic50data_long_625$species[order((ic50data_long_625$y),decreasing = T)]))

#Plotting the normalized dose response curves
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))

plotly=ggplot(ic50data_long,aes(x=log(conc),y=y,color=factor(species)))+
  facet_wrap(~factor(species))+
  geom_line()+
  geom_point()+
  cleanup+
  scale_color_manual(values = getPalette(length(unique(ic50data_long$species))))+
  theme(axis.text = element_blank(),
        axis.ticks = element_blank())
ggplotly(plotly)

###Dose response curve fitting with 4-parameter logistic ####First iteration: Have a y_model for only the drug concentrations Chuan used Essentially, all this is doing is adding a column for y-model to IC50data_long. Default was just y (proportion alive).

########Four parameter logistic########
#Reference: https://journals.plos.org/plosone/article/file?type=supplementary&id=info:doi/10.1371/journal.pone.0146021.s001
#In short: For each dose in each species, get the response
# rm(list=ls())
ic50data_long_model=data.frame()
for (species_curr in sort(unique(ic50data_long$species))){
  ic50data_species_specific=ic50data_long%>%filter(species==species_curr)
  x=ic50data_species_specific$conc
  y=ic50data_species_specific$y
  #Next: Appproximating Response from dose (inverse of the prediction)
  ic50.ll4=drm(y~conc,data=ic50data_long%>%filter(species==species_curr),fct=LL.3(fixed=c(NA,1,NA)))
    b=coef(ic50.ll4)[1]
    c=0
    d=1
    e=coef(ic50.ll4)[2]
  ###Getting predictions
  ic50data_species_specific=ic50data_species_specific%>%group_by(conc)%>%mutate(y_model=c+((d-c)/(1+exp(b*(log(conc)-log(e))))))
  ic50data_species_specific=data.frame(ic50data_species_specific) #idk why I have to end up doing this
  ic50data_long_model=rbind(ic50data_long_model,ic50data_species_specific)
}
ic50data_long=ic50data_long_model

#In the next step, I'm ordering mutants by decreasing resposne to the 625nM dose. Then I use this to change the levels of the species factor from more to less resistant. This helps with ggplot because now I can color the mutants with decreasing resistance
ic50data_long_625=ic50data_long%>%filter(conc==.625)
ic50data_long$species=factor(ic50data_long$species,levels = as.character(ic50data_long_625$species[order((ic50data_long_625$y_model),decreasing = T)]))

#Adding drug effect
##########Changed this on 2/20. Using y from 4 parameter logistic rather than raw values
ic50data_long=ic50data_long%>%
  filter(!species=="Wt")%>%
  mutate(drug_effect=-log(y_model)/72)

#Adding Net growth rate
ic50data_long$netgr_pred=.05-ic50data_long$drug_effect

Plotting modeled dose responses

getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
plotly=ggplot(ic50data_long,aes(x=log(conc),color=factor(species)))+
  facet_wrap(~factor(species))+
  geom_line(aes(y=y_model))+
  geom_point(aes(y=y))+
  cleanup+
  scale_color_manual(values = getPalette(length(unique(ic50data_long$species))))+
  theme(axis.text = element_blank(),
        axis.ticks = element_blank())
ggplotly(plotly)

Plotting species changes in dose responses over different concentrations

plotly=ggplot(ic50data_long,aes(x=species,y=y_model))+
  facet_wrap(~factor(conc))+
  geom_col(aes(fill=factor(species)))+
  cleanup+
  scale_fill_manual(values = getPalette(length(unique(ic50data_long$species))))+
  theme(axis.text = element_blank(),
        axis.ticks = element_blank())
ggplotly(plotly)

Dose response curve fitting with 4-parameter logistic

Second iteration: Get y_model for predefined concentration ranges of interest

conc.list=seq(.5,1.5,by=.1)
ic50.model.pred=data.frame(matrix(NA,nrow=length(conc.list)*length(unique(ic50data_long$species)),ncol=0))
for(species_curr in sort(unique(ic50data_long$mutant))){
  ic50data_species_specific=ic50data_long%>%filter(mutant==species_curr)
  #Next: Appproximating Response from dose (inverse of the prediction)
  ic50.ll4=drm(y~conc,data=ic50data_species_specific,fct=LL.3(fixed=c(NA,1,NA)))
  #Extracting coefficients
  b=coef(ic50.ll4)[1]
  c=0
  d=1
  e=coef(ic50.ll4)[2]
  rm(ic50.model.pred.species.specific)
  ic50.model.pred.species.specific=data.frame(matrix(NA,nrow=length(conc.list),ncol=0))
  i=1
  ic50.model.pred.species.specific$mutant=species_curr
  #For loop for the unique concentrations
  for(conc.curr in conc.list){
    ic50.model.pred.species.specific$conc[i]=conc.curr
    ic50.model.pred.species.specific$y_model[i]=c+((d-c)/(1+exp(b*(log(conc.curr)-log(e)))))
    i=i+1
  }
  ic50.model.pred=rbind(ic50.model.pred,ic50.model.pred.species.specific)
}

Warning in rm(ic50.model.pred.species.specific): object
'ic50.model.pred.species.specific' not found

#Adding drug effect
ic50.model.pred=ic50.model.pred%>%
  filter(!mutant=="Wt")%>%
  mutate(drug_effect=-log(y_model)/72)

    #Adding Net growth rate
    # ic50.model.pred$netgr_pred=.05-ic50.model.pred$drug_effect
ic50data_long=ic50.model.pred
ic50data_all_conc=ic50data_long

Changing the format of the IC50s dataframe so that it matches twinstrand data labeling etc

Also converting dose response to expected change in growth rate

This requires estimating a growth rate without drug. Note that I am using k=0.05 or 14 hours right now.

#Variables when making predictions:
#Your assumed fitness without drug
ic50data_long$netgr_pred=net_gr_wodrug-ic50data_long$drug_effect
#Your assumed concentration
ic50data_long=ic50data_long%>%filter(conc==conc_for_predictions) ###Can remove this filter if you wanna look at how well predictions would match up if there was a systematic difference in the concentrations Chuan used and you used in your IC50s
    ##########Changed this on 2/20. Using y from 4 parameter logistic rather than raw values
# ic50data_formerge=ic50data_long%>%filter(!species=="Wt")%>%mutate(drug_effect=-log(y)/72)
# ic50data_formerge=ic50data_long%>%filter(!species=="Wt")%>%mutate(drug_effect=-log(y_model)/72)

Data Parsing– Duplex Sequencing Data

Importing Twinstrand Mutation calls dataframe

The twinstrand dataframe has sampleIDs. I’m merging this dataframe with a ‘names’ df that has details on what those sample IDs mean

Here I also converted genomic coordinates and nucleotide changes to residue changes. I did all of our 20 spike-in mutants and others that I could find.

Other mutants included unique mutants found in the ENU data. i.e. A397P, F311L, F359C, H214R, H396P, K285N, L324R.

Ideally, in the future I will use Biomart or a similar package that can do this automatically. Ideally, I’ll convert the fasta/bamh files into maf files myself Got residues and positions from here: #https://www.rcsb.org/pdb/chromosome.do?v=hg38&chromosome=chr9&pos=130862947 One thing that was tripping me up is that I was searching the database based on start position and not end-position This NCBI tool is also a good resource: https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.39 However, this is probably the best tool to go straight from genomic coordinate to protein change: https://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/DisaStr/GetPage.pl?varmap=TRUE

twinstrand_maf_merge=merge(twinstrand_maf,names,by.x = "Sample",by.y = "TwinstrandId")

#Of the 20 mutants, I don't see F486, F359C
twinstrand_maf_merge$mutant=0
twinstrand_maf_merge=twinstrand_maf_merge%>%
  mutate(mutant=case_when(End==130872896 & ALT=="T" ~ "T315I",
                          End==130862970 & ALT=="C" ~ "Y253H",
                          End==130862977 & ALT=="T" ~ "E255V",
                          End==130873004 & ALT=="C" ~ "M351T",
                          End==130862962 & ALT=="A" ~ "G250E",
                          End==130874969 & ALT=="C" ~ "H396P",
                          End==130862955 & ALT=="G" ~ "L248V",
                          End==130874969 & ALT=="G" ~ "H396R",
                          End==130862971 & ALT=="T" ~ "Y253F",
                          End==130862969 & ALT=="T" ~ "Q252H",
                          End==130862976 & ALT=="A" ~ "E255K",
                          End==130872901 & ALT=="C" ~ "F317L",
                          End==130873027 & ALT=="C" ~ "F359L",
                          End==130873027 & ALT=="G" ~ "F359V",
                          End==130873027 & ALT=="A" ~ "F359I",
                          End==130873016 & ALT=="G" ~ "E355G",
                          End==130873016 & ALT=="C" ~ "E355A",
                          End==130878519 & ALT=="A" ~ "E459K",
                          End==130872911 & ALT=="G" ~ "Y320C",
                          End==130872133 & ALT=="G" ~ "D276G",
                          End==130862969 & ALT=="C" ~ "Q252Hsyn", ###The mutants below were found only in the ENU mutagenized pools
                          End==130872885 & ALT=="G" ~ "F311L",
                          End==130873028 & ALT=="G" ~ "F359C",
                          End==130874971 & ALT=="C" ~ "A397P",
                          End==130862854 & ALT=="G" ~ "H214R",
                          End==130872146 & ALT=="C" ~ "V280syn",
                          End==130872161 & ALT=="T" ~ "K285N",
                          End==130872923 & ALT=="G" ~ "L324R",
                          End==130872983 & ALT=="T" ~ "A344D")) #Not observed on D6. Dropped out! Note that D276G looked like it was contaminant DNA because it was barely at detection threshold at D0

#Ordering mutants by level of drug resistance. Note that since we don't know the level of DR for the unique ENU mutants, I have left them out here.
twinstrand_maf_merge$mutant=factor(twinstrand_maf_merge$mutant,levels = c("T315I","Y253H","E255V","M351T","G250E","H396P","L248V","H396R","Y253F","Q252H","E255K","F317L","F359L","F359V","F359I","E355G","E355A","E459K","Y320C","D276G","F311L","F359C","A397P","H214R","K285N","L324R","A344D"))

# twinstrand_maf_merge=twinstrand_maf_merge%>%
#   mutate(mutant=case_when(End==130872896 & ALT=="T" ~ "T315I",
#                           End==130862970 & ALT=="C" ~ "Y253H",
#                           End==130862977 & ALT=="T" ~ "E255V",
#                           End==130873004 & ALT=="C" ~ "M351T",
#                           End==130862962 & ALT=="A" ~ "G250E",
#                           End==130874969 & ALT=="C" ~ "H396P",
#                           End==130862955 & ALT=="G" ~ "L248V",
#                           End==130874969 & ALT=="G" ~ "H396R",
#                           End==130862971 & ALT=="T" ~ "Y253F",
#                           End==130862969 & ALT=="T" ~ "Q252H",
#                           End==130862976 & ALT=="A" ~ "E255K",
#                           End==130872901 & ALT=="C" ~ "F317L",
#                           End==130873027 & ALT=="C" ~ "F359L",
#                           End==130873027 & ALT=="G" ~ "F359V",
#                           End==130873027 & ALT=="A" ~ "F359I",
#                           End==130873016 & ALT=="G" ~ "E355G",
#                           End==130873016 & ALT=="C" ~ "E355A",
#                           End==130878519 & ALT=="A" ~ "E459K",
#                           End==130872911 & ALT=="G" ~ "Y320C",
#                           End==130872133 & ALT=="G" ~ "D276G")) 


#Adding columns for experiment names, experiment frequencies, and time
##############Experiment Name#################
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("M3D0","M3D3","M3D6")]="M3"
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("M4D0","M4D3","M4D6")]="M4"
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("M5D0","M5D3","M5D6")]="M5"
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("M6D0","M6D3","M6D6")]="M6"
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("M7D0","M7D3","M7D6")]="M7"
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("Enu3_D3","Enu3_D6")]="Enu_3"
twinstrand_maf_merge$experiment[twinstrand_maf_merge$CustomerName%in%c("Enu4_D0","Enu4_D3","Enu4_D6")]="Enu_4"
##############Spike in frequency#################
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("M3D0","M3D3","M3D6")]=1000
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("M4D0","M4D3","M4D6")]=5000
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("M5D0","M5D3","M5D6")]=1000
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("M6D0","M6D3","M6D6")]=5000
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("M7D0","M7D3","M7D6")]=1000
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("Enu3_D3","Enu3_D6")]=1000
twinstrand_maf_merge$Spike_in_freq[twinstrand_maf_merge$CustomerName%in%c("Enu4_D0","Enu4_D3","Enu4_D6")]=1000
##############Time point#################
twinstrand_maf_merge$time_point[twinstrand_maf_merge$CustomerName%in%c("M3D0","M6D0","Enu4_D0")]="D0"
twinstrand_maf_merge$time_point[twinstrand_maf_merge$CustomerName%in%c("M3D3","M4D3","M5D3","M6D3","M7D3","Enu3_D3","Enu4_D3")]="D3"
twinstrand_maf_merge$time_point[twinstrand_maf_merge$CustomerName%in%c("M3D6","M4D6","M5D6","M6D6","M7D6","Enu3_D6","Enu4_D6")]="D6"

Converting MAFs of all mutants to counts by using the flow cytometry count data for each experiment.

#To start off converting MAFs into 'Total number of mutant cell' numbers, we will use only mixing experiment 3 as an example.
##########M3##########
twinstrand_maf_merge$totalcells=0
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M3"&twinstrand_maf_merge$time_point=="D0"]=493000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M3"&twinstrand_maf_merge$time_point=="D3"]=1295000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M3"&twinstrand_maf_merge$time_point=="D6"]=13600000
##########M5##########
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M5"&twinstrand_maf_merge$time_point=="D0"]=588000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M5"&twinstrand_maf_merge$time_point=="D3"]=1299000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M5"&twinstrand_maf_merge$time_point=="D6"]=11294000
##########M7##########
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M7"&twinstrand_maf_merge$time_point=="D0"]=611000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M7"&twinstrand_maf_merge$time_point=="D3"]=857000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M7"&twinstrand_maf_merge$time_point=="D6"]=14568000
##########M4##########
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M4"&twinstrand_maf_merge$time_point=="D0"]=405000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M4"&twinstrand_maf_merge$time_point=="D3"]=980000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M4"&twinstrand_maf_merge$time_point=="D6"]=1959000
##########M6##########
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M6"&twinstrand_maf_merge$time_point=="D0"]=510000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M6"&twinstrand_maf_merge$time_point=="D3"]=798000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="M6"&twinstrand_maf_merge$time_point=="D6"]=5457000
##########ENU3##########
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="Enu_3"&twinstrand_maf_merge$time_point=="D0"]=166000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="Enu_3"&twinstrand_maf_merge$time_point=="D3"]=1282000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="Enu_3"&twinstrand_maf_merge$time_point=="D6"]=97200000
##########ENU4##########
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="Enu_4"&twinstrand_maf_merge$time_point=="D0"]=316000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="Enu_4"&twinstrand_maf_merge$time_point=="D3"]=1264000
twinstrand_maf_merge$totalcells[twinstrand_maf_merge$experiment=="Enu_4"&twinstrand_maf_merge$time_point=="D6"]=40000000

########Converting MAF to Total Count##########
twinstrand_maf_merge=twinstrand_maf_merge%>%mutate(totalmutant=AltDepth/Depth*totalcells)

Deriving growthrates from twinstrand_maf_merge

detach("package:dplyr", character.only = TRUE)
library("dplyr", character.only = TRUE)


Attaching package: 'dplyr'

The following object is masked from 'package:MASS':

    select

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

twinstrand_simple=twinstrand_maf_merge%>%filter(tki_resistant_mutation=="True",!is.na(mutant),!is.na(experiment))
twinstrand_simple=twinstrand_simple%>%dplyr::select("mutant","experiment","Spike_in_freq","time_point","totalmutant")
twinstrand_simple_cast=dcast(twinstrand_simple,mutant+experiment+Spike_in_freq~time_point,value.var="totalmutant")

twinstrand_simple_cast$d0d3=log(twinstrand_simple_cast$D3/twinstrand_simple_cast$D0)/72
twinstrand_simple_cast$d3d6=log(twinstrand_simple_cast$D6/twinstrand_simple_cast$D3)/72
twinstrand_simple_cast$d0d6=log(twinstrand_simple_cast$D6/twinstrand_simple_cast$D0)/144
#Check if ln(final/initial)/time is the correct formula. Also notice how I'm using days not hours
twinstrand_simple_melt=melt(twinstrand_simple_cast[,-c(4:6)],id.vars=c("mutant","experiment","Spike_in_freq"),variable.name = "duration",value.name = "netgr_obs") #!!!!!!!!!!!value name should be drug effect. And drug effect should be drug_effect_obs i think. NO. I think this should be drug_effect_obs. Fixed 4/2/20
twinstrand_simple_melt$drug_effect_obs=net_gr_wodrug-twinstrand_simple_melt$netgr_obs

# twinstrand_simple_melt_merge=merge(twinstrand_simple_melt,ic50data_formerge,"mutant")
# twinstrand_simple_melt_merge=merge(twinstrand_simple_melt,ic50data_long,"mutant")
twinstrand_simple_melt_merge=merge(twinstrand_simple_melt,ic50data_long%>%filter(conc==conc_for_predictions),all.x = T)

Saving Dataframes

head(twinstrand_maf_merge)

    Sample Chromosome     Start       End VariationType   REF ALT AltDepth
1 dna00762       chr9 130862900 130862905         indel CCCAA   C        2
2 dna00762       chr9 130872157 130872159         indel    GA   G        1
3 dna00762       chr9 130872199 130872200       snv/snp     G   A    20665
4 dna00762       chr9 130872205 130872206       snv/snp     G   A        1
5 dna00762       chr9 130872205 130872206       snv/snp     G   T        1
6 dna00762       chr9 130872938 130872939       snv/snp     G   C        2
  Depth tki_resistant_mutation tki_resistant_mutation_evidence CustomerName
1 27896                  False                                   BCR-Abl Wt
2 23301                  False                                   BCR-Abl Wt
3 20665                  False                                   BCR-Abl Wt
4 20982                  False                                   BCR-Abl Wt
5 20982                  False                                   BCR-Abl Wt
6 34493                  False                                   BCR-Abl Wt
                            Annotation mutant experiment Spike_in_freq
1 Wild type BCR-Abl Ba/F3- no spike in   <NA>       <NA>            NA
2 Wild type BCR-Abl Ba/F3- no spike in   <NA>       <NA>            NA
3 Wild type BCR-Abl Ba/F3- no spike in   <NA>       <NA>            NA
4 Wild type BCR-Abl Ba/F3- no spike in   <NA>       <NA>            NA
5 Wild type BCR-Abl Ba/F3- no spike in   <NA>       <NA>            NA
6 Wild type BCR-Abl Ba/F3- no spike in   <NA>       <NA>            NA
  time_point totalcells totalmutant
1       <NA>          0           0
2       <NA>          0           0
3       <NA>          0           0
4       <NA>          0           0
5       <NA>          0           0
6       <NA>          0           0

head(twinstrand_simple_melt_merge)

  mutant experiment Spike_in_freq duration  netgr_obs drug_effect_obs conc
1  T315I         M4          5000     d0d3         NA              NA  0.8
2  T315I         M5          1000     d0d3         NA              NA  0.8
3  T315I         M3          1000     d0d3 0.06165569    -0.011655692  0.8
4  T315I      Enu_4          1000     d3d6 0.05375515    -0.003755150  0.8
5  T315I         M3          1000     d3d6 0.05565321    -0.005653211  0.8
6  T315I         M4          5000     d3d6 0.05776078    -0.007760782  0.8
    y_model drug_effect netgr_pred
1 0.8162648 0.002819673 0.04718033
2 0.8162648 0.002819673 0.04718033
3 0.8162648 0.002819673 0.04718033
4 0.8162648 0.002819673 0.04718033
5 0.8162648 0.002819673 0.04718033
6 0.8162648 0.002819673 0.04718033

head(ic50data_all_conc)

  mutant conc    y_model drug_effect
1  D276G  0.5 0.22194952  0.02090702
2  D276G  0.6 0.17731828  0.02402512
3  D276G  0.7 0.14534373  0.02678686
4  D276G  0.8 0.12165238  0.02925816
5  D276G  0.9 0.10359142  0.03149029
6  D276G  1.0 0.08948725  0.03352304

write.csv(twinstrand_maf_merge,"twinstrand_maf_merge.csv")
write.csv(twinstrand_simple_melt_merge,"twinstrand_simple_melt_merge.csv")
# write.csv(ic50data_all_conc,"ic50data_all_conc.csv")

sessionInfo()

R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] dplyr_0.8.4         drc_3.0-1           MASS_7.3-51.5      
 [4] BiocManager_1.30.10 plotly_4.9.1        ggsignif_0.6.0     
 [7] devtools_2.2.1      usethis_1.5.1       RColorBrewer_1.1-2 
[10] reshape2_1.4.3      ggplot2_3.2.1       doParallel_1.0.15  
[13] iterators_1.0.12    foreach_1.4.7       VennDiagram_1.6.20 
[16] futile.logger_1.4.3 tictoc_1.0          knitr_1.27         
[19] workflowr_1.6.0    

loaded via a namespace (and not attached):
 [1] fs_1.3.1             httr_1.4.1           rprojroot_1.3-2     
 [4] tools_3.5.2          backports_1.1.5      R6_2.4.1            
 [7] lazyeval_0.2.2       colorspace_1.4-1     withr_2.1.2         
[10] tidyselect_1.0.0     prettyunits_1.1.1    processx_3.4.1      
[13] curl_4.3             compiler_3.5.2       git2r_0.26.1        
[16] cli_2.0.1            formatR_1.7          sandwich_2.5-1      
[19] desc_1.2.0           labeling_0.3         scales_1.1.0        
[22] mvtnorm_1.0-12       callr_3.4.1          stringr_1.4.0       
[25] digest_0.6.23        foreign_0.8-75       rmarkdown_2.1       
[28] rio_0.5.16           pkgconfig_2.0.3      htmltools_0.4.0     
[31] sessioninfo_1.1.1    plotrix_3.7-7        fastmap_1.0.1       
[34] htmlwidgets_1.5.1    rlang_0.4.4          readxl_1.3.1        
[37] shiny_1.4.0          zoo_1.8-7            jsonlite_1.6        
[40] crosstalk_1.0.0      gtools_3.8.1         zip_2.0.4           
[43] car_3.0-6            magrittr_1.5         Matrix_1.2-18       
[46] Rcpp_1.0.3           munsell_0.5.0        fansi_0.4.1         
[49] abind_1.4-5          lifecycle_0.1.0      multcomp_1.4-12     
[52] stringi_1.4.5        whisker_0.4          yaml_2.2.1          
[55] carData_3.0-3        pkgbuild_1.0.6       plyr_1.8.5          
[58] promises_1.1.0       forcats_0.4.0        crayon_1.3.4        
[61] lattice_0.20-38      splines_3.5.2        haven_2.2.0         
[64] hms_0.5.3            ps_1.3.0             pillar_1.4.3        
[67] codetools_0.2-16     pkgload_1.0.2        futile.options_1.0.1
[70] glue_1.3.1           evaluate_0.14        lambda.r_1.2.4      
[73] data.table_1.12.8    remotes_2.1.0        vctrs_0.2.2         
[76] httpuv_1.5.2         testthat_2.3.1       cellranger_1.1.0    
[79] gtable_0.3.0         purrr_0.3.3          tidyr_1.0.2         
[82] assertthat_0.2.1     xfun_0.12            openxlsx_4.1.4      
[85] mime_0.8             xtable_1.8-4         later_1.0.0         
[88] survival_3.1-8       viridisLite_0.3.0    tibble_2.1.3        
[91] memoise_1.1.0        TH.data_1.0-10       ellipsis_0.3.0

twinstrand_spikeins_data_generation

Haider Inam

4/2/2020