Last updated: 2018-09-06
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(12345) 
The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: 98159a7 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    output/.DS_Store
Untracked files:
    Untracked:  analysis/ncbiRefSeq_sm.sort.mRNA.bed
    Untracked:  analysis/snake.config.notes.Rmd
    Untracked:  data/18486.genecov.txt
    Untracked:  data/APApeaksYL.total.inbrain.bed
    Untracked:  data/RNAkalisto/
    Untracked:  data/Totalpeaks_filtered_clean.bed
    Untracked:  data/YL-SP-18486-T-combined-genecov.txt
    Untracked:  data/YL-SP-18486-T_S9_R1_001-genecov.txt
    Untracked:  data/bedgraph_peaks/
    Untracked:  data/bin200.5.T.nuccov.bed
    Untracked:  data/bin200.Anuccov.bed
    Untracked:  data/bin200.nuccov.bed
    Untracked:  data/clean_peaks/
    Untracked:  data/comb_map_stats.csv
    Untracked:  data/comb_map_stats.xlsx
    Untracked:  data/combined_reads_mapped_three_prime_seq.csv
    Untracked:  data/gencov.test.csv
    Untracked:  data/gencov.test.txt
    Untracked:  data/gencov_zero.test.csv
    Untracked:  data/gencov_zero.test.txt
    Untracked:  data/gene_cov/
    Untracked:  data/joined
    Untracked:  data/leafcutter/
    Untracked:  data/merged_combined_YL-SP-threeprimeseq.bg
    Untracked:  data/nom_QTL/
    Untracked:  data/nom_QTL_opp/
    Untracked:  data/nuc6up/
    Untracked:  data/other_qtls/
    Untracked:  data/peakPerRefSeqGene/
    Untracked:  data/perm_QTL/
    Untracked:  data/perm_QTL_opp/
    Untracked:  data/reads_mapped_three_prime_seq.csv
    Untracked:  data/smash.cov.results.bed
    Untracked:  data/smash.cov.results.csv
    Untracked:  data/smash.cov.results.txt
    Untracked:  data/smash_testregion/
    Untracked:  data/ssFC200.cov.bed
    Untracked:  data/temp.file1
    Untracked:  data/temp.file2
    Untracked:  data/temp.gencov.test.txt
    Untracked:  data/temp.gencov_zero.test.txt
    Untracked:  output/picard/
    Untracked:  output/plots/
    Untracked:  output/qual.fig2.pdf
Unstaged changes:
    Modified:   analysis/28ind.peak.explore.Rmd
    Modified:   analysis/cleanupdtseq.internalpriming.Rmd
    Modified:   analysis/dif.iso.usage.leafcutter.Rmd
    Modified:   analysis/explore.filters.Rmd
    Modified:   analysis/peak.cov.pipeline.Rmd
    Modified:   analysis/peakOverlap_oppstrand.Rmd
    Modified:   analysis/pheno.leaf.comb.Rmd
    Modified:   analysis/test.max2.Rmd
    Modified:   code/Snakefile
| File | Version | Author | Date | Message | 
|---|---|---|---|---|
| Rmd | 98159a7 | Briana Mittleman | 2018-09-06 | alpha .3 | 
| html | febabe4 | Briana Mittleman | 2018-09-06 | Build site. | 
| Rmd | 57005d9 | Briana Mittleman | 2018-09-06 | make qqplot | 
| html | 584548c | Briana Mittleman | 2018-09-06 | Build site. | 
| Rmd | e2f5e81 | Briana Mittleman | 2018-09-06 | fix run code | 
| html | f92a58f | Briana Mittleman | 2018-09-06 | Build site. | 
| Rmd | 46b7343 | Briana Mittleman | 2018-09-06 | add overlap analysis with code to subset | 
I will use this to overlap my QTLs with the other molecular QTLs already identified in the same individuals. First pass I will subset my nuclear and total nomial qtls by the snps with pvals less than .05 in each of the sets and make a qqplot.
I want to create a python script that takes in which type of qtl and a pvalue and subsets the full file for snps that pass that filter.
subset_qtls.py
def main(inFile, outFile, qtl, cutoff):
    fout=open(outFile, "w")
    ifile=open(inFile, "r")
    cutoff=float(cutoff)
    qtl_types= ['4su_30', '4su_60', 'RNAseq', 'RNAseqGeuvadis', 'ribo', 'prot']
    if qtl not in qtl_types:
         raise NameError("QTL arg must be 4su_30, 4su_60, RNAseq, RNAseqGeuvadis, ribo, or prot") 
    elif qtl=="4su_30":
        target=4
    elif qtl=="4su_60":
        target=5
    elif qtl=="RNAseq":
        target=6
    elif qtl=="RNAseqGeuvadis":
        target=7
    elif qtl=="ribo":
        target =8
    elif qtl=="prot":
        target=9
    for num,ln in enumerate(ifile):
        if num > 0 :
            line_list = ln.split()
            chrom=line_list[0][3:]
            pos=line_list[1]
            rsid=line_list[2]
            geneID=line_list[3]
            val = line_list[target].split(":")[0]
            if val == "NA":
              continue
            else:
                val = float(val)
                if val <= cutoff:
                    fout.write("%s:%s\t%s\t%s\t%f\n"%(chrom, pos, rsid, geneID,val))
    
if __name__ == "__main__":
    import sys
    qtl = sys.argv[1]
    cutoff= sys.argv[2]
    
    inFile = "/project2/gilad/briana/threeprimeseq/data/otherQTL/summary_betas_ste_100kb.txt"
    outFile = "/project2/gilad/briana/threeprimeseq/data/otherQTL/summary_betas_ste_100kb.%s%s.txt"%(qtl, cutoff)
    main(inFile, outFile, qtl, cutoff)I can run this to subset by each qtl at .05
run_subsetQTLs05.sh
#!/bin/bash
#SBATCH --job-name=run_subsetqtl05
#SBATCH --account=pi-gilad
#SBATCH --time=24:00:00
#SBATCH --output=run_subsetqtl05.out
#SBATCH --error=run_subsetqtl05.err
#SBATCH --partition=gilad
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
#qtls=('4su_30', '4su_60', 'RNAseq', 'RNAseqGeuvadis', 'ribo', 'prot')  
for i in 4su_30 4su_60 RNAseq RNAseqGeuvadis ribo prot; do
    python subset_qtls.py $i .05 
done
library(tidyverse)── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──✔ ggplot2 3.0.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()library(workflowr)This is workflowr version 1.1.1
Run ?workflowr for help getting startedlibrary(reshape2)
Attaching package: 'reshape2'The following object is masked from 'package:tidyr':
    smithslibrary(readr)nuc.nom=read.table("../data/nom_QTL_opp/filtered_APApeaks_merged_allchrom_refseqGenes_pheno_Nuclear_NomRes_onetenth.txt", header = F, stringsAsFactors = F)
colnames(nuc.nom)= c("peakID", "snpID", "dist", "nuc_pval", "slope")
QTL_names=c("snpID", "snpid2","Gene", "pval")
fourSU30= read.table("../data/other_qtls/summary_betas_ste_100kb.4su_30.05.txt", header=F, stringsAsFactors = F, col.names = QTL_names)
fourSU60=read.table("../data/other_qtls/summary_betas_ste_100kb.4su_60.05.txt", header=F, stringsAsFactors = F, col.names = QTL_names)
RNAseq=read.table("../data/other_qtls/summary_betas_ste_100kb.RNAseq.05.txt", header=F, stringsAsFactors = F, col.names = QTL_names)
guevardis=read.table("../data/other_qtls/summary_betas_ste_100kb.RNAseqGeuvadis.05.txt", header=F, stringsAsFactors = F, col.names = QTL_names)
ribo=read.table("../data/other_qtls/summary_betas_ste_100kb.ribo.05.txt", header=F, stringsAsFactors = F, col.names = QTL_names)
prot=read.table("../data/other_qtls/summary_betas_ste_100kb.prot.05.txt", header=F, stringsAsFactors = F, col.names = QTL_names)Overlap the files:
fourSU30AndNuc= fourSU30 %>% inner_join(nuc.nom, by="snpID") %>% select(snpID, nuc_pval)
fourSU30_unif=runif(nrow(fourSU30AndNuc))
fourSU60AndNuc= fourSU60 %>% inner_join(nuc.nom, by="snpID") %>% select(snpID, nuc_pval)
fourSU60_unif=runif(nrow(fourSU60AndNuc))
RNAAndNuc= RNAseq %>% inner_join(nuc.nom, by="snpID") %>% select(snpID, nuc_pval)
RNAseq_unif=runif(nrow(RNAAndNuc))
GuevAndNuc= guevardis %>% inner_join(nuc.nom, by="snpID") %>% select(snpID, nuc_pval)
guev_unif=runif(nrow(GuevAndNuc))
riboAndNuc= ribo %>% inner_join(nuc.nom, by="snpID") %>% select(snpID, nuc_pval)
ribo_unif=runif(nrow(riboAndNuc))
protAndNuc= prot %>% inner_join(nuc.nom, by="snpID") %>% select(snpID, nuc_pval)
prot_unif=runif(nrow(protAndNuc))Plot results:
qqplot(-log10(runif(nrow(nuc.nom))), -log10(nuc.nom$nuc_pval),ylab="-log10 Nuclear nominal pvalue", xlab="Uniform expectation", main="Nuclear Nominal pvalues for all snps")
points(sort(-log10(fourSU30_unif)), sort(-log10(fourSU30AndNuc$nuc_pval)), col="Red", alpha=.3)Warning in plot.xy(xy.coords(x, y), type = type, ...): "alpha" is not a
graphical parameterpoints(sort(-log10(fourSU60_unif)), sort(-log10(fourSU60AndNuc$nuc_pval)), col="Orange",alpha=.3)Warning in plot.xy(xy.coords(x, y), type = type, ...): "alpha" is not a
graphical parameterpoints(sort(-log10(RNAseq_unif)), sort(-log10(RNAAndNuc$nuc_pval)), col="Yellow",alpha=.3)Warning in plot.xy(xy.coords(x, y), type = type, ...): "alpha" is not a
graphical parameterpoints(sort(-log10(guev_unif)), sort(-log10(GuevAndNuc$nuc_pval)), col="Green",alpha=.3)Warning in plot.xy(xy.coords(x, y), type = type, ...): "alpha" is not a
graphical parameterpoints(sort(-log10(ribo_unif)), sort(-log10(riboAndNuc$nuc_pval)), col="Blue",alpha=.3)Warning in plot.xy(xy.coords(x, y), type = type, ...): "alpha" is not a
graphical parameterpoints(sort(-log10(prot_unif)), sort(-log10(protAndNuc$nuc_pval)), col="Purple",alpha=.3)Warning in plot.xy(xy.coords(x, y), type = type, ...): "alpha" is not a
graphical parameterabline(0,1)
legend("topleft", legend=c("All SNPs", "4su 30", "4su 60", "RNAseq", "Guevadis RNA", "Ribo", "Protein"), col=c("black", "red", "orange", "yellow", "green", "blue", "purple"), pch=19)
| Version | Author | Date | 
|---|---|---|
| febabe4 | Briana Mittleman | 2018-09-06 | 
sessionInfo()R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] reshape2_1.4.3  workflowr_1.1.1 forcats_0.3.0   stringr_1.3.1  
 [5] dplyr_0.7.6     purrr_0.2.5     readr_1.1.1     tidyr_0.8.1    
 [9] tibble_1.4.2    ggplot2_3.0.0   tidyverse_1.2.1
loaded via a namespace (and not attached):
 [1] tidyselect_0.2.4  haven_1.1.2       lattice_0.20-35  
 [4] colorspace_1.3-2  htmltools_0.3.6   yaml_2.2.0       
 [7] rlang_0.2.2       R.oo_1.22.0       pillar_1.3.0     
[10] glue_1.3.0        withr_2.1.2       R.utils_2.7.0    
[13] modelr_0.1.2      readxl_1.1.0      bindrcpp_0.2.2   
[16] bindr_0.1.1       plyr_1.8.4        munsell_0.5.0    
[19] gtable_0.2.0      cellranger_1.1.0  rvest_0.3.2      
[22] R.methodsS3_1.7.1 evaluate_0.11     knitr_1.20       
[25] broom_0.5.0       Rcpp_0.12.18      scales_1.0.0     
[28] backports_1.1.2   jsonlite_1.5      hms_0.4.2        
[31] digest_0.6.16     stringi_1.2.4     grid_3.5.1       
[34] rprojroot_1.3-2   cli_1.0.0         tools_3.5.1      
[37] magrittr_1.5      lazyeval_0.2.1    crayon_1.3.4     
[40] whisker_0.3-2     pkgconfig_2.0.2   xml2_1.2.0       
[43] lubridate_1.7.4   assertthat_0.2.0  rmarkdown_1.10   
[46] httr_1.3.1        rstudioapi_0.7    R6_2.2.2         
[49] nlme_3.1-137      git2r_0.23.0      compiler_3.5.1   
This reproducible R Markdown analysis was created with workflowr 1.1.1