• Inclusive
  • Address multimaping problem
    • Try the primary map read

Last updated: 2020-03-28

Checks: 7 0

Knit directory: Comparative_APA/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190902) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    code/chimp_log/
    Ignored:    code/human_log/
    Ignored:    data/.DS_Store
    Ignored:    data/TrialFiltersMeta.txt.sb-9845453e-R58Y0Q/
    Ignored:    data/mediation_prot/
    Ignored:    data/metadata_HCpanel.txt.sb-a5794dd2-i594qs/
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  ._.DS_Store
    Untracked:  Chimp/
    Untracked:  Human/
    Untracked:  analysis/CrossChimpThreePrime.Rmd
    Untracked:  analysis/DiffTransProtvsExpression.Rmd
    Untracked:  analysis/DiffUsedUTR.Rmd
    Untracked:  analysis/GvizPlots.Rmd
    Untracked:  analysis/HandC.TvN
    Untracked:  analysis/PhenotypeOverlap10.Rmd
    Untracked:  analysis/annotationBias.Rmd
    Untracked:  analysis/assessReadQual.Rmd
    Untracked:  analysis/diffExpressionPantro6.Rmd
    Untracked:  analysis/pol2.Rmd
    Untracked:  code/._BothFCMM.sh
    Untracked:  code/._BothFCMMPrim.sh
    Untracked:  code/._BothFCnewOInclusive.sh
    Untracked:  code/._ChimpStarMM2.sh
    Untracked:  code/._ClassifyLeafviz.sh
    Untracked:  code/._Config_chimp.yaml
    Untracked:  code/._Config_chimp_full.yaml
    Untracked:  code/._Config_human.yaml
    Untracked:  code/._ConvertJunc2Bed.sh
    Untracked:  code/._CountNucleotides.py
    Untracked:  code/._CrossMapChimpRNA.sh
    Untracked:  code/._CrossMapThreeprime.sh
    Untracked:  code/._DiffSplice.sh
    Untracked:  code/._DiffSplicePlots.sh
    Untracked:  code/._DiffSplicePlots_gencode.sh
    Untracked:  code/._DiffSplice_gencode.sh
    Untracked:  code/._DiffSplice_removebad.sh
    Untracked:  code/._FindIntronForDomPAS.sh
    Untracked:  code/._FindIntronForDomPAS_DF.sh
    Untracked:  code/._GetMAPQscore.py
    Untracked:  code/._GetSecondaryMap.py
    Untracked:  code/._Lift5perPAS.sh
    Untracked:  code/._LiftFinalChimpJunc2Human.sh
    Untracked:  code/._LiftOrthoPAS2chimp.sh
    Untracked:  code/._MapBadSamples.sh
    Untracked:  code/._PAS_ATTAAA.sh
    Untracked:  code/._PAS_ATTAAA_df.sh
    Untracked:  code/._PAS_seqExpanded.sh
    Untracked:  code/._PASsequences.sh
    Untracked:  code/._PASsequences_DF.sh
    Untracked:  code/._PlotNuclearUsagebySpecies.R
    Untracked:  code/._PlotNuclearUsagebySpecies_DF.R
    Untracked:  code/._QuantMergedClusters.sh
    Untracked:  code/._RNATranscriptDTplot.sh
    Untracked:  code/._ReverseLiftFilter.R
    Untracked:  code/._RunFixLeafCluster.sh
    Untracked:  code/._RunNegMCMediation.sh
    Untracked:  code/._RunNegMCMediationDF.sh
    Untracked:  code/._RunPosMCMediationDF.err
    Untracked:  code/._RunPosMCMediationDF.sh
    Untracked:  code/._SAF2Bed.py
    Untracked:  code/._Snakefile
    Untracked:  code/._SnakefilePAS
    Untracked:  code/._SnakefilePASfilt
    Untracked:  code/._SortIndexBadSamples.sh
    Untracked:  code/._StarMM2.sh
    Untracked:  code/._TestFC.sh
    Untracked:  code/._assignPeak2Intronicregion
    Untracked:  code/._assignPeak2Intronicregion.sh
    Untracked:  code/._bed215upbed.py
    Untracked:  code/._bed2Bedbothstrand.py
    Untracked:  code/._bed2SAF_gen.py
    Untracked:  code/._buildIndecpantro5
    Untracked:  code/._buildIndecpantro5.sh
    Untracked:  code/._buildLeafviz.sh
    Untracked:  code/._buildLeafviz_leadAnno.sh
    Untracked:  code/._buildStarIndex.sh
    Untracked:  code/._chimpChromprder.sh
    Untracked:  code/._chimpMultiCov.sh
    Untracked:  code/._chimpMultiCov255.sh
    Untracked:  code/._chimpMultiCovInclusive.sh
    Untracked:  code/._chooseSignalSite.py
    Untracked:  code/._cleanbed2saf.py
    Untracked:  code/._cluster.json
    Untracked:  code/._cluster2bed.py
    Untracked:  code/._clusterLiftReverse.sh
    Untracked:  code/._clusterLiftReverse_removebad.sh
    Untracked:  code/._clusterLiftprimary.sh
    Untracked:  code/._clusterLiftprimary_removebad.sh
    Untracked:  code/._converBam2Junc.sh
    Untracked:  code/._converBam2Junc_removeBad.sh
    Untracked:  code/._extraSnakefiltpas
    Untracked:  code/._extractPhyloReg.py
    Untracked:  code/._extractPhyloRegGene.py
    Untracked:  code/._extractPhylopGeneral.ph
    Untracked:  code/._extractPhylopGeneral.py
    Untracked:  code/._extractPhylopReg200down.py
    Untracked:  code/._extractPhylopReg200up.py
    Untracked:  code/._filter5percPAS.py
    Untracked:  code/._filterNumChroms.py
    Untracked:  code/._filterPASforMP.py
    Untracked:  code/._filterPostLift.py
    Untracked:  code/._filterPrimaryread.py
    Untracked:  code/._filterSecondaryread.py
    Untracked:  code/._fixExonFC.py
    Untracked:  code/._fixLeafCluster.py
    Untracked:  code/._fixLiftedJunc.py
    Untracked:  code/._fixUTRexonanno.py
    Untracked:  code/._formathg38Anno.py
    Untracked:  code/._formatpantro6Anno.py
    Untracked:  code/._getRNAseqMapStats.sh
    Untracked:  code/._hg19MapStats.sh
    Untracked:  code/._humanChromorder.sh
    Untracked:  code/._humanMultiCov.sh
    Untracked:  code/._humanMultiCov255.sh
    Untracked:  code/._humanMultiCov_inclusive.sh
    Untracked:  code/._intersectLiftedPAS.sh
    Untracked:  code/._liftJunctionFiles.sh
    Untracked:  code/._liftPAS19to38.sh
    Untracked:  code/._liftedchimpJunc2human.sh
    Untracked:  code/._makeNuclearDapaplots.sh
    Untracked:  code/._makeNuclearDapaplots_DF.sh
    Untracked:  code/._makeSamplyGroupsHuman_TvN.py
    Untracked:  code/._mapRNAseqhg19.sh
    Untracked:  code/._mapRNAseqhg19_newPipeline.sh
    Untracked:  code/._maphg19.sh
    Untracked:  code/._maphg19_subjunc.sh
    Untracked:  code/._mediation_test.R
    Untracked:  code/._mergeChimp3prime_inhg38.sh
    Untracked:  code/._mergeandBWRNAseq.sh
    Untracked:  code/._mergedBam2BW.sh
    Untracked:  code/._nameClusters.py
    Untracked:  code/._negativeMediation_montecarlo.R
    Untracked:  code/._negativeMediation_montecarloDF.R
    Untracked:  code/._numMultimap.py
    Untracked:  code/._overlapMMandOrthoexon.sh
    Untracked:  code/._overlapapaQTLPAS.sh
    Untracked:  code/._parseHg38.py
    Untracked:  code/._postiveMediation_montecarlo_DF.R
    Untracked:  code/._prepareCleanLiftedFC_5perc4LC.py
    Untracked:  code/._prepareLeafvizAnno.sh
    Untracked:  code/._preparePAS4lift.py
    Untracked:  code/._primaryLift.sh
    Untracked:  code/._processhg38exons.py
    Untracked:  code/._quantJunc.sh
    Untracked:  code/._quantJunc_TEST.sh
    Untracked:  code/._quantJunc_removeBad.sh
    Untracked:  code/._quantMerged_seperatly.sh
    Untracked:  code/._recLiftchim2human.sh
    Untracked:  code/._revLiftPAShg38to19.sh
    Untracked:  code/._reverseLift.sh
    Untracked:  code/._runCheckReverseLift.sh
    Untracked:  code/._runChimpDiffIso.sh
    Untracked:  code/._runCountNucleotides.sh
    Untracked:  code/._runFilterNumChroms.sh
    Untracked:  code/._runHumanDiffIso.sh
    Untracked:  code/._runNuclearDiffIso_DF.sh
    Untracked:  code/._runNuclearDifffIso.sh
    Untracked:  code/._runTotalDiffIso.sh
    Untracked:  code/._run_chimpverifybam.sh
    Untracked:  code/._run_verifyBam.sh
    Untracked:  code/._snakemake.batch
    Untracked:  code/._snakemakePAS.batch
    Untracked:  code/._snakemakePASchimp.batch
    Untracked:  code/._snakemakePAShuman.batch
    Untracked:  code/._snakemake_chimp.batch
    Untracked:  code/._snakemake_human.batch
    Untracked:  code/._snakemakefiltPAS.batch
    Untracked:  code/._snakemakefiltPAS_chimp
    Untracked:  code/._snakemakefiltPAS_chimp.sh
    Untracked:  code/._snakemakefiltPAS_human.sh
    Untracked:  code/._spliceSite2Fasta.py
    Untracked:  code/._submit-snakemake-chimp.sh
    Untracked:  code/._submit-snakemake-human.sh
    Untracked:  code/._submit-snakemakePAS-chimp.sh
    Untracked:  code/._submit-snakemakePAS-human.sh
    Untracked:  code/._submit-snakemakefiltPAS-chimp.sh
    Untracked:  code/._submit-snakemakefiltPAS-human.sh
    Untracked:  code/._subset_diffisopheno_Nuclear_HvC.py
    Untracked:  code/._subset_diffisopheno_Nuclear_HvC_DF.py
    Untracked:  code/._subset_diffisopheno_Total_HvC.py
    Untracked:  code/._threeprimeOrthoFC.sh
    Untracked:  code/._transcriptDTplotsNuclear.sh
    Untracked:  code/._verifyBam4973.sh
    Untracked:  code/._verifyBam4973inHuman.sh
    Untracked:  code/._wrap_chimpverifybam.sh
    Untracked:  code/._wrap_verifyBam.sh
    Untracked:  code/._writeMergecode.py
    Untracked:  code/.snakemake/
    Untracked:  code/BothFCMM.err
    Untracked:  code/BothFCMM.out
    Untracked:  code/BothFCMM.sh
    Untracked:  code/BothFCMMPrim.err
    Untracked:  code/BothFCMMPrim.out
    Untracked:  code/BothFCMMPrim.sh
    Untracked:  code/BothFCnewOInclusive.sh
    Untracked:  code/BothFCnewOInclusive.sh.err
    Untracked:  code/BothFCnewOInclusive.sh.out
    Untracked:  code/ChimpStarMM2.err
    Untracked:  code/ChimpStarMM2.out
    Untracked:  code/ChimpStarMM2.sh
    Untracked:  code/ClassifyLeafviz.sh
    Untracked:  code/Config_chimp.yaml
    Untracked:  code/Config_chimp_full.yaml
    Untracked:  code/Config_human.yaml
    Untracked:  code/ConvertJunc2Bed.err
    Untracked:  code/ConvertJunc2Bed.out
    Untracked:  code/ConvertJunc2Bed.sh
    Untracked:  code/CountNucleotides.py
    Untracked:  code/CrossMapChimpRNA.sh
    Untracked:  code/CrossMapThreeprime.sh
    Untracked:  code/CrossmapChimp3prime.err
    Untracked:  code/CrossmapChimp3prime.out
    Untracked:  code/CrossmapChimpRNA.err
    Untracked:  code/CrossmapChimpRNA.out
    Untracked:  code/DiffSplice.err
    Untracked:  code/DiffSplice.out
    Untracked:  code/DiffSplice.sh
    Untracked:  code/DiffSplicePlots.err
    Untracked:  code/DiffSplicePlots.out
    Untracked:  code/DiffSplicePlots.sh
    Untracked:  code/DiffSplicePlots_gencode.sh
    Untracked:  code/DiffSplice_gencode.sh
    Untracked:  code/DiffSplice_removebad.err
    Untracked:  code/DiffSplice_removebad.out
    Untracked:  code/DiffSplice_removebad.sh
    Untracked:  code/FilterReverseLift.err
    Untracked:  code/FilterReverseLift.out
    Untracked:  code/FindIntronForDomPAS.err
    Untracked:  code/FindIntronForDomPAS.out
    Untracked:  code/FindIntronForDomPAS.sh
    Untracked:  code/FindIntronForDomPAS_DF.sh
    Untracked:  code/GencodeDiffSplice.err
    Untracked:  code/GencodeDiffSplice.out
    Untracked:  code/GetMAPQscore.py
    Untracked:  code/GetSecondaryMap.py
    Untracked:  code/HchromOrder.err
    Untracked:  code/HchromOrder.out
    Untracked:  code/IntersectMMandOrtho.err
    Untracked:  code/IntersectMMandOrtho.out
    Untracked:  code/JunctionLift.err
    Untracked:  code/JunctionLift.out
    Untracked:  code/JunctionLiftFinalChimp.err
    Untracked:  code/JunctionLiftFinalChimp.out
    Untracked:  code/Lift5perPAS.sh
    Untracked:  code/Lift5perPASbed.err
    Untracked:  code/Lift5perPASbed.out
    Untracked:  code/LiftClustersFirst.err
    Untracked:  code/LiftClustersFirst.out
    Untracked:  code/LiftClustersFirst_remove.err
    Untracked:  code/LiftClustersFirst_remove.out
    Untracked:  code/LiftClustersSecond.err
    Untracked:  code/LiftClustersSecond.out
    Untracked:  code/LiftClustersSecond_remove.err
    Untracked:  code/LiftClustersSecond_remove.out
    Untracked:  code/LiftFinalChimpJunc2Human.sh
    Untracked:  code/LiftOrthoPAS2chimp.sh
    Untracked:  code/LiftorthoPAS.err
    Untracked:  code/LiftorthoPASt.out
    Untracked:  code/Log.out
    Untracked:  code/MapBadSamples.err
    Untracked:  code/MapBadSamples.out
    Untracked:  code/MapBadSamples.sh
    Untracked:  code/MapStats.err
    Untracked:  code/MapStats.out
    Untracked:  code/MaxEntCode/
    Untracked:  code/MergeClusters.err
    Untracked:  code/MergeClusters.out
    Untracked:  code/MergeClusters.sh
    Untracked:  code/PAS_ATTAAA.err
    Untracked:  code/PAS_ATTAAA.out
    Untracked:  code/PAS_ATTAAA.sh
    Untracked:  code/PAS_ATTAAADF.err
    Untracked:  code/PAS_ATTAAADF.out
    Untracked:  code/PAS_ATTAAA_df.sh
    Untracked:  code/PAS_seqExpanded.sh
    Untracked:  code/PAS_sequence.err
    Untracked:  code/PAS_sequence.out
    Untracked:  code/PAS_sequenceDF.err
    Untracked:  code/PAS_sequenceDF.out
    Untracked:  code/PASexpanded_sequenceDF.err
    Untracked:  code/PASexpanded_sequenceDF.out
    Untracked:  code/PASsequences.sh
    Untracked:  code/PASsequences_DF.sh
    Untracked:  code/PlotNuclearUsagebySpecies.R
    Untracked:  code/PlotNuclearUsagebySpecies_DF.R
    Untracked:  code/QuantMergeClusters
    Untracked:  code/QuantMergeClusters.err
    Untracked:  code/QuantMergeClusters.out
    Untracked:  code/QuantMergedClusters.sh
    Untracked:  code/RNATranscriptDTplot.err
    Untracked:  code/RNATranscriptDTplot.out
    Untracked:  code/RNATranscriptDTplot.sh
    Untracked:  code/Rev_liftoverPAShg19to38.err
    Untracked:  code/Rev_liftoverPAShg19to38.out
    Untracked:  code/ReverseLiftFilter.R
    Untracked:  code/RunFixCluster.err
    Untracked:  code/RunFixCluster.out
    Untracked:  code/RunFixLeafCluster.sh
    Untracked:  code/RunNegMCMediation.err
    Untracked:  code/RunNegMCMediation.sh
    Untracked:  code/RunNegMCMediationDF.err
    Untracked:  code/RunNegMCMediationDF.out
    Untracked:  code/RunNegMCMediationDF.sh
    Untracked:  code/RunNegMCMediationr.out
    Untracked:  code/RunPosMCMediation.err
    Untracked:  code/RunPosMCMediation.sh
    Untracked:  code/RunPosMCMediationDF.err
    Untracked:  code/RunPosMCMediationDF.out
    Untracked:  code/RunPosMCMediationDF.sh
    Untracked:  code/RunPosMCMediationr.out
    Untracked:  code/SAF215upbed_gen.py
    Untracked:  code/SAF2Bed.py
    Untracked:  code/Snakefile
    Untracked:  code/SnakefilePAS
    Untracked:  code/SnakefilePASfilt
    Untracked:  code/SortIndexBadSamples.err
    Untracked:  code/SortIndexBadSamples.out
    Untracked:  code/SortIndexBadSamples.sh
    Untracked:  code/StarMM2.err
    Untracked:  code/StarMM2.out
    Untracked:  code/StarMM2.sh
    Untracked:  code/TestFC.err
    Untracked:  code/TestFC.out
    Untracked:  code/TestFC.sh
    Untracked:  code/TotalTranscriptDTplot.err
    Untracked:  code/TotalTranscriptDTplot.out
    Untracked:  code/Upstream10Bases_general.py
    Untracked:  code/apaQTLsnake.err
    Untracked:  code/apaQTLsnake.out
    Untracked:  code/apaQTLsnakePAS.err
    Untracked:  code/apaQTLsnakePAS.out
    Untracked:  code/apaQTLsnakePAShuman.err
    Untracked:  code/assignPeak2Intronicregion.err
    Untracked:  code/assignPeak2Intronicregion.out
    Untracked:  code/assignPeak2Intronicregion.sh
    Untracked:  code/bam2junc.err
    Untracked:  code/bam2junc.out
    Untracked:  code/bam2junc_remove.err
    Untracked:  code/bam2junc_remove.out
    Untracked:  code/bed215upbed.py
    Untracked:  code/bed2Bedbothstrand.py
    Untracked:  code/bed2SAF_gen.py
    Untracked:  code/bed2saf.py
    Untracked:  code/bg_to_cov.py
    Untracked:  code/buildIndecpantro5
    Untracked:  code/buildIndecpantro5.sh
    Untracked:  code/buildLeafviz.err
    Untracked:  code/buildLeafviz.out
    Untracked:  code/buildLeafviz.sh
    Untracked:  code/buildLeafviz_leadAnno.sh
    Untracked:  code/buildLeafviz_leafanno.err
    Untracked:  code/buildLeafviz_leafanno.out
    Untracked:  code/buildStarIndex.sh
    Untracked:  code/callPeaksYL.py
    Untracked:  code/chimpChromprder.sh
    Untracked:  code/chimpMultiCov.err
    Untracked:  code/chimpMultiCov.out
    Untracked:  code/chimpMultiCov.sh
    Untracked:  code/chimpMultiCov255.sh
    Untracked:  code/chimpMultiCovInclusive.err
    Untracked:  code/chimpMultiCovInclusive.out
    Untracked:  code/chimpMultiCovInclusive.sh
    Untracked:  code/chooseAnno2Bed.py
    Untracked:  code/chooseAnno2SAF.py
    Untracked:  code/chooseSignalSite.py
    Untracked:  code/chromOrder.err
    Untracked:  code/chromOrder.out
    Untracked:  code/classifyLeafviz.err
    Untracked:  code/classifyLeafviz.out
    Untracked:  code/cleanbed2saf.py
    Untracked:  code/cluster.json
    Untracked:  code/cluster2bed.py
    Untracked:  code/clusterLiftReverse.sh
    Untracked:  code/clusterLiftReverse_removebad.sh
    Untracked:  code/clusterLiftprimary.sh
    Untracked:  code/clusterLiftprimary_removebad.sh
    Untracked:  code/clusterPAS.json
    Untracked:  code/clusterfiltPAS.json
    Untracked:  code/comands2Mege.sh
    Untracked:  code/converBam2Junc.sh
    Untracked:  code/converBam2Junc_removeBad.sh
    Untracked:  code/convertNumeric.py
    Untracked:  code/environment.yaml
    Untracked:  code/extraSnakefiltpas
    Untracked:  code/extractPhyloReg.py
    Untracked:  code/extractPhyloRegGene.py
    Untracked:  code/extractPhylopGeneral.py
    Untracked:  code/extractPhylopReg200down.py
    Untracked:  code/extractPhylopReg200up.py
    Untracked:  code/filter5perc.R
    Untracked:  code/filter5percPAS.py
    Untracked:  code/filter5percPheno.py
    Untracked:  code/filterBamforMP.pysam2_gen.py
    Untracked:  code/filterJuncChroms.err
    Untracked:  code/filterJuncChroms.out
    Untracked:  code/filterMissprimingInNuc10_gen.py
    Untracked:  code/filterNumChroms.py
    Untracked:  code/filterPASforMP.py
    Untracked:  code/filterPostLift.py
    Untracked:  code/filterPrimaryread.py
    Untracked:  code/filterSAFforMP_gen.py
    Untracked:  code/filterSecondaryread.py
    Untracked:  code/filterSortBedbyCleanedBed_gen.R
    Untracked:  code/filterpeaks.py
    Untracked:  code/fixExonFC.py
    Untracked:  code/fixFChead.py
    Untracked:  code/fixFChead_bothfrac.py
    Untracked:  code/fixLeafCluster.py
    Untracked:  code/fixLiftedJunc.py
    Untracked:  code/fixUTRexonanno.py
    Untracked:  code/formathg38Anno.py
    Untracked:  code/generateStarIndex.err
    Untracked:  code/generateStarIndex.out
    Untracked:  code/generateStarIndexHuman.err
    Untracked:  code/generateStarIndexHuman.out
    Untracked:  code/getRNAseqMapStats.sh
    Untracked:  code/hg19MapStats.err
    Untracked:  code/hg19MapStats.out
    Untracked:  code/hg19MapStats.sh
    Untracked:  code/humanChromorder.sh
    Untracked:  code/humanFiles
    Untracked:  code/humanMultiCov.err
    Untracked:  code/humanMultiCov.out
    Untracked:  code/humanMultiCov.sh
    Untracked:  code/humanMultiCov255.err
    Untracked:  code/humanMultiCov255.out
    Untracked:  code/humanMultiCov255.sh
    Untracked:  code/humanMultiCovInclusive.err
    Untracked:  code/humanMultiCovInclusive.out
    Untracked:  code/humanMultiCov_inclusive.sh
    Untracked:  code/intersectAnno.err
    Untracked:  code/intersectAnno.out
    Untracked:  code/intersectAnnoExt.err
    Untracked:  code/intersectAnnoExt.out
    Untracked:  code/intersectLiftedPAS.sh
    Untracked:  code/leafcutter_merge_regtools_redo.py
    Untracked:  code/liftJunctionFiles.sh
    Untracked:  code/liftPAS19to38.sh
    Untracked:  code/liftoverPAShg19to38.err
    Untracked:  code/liftoverPAShg19to38.out
    Untracked:  code/log/
    Untracked:  code/make5percPeakbed.py
    Untracked:  code/makeFileID.py
    Untracked:  code/makeNuclearDapaplots.sh
    Untracked:  code/makeNuclearDapaplots_DF.sh
    Untracked:  code/makeNuclearPlots.err
    Untracked:  code/makeNuclearPlots.out
    Untracked:  code/makeNuclearPlotsDF.err
    Untracked:  code/makeNuclearPlotsDF.out
    Untracked:  code/makePheno.py
    Untracked:  code/makeSamplyGroupsChimp_TvN.py
    Untracked:  code/makeSamplyGroupsHuman_TvN.py
    Untracked:  code/mapRNAseqhg19.sh
    Untracked:  code/mapRNAseqhg19_newPipeline.sh
    Untracked:  code/maphg19.err
    Untracked:  code/maphg19.out
    Untracked:  code/maphg19.sh
    Untracked:  code/maphg19_new.err
    Untracked:  code/maphg19_new.out
    Untracked:  code/maphg19_sub.err
    Untracked:  code/maphg19_sub.out
    Untracked:  code/maphg19_subjunc.sh
    Untracked:  code/mediation_test.R
    Untracked:  code/merge.err
    Untracked:  code/mergeChimp3prime_inhg38.sh
    Untracked:  code/merge_leafcutter_clusters_redo.py
    Untracked:  code/mergeandBWRNAseq.sh
    Untracked:  code/mergeandsort_ChimpinHuman.err
    Untracked:  code/mergeandsort_ChimpinHuman.out
    Untracked:  code/mergedBam2BW.sh
    Untracked:  code/mergedbam2bw.err
    Untracked:  code/mergedbam2bw.out
    Untracked:  code/mergedbamRNAand2bw.err
    Untracked:  code/mergedbamRNAand2bw.out
    Untracked:  code/nameClusters.py
    Untracked:  code/namePeaks.py
    Untracked:  code/negativeMediation_montecarlo.R
    Untracked:  code/negativeMediation_montecarloDF.R
    Untracked:  code/nuclearTranscriptDTplot.err
    Untracked:  code/nuclearTranscriptDTplot.out
    Untracked:  code/numMultimap.py
    Untracked:  code/overlapMMandOrthoexon.sh
    Untracked:  code/overlapPAS.err
    Untracked:  code/overlapPAS.out
    Untracked:  code/overlapapaQTLPAS.sh
    Untracked:  code/overlapapaQTLPAS_extended.sh
    Untracked:  code/overlapapaQTLPAS_samples.sh
    Untracked:  code/parseHg38.py
    Untracked:  code/peak2PAS.py
    Untracked:  code/pheno2countonly.R
    Untracked:  code/postiveMediation_montecarlo.R
    Untracked:  code/postiveMediation_montecarlo_DF.R
    Untracked:  code/prepareAnnoLeafviz.err
    Untracked:  code/prepareAnnoLeafviz.out
    Untracked:  code/prepareCleanLiftedFC_5perc4LC.py
    Untracked:  code/prepareLeafvizAnno.sh
    Untracked:  code/preparePAS4lift.py
    Untracked:  code/prepare_phenotype_table.py
    Untracked:  code/primaryLift.err
    Untracked:  code/primaryLift.out
    Untracked:  code/primaryLift.sh
    Untracked:  code/processhg38exons.py
    Untracked:  code/quantJunc.sh
    Untracked:  code/quantJunc_TEST.sh
    Untracked:  code/quantJunc_removeBad.sh
    Untracked:  code/quantLiftedPAS.err
    Untracked:  code/quantLiftedPAS.out
    Untracked:  code/quantLiftedPAS.sh
    Untracked:  code/quatJunc.err
    Untracked:  code/quatJunc.out
    Untracked:  code/recChimpback2Human.err
    Untracked:  code/recChimpback2Human.out
    Untracked:  code/recLiftchim2human.sh
    Untracked:  code/revLift.err
    Untracked:  code/revLift.out
    Untracked:  code/revLiftPAShg38to19.sh
    Untracked:  code/reverseLift.sh
    Untracked:  code/runCheckReverseLift.sh
    Untracked:  code/runChimpDiffIso.sh
    Untracked:  code/runChimpDiffIsoDF.sh
    Untracked:  code/runCountNucleotides.err
    Untracked:  code/runCountNucleotides.out
    Untracked:  code/runCountNucleotides.sh
    Untracked:  code/runCountNucleotidesPantro6.err
    Untracked:  code/runCountNucleotidesPantro6.out
    Untracked:  code/runCountNucleotides_pantro6.sh
    Untracked:  code/runFilterNumChroms.sh
    Untracked:  code/runHumanDiffIso.sh
    Untracked:  code/runHumanDiffIsoDF.sh
    Untracked:  code/runNuclearDiffIso_DF.sh
    Untracked:  code/runNuclearDifffIso.sh
    Untracked:  code/runTotalDiffIso.sh
    Untracked:  code/run_Chimpleafcutter_ds.err
    Untracked:  code/run_Chimpleafcutter_ds.out
    Untracked:  code/run_Chimpverifybam.err
    Untracked:  code/run_Chimpverifybam.out
    Untracked:  code/run_Humanleafcutter_dF.err
    Untracked:  code/run_Humanleafcutter_dF.out
    Untracked:  code/run_Humanleafcutter_ds.err
    Untracked:  code/run_Humanleafcutter_ds.out
    Untracked:  code/run_Nuclearleafcutter_ds.err
    Untracked:  code/run_Nuclearleafcutter_ds.out
    Untracked:  code/run_Nuclearleafcutter_dsDF.err
    Untracked:  code/run_Nuclearleafcutter_dsDF.out
    Untracked:  code/run_Totalleafcutter_ds.err
    Untracked:  code/run_Totalleafcutter_ds.out
    Untracked:  code/run_chimpverifybam.sh
    Untracked:  code/run_verifyBam.sh
    Untracked:  code/run_verifybam.err
    Untracked:  code/run_verifybam.out
    Untracked:  code/slurm-62824013.out
    Untracked:  code/slurm-62825841.out
    Untracked:  code/slurm-62826116.out
    Untracked:  code/slurm-64108209.out
    Untracked:  code/slurm-64108521.out
    Untracked:  code/slurm-64108557.out
    Untracked:  code/snakePASChimp.err
    Untracked:  code/snakePASChimp.out
    Untracked:  code/snakePAShuman.out
    Untracked:  code/snakemake.batch
    Untracked:  code/snakemakeChimp.err
    Untracked:  code/snakemakeChimp.out
    Untracked:  code/snakemakeHuman.err
    Untracked:  code/snakemakeHuman.out
    Untracked:  code/snakemakePAS.batch
    Untracked:  code/snakemakePASFiltChimp.err
    Untracked:  code/snakemakePASFiltChimp.out
    Untracked:  code/snakemakePASFiltHuman.err
    Untracked:  code/snakemakePASFiltHuman.out
    Untracked:  code/snakemakePASchimp.batch
    Untracked:  code/snakemakePAShuman.batch
    Untracked:  code/snakemake_chimp.batch
    Untracked:  code/snakemake_human.batch
    Untracked:  code/snakemakefiltPAS.batch
    Untracked:  code/snakemakefiltPAS_chimp.sh
    Untracked:  code/snakemakefiltPAS_human.sh
    Untracked:  code/spliceSite2Fasta.py
    Untracked:  code/submit-snakemake-chimp.sh
    Untracked:  code/submit-snakemake-human.sh
    Untracked:  code/submit-snakemakePAS-chimp.sh
    Untracked:  code/submit-snakemakePAS-human.sh
    Untracked:  code/submit-snakemakefiltPAS-chimp.sh
    Untracked:  code/submit-snakemakefiltPAS-human.sh
    Untracked:  code/subset_diffisopheno.py
    Untracked:  code/subset_diffisopheno_Chimp_tvN.py
    Untracked:  code/subset_diffisopheno_Chimp_tvN_DF.py
    Untracked:  code/subset_diffisopheno_Huma_tvN.py
    Untracked:  code/subset_diffisopheno_Huma_tvN_DF.py
    Untracked:  code/subset_diffisopheno_Nuclear_HvC.py
    Untracked:  code/subset_diffisopheno_Nuclear_HvC_DF.py
    Untracked:  code/subset_diffisopheno_Total_HvC.py
    Untracked:  code/test
    Untracked:  code/test.txt
    Untracked:  code/threeprimeOrthoFC.out
    Untracked:  code/threeprimeOrthoFC.sh
    Untracked:  code/threeprimeOrthoFCcd.err
    Untracked:  code/transcriptDTplotsNuclear.sh
    Untracked:  code/transcriptDTplotsTotal.sh
    Untracked:  code/verifyBam4973.sh
    Untracked:  code/verifyBam4973inHuman.sh
    Untracked:  code/verifybam4973.err
    Untracked:  code/verifybam4973.out
    Untracked:  code/verifybam4973HumanMap.err
    Untracked:  code/verifybam4973HumanMap.out
    Untracked:  code/wrap_Chimpverifybam.err
    Untracked:  code/wrap_Chimpverifybam.out
    Untracked:  code/wrap_chimpverifybam.sh
    Untracked:  code/wrap_verifyBam.sh
    Untracked:  code/wrap_verifybam.err
    Untracked:  code/wrap_verifybam.out
    Untracked:  code/writeMergecode.py
    Untracked:  data/._.DS_Store
    Untracked:  data/._HC_filenames.txt
    Untracked:  data/._HC_filenames.txt.sb-4426323c-IKIs0S
    Untracked:  data/._HC_filenames.xlsx
    Untracked:  data/._MapPantro6_meta.txt
    Untracked:  data/._MapPantro6_meta.txt.sb-a5794dd2-Cskmlm
    Untracked:  data/._MapPantro6_meta.xlsx
    Untracked:  data/._OppositeSpeciesMap.txt
    Untracked:  data/._OppositeSpeciesMap.txt.sb-a5794dd2-mayWJf
    Untracked:  data/._OppositeSpeciesMap.xlsx
    Untracked:  data/._RNASEQ_metadata.txt
    Untracked:  data/._RNASEQ_metadata.txt.sb-4426323c-TE4ns3
    Untracked:  data/._RNASEQ_metadata.txt.sb-51f67ae1-HXp7Gq
    Untracked:  data/._RNASEQ_metadata_2Removed.txt
    Untracked:  data/._RNASEQ_metadata_2Removed.txt.sb-4426323c-a4lBwx
    Untracked:  data/._RNASEQ_metadata_2Removed.xlsx
    Untracked:  data/._RNASEQ_metadata_stranded.txt
    Untracked:  data/._RNASEQ_metadata_stranded.txt.sb-a5794dd2-D659m2
    Untracked:  data/._RNASEQ_metadata_stranded.txt.sb-a5794dd2-ImNMoY
    Untracked:  data/._RNASEQ_metadata_stranded.txt.sb-e4bf31f0-ZGnGgl
    Untracked:  data/._RNASEQ_metadata_stranded.xlsx
    Untracked:  data/._TrialFiltersMeta.txt
    Untracked:  data/._TrialFiltersMeta.txt.sb-9845453e-R58Y0Q
    Untracked:  data/._metadata_HCpanel.txt
    Untracked:  data/._metadata_HCpanel.txt.sb-a3d92a2d-b9cYoF
    Untracked:  data/._metadata_HCpanel.txt.sb-a5794dd2-i594qs
    Untracked:  data/._metadata_HCpanel.txt.sb-f4823d1e-qihGek
    Untracked:  data/._metadata_HCpanel.xlsx
    Untracked:  data/._metadata_HCpanel_frompantro5.xlsx
    Untracked:  data/._~$RNASEQ_metadata.xlsx
    Untracked:  data/._~$metadata_HCpanel.xlsx
    Untracked:  data/._.xlsx
    Untracked:  data/BaseComp/
    Untracked:  data/CompapaQTLpas/
    Untracked:  data/DNDS/
    Untracked:  data/DTmatrix/
    Untracked:  data/DiffExpression/
    Untracked:  data/DiffIso_Nuclear/
    Untracked:  data/DiffIso_Nuclear_DF/
    Untracked:  data/DiffIso_Total/
    Untracked:  data/DiffSplice/
    Untracked:  data/DiffSplice_liftedJunc/
    Untracked:  data/DiffSplice_removeBad/
    Untracked:  data/DominantPAS/
    Untracked:  data/DominantPAS_DF/
    Untracked:  data/EvalPantro5/
    Untracked:  data/HC_filenames.txt
    Untracked:  data/HC_filenames.xlsx
    Untracked:  data/Khan_prot/
    Untracked:  data/Li_eqtls/
    Untracked:  data/MapPantro6_meta.txt
    Untracked:  data/MapPantro6_meta.xlsx
    Untracked:  data/MapStats/
    Untracked:  data/NormalizedClusters/
    Untracked:  data/NuclearHvC/
    Untracked:  data/NuclearHvC_DF/
    Untracked:  data/OppositeSpeciesMap.txt
    Untracked:  data/OppositeSpeciesMap.xlsx
    Untracked:  data/OrthoExonBed/
    Untracked:  data/OverlapBenchmark/
    Untracked:  data/OverlappingPAS/
    Untracked:  data/PAS/
    Untracked:  data/PAS_SAF/
    Untracked:  data/PAS_doubleFilter/
    Untracked:  data/Peaks_5perc/
    Untracked:  data/Pheno_5perc/
    Untracked:  data/Pheno_5perc_DF_nuclear/
    Untracked:  data/Pheno_5perc_nuclear/
    Untracked:  data/Pheno_5perc_nuclear_old/
    Untracked:  data/Pheno_5perc_total/
    Untracked:  data/PhyloP/
    Untracked:  data/RNASEQ_metadata.txt
    Untracked:  data/RNASEQ_metadata_2Removed.txt
    Untracked:  data/RNASEQ_metadata_2Removed.xlsx
    Untracked:  data/RNASEQ_metadata_stranded.txt
    Untracked:  data/RNASEQ_metadata_stranded.txt.sb-e4bf31f0-ZGnGgl/
    Untracked:  data/RNASEQ_metadata_stranded.xlsx
    Untracked:  data/SignalSites/
    Untracked:  data/SignalSites_doublefilter/
    Untracked:  data/SpliceSite/
    Untracked:  data/TestMM2/
    Untracked:  data/TestMM2_PrimaryRead/
    Untracked:  data/TestMM2_SeondaryRead/
    Untracked:  data/TestMM2_quality/
    Untracked:  data/Test_FC_methods/
    Untracked:  data/Threeprime2Ortho/
    Untracked:  data/TotalHvC/
    Untracked:  data/TrialFiltersMeta.txt
    Untracked:  data/TwoBadSampleAnalysis/
    Untracked:  data/Wang_ribo/
    Untracked:  data/apaQTLGenes/
    Untracked:  data/bioGRID/
    Untracked:  data/chainFiles/
    Untracked:  data/cleanPeaks_anno/
    Untracked:  data/cleanPeaks_byspecies/
    Untracked:  data/cleanPeaks_lifted/
    Untracked:  data/files4viz_nuclear/
    Untracked:  data/files4viz_nuclear_DF/
    Untracked:  data/gviz/
    Untracked:  data/leafviz/
    Untracked:  data/liftover_files/
    Untracked:  data/mediation/
    Untracked:  data/mediation_DF/
    Untracked:  data/metadata_HCpanel.txt
    Untracked:  data/metadata_HCpanel.xlsx
    Untracked:  data/metadata_HCpanel_frompantro5.txt
    Untracked:  data/metadata_HCpanel_frompantro5.xlsx
    Untracked:  data/multimap/
    Untracked:  data/primaryLift/
    Untracked:  data/reverseLift/
    Untracked:  data/testQuant/
    Untracked:  data/~$RNASEQ_metadata.xlsx
    Untracked:  data/~$metadata_HCpanel.xlsx
    Untracked:  data/.xlsx
    Untracked:  output/._.DS_Store
    Untracked:  output/dtPlots/
    Untracked:  projectNotes.Rmd
    Untracked:  proteinModelSet.Rmd

Unstaged changes:
    Modified:   analysis/DiffUsedIntronic.Rmd
    Modified:   analysis/ExploredAPA.Rmd
    Modified:   analysis/ExploredAPA_DF.Rmd
    Modified:   analysis/OppositeMap.Rmd
    Modified:   analysis/SpliceSiteStrength.Rmd
    Modified:   analysis/TotalVNuclearBothSpecies.Rmd
    Modified:   analysis/annotationInfo.Rmd
    Modified:   analysis/changeMisprimcut.Rmd
    Modified:   analysis/comp2apaQTLPAS.Rmd
    Modified:   analysis/correlationPhenos.Rmd
    Modified:   analysis/dAPAandapaQTL_DF.Rmd
    Modified:   analysis/establishCutoffs.Rmd
    Modified:   analysis/investigatePantro5.Rmd
    Modified:   analysis/multiMap.Rmd
    Modified:   analysis/speciesSpecific.Rmd
    Modified:   analysis/upsetter_DF.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 706fa0d brimittleman 2020-03-28 add mm exp code
html 3d4dd47 brimittleman 2020-03-27 Build site.
Rmd f92b89c brimittleman 2020-03-26 add ortho exon and new mm
html 1fad12e brimittleman 2020-03-25 Build site.
Rmd 4e6bd5a brimittleman 2020-03-25 fix mm
html 478a504 brimittleman 2020-03-24 Build site.
Rmd 646844c brimittleman 2020-03-24 add write out
html f0c8049 brimittleman 2020-03-23 Build site.
Rmd f11a628 brimittleman 2020-03-23 plots for multimap
html cd2bced brimittleman 2020-03-23 Build site.
Rmd 0e55318 brimittleman 2020-03-23 diagnose multimap
html 02f5c26 brimittleman 2020-03-20 Build site.
Rmd a4d172b brimittleman 2020-03-20 add mm res
html 246d60c brimittleman 2020-03-19 Build site.
Rmd b6ccdbc brimittleman 2020-03-19 add new quat

library(tidyverse)
── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.1       ✔ purrr   0.3.2  
✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
✔ tidyr   0.8.3       ✔ stringr 1.3.1  
✔ readr   1.3.1       ✔ forcats 0.3.0  
── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(ggpubr)
Loading required package: magrittr

Attaching package: 'magrittr'
The following object is masked from 'package:purrr':

    set_names
The following object is masked from 'package:tidyr':

    extract
library(VennDiagram)
Loading required package: grid
Loading required package: futile.logger

Attaching package: 'VennDiagram'
The following object is masked from 'package:ggpubr':

    rotate
library(Matrix)

Attaching package: 'Matrix'
The following object is masked from 'package:tidyr':

    expand

I am worried about using featureCounts. It looks like there are reads that are not counted. In this analysis, I will look at correlation between counts as well as the specific locations I have been worried about.

First, I will do this in human nuclear with the final PAS.

This is the fixed strand. The

Meta=read.table("../data/PAS_doubleFilter/PAS_10perc_either_HumanCoord_BothUsage_meta_doubleFilter.txt",header = T)
ChimpMeta=read.table("../data/PAS_doubleFilter/PAS_doublefilter_either_ChimpCoordChimpUsage.sort.bed", col.names = c("chr","start","end", "PAS", "score", "strand"),stringsAsFactors = F) %>% select(PAS)
HumanCountsFeatureCounts=read.table("../Human/data/CleanLiftedPeaks_FC/ALLPAS_postLift_LocParsed_Human_fixed.fc", header = T) %>%  separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")  %>% select(-contains("_T")) %>% filter(PAS %in% Meta$PAS) 

ChimpCountsFeatureCounts=read.table("../Chimp/data/CleanLiftedPeaks_FC/ALLPAS_postLift_LocParsed_Chimp_fixed.fc", header = T) %>%  separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")  %>% select(-contains("_T")) %>% right_join(ChimpMeta,by="PAS")

First use the bedtools method:

mkdir ../data/testQuant

sbatch humanMultiCov.sh

sbatch chimpMultiCov.sh
HumanDFMulticov=read.table("../data/testQuant/Human_DF_PAS.txt")

Compare:

# cor(a,b)

HumanCountsFeatureCounts_mat= as.matrix(HumanCountsFeatureCounts %>% select(contains("_N")))

HumanDFMulticov_mat=as.matrix(HumanDFMulticov[,7:12])
#PAS correlation 
Human_DFcor= cor(HumanCountsFeatureCounts_mat,HumanDFMulticov_mat)

I care about the diagonal for the matrix.

Human_DFcordiag= diag(Human_DFcor)

Human_DFcordiag
[1] 0.9839297 0.9551134 0.9668683 0.9742895 0.9781832 0.9821528

There are the correlations for individual.

ChimpDFMulticov=read.table("../data/testQuant/Chimp_DF_PAS.txt")

ChimpCountsFeatureCounts_mat= as.matrix(ChimpCountsFeatureCounts %>% select(contains("_N")))

ChimpDFMulticov_mat=as.matrix(ChimpDFMulticov[,7:12])
#PAS correlation 
Chimp_DFcor= cor(ChimpCountsFeatureCounts_mat,ChimpDFMulticov_mat)

Chimp_DFcordiag= diag(Chimp_DFcor)

Chimp_DFcordiag
[1] 0.9900335 0.9727290 0.9819526 0.9805695 0.9751067 0.9772205

Compare species

DF_Multi=as.data.frame(cbind(Chimp=Chimp_DFcordiag,Human=Human_DFcordiag)) %>% gather("Species", "Correlation")


ggplot(DF_Multi,aes(x=Species, y=Correlation, fill=Species)) + geom_boxplot() + geom_jitter() + stat_compare_means() + labs(title="Double filtered Final PAS correlation \nbetween feature counts and multiCov")

Version Author Date
246d60c brimittleman 2020-03-19

Difference in the matricies:

HumanDiff= HumanCountsFeatureCounts_mat-HumanDFMulticov_mat
ChimpDiff=ChimpCountsFeatureCounts_mat-ChimpDFMulticov_mat

Extract the non

nnzero(HumanDiff)
[1] 110127
sum(HumanDiff)
[1] -1787591
HumanDiff_g=as.data.frame(HumanDiff) %>% gather("Ind", "count")

ggplot(HumanDiff_g, aes(x=Ind, y=count,col=Ind))+ geom_boxplot() + labs(y="FeatureCount - Bedtools", title="Human Difference in Counts for final PAS")+ theme(legend.position = "none")

Version Author Date
1fad12e brimittleman 2020-03-25
02f5c26 brimittleman 2020-03-20

Look at the examples with the min values.

HumanDiff_rowS= rowSums(HumanDiff)
#min:  
minDiff=which(HumanDiff_rowS==min(HumanDiff_rowS))

#36964 (look at this row in the df)
HumanCountsFeatureCounts[minDiff,]
      disc         PAS chr2  start2    end2 strand    geneid  Chr   Start
36964 Both human299134 chr7 5527138 5527338      + ACTB_utr3 chr7 5527138
          End Strand Length X18498_N X18499_N X18502_N X18504_N X18510_N
36964 5527338      +    201     3833     4635     3328     2983     3460
      X18523_N
36964     4132
HumanDFMulticov[minDiff,]
        V1      V2      V3          V4        V5 V6    V7    V8    V9
36964 chr7 5527138 5527338 human299134 0.9833333  - 36243 45135 31796
        V10   V11   V12
36964 32458 33604 39187
nnzero(ChimpDiff)
[1] 112291
sum(ChimpDiff)
[1] -1904816
ChimpDiff_g=as.data.frame(ChimpDiff) %>% gather("Ind", "count")

ggplot(ChimpDiff_g, aes(x=Ind, y=count,col=Ind))+ geom_boxplot() + labs(y="FeatureCount - Bedtools", title="Chimp Difference in Counts for final PAS") + theme(legend.position = "none")

Version Author Date
1fad12e brimittleman 2020-03-25
02f5c26 brimittleman 2020-03-20

Look at the examples with the min values.

ChimpDiff_rowS= rowSums(ChimpDiff)
#min:  
minDiffChimp=which(ChimpDiff_rowS==min(ChimpDiff_rowS))

ChimpCountsFeatureCounts[minDiffChimp,]
      disc         PAS chr2  start2    end2 strand    geneid  Chr   Start
36910 Both human299134 chr7 5527138 5527338      + ACTB_utr3 chr7 5663709
          End Strand Length X18358_N X3622_N X3659_N X4973_N pt30_N pt91_N
36910 5663909      +    201     3626    7146    5153    6572   8693   7306
ChimpDFMulticov[minDiffChimp,]
        V1      V2      V3          V4    V5 V6    V7    V8    V9   V10
36910 chr7 5663709 5663909 human299134 0.985  - 31910 63880 44470 49514
        V11   V12
36910 73935 60607

This shows that in general counts are lower in the features counts.

Examples:

human188040

Meta %>% filter(PAS=="human188040")
          PAS  disc  gene loc  chr     start       end Chimp Human
1 human188040 Human PLCL1 cds chr2 198084933 198085133 0.095 0.025
  strandFix
1         +
ChimpDFMulticov %>% filter(V4=="human188040")
     V1       V2       V3          V4    V5 V6 V7 V8 V9 V10 V11 V12
1 chr2B 84660923 84661123 human188040 0.095  +  0  0  5   1   0   0
ChimpCountsFeatureCounts %>% filter(PAS=="human188040")
   disc         PAS chr2    start2      end2 strand    geneid   Chr
1 Human human188040 chr2 198084933 198085133      - PLCL1_cds chr2B
     Start      End Strand Length X18358_N X3622_N X3659_N X4973_N pt30_N
1 84660923 84661123      -    201        0       0       5       1      0
  pt91_N
1      0
HumanDFMulticov %>% filter(V4=="human188040")
    V1        V2        V3          V4    V5 V6 V7 V8 V9 V10 V11 V12
1 chr2 198084933 198085133 human188040 0.025  +  0  2  6   3   0   2
HumanCountsFeatureCounts %>% filter(PAS=="human188040")
   disc         PAS chr2    start2      end2 strand    geneid  Chr
1 Human human188040 chr2 198084933 198085133      - PLCL1_cds chr2
      Start       End Strand Length X18498_N X18499_N X18502_N X18504_N
1 198084933 198085133      -    201        0        2        6        3
  X18510_N X18523_N
1        0        2

Same counts here. This is why it was not found in chimp. It seems it wouldnt pass the non zero filter.
chimp171173

Meta %>% filter(PAS=="chimp171173")
          PAS  disc gene  loc  chr   start     end     Chimp        Human
1 chimp171173 Chimp ADI1 utr3 chr2 3498436 3498639 0.5358333 0.0008333333
  strandFix
1         -
HumanDFMulticov %>% filter(V4=="chimp171173") 
    V1      V2      V3          V4           V5 V6  V7 V8  V9 V10 V11 V12
1 chr2 3498436 3498639 chimp171173 0.0008333333  - 140 87 105  47 250 525
HumanCountsFeatureCounts %>% filter(PAS=="chimp171173")
   disc         PAS chr2  start2    end2 strand    geneid  Chr   Start
1 Chimp chimp171173 chr2 3498436 3498639      + ADI1_utr3 chr2 3498436
      End Strand Length X18498_N X18499_N X18502_N X18504_N X18510_N
1 3498639      +    204        0        0        0        0        0
  X18523_N
1        0

chimp171172

Meta %>% filter(PAS=="chimp171172")
          PAS  disc gene  loc  chr   start     end Chimp Human strandFix
1 chimp171172 Chimp ADI1 utr3 chr2 3497907 3498106 0.155     0         -
HumanDFMulticov %>% filter(V4=="chimp171172") 
    V1      V2      V3          V4 V5 V6 V7 V8 V9 V10 V11 V12
1 chr2 3497907 3498106 chimp171172  0  - 24  9 10  15  27 110
HumanCountsFeatureCounts %>% filter(PAS=="chimp171172")
   disc         PAS chr2  start2    end2 strand    geneid  Chr   Start
1 Chimp chimp171172 chr2 3497907 3498106      + ADI1_utr3 chr2 3497907
      End Strand Length X18498_N X18499_N X18502_N X18504_N X18510_N
1 3498106      +    200        0        0        0        0        0
  X18523_N
1        0

This is fixed. (ADI1)

LAT human124355

Meta %>% filter(PAS=="human124355")
          PAS  disc gene  loc   chr    start      end     Chimp Human
1 human124355 Human  LAT utr3 chr16 28990265 28990465 0.4241667  0.52
  strandFix
1         +
ChimpDFMulticov %>% filter(V4=="human124355")
                    V1     V2     V3          V4        V5 V6 V7 V8 V9 V10
1 chrUn_NW_019937196v1 380529 380729 human124355 0.4241667  - 19 26 43  23
  V11 V12
1  34  25
ChimpCountsFeatureCounts %>% filter(PAS=="human124355")
   disc         PAS  chr2   start2     end2 strand   geneid
1 Human human124355 chr16 28990265 28990465      - LAT_utr3
                   Chr  Start    End Strand Length X18358_N X3622_N
1 chrUn_NW_019937196v1 380529 380729      +    201       19      23
  X3659_N X4973_N pt30_N pt91_N
1      43      23     32     24
HumanDFMulticov %>% filter(V4=="human124355")
     V1       V2       V3          V4   V5 V6 V7 V8 V9 V10 V11 V12
1 chr16 28990265 28990465 human124355 0.52  + 50 10 28  18  33  32
HumanCountsFeatureCounts %>% filter(PAS=="human124355")
   disc         PAS  chr2   start2     end2 strand   geneid   Chr    Start
1 Human human124355 chr16 28990265 28990465      - LAT_utr3 chr16 28990265
       End Strand Length X18498_N X18499_N X18502_N X18504_N X18510_N
1 28990465      -    201       50       10       28       16       32
  X18523_N
1       31

Makes sense. Discovered only in human because on contig. Counting is ok.

Compare the PAS discovered in human:

humanDisc=Meta %>% filter(disc=="Human")

ChimpFC_human= ChimpCountsFeatureCounts %>% filter(PAS %in% humanDisc$PAS)
ChimpMC_human= ChimpDFMulticov %>% filter(V4 %in% humanDisc$PAS)

#correlation  
ChimpFC_human_mat= as.matrix(ChimpFC_human[,13:18])
ChimpMC_human_mat=as.matrix(ChimpMC_human[,7:12])
#PAS correlation 
Chimp_DFcor_Humandisc= cor(ChimpFC_human_mat,ChimpMC_human_mat)

Chimp_DFcor_Humandiscdiag= diag(Chimp_DFcor_Humandisc)

Chimp_DFcor_Humandiscdiag
[1] 0.9994462 0.9949149 0.9990672 0.9974558 0.9984791 0.9988820

Is the multicov higher?

ChimpHumanDiscDiff= ChimpFC_human_mat-ChimpMC_human_mat

nnzero(ChimpHumanDiscDiff)
[1] 12166
sum(ChimpHumanDiscDiff)
[1] -212818
summary(ChimpHumanDiscDiff)
    X18358_N            X3622_N             X3659_N         
 Min.   :-2248.000   Min.   :-2444.000   Min.   :-1658.000  
 1st Qu.:   -1.000   1st Qu.:   -2.000   1st Qu.:   -1.000  
 Median :    0.000   Median :    0.000   Median :    0.000  
 Mean   :   -8.554   Mean   :   -8.259   Mean   :   -6.238  
 3rd Qu.:    0.000   3rd Qu.:    0.000   3rd Qu.:    0.000  
 Max.   :  345.000   Max.   :  198.000   Max.   :  213.000  
    X4973_N             pt30_N              pt91_N         
 Min.   :-2526.00   Min.   :-2935.000   Min.   :-2713.000  
 1st Qu.:   -2.00   1st Qu.:   -2.000   1st Qu.:   -2.000  
 Median :    0.00   Median :    0.000   Median :    0.000  
 Mean   :   -6.81   Mean   :   -8.015   Mean   :   -8.682  
 3rd Qu.:    0.00   3rd Qu.:    0.000   3rd Qu.:    0.000  
 Max.   :  192.00   Max.   :  134.000   Max.   :  119.000  
ChimpHumanDiscDiff_rowS= rowSums(ChimpHumanDiscDiff)
#min:  
minDiffChimpinH=which(ChimpHumanDiscDiff_rowS==min(ChimpHumanDiscDiff_rowS))

ChimpFC_human[minDiffChimpinH,]
     disc       PAS chr2   start2     end2 strand    geneid  Chr    Start
113 Human human8304 chr1 44778707 44778907      - RPS8_utr3 chr1 43588450
         End Strand Length X18358_N X3622_N X3659_N X4973_N pt30_N pt91_N
113 43588650      -    201      400     421     239     448    627    602
ChimpMC_human[minDiffChimpinH,]
      V1       V2       V3        V4  V5 V6   V7   V8   V9  V10  V11  V12
113 chr1 43588450 43588650 human8304 0.5  + 2648 2534 1757 2974 3562 3315

chr 1 (.5X) RPS8 (snoRNAs near- could be problem )

Redo with different map quality

-q 255 this will not count multimapping reads.

sbatch chimpMultiCov255.sh
sbatch humanMultiCov255.sh
HumanDFMulticov255=read.table("../data/testQuant/Human_DF_PAS_255.txt")
ChimpDFMulticov255=read.table("../data/testQuant/Chimp_DF_PAS_255.txt")


HumanDFMulticov255_mat=as.matrix(HumanDFMulticov255[,7:12])


#compare this to the earlier one    

HumanDiffMulti=HumanDFMulticov_mat- HumanDFMulticov255_mat

HumanDiffMulti_g=as.data.frame(HumanDiffMulti) %>% gather("Ind", "Count")

ggplot(HumanDiffMulti_g,aes(x=Ind, y=Count, col=Ind)) + geom_boxplot() + labs(y="With Multimap - No multimap", title="Difference in PAS read count with bedtools multimap filters", x="Human Individuals") +theme(legend.position = "none")

Version Author Date
1fad12e brimittleman 2020-03-25
02f5c26 brimittleman 2020-03-20
#compare to FC  

HumanDiffw225= HumanCountsFeatureCounts_mat - HumanDFMulticov255_mat
HumanDiffw225_g=as.data.frame(HumanDiffw225) %>% gather("Ind", "Count")
ggplot(HumanDiffw225_g,aes(x=Ind, y=Count, col=Ind)) + geom_boxplot() + labs(x="Human Individual", y="Feature counts = Bedtools (no multimap)", title="Difference in PAS read count between featurecounts\n and bedtools without multimap") + theme(legend.position = "none")

Version Author Date
1fad12e brimittleman 2020-03-25

Check examples:

Meta %>% filter(PAS=="chimp171172")
          PAS  disc gene  loc  chr   start     end Chimp Human strandFix
1 chimp171172 Chimp ADI1 utr3 chr2 3497907 3498106 0.155     0         -
HumanDFMulticov %>% filter(V4=="chimp171172") 
    V1      V2      V3          V4 V5 V6 V7 V8 V9 V10 V11 V12
1 chr2 3497907 3498106 chimp171172  0  - 24  9 10  15  27 110
HumanDFMulticov255 %>% filter(V4=="chimp171172") 
    V1      V2      V3          V4 V5 V6 V7 V8 V9 V10 V11 V12
1 chr2 3497907 3498106 chimp171172  0  -  1  3  0   0   0   0
HumanCountsFeatureCounts %>% filter(PAS=="chimp171172")
   disc         PAS chr2  start2    end2 strand    geneid  Chr   Start
1 Chimp chimp171172 chr2 3497907 3498106      + ADI1_utr3 chr2 3497907
      End Strand Length X18498_N X18499_N X18502_N X18504_N X18510_N
1 3498106      +    200        0        0        0        0        0
  X18523_N
1        0
ChimpDFMulticov %>% filter(V4=="chimp171172") 
     V1      V2      V3          V4    V5 V6 V7 V8 V9 V10 V11 V12
1 chr2A 3477781 3477981 chimp171172 0.155  - 35 25 32  15  31  24
ChimpDFMulticov255 %>% filter(V4=="chimp171172") 
     V1      V2      V3          V4    V5 V6 V7 V8 V9 V10 V11 V12
1 chr2A 3477781 3477981 chimp171172 0.155  - 35 25 32  15  31  24
ChimpCountsFeatureCounts %>% filter(PAS=="chimp171172")
   disc         PAS chr2  start2    end2 strand    geneid   Chr   Start
1 Chimp chimp171172 chr2 3497907 3498106      + ADI1_utr3 chr2A 3477781
      End Strand Length X18358_N X3622_N X3659_N X4973_N pt30_N pt91_N
1 3477981      +    201       35      25      32      15     30     24

Ok it seems like maybe multimapping accounts for some of the differece.

Try including the multi map reads in feature counts. I will test the -M flag and the –primary flag to compare. The SAF file has all of the PAS.

sbatch BothFCMM.sh 

Grep the results: Chimp:chimp171172:chr2:3497907:3498106:+:ADI1_utr3 chr2 3497907 3498106 + 200 23 6 10 15 27 110

So this is the problem. The IGV.

Inclusive

Test also on the original calls. To see if it can handle this one.

I need to change bed to include both positive and negative. I can do this with python.

python bed2Bedbothstrand.py ../Human/data/inclusivePeaks/human_APApeaks.ALLChrom.bed ../data/testQuant/Human_ALLpeaks.Bothstrand.bed

python bed2Bedbothstrand.py ../Chimp/data/inclusivePeaks/chimp_APApeaks.ALLChrom.bed ../data/testQuant/Chimp_ALLpeaks.Bothstrand.bed

sbatch humanMultiCov_inclusive.sh
sbatch chimpMultiCovInclusive.sh


#look at difference with the -O flag  


sbatch BothFCnewOInclusive.sh

All counts are 0. Fix this.

Address multimaping problem

it looks like the problem is multimapping. I want to get rid of PAS that multi map in one species only. To do this I want to compare the featurecounts with and without the -M flag.


sbatch BothFCMM.sh 


python fixFChead_bothfrac.py ../data/testQuant/ALLPAS_postLift_LocParsed_Human_BothFrac_Mflag ../data/testQuant/ALLPAS_postLift_LocParsed_Human_BothFrac_Mflag_fixed.fc  

Only look at the PAS in the double filtered for now. I want to know how many I would lose in both frac.

HumanCountsNoMM=read.table("../Human/data/CleanLiftedPeaks_FC/ALLPAS_postLift_LocParsed_Human_fixed.fc", header = T) %>%  separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")%>% filter(PAS %in% Meta$PAS) 

HC_noMM=rowSums(as.matrix(HumanCountsNoMM[,13:24]) )

HumanCountMM=read.table("../data/testQuant/ALLPAS_postLift_LocParsed_Human_BothFrac_Mflag_fixed.fc", header = T)%>%  separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":") %>% filter(PAS %in% Meta$PAS) 


HC_MM=rowSums(as.matrix(HumanCountMM[,13:24]) )

HC_diff=HC_MM-HC_noMM
plot(HC_diff,  main="Difference between featurecounts with and without -M flag", ylab="Sum Human multimap - Sum Human no multimap")

Version Author Date
1fad12e brimittleman 2020-03-25

12 libraries. Lets say it needs to have at least 10 in each or 120

sum(HC_diff>120)
[1] 999
ChimpCountsNoMM=read.table("../Chimp/data/CleanLiftedPeaks_FC/ALLPAS_postLift_LocParsed_Chimp_fixed.fc", header = T)  %>%  separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":") %>% right_join(ChimpMeta,by="PAS")

CC_noMM=rowSums(as.matrix(ChimpCountsNoMM[,13:24]) )

ChimpCountMM=read.table("../data/testQuant/ALLPAS_postLift_LocParsed_Chimp_BothFrac_Mflag", header = T) %>%  separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")  %>% right_join(ChimpMeta,by="PAS")


CC_MM=rowSums(as.matrix(ChimpCountMM[,13:24]) )



CC_diff=CC_MM-CC_noMM
plot(CC_diff,main="Difference between featurecounts with and without -M flag", ylab="Sum Chimp multimap - Sum Chimp no multimap")

Version Author Date
1fad12e brimittleman 2020-03-25
sum(HC_diff>120)
[1] 999
humanMM=Human=HumanCountsNoMM[which(HC_diff>120),2]
chimpMM=ChimpCountsNoMM[which(CC_diff>120),2]

bothMM=intersect(humanMM,chimpMM)

overlap <- intersect(humanMM, chimpMM) 
H.unique <- setdiff(humanMM,chimpMM) # n 1st NOT 2nd
C.unique <- setdiff(chimpMM,humanMM)
full.list <- unique(c(humanMM,chimpMM))
grid.newpage()
venn.plot <- draw.pairwise.venn(area1 = length(humanMM),
                           area2 = length(chimpMM),
                           cross.area = length(overlap),
                           c("Human", "Chimp"), scaled = TRUE,
                           fill = c("green", "blue"),
                           cex = 1.5,
                           cat.cex = 1.5,
                           cat.pos = c(320, 25),
                           cat.dist = .05) 

Version Author Date
1fad12e brimittleman 2020-03-25

total remove if this filter:

length(full.list)
[1] 1528

Test a range of filters:

I will test how many PAS we would lose if we filtered at each value. I will see if there is natural cuttoff.

filterRange=seq(0,500000,by=100)
ChimpDiffN=c()
HumanDiffN=c()

for (i in filterRange){
  ChimpDiffN= c(ChimpDiffN, sum(CC_diff>i))
  HumanDiffN= c(HumanDiffN, sum(HC_diff>i))
}


LostPASdf=as.data.frame(cbind(Filter=filterRange, Chimp=ChimpDiffN, Human=HumanDiffN))
#%>% mutate(All=42318)
LostPASdf_g= LostPASdf %>% gather("Species","PAS_lost", -Filter)

Plot:

ggplot(LostPASdf_g, aes(x=Filter, y=log10(PAS_lost),col=Species)) + geom_point(alpha=.5) + labs(y="log10(PAS lost due to filter)", x="Filter for difference in read count due to multimapping", title="PAS lost due to multimap filter by different filter") + scale_color_brewer(palette = "Dark2")+geom_vline(xintercept = 20000) + annotate("text", label="Difference of 20,000 \ncounts across individuals", x=120000,y=3)

Version Author Date
1fad12e brimittleman 2020-03-25
ggplot(LostPASdf_g, aes(x=log10(Filter), y=log10(PAS_lost),col=Species)) + geom_point(alpha=.5) + labs(y="log10(PAS lost due to filter)", x="log10(Filter for difference in read count due to multimapping)", title="PAS lost due to multimap filter by different filter") + scale_color_brewer(palette = "Dark2")

Version Author Date
1fad12e brimittleman 2020-03-25
cd2bced brimittleman 2020-03-23

Look at proportion.

LostPASdfPROP=LostPASdf_g%>% mutate(All=42318, propLeft=PAS_lost/All, PropRemain=1-propLeft)
ggplot(LostPASdfPROP, aes(x=Filter, y=PropRemain,col=Species)) + geom_point(alpha=.3)

Version Author Date
1fad12e brimittleman 2020-03-25
cd2bced brimittleman 2020-03-23

Zoom in. Look at filter over 1000

LostPASdfPROP_filt=LostPASdfPROP %>% filter(Filter>1000)


ggplot(LostPASdfPROP_filt, aes(x=Filter, y=PropRemain,col=Species)) + geom_point(alpha=.3) + geom_vline(xintercept = 20000) +  annotate("text", label="Difference of 20,000 \ncounts across individuals", x=120000,y=.998) + scale_color_brewer(palette = "Dark2") + labs(x="Filter for difference in reads", y="Proportion of PAS remaining", title="Proportion of PAS remaining by species after multimap filter")

Version Author Date
1fad12e brimittleman 2020-03-25

Use this filter to make the boxplot

humanMM2=Human=HumanCountsNoMM[which(HC_diff>20000),2]
chimpMM2=ChimpCountsNoMM[which(CC_diff>20000),2]

bothMM2=intersect(humanMM2,chimpMM2)
overlap2 <- intersect(humanMM2, chimpMM2) 
grid.newpage()
venn.plot <- draw.pairwise.venn(area1 = length(humanMM2),
                           area2 = length(chimpMM2),
                           cross.area = length(overlap2),
                           c("Human", "Chimp"), scaled = TRUE,
                           fill = c("orange", "green"),
                           cex = 1.5,
                           cat.cex = 1.5,
                           cat.pos = c(320, 25),
                           cat.dist = .05) 

Version Author Date
1fad12e brimittleman 2020-03-25

expect to loose chimp171173 in human

HumanCountsNoMM[HumanCountsNoMM$PAS=="chimp171173",]
       disc         PAS chr2  start2    end2 strand    geneid  Chr   Start
22206 Chimp chimp171173 chr2 3498436 3498639      + ADI1_utr3 chr2 3498436
          End Strand Length X18498_N X18498_T X18499_N X18499_T X18502_N
22206 3498639      +    204        0        0        0        0        0
      X18502_T X18504_N X18504_T X18510_N X18510_T X18523_N X18523_T
22206        0        0        0        0        1        0        0
HC_diff[22206]
[1] 3808

Should it be the proportion of reads lost?

I need the sum for the reads with and without multimapping at each PAS. Proportion of reads in the mulimap set

HumanPASMultiMap=as.data.frame(cbind(PAS=HumanCountMM$PAS,WithMultiMap=HC_MM, NoMultiMap=HC_noMM)) 
HumanPASMultiMap$WithMultiMap=as.numeric(as.character(HumanPASMultiMap$WithMultiMap))
HumanPASMultiMap$NoMultiMap=as.numeric(as.character(HumanPASMultiMap$NoMultiMap))
HumanPASMultiMapProp= HumanPASMultiMap%>% mutate(Difference=WithMultiMap-NoMultiMap, PercentLost=Difference/WithMultiMap)

check chimp171173

HumanPASMultiMapProp %>% filter(PAS=="chimp171173")
          PAS WithMultiMap NoMultiMap Difference PercentLost
1 chimp171173         3809          1       3808   0.9997375

Plot
Color top 10%

x=quantile(HumanPASMultiMapProp$PercentLost, seq(0,1, by=.05),na.rm = T)[20]

HumanPASMultiMapPropcol=HumanPASMultiMapProp %>% mutate(color=ifelse(PercentLost>=x, "red", "black")) %>% arrange(PercentLost)
plot(HumanPASMultiMapPropcol$PercentLost,col=HumanPASMultiMapPropcol$color,ylab = "Percent of reads multimapping",main="Percent of reads lost to multimap filter in each human PAS",xlab="42,318 PAS Sorted")
legend("topleft", legend=c("Bottom 95%", "Top 5%"),col=c("black", "red"), pch=16,bty = 'n')

Version Author Date
1fad12e brimittleman 2020-03-25

Look at this in chimp

ChimpPASMultiMap=as.data.frame(cbind(PAS=ChimpCountMM$PAS,WithMultiMap=CC_MM, NoMultiMap=CC_noMM)) 
ChimpPASMultiMap$WithMultiMap=as.numeric(as.character(ChimpPASMultiMap$WithMultiMap))
ChimpPASMultiMap$NoMultiMap=as.numeric(as.character(ChimpPASMultiMap$NoMultiMap))
ChimpPASMultiMapProp= ChimpPASMultiMap%>% mutate(Difference=WithMultiMap-NoMultiMap, PercentLost=Difference/WithMultiMap)

Plot
Color top 5%

y=quantile(ChimpPASMultiMapProp$PercentLost, seq(0,1, by=.05),na.rm = T)[20]

ChimpPASMultiMapPropcol=ChimpPASMultiMapProp %>% mutate(color=ifelse(PercentLost>=y, "red", "black")) %>% arrange(PercentLost)
plot(ChimpPASMultiMapPropcol$PercentLost,col=ChimpPASMultiMapPropcol$color,ylab = "Percent of reads multimapping",main="Percent of reads lost to multimap filter in each chimp PAS",xlab="42,318 PAS Sorted")
legend("topleft", legend=c("Bottom 95%", "Top 5%"),col=c("black", "red"), pch=16,bty = 'n')

Version Author Date
1fad12e brimittleman 2020-03-25
cd2bced brimittleman 2020-03-23

Are these the same PAS :

Look at example.

ChimpPASMultiMapProp %>% filter(PAS=="chimp171173")
          PAS WithMultiMap NoMultiMap Difference PercentLost
1 chimp171173         2219       2216          3  0.00135196

So this shows that this PAS would not be lost in chimp

ChimpTop5=ChimpPASMultiMapPropcol %>% filter(color=="red") %>% select(PAS)
HumanTop5=HumanPASMultiMapPropcol %>% filter(color=="red") %>% select(PAS)

bothLostProp=intersect(HumanTop5$PAS,ChimpTop5$PAS)
OverlapProp <-intersect(HumanTop5$PAS, ChimpTop5$PAS) 
grid.newpage()
venn.plot <- draw.pairwise.venn(area1 = length(HumanTop5$PAS),
                           area2 = length(ChimpTop5$PAS),
                           cross.area = length(OverlapProp),
                           c("Human", "Chimp"), scaled = TRUE,
                           fill = c("darkorange", "darkgreen"),
                           cex = 1.5,
                           cat.cex = 1.5,
                           cat.pos = c(320, 25),
                           cat.dist = .05) 

Version Author Date
1fad12e brimittleman 2020-03-25
cd2bced brimittleman 2020-03-23

This shows that multimapping is not equivalent between.

Color top 2% (get up to 13% difference )

y=quantile(ChimpPASMultiMapProp$PercentLost, seq(0,1, by=.01),na.rm = T)[99]

ChimpPASMultiMapPropcol2=ChimpPASMultiMapProp %>% mutate(color=ifelse(PercentLost>=y, "red", "black")) %>% arrange(PercentLost)

x=quantile(HumanPASMultiMapProp$PercentLost, seq(0,1, by=.01),na.rm = T)[99]

HumanPASMultiMapPropcol2=HumanPASMultiMapProp %>% mutate(color=ifelse(PercentLost>=y, "red", "black")) %>% arrange(PercentLost)
plot(HumanPASMultiMapPropcol2$PercentLost,col=HumanPASMultiMapPropcol2$color,ylab = "Percent of reads multimapping",main="Percent of reads lost to multimap filter in each human PAS",xlab="42,318 PAS Sorted")
legend("topleft", legend=c("Bottom 98%", "Top 2%"),col=c("black", "red"), pch=16,bty = 'n')

Version Author Date
1fad12e brimittleman 2020-03-25
cd2bced brimittleman 2020-03-23
plot(ChimpPASMultiMapPropcol2$PercentLost,col=ChimpPASMultiMapPropcol2$color,ylab = "Percent of reads multimapping",main="Percent of reads lost to multimap filter in each chimp PAS",xlab="42,318 PAS Sorted")
legend("topleft", legend=c("Bottom 98%", "Top 2%"),col=c("black", "red"), pch=16,bty = 'n')

Version Author Date
1fad12e brimittleman 2020-03-25
f0c8049 brimittleman 2020-03-23
ChimpTop2=ChimpPASMultiMapPropcol2 %>% filter(color=="red") %>% select(PAS)
HumanTop2=HumanPASMultiMapPropcol2 %>% filter(color=="red") %>% select(PAS)

bothLostProp=intersect(HumanTop2$PAS,ChimpTop2$PAS)
OverlapProp <-intersect(HumanTop2$PAS, ChimpTop2$PAS) 
grid.newpage()
venn.plot <- draw.pairwise.venn(area1 = length(HumanTop2$PAS),
                           area2 = length(ChimpTop2$PAS),
                           cross.area = length(OverlapProp),
                           c("Human", "Chimp"), scaled = TRUE,
                           fill = c("darkorange", "darkgreen"),
                           cex = 1.5,
                           cat.cex = 1.5,
                           cat.pos = c(320, 25),
                           cat.dist = .05) 

Version Author Date
1fad12e brimittleman 2020-03-25
f0c8049 brimittleman 2020-03-23

In this set there are more on the chimp side.

Look at where these are to see if this could explain the different distribution:

Chimp2PerMeta= Meta %>% filter(PAS %in% ChimpTop2$PAS) %>% mutate(MultiMap="Chimp")

ggplot(Chimp2PerMeta,aes(x=loc,fill=loc)) +geom_bar(stat="count")

Version Author Date
1fad12e brimittleman 2020-03-25
Human2PerMeta= Meta %>% filter(PAS %in% HumanTop2$PAS) %>% mutate(MultiMap="Human")

ggplot(Human2PerMeta,aes(x=loc,fill=loc)) +geom_bar(stat="count")

Version Author Date
1fad12e brimittleman 2020-03-25
f0c8049 brimittleman 2020-03-23

Filterout the PAS with multimapping in both:

Human.unique <- setdiff(HumanTop2$PAS, ChimpTop2$PAS) # n 1st NOT 2nd

Chimp.unique <- setdiff(ChimpTop2$PAS, HumanTop2$PAS) # n 1st NOT 2nd

plot

Human2PerUniqMeta= Meta %>% filter(PAS %in% Human.unique) %>% mutate(MultiMap="Human")

Chimp2PerUniqMeta= Meta %>% filter(PAS %in% Chimp.unique) %>%  mutate(MultiMap="Chimp")

Both2PerUniq=Human2PerUniqMeta %>% bind_rows(Chimp2PerUniqMeta)


ggplot(Both2PerUniq, aes(x=loc, by=MultiMap, fill=MultiMap)) + geom_bar(stat="count", position = "dodge") + scale_fill_brewer(palette = "Dark2") + labs(y="Number of PAS", title="PAS likely wrong due to multimapping in 1 species \n 2% filter")

Version Author Date
1fad12e brimittleman 2020-03-25
f0c8049 brimittleman 2020-03-23

Plot average usage in each species for these:

Both2PerUniq_g= Both2PerUniq %>%select(Chimp, Human, MultiMap, loc) %>% gather("Species", "Usage", -MultiMap, -loc )
ggplot(Both2PerUniq_g, aes(x=loc, y=Usage, by=Species, fill=Species)) + geom_boxplot() + facet_grid(~MultiMap) + scale_fill_brewer(palette = "Dark2")

Version Author Date
1fad12e brimittleman 2020-03-25

Where were they discovered:

ggplot(Both2PerUniq, aes(x=loc, by=disc,fill=disc)) + geom_bar(stat="count", position="dodge")+ facet_grid(~MultiMap) + scale_fill_brewer(palette = "Dark2")

Version Author Date
1fad12e brimittleman 2020-03-25

Check to see if these are in the dominant set

Get the dominant and allow ties:

allPAS= read.table("../data/PAS_doubleFilter/PAS_10perc_either_HumanCoord_BothUsage_meta_doubleFilter.txt", header = T) 
ChimpPASwMean =allPAS %>% dplyr::select(-Human)
HumanPASwMean =allPAS %>% dplyr::select(-Chimp)

Chimp_Dom2= ChimpPASwMean %>%
  group_by(gene) %>%
  top_n(1,Chimp) %>% 
  mutate(nPer=n()) 
nrow(Chimp_Dom2%>% filter(nPer>1) )
[1] 198
Human_Dom2= HumanPASwMean %>%
  group_by(gene) %>%
  top_n(1,Human) %>% 
  mutate(nPer=n()) 

Is there an enrichment for dominant in these:

allPAS_annoDomandMM= allPAS %>% mutate(DomHuman=ifelse(PAS %in% Human_Dom2$PAS, "Yes", "No"),DomChimp=ifelse(PAS %in% Chimp_Dom2$PAS, "Yes", "No"), MM_human=ifelse(PAS %in% HumanTop2$PAS, "Yes", "No"), MM_Chimp=ifelse(PAS %in% ChimpTop2$PAS, "Yes", "No"), MMuniq_human=ifelse(PAS %in% Human.unique, "Yes", "No"),MMuniq_chimp=ifelse(PAS %in% Chimp.unique, "Yes", "No"))

Enrichments:

Dom Human and MM human uniq

x= nrow(allPAS_annoDomandMM %>% filter(DomHuman =="Yes",MMuniq_chimp=="Yes" ))
m= nrow(allPAS_annoDomandMM %>% filter(DomHuman =="Yes"))
n=nrow(allPAS_annoDomandMM%>% filter(DomHuman =="No"))
k=nrow(allPAS_annoDomandMM%>% filter(MMuniq_chimp =="Yes"))
x
[1] 162
#expected
which(grepl(max(dhyper(1:x, m, n, k)), dhyper(1:x, m, n, k)))
[1] 144
phyper(x,m, n , k,lower.tail=F)
[1] 0.04561002
x= nrow(allPAS_annoDomandMM %>% filter(DomChimp =="Yes",MMuniq_human=="Yes" ))
m= nrow(allPAS_annoDomandMM %>% filter(DomChimp =="Yes"))
n=nrow(allPAS_annoDomandMM%>% filter(DomChimp =="No"))
k=nrow(allPAS_annoDomandMM%>% filter(MMuniq_human =="Yes"))
x
[1] 163
#expected
which(grepl(max(dhyper(1:x, m, n, k)), dhyper(1:x, m, n, k)))
[1] 132
phyper(x,m, n , k,lower.tail=F)
[1] 0.001039761

Not enriched but 79 and 62 still means something. Lets see where these are.

ChimpDomandMM=allPAS_annoDomandMM %>% filter(DomChimp =="Yes",MMuniq_human=="Yes") %>% mutate(DominantSpecies ="Chimp")
HumanDomandMM=allPAS_annoDomandMM %>% filter(DomHuman =="Yes",MMuniq_chimp=="Yes")%>% mutate(DominantSpecies="Human")


bothDomandMM= ChimpDomandMM %>% bind_rows(HumanDomandMM)
#plot the location of dominant 

ggplot(bothDomandMM,aes(x=loc, fill=DominantSpecies,by=DominantSpecies)) + geom_bar(stat="count", position = "dodge") + scale_fill_brewer(palette = "Dark2") + labs(x="", y="Number of PAS", title="Location of Dominant PAS conditioned on multimapping problem in other species ")

Version Author Date
1fad12e brimittleman 2020-03-25

Write out these PAS to pull out in other analysis.

mkdir ../data/multimap
allPAS_overlap= allPAS %>% filter(PAS %in% OverlapProp) %>% mutate(MultiMap="Both")


write.table(Chimp2PerUniqMeta, "../data/multimap/Chimp_Uniq_multimapPAS.txt", col.names=T, row.names=F, quote=F)

write.table(Human2PerUniqMeta, "../data/multimap/Human_Uniq_multimapPAS.txt", col.names=T, row.names=F, quote=F)
write.table(allPAS_overlap, "../data/multimap/Both_multimapPAS.txt", col.names=T, row.names=F, quote=F)

Try the primary map read

python fixFChead_bothfrac.py ../data/testQuant/ALLPAS_postLift_LocParsed_Human_Primary ../data/testQuant/ALLPAS_postLift_LocParsed_Human_Primary.fixed.fc 
python fixFChead_bothfrac.py ../data/testQuant/ALLPAS_postLift_LocParsed_Chimp_Primary ../data/testQuant/ALLPAS_postLift_LocParsed_Chimp_Primary.fixed.fc 
HumanNPrimary=read.table("../data/testQuant/ALLPAS_postLift_LocParsed_Human_Primary.fixed.fc",header = T, stringsAsFactors = F) %>% separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")%>% filter(PAS %in% Meta$PAS) 
ChimpNPrimary=read.table("../data/testQuant/ALLPAS_postLift_LocParsed_Chimp_Primary.fixed.fc", header = T, stringsAsFactors = F) %>% separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")%>% filter(PAS %in% ChimpMeta$PAS) 

Compare this to the original FC.

#HumanCountsFeatureCounts_mat

HumanPrimary_mat=as.matrix(HumanNPrimary %>% select(contains("_N")))

HumanFCP=HumanPrimary_mat- HumanCountsFeatureCounts_mat



HumanFCP_g=as.data.frame(HumanFCP) %>% gather("Ind", "count")

ggplot(HumanFCP_g, aes(x=Ind, y=count,col=Ind))+ geom_boxplot() + labs(y="Primary - No MM", title="Human Difference in Counts for final PAS")+ theme(legend.position = "none")

Version Author Date
3d4dd47 brimittleman 2020-03-27

Ok so the primary flag doesn’t change anything for these.

try priamry and -m

sbatch BothFCMMPrim.sh
python fixFChead_bothfrac.py ../data/testQuant/ALLPAS_postLift_LocParsed_Human_PrimaryandM ../data/testQuant/ALLPAS_postLift_LocParsed_Human_PrimaryandM.fixed.fc 
python fixFChead_bothfrac.py ../data/testQuant/ALLPAS_postLift_LocParsed_Chimp_PrimaryandM ../data/testQuant/ALLPAS_postLift_LocParsed_Chimp_PrimaryandM.fixed.fc 
HumanNPrimaryandM=read.table("../data/testQuant/ALLPAS_postLift_LocParsed_Human_PrimaryandM.fixed.fc",header = T, stringsAsFactors = F) %>% separate(Geneid, into=c("disc", "PAS", "chr2", "start2", "end2", 'strand', 'geneid'), sep=":")%>% filter(PAS %in% Meta$PAS) 

HumanPrimaryandM_mat=as.matrix(HumanNPrimaryandM %>% select(contains("_N")))

HumanFCPM=HumanPrimaryandM_mat- HumanCountsFeatureCounts_mat



HumanFCPM_g=as.data.frame(HumanFCPM) %>% gather("Ind", "count")

ggplot(HumanFCPM_g, aes(x=Ind, y=count,col=Ind))+ geom_boxplot() + labs(y="Primary with M - No MM", title="Human Difference in Counts for final PAS")+ theme(legend.position = "none")

Version Author Date
3d4dd47 brimittleman 2020-03-27

Look at how many are not zero here. This tells me how many PAS have multimap.

nrow(HumanFCPM)
[1] 42318
sum(rowSums(HumanFCPM) !=0)
[1] 10693
10693/42318
[1] 0.2526821

25% have some MM.

I need the scores for the primary and secondary maps.

I will look at the mean for each PAS across individuals in each set. Then I can see the differences between the 3 sets. regular, -m, –primary

#reg
HumanOriginal=HumanCountsFeatureCounts %>% select(PAS,contains("_N")) %>% gather("Ind", "Counts", -PAS) %>%group_by(PAS) %>%  summarise(meanCount=mean(Counts)) %>% mutate(Set="Original")

#multimap  

HumanPAS_MM=HumanCountMM %>% select(PAS,contains("_N")) %>% gather("Ind", "Counts", -PAS) %>%group_by(PAS) %>%  summarise(meanCount=mean(Counts)) %>% mutate(Set="MultiMap")


#Primary  

HumanPAS_primary=HumanNPrimaryandM  %>% select(PAS,contains("_N")) %>% gather("Ind", "Counts", -PAS) %>%group_by(PAS) %>%  summarise(meanCount=mean(Counts)) %>% mutate(Set="Primary")


ThreeCriteriaHuman=HumanOriginal %>% bind_rows(HumanPAS_MM) %>% bind_rows(HumanPAS_primary) %>% spread(Set, meanCount) 

ThreeCriteriaHuman_var= HumanOriginal %>% bind_rows(HumanPAS_MM) %>% bind_rows(HumanPAS_primary) %>% group_by(PAS) %>% summarise(VarSet=var(meanCount))

ThreeCriteriaHuman_varnot0= ThreeCriteriaHuman_var %>%  filter(VarSet!=0)
plot(sort(ThreeCriteriaHuman_varnot0$VarSet))

ggplot(ThreeCriteriaHuman,aes(x=Original,y=Primary)) + geom_point() + geom_abline(slope=1, intercept = 0) + geom_density2d(col="blue")+ coord_cartesian(xlim=c(0,2500),ylim=c(0,2500))

ggplot(ThreeCriteriaHuman,aes(x=Original,y=MultiMap)) + geom_point() + geom_abline(slope=1, intercept = 0) + geom_density2d(col="blue")+ coord_cartesian(xlim=c(0,2500),ylim=c(0,2500))

ggplot(ThreeCriteriaHuman,aes(x=Primary,y=MultiMap)) + geom_point() + geom_abline(slope=1, intercept = 0) + geom_density2d(col="blue")+ coord_cartesian(xlim=c(0,2500),ylim=c(0,2500))

I only let one location read. Multimap will be the same as primary.


sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] Matrix_1.2-15       VennDiagram_1.6.20  futile.logger_1.4.3
 [4] ggpubr_0.2          magrittr_1.5        forcats_0.3.0      
 [7] stringr_1.3.1       dplyr_0.8.0.1       purrr_0.3.2        
[10] readr_1.3.1         tidyr_0.8.3         tibble_2.1.1       
[13] ggplot2_3.1.1       tidyverse_1.2.1    

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5     reshape2_1.4.3       haven_1.1.2         
 [4] lattice_0.20-38      colorspace_1.3-2     generics_0.0.2      
 [7] htmltools_0.3.6      yaml_2.2.0           rlang_0.4.0         
[10] later_0.7.5          pillar_1.3.1         glue_1.3.0          
[13] withr_2.1.2          RColorBrewer_1.1-2   lambda.r_1.2.3      
[16] modelr_0.1.2         readxl_1.1.0         plyr_1.8.4          
[19] munsell_0.5.0        gtable_0.2.0         workflowr_1.6.0     
[22] cellranger_1.1.0     rvest_0.3.2          evaluate_0.12       
[25] labeling_0.3         knitr_1.20           httpuv_1.4.5        
[28] broom_0.5.1          Rcpp_1.0.2           formatR_1.5         
[31] promises_1.0.1       scales_1.0.0         backports_1.1.2     
[34] jsonlite_1.6         fs_1.3.1             hms_0.4.2           
[37] digest_0.6.18        stringi_1.2.4        rprojroot_1.3-2     
[40] cli_1.1.0            tools_3.5.1          lazyeval_0.2.1      
[43] futile.options_1.0.1 crayon_1.3.4         whisker_0.3-2       
[46] pkgconfig_2.0.2      MASS_7.3-51.1        xml2_1.2.0          
[49] lubridate_1.7.4      assertthat_0.2.0     rmarkdown_1.10      
[52] httr_1.3.1           rstudioapi_0.10      R6_2.3.0            
[55] nlme_3.1-137         git2r_0.26.1         compiler_3.5.1