Last updated: 2020-01-26
Checks: 7 0
Knit directory: Comparative_APA/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190902)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: code/chimp_log/
Ignored: code/human_log/
Ignored: data/.DS_Store
Ignored: data/mediation_prot/
Ignored: data/metadata_HCpanel.txt.sb-a5794dd2-i594qs/
Untracked files:
Untracked: ._.DS_Store
Untracked: Chimp/
Untracked: Human/
Untracked: analysis/CrossChimpThreePrime.Rmd
Untracked: analysis/DiffTransProtvsExpression.Rmd
Untracked: analysis/assessReadQual.Rmd
Untracked: analysis/diffExpressionPantro6.Rmd
Untracked: code/._ClassifyLeafviz.sh
Untracked: code/._Config_chimp.yaml
Untracked: code/._Config_chimp_full.yaml
Untracked: code/._Config_human.yaml
Untracked: code/._ConvertJunc2Bed.sh
Untracked: code/._CountNucleotides.py
Untracked: code/._CrossMapChimpRNA.sh
Untracked: code/._CrossMapThreeprime.sh
Untracked: code/._DiffSplice.sh
Untracked: code/._DiffSplicePlots.sh
Untracked: code/._DiffSplicePlots_gencode.sh
Untracked: code/._DiffSplice_gencode.sh
Untracked: code/._DiffSplice_removebad.sh
Untracked: code/._FindIntronForDomPAS.sh
Untracked: code/._FindIntronForDomPAS_DF.sh
Untracked: code/._GetMAPQscore.py
Untracked: code/._GetSecondaryMap.py
Untracked: code/._Lift5perPAS.sh
Untracked: code/._LiftFinalChimpJunc2Human.sh
Untracked: code/._LiftOrthoPAS2chimp.sh
Untracked: code/._MapBadSamples.sh
Untracked: code/._PAS_ATTAAA.sh
Untracked: code/._PAS_ATTAAA_df.sh
Untracked: code/._PAS_seqExpanded.sh
Untracked: code/._PASsequences.sh
Untracked: code/._PASsequences_DF.sh
Untracked: code/._PlotNuclearUsagebySpecies.R
Untracked: code/._PlotNuclearUsagebySpecies_DF.R
Untracked: code/._QuantMergedClusters.sh
Untracked: code/._ReverseLiftFilter.R
Untracked: code/._RunFixLeafCluster.sh
Untracked: code/._RunNegMCMediation.sh
Untracked: code/._RunNegMCMediationDF.sh
Untracked: code/._RunPosMCMediationDF.err
Untracked: code/._RunPosMCMediationDF.sh
Untracked: code/._Snakefile
Untracked: code/._SnakefilePAS
Untracked: code/._SnakefilePASfilt
Untracked: code/._SortIndexBadSamples.sh
Untracked: code/._bed215upbed.py
Untracked: code/._bed2SAF_gen.py
Untracked: code/._buildIndecpantro5
Untracked: code/._buildIndecpantro5.sh
Untracked: code/._buildLeafviz.sh
Untracked: code/._buildLeafviz_leadAnno.sh
Untracked: code/._buildStarIndex.sh
Untracked: code/._chimpChromprder.sh
Untracked: code/._chooseSignalSite.py
Untracked: code/._cleanbed2saf.py
Untracked: code/._cluster.json
Untracked: code/._cluster2bed.py
Untracked: code/._clusterLiftReverse.sh
Untracked: code/._clusterLiftReverse_removebad.sh
Untracked: code/._clusterLiftprimary.sh
Untracked: code/._clusterLiftprimary_removebad.sh
Untracked: code/._converBam2Junc.sh
Untracked: code/._converBam2Junc_removeBad.sh
Untracked: code/._extraSnakefiltpas
Untracked: code/._extractPhyloReg.py
Untracked: code/._extractPhyloRegGene.py
Untracked: code/._filter5percPAS.py
Untracked: code/._filterNumChroms.py
Untracked: code/._filterPASforMP.py
Untracked: code/._filterPostLift.py
Untracked: code/._fixExonFC.py
Untracked: code/._fixLeafCluster.py
Untracked: code/._fixLiftedJunc.py
Untracked: code/._fixUTRexonanno.py
Untracked: code/._formathg38Anno.py
Untracked: code/._formatpantro6Anno.py
Untracked: code/._getRNAseqMapStats.sh
Untracked: code/._hg19MapStats.sh
Untracked: code/._humanChromorder.sh
Untracked: code/._intersectLiftedPAS.sh
Untracked: code/._liftJunctionFiles.sh
Untracked: code/._liftPAS19to38.sh
Untracked: code/._liftedchimpJunc2human.sh
Untracked: code/._makeNuclearDapaplots.sh
Untracked: code/._makeNuclearDapaplots_DF.sh
Untracked: code/._makeSamplyGroupsHuman_TvN.py
Untracked: code/._mapRNAseqhg19.sh
Untracked: code/._mapRNAseqhg19_newPipeline.sh
Untracked: code/._maphg19.sh
Untracked: code/._maphg19_subjunc.sh
Untracked: code/._mediation_test.R
Untracked: code/._mergeChimp3prime_inhg38.sh
Untracked: code/._mergedBam2BW.sh
Untracked: code/._nameClusters.py
Untracked: code/._negativeMediation_montecarlo.R
Untracked: code/._negativeMediation_montecarloDF.R
Untracked: code/._numMultimap.py
Untracked: code/._overlapapaQTLPAS.sh
Untracked: code/._parseHg38.py
Untracked: code/._postiveMediation_montecarlo_DF.R
Untracked: code/._prepareCleanLiftedFC_5perc4LC.py
Untracked: code/._prepareLeafvizAnno.sh
Untracked: code/._preparePAS4lift.py
Untracked: code/._primaryLift.sh
Untracked: code/._processhg38exons.py
Untracked: code/._quantJunc.sh
Untracked: code/._quantJunc_TEST.sh
Untracked: code/._quantJunc_removeBad.sh
Untracked: code/._quantMerged_seperatly.sh
Untracked: code/._recLiftchim2human.sh
Untracked: code/._revLiftPAShg38to19.sh
Untracked: code/._reverseLift.sh
Untracked: code/._runCheckReverseLift.sh
Untracked: code/._runChimpDiffIso.sh
Untracked: code/._runCountNucleotides.sh
Untracked: code/._runFilterNumChroms.sh
Untracked: code/._runHumanDiffIso.sh
Untracked: code/._runNuclearDiffIso_DF.sh
Untracked: code/._runNuclearDifffIso.sh
Untracked: code/._runTotalDiffIso.sh
Untracked: code/._run_chimpverifybam.sh
Untracked: code/._run_verifyBam.sh
Untracked: code/._snakemake.batch
Untracked: code/._snakemakePAS.batch
Untracked: code/._snakemakePASchimp.batch
Untracked: code/._snakemakePAShuman.batch
Untracked: code/._snakemake_chimp.batch
Untracked: code/._snakemake_human.batch
Untracked: code/._snakemakefiltPAS.batch
Untracked: code/._snakemakefiltPAS_chimp
Untracked: code/._snakemakefiltPAS_chimp.sh
Untracked: code/._snakemakefiltPAS_human.sh
Untracked: code/._submit-snakemake-chimp.sh
Untracked: code/._submit-snakemake-human.sh
Untracked: code/._submit-snakemakePAS-chimp.sh
Untracked: code/._submit-snakemakePAS-human.sh
Untracked: code/._submit-snakemakefiltPAS-chimp.sh
Untracked: code/._submit-snakemakefiltPAS-human.sh
Untracked: code/._subset_diffisopheno_Nuclear_HvC.py
Untracked: code/._subset_diffisopheno_Nuclear_HvC_DF.py
Untracked: code/._subset_diffisopheno_Total_HvC.py
Untracked: code/._threeprimeOrthoFC.sh
Untracked: code/._transcriptDTplotsNuclear.sh
Untracked: code/._verifyBam4973.sh
Untracked: code/._verifyBam4973inHuman.sh
Untracked: code/._wrap_chimpverifybam.sh
Untracked: code/._wrap_verifyBam.sh
Untracked: code/._writeMergecode.py
Untracked: code/.snakemake/
Untracked: code/ClassifyLeafviz.sh
Untracked: code/Config_chimp.yaml
Untracked: code/Config_chimp_full.yaml
Untracked: code/Config_human.yaml
Untracked: code/ConvertJunc2Bed.err
Untracked: code/ConvertJunc2Bed.out
Untracked: code/ConvertJunc2Bed.sh
Untracked: code/CountNucleotides.py
Untracked: code/CrossMapChimpRNA.sh
Untracked: code/CrossMapThreeprime.sh
Untracked: code/CrossmapChimp3prime.err
Untracked: code/CrossmapChimp3prime.out
Untracked: code/CrossmapChimpRNA.err
Untracked: code/CrossmapChimpRNA.out
Untracked: code/DiffSplice.err
Untracked: code/DiffSplice.out
Untracked: code/DiffSplice.sh
Untracked: code/DiffSplicePlots.err
Untracked: code/DiffSplicePlots.out
Untracked: code/DiffSplicePlots.sh
Untracked: code/DiffSplicePlots_gencode.sh
Untracked: code/DiffSplice_gencode.sh
Untracked: code/DiffSplice_removebad.err
Untracked: code/DiffSplice_removebad.out
Untracked: code/DiffSplice_removebad.sh
Untracked: code/FilterReverseLift.err
Untracked: code/FilterReverseLift.out
Untracked: code/FindIntronForDomPAS.err
Untracked: code/FindIntronForDomPAS.out
Untracked: code/FindIntronForDomPAS.sh
Untracked: code/FindIntronForDomPAS_DF.sh
Untracked: code/GencodeDiffSplice.err
Untracked: code/GencodeDiffSplice.out
Untracked: code/GetMAPQscore.py
Untracked: code/GetSecondaryMap.py
Untracked: code/HchromOrder.err
Untracked: code/HchromOrder.out
Untracked: code/JunctionLift.err
Untracked: code/JunctionLift.out
Untracked: code/JunctionLiftFinalChimp.err
Untracked: code/JunctionLiftFinalChimp.out
Untracked: code/Lift5perPAS.sh
Untracked: code/Lift5perPASbed.err
Untracked: code/Lift5perPASbed.out
Untracked: code/LiftClustersFirst.err
Untracked: code/LiftClustersFirst.out
Untracked: code/LiftClustersFirst_remove.err
Untracked: code/LiftClustersFirst_remove.out
Untracked: code/LiftClustersSecond.err
Untracked: code/LiftClustersSecond.out
Untracked: code/LiftClustersSecond_remove.err
Untracked: code/LiftClustersSecond_remove.out
Untracked: code/LiftFinalChimpJunc2Human.sh
Untracked: code/LiftOrthoPAS2chimp.sh
Untracked: code/LiftorthoPAS.err
Untracked: code/LiftorthoPASt.out
Untracked: code/Log.out
Untracked: code/MapBadSamples.err
Untracked: code/MapBadSamples.out
Untracked: code/MapBadSamples.sh
Untracked: code/MapStats.err
Untracked: code/MapStats.out
Untracked: code/MergeClusters.err
Untracked: code/MergeClusters.out
Untracked: code/MergeClusters.sh
Untracked: code/PAS_ATTAAA.err
Untracked: code/PAS_ATTAAA.out
Untracked: code/PAS_ATTAAA.sh
Untracked: code/PAS_ATTAAADF.err
Untracked: code/PAS_ATTAAADF.out
Untracked: code/PAS_ATTAAA_df.sh
Untracked: code/PAS_seqExpanded.sh
Untracked: code/PAS_sequence.err
Untracked: code/PAS_sequence.out
Untracked: code/PAS_sequenceDF.err
Untracked: code/PAS_sequenceDF.out
Untracked: code/PASexpanded_sequenceDF.err
Untracked: code/PASexpanded_sequenceDF.out
Untracked: code/PASsequences.sh
Untracked: code/PASsequences_DF.sh
Untracked: code/PlotNuclearUsagebySpecies.R
Untracked: code/PlotNuclearUsagebySpecies_DF.R
Untracked: code/QuantMergeClusters
Untracked: code/QuantMergeClusters.err
Untracked: code/QuantMergeClusters.out
Untracked: code/QuantMergedClusters.sh
Untracked: code/Rev_liftoverPAShg19to38.err
Untracked: code/Rev_liftoverPAShg19to38.out
Untracked: code/ReverseLiftFilter.R
Untracked: code/RunFixCluster.err
Untracked: code/RunFixCluster.out
Untracked: code/RunFixLeafCluster.sh
Untracked: code/RunNegMCMediation.err
Untracked: code/RunNegMCMediation.sh
Untracked: code/RunNegMCMediationDF.err
Untracked: code/RunNegMCMediationDF.out
Untracked: code/RunNegMCMediationDF.sh
Untracked: code/RunNegMCMediationr.out
Untracked: code/RunPosMCMediation.err
Untracked: code/RunPosMCMediation.sh
Untracked: code/RunPosMCMediationDF.err
Untracked: code/RunPosMCMediationDF.out
Untracked: code/RunPosMCMediationDF.sh
Untracked: code/RunPosMCMediationr.out
Untracked: code/SAF215upbed_gen.py
Untracked: code/Snakefile
Untracked: code/SnakefilePAS
Untracked: code/SnakefilePASfilt
Untracked: code/SortIndexBadSamples.err
Untracked: code/SortIndexBadSamples.out
Untracked: code/SortIndexBadSamples.sh
Untracked: code/TotalTranscriptDTplot.err
Untracked: code/TotalTranscriptDTplot.out
Untracked: code/Upstream10Bases_general.py
Untracked: code/apaQTLsnake.err
Untracked: code/apaQTLsnake.out
Untracked: code/apaQTLsnakePAS.err
Untracked: code/apaQTLsnakePAS.out
Untracked: code/apaQTLsnakePAShuman.err
Untracked: code/bam2junc.err
Untracked: code/bam2junc.out
Untracked: code/bam2junc_remove.err
Untracked: code/bam2junc_remove.out
Untracked: code/bed215upbed.py
Untracked: code/bed2SAF_gen.py
Untracked: code/bed2saf.py
Untracked: code/bg_to_cov.py
Untracked: code/buildIndecpantro5
Untracked: code/buildIndecpantro5.sh
Untracked: code/buildLeafviz.err
Untracked: code/buildLeafviz.out
Untracked: code/buildLeafviz.sh
Untracked: code/buildLeafviz_leadAnno.sh
Untracked: code/buildLeafviz_leafanno.err
Untracked: code/buildLeafviz_leafanno.out
Untracked: code/buildStarIndex.sh
Untracked: code/callPeaksYL.py
Untracked: code/chimpChromprder.sh
Untracked: code/chooseAnno2Bed.py
Untracked: code/chooseAnno2SAF.py
Untracked: code/chooseSignalSite.py
Untracked: code/chromOrder.err
Untracked: code/chromOrder.out
Untracked: code/classifyLeafviz.err
Untracked: code/classifyLeafviz.out
Untracked: code/cleanbed2saf.py
Untracked: code/cluster.json
Untracked: code/cluster2bed.py
Untracked: code/clusterLiftReverse.sh
Untracked: code/clusterLiftReverse_removebad.sh
Untracked: code/clusterLiftprimary.sh
Untracked: code/clusterLiftprimary_removebad.sh
Untracked: code/clusterPAS.json
Untracked: code/clusterfiltPAS.json
Untracked: code/comands2Mege.sh
Untracked: code/converBam2Junc.sh
Untracked: code/converBam2Junc_removeBad.sh
Untracked: code/convertNumeric.py
Untracked: code/environment.yaml
Untracked: code/extraSnakefiltpas
Untracked: code/extractPhyloReg.py
Untracked: code/extractPhyloRegGene.py
Untracked: code/filter5perc.R
Untracked: code/filter5percPAS.py
Untracked: code/filter5percPheno.py
Untracked: code/filterBamforMP.pysam2_gen.py
Untracked: code/filterJuncChroms.err
Untracked: code/filterJuncChroms.out
Untracked: code/filterMissprimingInNuc10_gen.py
Untracked: code/filterNumChroms.py
Untracked: code/filterPASforMP.py
Untracked: code/filterPostLift.py
Untracked: code/filterSAFforMP_gen.py
Untracked: code/filterSortBedbyCleanedBed_gen.R
Untracked: code/filterpeaks.py
Untracked: code/fixExonFC.py
Untracked: code/fixFChead.py
Untracked: code/fixFChead_bothfrac.py
Untracked: code/fixLeafCluster.py
Untracked: code/fixLiftedJunc.py
Untracked: code/fixUTRexonanno.py
Untracked: code/formathg38Anno.py
Untracked: code/generateStarIndex.err
Untracked: code/generateStarIndex.out
Untracked: code/generateStarIndexHuman.err
Untracked: code/generateStarIndexHuman.out
Untracked: code/getRNAseqMapStats.sh
Untracked: code/hg19MapStats.err
Untracked: code/hg19MapStats.out
Untracked: code/hg19MapStats.sh
Untracked: code/humanChromorder.sh
Untracked: code/humanFiles
Untracked: code/intersectAnno.err
Untracked: code/intersectAnno.out
Untracked: code/intersectAnnoExt.err
Untracked: code/intersectAnnoExt.out
Untracked: code/intersectLiftedPAS.sh
Untracked: code/leafcutter_merge_regtools_redo.py
Untracked: code/liftJunctionFiles.sh
Untracked: code/liftPAS19to38.sh
Untracked: code/liftoverPAShg19to38.err
Untracked: code/liftoverPAShg19to38.out
Untracked: code/log/
Untracked: code/make5percPeakbed.py
Untracked: code/makeFileID.py
Untracked: code/makeNuclearDapaplots.sh
Untracked: code/makeNuclearDapaplots_DF.sh
Untracked: code/makeNuclearPlots.err
Untracked: code/makeNuclearPlots.out
Untracked: code/makeNuclearPlotsDF.err
Untracked: code/makeNuclearPlotsDF.out
Untracked: code/makePheno.py
Untracked: code/makeSamplyGroupsChimp_TvN.py
Untracked: code/makeSamplyGroupsHuman_TvN.py
Untracked: code/mapRNAseqhg19.sh
Untracked: code/mapRNAseqhg19_newPipeline.sh
Untracked: code/maphg19.err
Untracked: code/maphg19.out
Untracked: code/maphg19.sh
Untracked: code/maphg19_new.err
Untracked: code/maphg19_new.out
Untracked: code/maphg19_sub.err
Untracked: code/maphg19_sub.out
Untracked: code/maphg19_subjunc.sh
Untracked: code/mediation_test.R
Untracked: code/merge.err
Untracked: code/mergeChimp3prime_inhg38.sh
Untracked: code/merge_leafcutter_clusters_redo.py
Untracked: code/mergeandsort_ChimpinHuman.err
Untracked: code/mergeandsort_ChimpinHuman.out
Untracked: code/mergedBam2BW.sh
Untracked: code/mergedbam2bw.err
Untracked: code/mergedbam2bw.out
Untracked: code/nameClusters.py
Untracked: code/namePeaks.py
Untracked: code/negativeMediation_montecarlo.R
Untracked: code/negativeMediation_montecarloDF.R
Untracked: code/nuclearTranscriptDTplot.err
Untracked: code/nuclearTranscriptDTplot.out
Untracked: code/numMultimap.py
Untracked: code/overlapPAS.err
Untracked: code/overlapPAS.out
Untracked: code/overlapapaQTLPAS.sh
Untracked: code/overlapapaQTLPAS_extended.sh
Untracked: code/overlapapaQTLPAS_samples.sh
Untracked: code/parseHg38.py
Untracked: code/peak2PAS.py
Untracked: code/pheno2countonly.R
Untracked: code/postiveMediation_montecarlo.R
Untracked: code/postiveMediation_montecarlo_DF.R
Untracked: code/prepareAnnoLeafviz.err
Untracked: code/prepareAnnoLeafviz.out
Untracked: code/prepareCleanLiftedFC_5perc4LC.py
Untracked: code/prepareLeafvizAnno.sh
Untracked: code/preparePAS4lift.py
Untracked: code/prepare_phenotype_table.py
Untracked: code/primaryLift.err
Untracked: code/primaryLift.out
Untracked: code/primaryLift.sh
Untracked: code/processhg38exons.py
Untracked: code/quantJunc.sh
Untracked: code/quantJunc_TEST.sh
Untracked: code/quantJunc_removeBad.sh
Untracked: code/quantLiftedPAS.err
Untracked: code/quantLiftedPAS.out
Untracked: code/quantLiftedPAS.sh
Untracked: code/quatJunc.err
Untracked: code/quatJunc.out
Untracked: code/recChimpback2Human.err
Untracked: code/recChimpback2Human.out
Untracked: code/recLiftchim2human.sh
Untracked: code/revLift.err
Untracked: code/revLift.out
Untracked: code/revLiftPAShg38to19.sh
Untracked: code/reverseLift.sh
Untracked: code/runCheckReverseLift.sh
Untracked: code/runChimpDiffIso.sh
Untracked: code/runCountNucleotides.err
Untracked: code/runCountNucleotides.out
Untracked: code/runCountNucleotides.sh
Untracked: code/runCountNucleotidesPantro6.err
Untracked: code/runCountNucleotidesPantro6.out
Untracked: code/runCountNucleotides_pantro6.sh
Untracked: code/runFilterNumChroms.sh
Untracked: code/runHumanDiffIso.sh
Untracked: code/runNuclearDiffIso_DF.sh
Untracked: code/runNuclearDifffIso.sh
Untracked: code/runTotalDiffIso.sh
Untracked: code/run_Chimpleafcutter_ds.err
Untracked: code/run_Chimpleafcutter_ds.out
Untracked: code/run_Chimpverifybam.err
Untracked: code/run_Chimpverifybam.out
Untracked: code/run_Humanleafcutter_ds.err
Untracked: code/run_Humanleafcutter_ds.out
Untracked: code/run_Nuclearleafcutter_ds.err
Untracked: code/run_Nuclearleafcutter_ds.out
Untracked: code/run_Nuclearleafcutter_dsDF.err
Untracked: code/run_Nuclearleafcutter_dsDF.out
Untracked: code/run_Totalleafcutter_ds.err
Untracked: code/run_Totalleafcutter_ds.out
Untracked: code/run_chimpverifybam.sh
Untracked: code/run_verifyBam.sh
Untracked: code/run_verifybam.err
Untracked: code/run_verifybam.out
Untracked: code/slurm-62824013.out
Untracked: code/slurm-62825841.out
Untracked: code/slurm-62826116.out
Untracked: code/slurm-64108209.out
Untracked: code/slurm-64108521.out
Untracked: code/slurm-64108557.out
Untracked: code/snakePASChimp.err
Untracked: code/snakePASChimp.out
Untracked: code/snakePAShuman.out
Untracked: code/snakemake.batch
Untracked: code/snakemakeChimp.err
Untracked: code/snakemakeChimp.out
Untracked: code/snakemakeHuman.err
Untracked: code/snakemakeHuman.out
Untracked: code/snakemakePAS.batch
Untracked: code/snakemakePASFiltChimp.err
Untracked: code/snakemakePASFiltChimp.out
Untracked: code/snakemakePASFiltHuman.err
Untracked: code/snakemakePASFiltHuman.out
Untracked: code/snakemakePASchimp.batch
Untracked: code/snakemakePAShuman.batch
Untracked: code/snakemake_chimp.batch
Untracked: code/snakemake_human.batch
Untracked: code/snakemakefiltPAS.batch
Untracked: code/snakemakefiltPAS_chimp.sh
Untracked: code/snakemakefiltPAS_human.sh
Untracked: code/submit-snakemake-chimp.sh
Untracked: code/submit-snakemake-human.sh
Untracked: code/submit-snakemakePAS-chimp.sh
Untracked: code/submit-snakemakePAS-human.sh
Untracked: code/submit-snakemakefiltPAS-chimp.sh
Untracked: code/submit-snakemakefiltPAS-human.sh
Untracked: code/subset_diffisopheno.py
Untracked: code/subset_diffisopheno_Chimp_tvN.py
Untracked: code/subset_diffisopheno_Huma_tvN.py
Untracked: code/subset_diffisopheno_Nuclear_HvC.py
Untracked: code/subset_diffisopheno_Nuclear_HvC_DF.py
Untracked: code/subset_diffisopheno_Total_HvC.py
Untracked: code/test
Untracked: code/threeprimeOrthoFC.out
Untracked: code/threeprimeOrthoFC.sh
Untracked: code/threeprimeOrthoFCcd.err
Untracked: code/transcriptDTplotsNuclear.sh
Untracked: code/transcriptDTplotsTotal.sh
Untracked: code/verifyBam4973.sh
Untracked: code/verifyBam4973inHuman.sh
Untracked: code/verifybam4973.err
Untracked: code/verifybam4973.out
Untracked: code/verifybam4973HumanMap.err
Untracked: code/verifybam4973HumanMap.out
Untracked: code/wrap_Chimpverifybam.err
Untracked: code/wrap_Chimpverifybam.out
Untracked: code/wrap_chimpverifybam.sh
Untracked: code/wrap_verifyBam.sh
Untracked: code/wrap_verifybam.err
Untracked: code/wrap_verifybam.out
Untracked: code/writeMergecode.py
Untracked: data/._.DS_Store
Untracked: data/._HC_filenames.txt
Untracked: data/._HC_filenames.txt.sb-4426323c-IKIs0S
Untracked: data/._HC_filenames.xlsx
Untracked: data/._MapPantro6_meta.txt
Untracked: data/._MapPantro6_meta.txt.sb-a5794dd2-Cskmlm
Untracked: data/._MapPantro6_meta.xlsx
Untracked: data/._OppositeSpeciesMap.txt
Untracked: data/._OppositeSpeciesMap.txt.sb-a5794dd2-mayWJf
Untracked: data/._OppositeSpeciesMap.xlsx
Untracked: data/._RNASEQ_metadata.txt
Untracked: data/._RNASEQ_metadata.txt.sb-4426323c-TE4ns3
Untracked: data/._RNASEQ_metadata.txt.sb-51f67ae1-HXp7Gq
Untracked: data/._RNASEQ_metadata_2Removed.txt
Untracked: data/._RNASEQ_metadata_2Removed.txt.sb-4426323c-a4lBwx
Untracked: data/._RNASEQ_metadata_2Removed.xlsx
Untracked: data/._RNASEQ_metadata_stranded.txt
Untracked: data/._RNASEQ_metadata_stranded.txt.sb-a5794dd2-D659m2
Untracked: data/._RNASEQ_metadata_stranded.txt.sb-a5794dd2-ImNMoY
Untracked: data/._RNASEQ_metadata_stranded.txt.sb-e4bf31f0-ZGnGgl
Untracked: data/._RNASEQ_metadata_stranded.xlsx
Untracked: data/._metadata_HCpanel.txt
Untracked: data/._metadata_HCpanel.txt.sb-a3d92a2d-b9cYoF
Untracked: data/._metadata_HCpanel.txt.sb-a5794dd2-i594qs
Untracked: data/._metadata_HCpanel.txt.sb-f4823d1e-qihGek
Untracked: data/._metadata_HCpanel.xlsx
Untracked: data/._metadata_HCpanel_frompantro5.xlsx
Untracked: data/._~$RNASEQ_metadata.xlsx
Untracked: data/._~$metadata_HCpanel.xlsx
Untracked: data/._.xlsx
Untracked: data/CompapaQTLpas/
Untracked: data/DNDS/
Untracked: data/DTmatrix/
Untracked: data/DiffExpression/
Untracked: data/DiffIso_Nuclear/
Untracked: data/DiffIso_Nuclear_DF/
Untracked: data/DiffIso_Total/
Untracked: data/DiffSplice/
Untracked: data/DiffSplice_liftedJunc/
Untracked: data/DiffSplice_removeBad/
Untracked: data/DominantPAS/
Untracked: data/DominantPAS_DF/
Untracked: data/EvalPantro5/
Untracked: data/HC_filenames.txt
Untracked: data/HC_filenames.xlsx
Untracked: data/Khan_prot/
Untracked: data/Li_eqtls/
Untracked: data/MapPantro6_meta.txt
Untracked: data/MapPantro6_meta.xlsx
Untracked: data/MapStats/
Untracked: data/NormalizedClusters/
Untracked: data/NuclearHvC/
Untracked: data/NuclearHvC_DF/
Untracked: data/OppositeSpeciesMap.txt
Untracked: data/OppositeSpeciesMap.xlsx
Untracked: data/OverlapBenchmark/
Untracked: data/PAS/
Untracked: data/PAS_doubleFilter/
Untracked: data/Peaks_5perc/
Untracked: data/Pheno_5perc/
Untracked: data/Pheno_5perc_DF_nuclear/
Untracked: data/Pheno_5perc_nuclear/
Untracked: data/Pheno_5perc_nuclear_old/
Untracked: data/Pheno_5perc_total/
Untracked: data/PhyloP/
Untracked: data/RNASEQ_metadata.txt
Untracked: data/RNASEQ_metadata_2Removed.txt
Untracked: data/RNASEQ_metadata_2Removed.xlsx
Untracked: data/RNASEQ_metadata_stranded.txt
Untracked: data/RNASEQ_metadata_stranded.txt.sb-e4bf31f0-ZGnGgl/
Untracked: data/RNASEQ_metadata_stranded.xlsx
Untracked: data/SignalSites/
Untracked: data/SignalSites_doublefilter/
Untracked: data/Threeprime2Ortho/
Untracked: data/TotalHvC/
Untracked: data/TwoBadSampleAnalysis/
Untracked: data/Wang_ribo/
Untracked: data/apaQTLGenes/
Untracked: data/chainFiles/
Untracked: data/cleanPeaks_anno/
Untracked: data/cleanPeaks_byspecies/
Untracked: data/cleanPeaks_lifted/
Untracked: data/files4viz_nuclear/
Untracked: data/files4viz_nuclear_DF/
Untracked: data/leafviz/
Untracked: data/liftover_files/
Untracked: data/mediation/
Untracked: data/mediation_DF/
Untracked: data/metadata_HCpanel.txt
Untracked: data/metadata_HCpanel.xlsx
Untracked: data/metadata_HCpanel_frompantro5.txt
Untracked: data/metadata_HCpanel_frompantro5.xlsx
Untracked: data/primaryLift/
Untracked: data/reverseLift/
Untracked: data/~$RNASEQ_metadata.xlsx
Untracked: data/~$metadata_HCpanel.xlsx
Untracked: data/.xlsx
Untracked: output/dtPlots/
Untracked: projectNotes.Rmd
Unstaged changes:
Modified: analysis/ExploredAPA.Rmd
Modified: analysis/OppositeMap.Rmd
Modified: analysis/annotationInfo.Rmd
Modified: analysis/comp2apaQTLPAS.Rmd
Modified: analysis/correlationPhenos.Rmd
Modified: analysis/dAPAandapaQTL_DF.Rmd
Modified: analysis/establishCutoffs.Rmd
Modified: analysis/investigatePantro5.Rmd
Modified: analysis/multiMap.Rmd
Modified: analysis/speciesSpecific.Rmd
Modified: analysis/speciesSpecific_DF.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 9e489ef | brimittleman | 2020-01-26 | update plot for CM |
html | 4f5d492 | brimittleman | 2020-01-14 | Build site. |
Rmd | 3cce3da | brimittleman | 2020-01-14 | start mediation analysis |
html | c74c3df | brimittleman | 2020-01-14 | Build site. |
Rmd | e1c6d25 | brimittleman | 2020-01-14 | add analysis on species spec PAS |
html | b963392 | brimittleman | 2020-01-13 | Build site. |
Rmd | 1121102 | brimittleman | 2020-01-13 | add dpau and de overlap with dom intronic |
html | 326f6e2 | brimittleman | 2020-01-10 | Build site. |
Rmd | 61bcd28 | brimittleman | 2020-01-10 | prepare DE volcano |
html | e5dff29 | brimittleman | 2020-01-06 | Build site. |
Rmd | 767ca26 | brimittleman | 2020-01-06 | add eQTL enrichment for eQTLs |
html | c453931 | brimittleman | 2020-01-05 | Build site. |
Rmd | 894a755 | brimittleman | 2020-01-05 | update DE new Orthoexon file |
html | 3ee8382 | brimittleman | 2019-12-08 | Build site. |
Rmd | f04a804 | brimittleman | 2019-12-08 | add DE vs DS analysis |
html | 2c9b5b4 | brimittleman | 2019-12-06 | Build site. |
Rmd | 6d93e4d | brimittleman | 2019-12-06 | update stranded |
html | a288a29 | brimittleman | 2019-12-04 | Build site. |
Rmd | 558b39f | brimittleman | 2019-12-04 | add current error for splice and write out DE genes |
html | 8fca47f | brimittleman | 2019-11-22 | Build site. |
Rmd | f13781e | brimittleman | 2019-11-22 | fixed mapping and indivs |
html | 25971ed | brimittleman | 2019-11-21 | Build site. |
Rmd | db0484c | brimittleman | 2019-11-21 | add PC corr |
html | 712106e | brimittleman | 2019-11-19 | Build site. |
Rmd | 8dc9ea0 | brimittleman | 2019-11-19 | first pipeline for de |
html | 586c9ec | brimittleman | 2019-11-13 | Build site. |
Rmd | bedfa41 | brimittleman | 2019-11-13 | question PCA methods |
html | a22bae9 | brimittleman | 2019-11-13 | Build site. |
Rmd | a52c26d | brimittleman | 2019-11-13 | look at pca and tech factors |
html | da4bab0 | brimittleman | 2019-11-12 | Build site. |
Rmd | 98d7f9b | brimittleman | 2019-11-12 | add cpm pca |
html | 32b435b | brimittleman | 2019-11-12 | Build site. |
Rmd | 1ce8433 | brimittleman | 2019-11-12 | start normalization |
html | 2c02d70 | brimittleman | 2019-11-12 | Build site. |
Rmd | 53642f7 | brimittleman | 2019-11-12 | add mapp stats |
html | dc91b0a | brimittleman | 2019-11-11 | Build site. |
Rmd | b5ba82e | brimittleman | 2019-11-11 | add diff expression and diff splicing |
library(workflowr)
This is workflowr version 1.5.0
Run ?workflowr for help getting started
library(tidyverse)
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.1 ✔ purrr 0.3.2
✔ tibble 2.1.1 ✔ dplyr 0.8.0.1
✔ tidyr 0.8.3 ✔ stringr 1.3.1
✔ readr 1.3.1 ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library("scales")
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library("gplots")
Attaching package: 'gplots'
The following object is masked from 'package:stats':
lowess
library("edgeR")
Loading required package: limma
library("R.utils")
Loading required package: R.oo
Loading required package: R.methodsS3
R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.
Attaching package: 'R.oo'
The following objects are masked from 'package:methods':
getClasses, getMethods
The following objects are masked from 'package:base':
attach, detach, gc, load, save
R.utils v2.7.0 successfully loaded. See ?R.utils for help.
Attaching package: 'R.utils'
The following object is masked from 'package:tidyr':
extract
The following object is masked from 'package:utils':
timestamp
The following objects are masked from 'package:base':
cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library("limma")
library("VennDiagram")
Loading required package: grid
Loading required package: futile.logger
library("RColorBrewer")
library(reshape2)
Attaching package: 'reshape2'
The following object is masked from 'package:tidyr':
smiths
For this analysis I do preprocessing with the Snakemake pipeline. The snakemake will map the RNA seq and quantify orthologous exons.
From FastQC:
Does not look like there is adapter contamination
No reads tagged as bad quality
Assess mapping:
metaData=read.table("../data/RNASEQ_metadata_stranded.txt", header = T, stringsAsFactors = F)
metaData$Species=as.factor(metaData$Species)
metaData$Collection=as.factor(metaData$Collection)
readInfo=metaData %>% mutate(AAUnMapped= Reads-Mapped, ABNotOrtho= Mapped-AssignedOrtho) %>% dplyr::select(Line, Species, AAUnMapped, ABNotOrtho, AssignedOrtho) %>% gather(key="Category", value="Number", -Line, -Species)
ggplot(readInfo, aes(x=Line,y=Number, fill=Category)) + geom_bar(stat="identity") + scale_fill_brewer(palette = "Dark2",name = "Type", labels = c("Unmapped", "Mapped not ortho", "Assigned Ortho Exon"))+theme(axis.text.x = element_text( hjust = 0,vjust = 1, size = 6, angle = 90)) + labs(y="Reads", title="Human and chimp read statistics")
Proportion of reads.
readProp=metaData %>% mutate(Aunmapped=1-percentMapped, MappednotOrtho=percentMapped-percentOrtho) %>% dplyr::select(Line,Species, percentOrtho, MappednotOrtho, Aunmapped) %>% gather(key="Category", value="Proportion", -Line, -Species)
ggplot(readProp, aes(x=Line,y=Proportion, fill=Category)) + geom_bar(stat="identity") + scale_fill_brewer(palette = "Dark2", name="", labels = c("Unmapped", "Mapped not ortho", "Assigned Ortho Exon"))+theme(axis.text.x = element_text( hjust = 0,vjust = 1, size = 6, angle = 90)) + labs(y="Reads", title="Human and chimp read proportions")
By species:
ggplot(readInfo,aes(x=Category, y=Number, by=Species, fill=Species)) + geom_boxplot() +scale_x_discrete( breaks=c("AAUnMapped","ABNotOrtho","AssignedOrtho"),labels=c("Unmapped", "Not in OrthoExon", "Assigned to OrthoExon")) + scale_fill_brewer(palette = "Dark2") + labs(title="Mapped reads by Species", y="Reads", x="")
ggplot(readProp,aes(x=Category, y=Proportion, by=Species, fill=Species)) + geom_boxplot() + scale_fill_brewer(palette = "Dark2") + labs(title="Map Proportion by Species", y="Proportion", x="") + scale_x_discrete( breaks=c("Aunmapped","MappedNotOrtho","percentOrtho"),labels=c("Unmapped", "Not in OrthoExon", "Assigned to OrthoExon"))
Code originally from Lauren Blake (http://lauren-blake.github.io/Reg_Evo_Primates/analysis/Normalization_plots.html)
Fix header for fc files:
python fixExonFC.py /project2/gilad/briana/Comparative_APA/Human/data/RNAseq/ExonCounts/RNAseqOrthoExon.fc /project2/gilad/briana/Comparative_APA/Human/data/RNAseq/ExonCounts/RNAseqOrthoExon.fixed.fc
python fixExonFC.py /project2/gilad/briana/Comparative_APA/Chimp/data/RNAseq/ExonCounts/RNAseqOrthoExon.fc /project2/gilad/briana/Comparative_APA/Chimp/data/RNAseq/ExonCounts/RNAseqOrthoExon.fixed.fc
HumanCounts=read.table("../Human/data/RNAseq/ExonCounts/RNAseqOrthoExon.fixed.fc", header = T, stringsAsFactors = F) %>% dplyr::select(-Chr,-Start,-End,-Strand, -Length)
ChimpCounts=read.table("../Chimp/data/RNAseq/ExonCounts/RNAseqOrthoExon.fixed.fc", header = T, stringsAsFactors = F) %>% dplyr::select(-Chr,-Start,-End,-Strand, -Length)
counts_genes=HumanCounts %>% inner_join(ChimpCounts,by="Geneid") %>% column_to_rownames(var="Geneid")
head(counts_genes)
NA18498 NA18504 NA18510 NA18523 NA18499 NA18502 NA4973
ENSG00000000003 0 3 4 1 2 2 13
ENSG00000000005 0 0 0 0 0 0 0
ENSG00000000419 911 742 1202 867 979 1023 580
ENSG00000000457 529 421 613 408 623 555 716
ENSG00000000460 448 287 545 159 560 481 546
ENSG00000000938 5186 3676 7197 12054 2916 3675 2865
NAPT30 NAPT91 NA3622 NA3659 NA18358
ENSG00000000003 3 1 18 13 0
ENSG00000000005 0 0 0 0 0
ENSG00000000419 451 533 530 563 501
ENSG00000000457 656 905 712 727 531
ENSG00000000460 410 237 510 526 245
ENSG00000000938 230 369 427 2307 3742
# Load colors
colors <- colorRampPalette(c(brewer.pal(9, "Blues")[1],brewer.pal(9, "Blues")[9]))(100)
pal <- c(brewer.pal(9, "Set1"), brewer.pal(8, "Set2"), brewer.pal(12, "Set3"))
labels <- paste(metaData$Species,metaData$Line, sep=" ")
#PCA function (original code from Julien Roux)
#Load in the plot_scores function
plot_scores <- function(pca, scores, n, m, cols, points=F, pchs =20, legend=F){
xmin <- min(scores[,n]) - (max(scores[,n]) - min(scores[,n]))*0.05
if (legend == T){ ## let some room (35%) for a legend
xmax <- max(scores[,n]) + (max(scores[,n]) - min(scores[,n]))*0.50
}
else {
xmax <- max(scores[,n]) + (max(scores[,n]) - min(scores[,n]))*0.05
}
ymin <- min(scores[,m]) - (max(scores[,m]) - min(scores[,m]))*0.05
ymax <- max(scores[,m]) + (max(scores[,m]) - min(scores[,m]))*0.05
plot(scores[,n], scores[,m], xlab=paste("PC", n, ": ", round(summary(pca)$importance[2,n],3)*100, "% variance explained", sep=""), ylab=paste("PC", m, ": ", round(summary(pca)$importance[2,m],3)*100, "% variance explained", sep=""), xlim=c(xmin, xmax), ylim=c(ymin, ymax), type="n")
if (points == F){
text(scores[,n],scores[,m], rownames(scores), col=cols, cex=1)
}
else {
points(scores[,n],scores[,m], col=cols, pch=pchs, cex=1.3)
}
}
# Clustering (original code from Julien Roux)
cors <- cor(counts_genes, method="spearman", use="pairwise.complete.obs")
heatmap.2( cors, scale="none", col = colors, margins = c(12, 12), trace='none', denscol="white", labCol=labels, ColSideColors=pal[as.integer(as.factor(metaData$Species))], RowSideColors=pal[as.integer(as.factor(metaData$Collection))+9], cexCol = 0.2 + 1/log10(15), cexRow = 0.2 + 1/log10(15))
select <- counts_genes
summary(apply(select, 1, var) == 0)
Mode FALSE TRUE
logical 18209 1164
# Perform PCA
pca_genes <- prcomp(t(counts_genes), scale = F)
scores <- pca_genes$x
#Make PCA plots with the factors colored by species
### PCs 1 and 2 Raw Data
for (n in 1:1){
col.v <- pal[as.integer(metaData$Species)]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
### PCs 3 and 4 Raw Data
for (n in 3:3){
col.v <- pal[as.integer(metaData$Species)]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
Plot density for raw data:
density_plot_18504 <- ggplot(counts_genes, aes(x = NA18504)) + geom_density() + labs(title = "Density plot of raw gene counts of NA18504") + labs(x = "Raw counts for each gene")
density_plot_18504
Convert to log2
log_counts_genes <- as.data.frame(log2(counts_genes))
head(log_counts_genes)
NA18498 NA18504 NA18510 NA18523 NA18499
ENSG00000000003 -Inf 1.584963 2.000000 0.000000 1.000000
ENSG00000000005 -Inf -Inf -Inf -Inf -Inf
ENSG00000000419 9.831307 9.535275 10.231221 9.759888 9.935165
ENSG00000000457 9.047124 8.717676 9.259743 8.672425 9.283088
ENSG00000000460 8.807355 8.164907 9.090112 7.312883 9.129283
ENSG00000000938 12.340406 11.843921 12.813180 13.557224 11.509775
NA18502 NA4973 NAPT30 NAPT91 NA3622 NA3659
ENSG00000000003 1.000000 3.700440 1.584963 0.000000 4.169925 3.700440
ENSG00000000005 -Inf -Inf -Inf -Inf -Inf -Inf
ENSG00000000419 9.998590 9.179909 8.816984 9.057992 9.049849 9.136991
ENSG00000000457 9.116344 9.483816 9.357552 9.821774 9.475733 9.505812
ENSG00000000460 8.909893 9.092757 8.679480 7.888743 8.994353 9.038919
ENSG00000000938 11.843529 11.484319 7.845490 8.527477 8.738092 11.171802
NA18358
ENSG00000000003 -Inf
ENSG00000000005 -Inf
ENSG00000000419 8.968667
ENSG00000000457 9.052568
ENSG00000000460 7.936638
ENSG00000000938 11.869594
density_plot_18504 <- ggplot(log_counts_genes, aes(x = 18504)) + geom_density()
density_plot_18504 + labs(title = "Density plot of log2 counts of 18504") + labs(x = "Log2 counts for each gene") + geom_vline(xintercept = 1)
plotDensities(log_counts_genes, col=pal[as.numeric(metaData$Species)], legend="topright")
Convert to CPM
cpm <- cpm(counts_genes, log=TRUE)
head(cpm)
NA18498 NA18504 NA18510 NA18523 NA18499
ENSG00000000003 -3.214119 -1.687803 -1.678332 -2.557786 -2.259918
ENSG00000000005 -3.214119 -3.214119 -3.214119 -3.214119 -3.214119
ENSG00000000419 5.579570 5.650403 5.945220 5.752976 5.631088
ENSG00000000457 4.797732 4.835162 4.976165 4.668752 4.980802
ENSG00000000460 4.558974 4.284933 4.807150 3.318750 4.827551
ENSG00000000938 8.085987 7.956576 8.525076 9.547634 7.203612
NA18502 NA4973 NAPT30 NAPT91 NA3622
ENSG00000000003 -2.244077 -0.508794 -1.907969 -2.610028 0.08159357
ENSG00000000005 -3.214119 -3.214119 -3.214119 -3.214119 -3.21411905
ENSG00000000419 5.726859 4.736398 4.582972 4.905709 4.81211618
ENSG00000000457 4.847086 5.039196 5.121512 5.667358 5.23658578
ENSG00000000460 4.641466 4.649609 4.446117 3.742924 4.75683807
ENSG00000000938 7.569677 7.036149 3.617698 4.377498 4.50169416
NA3659 NA18358
ENSG00000000003 -0.4625636 -3.214119
ENSG00000000005 -3.2141190 -3.214119
ENSG00000000419 4.7478795 5.024297
ENSG00000000457 5.1153942 5.107928
ENSG00000000460 4.6502143 3.997251
ENSG00000000938 6.7783101 7.921081
plotDensities(cpm, col=pal[as.numeric(metaData$Species)], legend="topright")
TMM/log2(CPM)
## Create edgeR object (dge) to calculate TMM normalization
dge_original <- DGEList(counts=as.matrix(counts_genes), genes=rownames(counts_genes), group = as.character(t(labels)))
dge_original <- calcNormFactors(dge_original)
tmm_cpm <- cpm(dge_original, normalized.lib.sizes=TRUE, log=TRUE, prior.count = 0.25)
head(cpm)
NA18498 NA18504 NA18510 NA18523 NA18499
ENSG00000000003 -3.214119 -1.687803 -1.678332 -2.557786 -2.259918
ENSG00000000005 -3.214119 -3.214119 -3.214119 -3.214119 -3.214119
ENSG00000000419 5.579570 5.650403 5.945220 5.752976 5.631088
ENSG00000000457 4.797732 4.835162 4.976165 4.668752 4.980802
ENSG00000000460 4.558974 4.284933 4.807150 3.318750 4.827551
ENSG00000000938 8.085987 7.956576 8.525076 9.547634 7.203612
NA18502 NA4973 NAPT30 NAPT91 NA3622
ENSG00000000003 -2.244077 -0.508794 -1.907969 -2.610028 0.08159357
ENSG00000000005 -3.214119 -3.214119 -3.214119 -3.214119 -3.21411905
ENSG00000000419 5.726859 4.736398 4.582972 4.905709 4.81211618
ENSG00000000457 4.847086 5.039196 5.121512 5.667358 5.23658578
ENSG00000000460 4.641466 4.649609 4.446117 3.742924 4.75683807
ENSG00000000938 7.569677 7.036149 3.617698 4.377498 4.50169416
NA3659 NA18358
ENSG00000000003 -0.4625636 -3.214119
ENSG00000000005 -3.2141190 -3.214119
ENSG00000000419 4.7478795 5.024297
ENSG00000000457 5.1153942 5.107928
ENSG00000000460 4.6502143 3.997251
ENSG00000000938 6.7783101 7.921081
pca_genes <- prcomp(t(tmm_cpm), scale = F)
scores <- pca_genes$x
for (n in 1:2){
col.v <- pal[as.integer(metaData$Species)]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
# Plot library size
boxplot_library_size <- ggplot(dge_original$samples, aes(x=metaData$Species, y = dge_original$samples$lib.size, fill = metaData$Species)) + geom_boxplot()
boxplot_library_size + labs(title = "Library size by Species") + labs(y = "Library size") + labs(x = "Species") + guides(fill=guide_legend(title="Species"))
plotDensities(tmm_cpm, col=pal[as.numeric(metaData$Species)], legend="topright")
Filter based on log2 cpm
filter log2(cpm) >1 in at least 8 of the samples (2/3)
#filter counts
keep.exprs=rowSums(tmm_cpm>1) >8
counts_filtered= counts_genes[keep.exprs,]
plotDensities(counts_filtered, col=pal[as.numeric(metaData$Species)], legend="topright")
labels <- paste(metaData$Species, metaData$Line, sep=" ")
dge_in_cutoff <- DGEList(counts=as.matrix(counts_filtered), genes=rownames(counts_filtered), group = as.character(t(labels)))
dge_in_cutoff <- calcNormFactors(dge_in_cutoff)
cpm_in_cutoff <- cpm(dge_in_cutoff, normalized.lib.sizes=TRUE, log=TRUE, prior.count = 0.25)
head(cpm_in_cutoff)
NA18498 NA18504 NA18510 NA18523 NA18499 NA18502
ENSG00000000419 5.605914 5.653296 5.916152 5.845268 5.644714 5.728449
ENSG00000000457 4.822027 4.836001 4.944992 4.758197 4.992866 4.846521
ENSG00000000460 4.582386 4.283559 4.775443 3.399813 4.839132 4.640177
ENSG00000000938 8.114674 7.961624 8.497834 9.642281 7.219058 7.573114
ENSG00000001036 5.888276 5.743706 5.726636 5.927711 5.758006 5.063235
ENSG00000001084 4.763617 4.061338 4.530956 5.004519 3.991180 4.092598
NA4973 NAPT30 NAPT91 NA3622 NA3659 NA18358
ENSG00000000419 4.776243 4.670620 5.016289 4.838002 4.798818 5.096826
ENSG00000000457 5.080011 5.210943 5.779817 5.263708 5.167477 5.180694
ENSG00000000460 4.689137 4.533195 3.847813 4.782534 4.700797 4.065408
ENSG00000000938 7.080071 3.699882 4.486049 4.526414 6.833086 7.997246
ENSG00000001036 4.068473 3.761218 4.405707 3.489676 4.878534 5.234014
ENSG00000001084 5.065843 4.396833 4.623827 4.751093 4.730640 4.640130
GenesCutoff=rownames(cpm_in_cutoff)
NormalizedGenesCuttoff=as.data.frame(cbind(Gene_stable_ID=GenesCutoff, cpm_in_cutoff))
hist(cpm_in_cutoff, xlab = "Log2(CPM)", main = "Log2(CPM) values for genes meeting the filtering criteria", breaks = 100 )
Voom transformation:
Species <- factor(metaData$Species)
design <- model.matrix(~ 0 + Species)
head(design)
SpeciesChimp SpeciesHuman
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
colnames(design) <- gsub("Species", "", dput(colnames(design)))
c("SpeciesChimp", "SpeciesHuman")
Voom creates a random effect.
# Voom with individual as a random variable
cpm.voom<- voom(counts_filtered, design, normalize.method="quantile", plot=T)
boxplot(cpm.voom$E, col = pal[as.numeric(metaData$Species)],las=2)
plotDensities(cpm.voom, col = pal[as.numeric(metaData$Species)], legend = "topleft")
Looks like i still have a skew on the lower side of the distribution.
# PCA
pca_genes <- prcomp(t(cpm.voom$E), scale = T)
scores <- pca_genes$x
eigsGene <- pca_genes$sdev^2
proportionG = eigsGene/sum(eigsGene)
plot(proportionG)
for (n in 1:2){
col.v <- pal[as.integer(metaData$Species)]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
#Clustering (original code from Julien Roux)
cors <- cor(cpm.voom$E, method="spearman", use="pairwise.complete.obs")
heatmap.2( cors, scale="none", col = colors, margins = c(12, 12), trace='none', denscol="white", labCol=labels, ColSideColors=pal[as.integer(as.factor(metaData$Species))], RowSideColors=pal[as.integer(as.factor(metaData$Species))+9], cexCol = 0.2 + 1/log10(15), cexRow = 0.2 + 1/log10(15))
One thing I can do is look at the correlation between the PCs and other factors in the data.
# PCA
pca_genes <- prcomp(t(cpm.voom$E), scale = F)
scores <- pca_genes$x
for (n in 1:2){
col.v <- pal[as.integer(metaData$Collection)]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
metaData$Extraction=as.factor(metaData$Extraction)
for (n in 1:2){
col.v <- pal[as.integer(metaData$Extraction)]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
It does not look like batch (who collected or extraction date batch)
cols = brewer.pal(9, "Blues")
palC = colorRampPalette(cols)
metaData$UndilutedAverageorder = findInterval(metaData$UndilutedAverage, sort(metaData$UndilutedAverage))
for (n in 1:2){
col.v <- palC(nrow(metaData))[metaData$UndilutedAverageorder]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
metaData$BioAConcorder = findInterval(metaData$BioAConc, sort(metaData$BioAConc))
for (n in 1:2){
col.v <- palC(nrow(metaData))[metaData$BioAConcorder]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
metaData$RinConcorder = findInterval(metaData$Rin, sort(metaData$Rin))
for (n in 1:2){
col.v <- palC(nrow(metaData))[metaData$RinConcorder]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
The samples do not cluster by collection concentration, RNA rin score or RNA concentration.
metaData$AssignedOrthoorder = findInterval(metaData$AssignedOrtho, sort(metaData$AssignedOrtho))
for (n in 1:2){
col.v <- palC(nrow(metaData))[metaData$AssignedOrthoorder]
plot_scores(pca_genes, scores, n, n+1, col.v)
}
They also do not cluster by number of reads mapping to ortho exons.
PCA heatmap: Code from Michelle Ward:
x.pca <- pca_genes
tech_factors <- metaData
tech_factors_sum <- tech_factors[,c(2:15)] %>% dplyr::select(-CollectionDate,-percentOrtho)
p_comps <- 1:6
pc_cov_cor <- matrix(nrow = ncol(tech_factors_sum), ncol = length(p_comps),
dimnames = list(colnames(tech_factors_sum), colnames(x.pca$x)[p_comps]))
for (pc in p_comps) {
for (covariate in 1:ncol(tech_factors_sum)) {
lm_result <- lm(x.pca$x[, pc] ~ tech_factors_sum[, covariate])
r2 <- summary(lm_result)$r.squared
pc_cov_cor[covariate, pc] <- r2
}
}
pc_cov_pval <- matrix(nrow = ncol(tech_factors_sum), ncol = length(p_comps),
dimnames = list(colnames(tech_factors_sum), colnames(x.pca$x)[p_comps]))
for (pc in p_comps) {
for (covariate_2 in 1:ncol(tech_factors_sum)) {
lm_result_2 <- lm(x.pca$x[, pc] ~ tech_factors_sum[, covariate_2])
pval <- anova(lm_result_2)$'Pr(>F)'[1]
pc_cov_pval[covariate_2, pc] <- pval
}
}
PCs <- c("PC1", "PC2", "PC3", "PC4", "PC5", "PC6")
Tech_fac <- colnames(tech_factors_sum)
#Tech_fac <- c("Species", "Individual", "O2.", "Condition" , "Sex", "RIN" , "CO2", "Purity_high", "Purity_med" ,
#"Expt_Batch", "RNA_Batch", "Library_Batch", "Seq_pool", "Episomal_integration" )
heatmap.2(as.matrix(pc_cov_cor[Tech_fac,PCs]),col=brewer.pal(4, "Greens"), trace="none",
Rowv=FALSE, Colv=FALSE, key=T, main="Cor. PCs & tech factors", dendrogram="none",
key.title=NA, cexRow=0.9, cexCol=0.9)
log10_pc_cov_pval <- -log(pc_cov_pval)
heatmap.2(as.matrix(log10_pc_cov_pval[Tech_fac,PCs]), col=brewer.pal(9, "Greens"), trace="none",
Rowv=FALSE, Colv=FALSE, key=T, main="-log10 pval of cor. PCs & tech factors", dendrogram="none",
key.title=NA, cexRow=0.9, cexCol=0.9)
Plot PVE by each PC:
eigs <- pca_genes$sdev^2
proportion = eigs/sum(eigs)
plot(proportion, main="PVE by each PC", xlab="PC")
Version | Author | Date |
---|---|---|
c453931 | brimittleman | 2020-01-05 |
proportion[4]
[1] 0.09109573
fit.cpm.voom = lmFit(cpm.voom, design, plot=T)
head(coef(fit.cpm.voom))
Chimp Human
ENSG00000000419 5.729563 4.863479
ENSG00000000457 4.862504 5.280516
ENSG00000000460 4.452581 4.461012
ENSG00000000938 8.146155 5.749239
ENSG00000001036 5.682363 4.278239
ENSG00000001084 4.398539 4.711799
testgenes= rownames(coef(fit.cpm.voom))
#df for average normalized expression
avgExp=as.data.frame(cbind(genes=testgenes,coef(fit.cpm.voom) ))
contr <- makeContrasts(Chimp - Human, levels = colnames(coef(fit.cpm.voom)))
contr
Contrasts
Levels Chimp - Human
Chimp 1
Human -1
tmp <- contrasts.fit(fit.cpm.voom, contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)
head(top.table, 20)
logFC AveExpr t P.Value adj.P.Val
ENSG00000105372 5.630358 8.110130 59.52351 1.485694e-16 1.508722e-12
ENSG00000204463 -6.197558 5.041060 -45.74372 3.777266e-15 1.142820e-11
ENSG00000205531 -5.940439 5.011790 -45.62134 3.903519e-15 1.142820e-11
ENSG00000142937 5.589075 8.074735 45.09454 4.501505e-15 1.142820e-11
ENSG00000137818 7.114468 6.794576 43.68229 6.651200e-15 1.350859e-11
ENSG00000071082 5.413729 7.296312 35.45376 8.574484e-14 1.451231e-10
ENSG00000186298 3.352417 6.680918 32.22943 2.748792e-13 3.974110e-10
ENSG00000148303 4.640610 7.595442 31.88765 3.130762e-13 3.974110e-10
ENSG00000116478 3.694113 6.385183 30.76256 4.852201e-13 5.474900e-10
ENSG00000147604 -7.064441 5.783547 -30.38644 5.637299e-13 5.724678e-10
ENSG00000088038 -4.999454 3.948451 -29.72131 7.382438e-13 6.815333e-10
ENSG00000072864 -6.857923 4.399064 -29.44551 8.270260e-13 6.998707e-10
ENSG00000183020 -5.130305 4.151035 -29.13161 9.423275e-13 7.361028e-10
ENSG00000128731 -3.587995 5.679996 -28.54326 1.207989e-12 8.762231e-10
ENSG00000161654 -3.666330 4.603805 -27.92183 1.578929e-12 1.040331e-09
ENSG00000179950 -5.983934 4.402584 -27.78236 1.678068e-12 1.040331e-09
ENSG00000167615 -6.313264 4.881549 -27.69762 1.741569e-12 1.040331e-09
ENSG00000145741 2.897300 7.633998 26.95215 2.426347e-12 1.368864e-09
ENSG00000105640 -2.953215 8.019379 -25.12854 5.677714e-12 2.925081e-09
ENSG00000049656 -2.675713 5.440187 -25.09842 5.760868e-12 2.925081e-09
B
ENSG00000105372 27.35947
ENSG00000204463 22.47175
ENSG00000205531 22.58641
ENSG00000142937 24.66798
ENSG00000137818 23.42239
ENSG00000071082 21.98589
ENSG00000186298 20.97083
ENSG00000148303 20.85556
ENSG00000116478 20.39436
ENSG00000147604 19.44754
ENSG00000088038 18.72823
ENSG00000072864 18.27882
ENSG00000183020 18.65049
ENSG00000128731 19.44355
ENSG00000161654 18.88385
ENSG00000179950 18.10333
ENSG00000167615 18.27649
ENSG00000145741 18.88448
ENSG00000105640 18.03812
ENSG00000049656 17.99535
length(which(top.table$adj.P.Val < 0.05))
[1] 3796
Make a table to plot:
-log10(bh adjusted pval) vs logFC (log3 fold change)
top.table2=top.table %>% mutate(Species=ifelse(logFC > 1 & adj.P.Val<.05, "Chimp", ifelse(logFC < -1 & adj.P.Val< .05, "Human", "Neither")))
ggplot(top.table2, aes(x=logFC, y= -log10(adj.P.Val))) + geom_point(aes(col=Species), alpha=.3) + scale_color_brewer(palette = "Dark2") + labs(title="LCL differential Expression")
Version | Author | Date |
---|---|---|
326f6e2 | brimittleman | 2020-01-10 |
summary(decideTests(tmp))
Chimp - Human
Down 1895
NotSig 6359
Up 1901
deGenes=as.data.frame(row.names(top.table[top.table$adj.P.Val < 0.05,]))
#mkdir ../data/DiffExpression
geneNames=as.data.frame(row.names(top.table))
colnames(geneNames)="genes"
deGenes_witheffect= cbind(geneNames,top.table) %>% filter(adj.P.Val < 0.05)
tested_witheffect=cbind(geneNames,top.table)
write.table(avgExp, "../data/DiffExpression/NoramalizedExpression.txt", col.names = T, row.names = F, quote = F)
write.table(deGenes,"../data/DiffExpression/DE_genes.txt", col.names = F, row.names = F, quote = F)
write.table(tested_witheffect,"../data/DiffExpression/DEtested_allres.txt", col.names = F, row.names = F, quote = F)
write.table(deGenes_witheffect,"../data/DiffExpression/DE_results5fdr.txt", col.names = T, row.names = F, quote = F)
write.table(testgenes,"../data/DiffExpression/DE_Testedgenes.txt", col.names = F, row.names = F, quote = F)
write.table(NormalizedGenesCuttoff, "../data/DiffExpression/NormalizedExpressionPassCutoff.txt", col.names=T, row.names=F, quote = F)
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] reshape2_1.4.3 RColorBrewer_1.1-2 VennDiagram_1.6.20
[4] futile.logger_1.4.3 R.utils_2.7.0 R.oo_1.22.0
[7] R.methodsS3_1.7.1 edgeR_3.24.0 limma_3.38.2
[10] gplots_3.0.1 scales_1.0.0 forcats_0.3.0
[13] stringr_1.3.1 dplyr_0.8.0.1 purrr_0.3.2
[16] readr_1.3.1 tidyr_0.8.3 tibble_2.1.1
[19] ggplot2_3.1.1 tidyverse_1.2.1 workflowr_1.5.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 locfit_1.5-9.1 lubridate_1.7.4
[4] lattice_0.20-38 gtools_3.8.1 assertthat_0.2.0
[7] rprojroot_1.3-2 digest_0.6.18 R6_2.3.0
[10] cellranger_1.1.0 plyr_1.8.4 futile.options_1.0.1
[13] backports_1.1.2 evaluate_0.12 httr_1.3.1
[16] pillar_1.3.1 rlang_0.4.0 lazyeval_0.2.1
[19] readxl_1.1.0 rstudioapi_0.10 gdata_2.18.0
[22] whisker_0.3-2 rmarkdown_1.10 labeling_0.3
[25] munsell_0.5.0 broom_0.5.1 compiler_3.5.1
[28] httpuv_1.4.5 modelr_0.1.2 pkgconfig_2.0.2
[31] htmltools_0.3.6 tidyselect_0.2.5 crayon_1.3.4
[34] withr_2.1.2 later_0.7.5 bitops_1.0-6
[37] nlme_3.1-137 jsonlite_1.6 gtable_0.2.0
[40] formatR_1.5 git2r_0.26.1 magrittr_1.5
[43] KernSmooth_2.23-15 cli_1.1.0 stringi_1.2.4
[46] fs_1.3.1 promises_1.0.1 xml2_1.2.0
[49] generics_0.0.2 lambda.r_1.2.3 tools_3.5.1
[52] glue_1.3.0 hms_0.4.2 yaml_2.2.0
[55] colorspace_1.3-2 caTools_1.17.1.1 rvest_0.3.2
[58] knitr_1.20 haven_1.1.2