Last updated: 2019-04-30

Checks: 6 0

Knit directory: HiCiPSC/

This reproducible R Markdown analysis was created with workflowr (version 1.2.0). The Report tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190311) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    analysis/.DS_Store
    Ignored:    code/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  Rplot.jpeg
    Untracked:  Rplot001.jpeg
    Untracked:  Rplot002.jpeg
    Untracked:  Rplot003.jpeg
    Untracked:  Rplot004.jpeg
    Untracked:  Rplot005.jpeg
    Untracked:  Rplot006.jpeg
    Untracked:  Rplot007.jpeg
    Untracked:  Rplot008.jpeg
    Untracked:  Rplot009.jpeg
    Untracked:  Rplot010.jpeg
    Untracked:  Rplot011.jpeg
    Untracked:  Rplot012.jpeg
    Untracked:  Rplot013.jpeg
    Untracked:  Rplot014.jpeg
    Untracked:  Rplot015.jpeg
    Untracked:  Rplot016.jpeg
    Untracked:  Rplot017.jpeg
    Untracked:  Rplot018.jpeg
    Untracked:  Rplot019.jpeg
    Untracked:  Rplot020.jpeg
    Untracked:  Rplot021.jpeg
    Untracked:  Rplot022.jpeg
    Untracked:  Rplot023.jpeg
    Untracked:  Rplot024.jpeg
    Untracked:  Rplot025.jpeg
    Untracked:  Rplot026.jpeg
    Untracked:  Rplot027.jpeg
    Untracked:  Rplot028.jpeg
    Untracked:  Rplot029.jpeg
    Untracked:  Rplot030.jpeg
    Untracked:  Rplot031.jpeg
    Untracked:  Rplot032.jpeg
    Untracked:  Rplot033.jpeg
    Untracked:  Rplot034.jpeg
    Untracked:  Rplot035.jpeg
    Untracked:  Rplot036.jpeg
    Untracked:  Rplot037.jpeg
    Untracked:  Rplot038.jpeg
    Untracked:  Rplot039.jpeg
    Untracked:  Rplot040.jpeg
    Untracked:  Rplot041.jpeg
    Untracked:  Rplot042.jpeg
    Untracked:  Rplot043.jpeg
    Untracked:  Rplot044.jpeg
    Untracked:  Rplot045.jpeg
    Untracked:  Rplot046.jpeg
    Untracked:  Rplot047.jpeg
    Untracked:  Rplot048.jpeg
    Untracked:  Rplot049.jpeg
    Untracked:  Rplot050.jpeg
    Untracked:  Rplot051.jpeg
    Untracked:  Rplot052.jpeg
    Untracked:  Rplot053.jpeg
    Untracked:  Rplot054.jpeg
    Untracked:  Rplot055.jpeg
    Untracked:  Rplot056.jpeg
    Untracked:  Rplot057.jpeg
    Untracked:  Rplot058.jpeg
    Untracked:  Rplot059.jpeg
    Untracked:  Rplot060.jpeg
    Untracked:  Rplot061.jpeg
    Untracked:  Rplot062.jpeg
    Untracked:  Rplot063.jpeg
    Untracked:  Rplot064.jpeg
    Untracked:  Rplot065.jpeg
    Untracked:  Rplot066.jpeg
    Untracked:  Rplot067.jpeg
    Untracked:  Rplot068.jpeg
    Untracked:  Rplot069.jpeg
    Untracked:  Rplot070.jpeg
    Untracked:  Rplot071.jpeg
    Untracked:  Rplot072.jpeg
    Untracked:  Rplot073.jpeg
    Untracked:  Rplot074.jpeg
    Untracked:  Rplot075.jpeg
    Untracked:  Rplot076.jpeg
    Untracked:  Rplot077.jpeg
    Untracked:  Rplot078.jpeg
    Untracked:  Rplot079.jpeg
    Untracked:  Rplot080.jpeg
    Untracked:  Rplot081.jpeg
    Untracked:  Rplot082.jpeg
    Untracked:  Rplot083.jpeg
    Untracked:  Rplot084.jpeg
    Untracked:  Rplot085.jpeg
    Untracked:  Rplot086.jpeg
    Untracked:  Rplot087.jpeg
    Untracked:  Rplot088.jpeg
    Untracked:  Rplot089.jpeg
    Untracked:  Rplot090.jpeg
    Untracked:  Rplot091.jpeg
    Untracked:  Rplot092.jpeg
    Untracked:  Rplot093.jpeg
    Untracked:  Rplot094.jpeg
    Untracked:  Rplot095.jpeg
    Untracked:  Rplot096.jpeg
    Untracked:  Rplot097.jpeg
    Untracked:  Rplot098.jpeg
    Untracked:  Rplot099.jpeg
    Untracked:  Rplot100.jpeg
    Untracked:  Rplot101.jpeg
    Untracked:  Rplot102.jpeg
    Untracked:  Rplot103.jpeg
    Untracked:  Rplot104.jpeg
    Untracked:  Rplot105.jpeg
    Untracked:  Rplot106.jpeg
    Untracked:  Rplot107.jpeg
    Untracked:  Rplot108.jpeg
    Untracked:  Rplot109.jpeg
    Untracked:  Rplot110.jpeg
    Untracked:  Rplot111.jpeg
    Untracked:  Rplot112.jpeg
    Untracked:  Rplot113.jpeg
    Untracked:  Rplot114.jpeg
    Untracked:  Rplot115.jpeg
    Untracked:  Rplot116.jpeg
    Untracked:  Rplot117.jpeg
    Untracked:  Rplot118.jpeg
    Untracked:  Rplot119.jpeg
    Untracked:  Rplot120.jpeg
    Untracked:  Rplot121.jpeg
    Untracked:  Rplot122.jpeg
    Untracked:  Rplot123.jpeg
    Untracked:  Rplot124.jpeg
    Untracked:  Rplot125.jpeg
    Untracked:  Rplot126.jpeg
    Untracked:  Rplot127.jpeg
    Untracked:  Rplot128.jpeg
    Untracked:  Rplot129.jpeg
    Untracked:  Rplot130.jpeg
    Untracked:  Rplot131.jpeg
    Untracked:  Rplot132.jpeg
    Untracked:  Rplot133.jpeg
    Untracked:  Rplot134.jpeg
    Untracked:  Rplot135.jpeg
    Untracked:  Rplot136.jpeg
    Untracked:  Rplot137.jpeg
    Untracked:  Rplot138.jpeg
    Untracked:  Rplot139.jpeg
    Untracked:  Rplot140.jpeg
    Untracked:  Rplot141.jpeg
    Untracked:  Rplot142.jpeg
    Untracked:  Rplot143.jpeg
    Untracked:  Rplot144.jpeg
    Untracked:  Rplot145.jpeg
    Untracked:  Rplot146.jpeg
    Untracked:  Rplot147.jpeg
    Untracked:  Rplot148.jpeg
    Untracked:  Rplot149.jpeg
    Untracked:  Rplot150.jpeg
    Untracked:  Rplot151.jpeg
    Untracked:  Rplot152.jpeg
    Untracked:  Rplot153.jpeg
    Untracked:  Rplot154.jpeg
    Untracked:  Rplot155.jpeg
    Untracked:  Rplot156.jpeg
    Untracked:  Rplot157.jpeg
    Untracked:  Rplot158.jpeg
    Untracked:  Rplot159.jpeg
    Untracked:  Rplot160.jpeg
    Untracked:  Rplot161.jpeg
    Untracked:  Rplot162.jpeg
    Untracked:  Rplot163.jpeg
    Untracked:  Rplot164.jpeg
    Untracked:  Rplot165.jpeg
    Untracked:  Rplot166.jpeg
    Untracked:  Rplot167.jpeg
    Untracked:  Rplot168.jpeg
    Untracked:  Rplot169.jpeg
    Untracked:  Rplot170.jpeg
    Untracked:  Rplot171.jpeg
    Untracked:  Rplot172.jpeg
    Untracked:  Rplot173.jpeg
    Untracked:  Rplot174.jpeg
    Untracked:  Rplot175.jpeg
    Untracked:  Rplot176.jpeg
    Untracked:  Rplot177.jpeg
    Untracked:  Rplot178.jpeg
    Untracked:  Rplot179.jpeg
    Untracked:  Rplot180.jpeg
    Untracked:  Rplot181.jpeg
    Untracked:  Rplot182.jpeg
    Untracked:  Rplot183.jpeg
    Untracked:  Rplot184.jpeg
    Untracked:  Rplot185.jpeg
    Untracked:  Rplot186.jpeg
    Untracked:  Rplot187.jpeg
    Untracked:  Rplot188.jpeg
    Untracked:  Rplot189.jpeg
    Untracked:  Rplot190.jpeg
    Untracked:  Rplot191.jpeg
    Untracked:  Rplot192.jpeg
    Untracked:  Rplot193.jpeg
    Untracked:  Rplot194.jpeg
    Untracked:  Rplot195.jpeg
    Untracked:  Rplot196.jpeg
    Untracked:  Rplot197.jpeg
    Untracked:  Rplot198.jpeg
    Untracked:  Rplot199.jpeg
    Untracked:  Rplot200.jpeg
    Untracked:  Rplot201.jpeg
    Untracked:  Rplot202.jpeg
    Untracked:  Rplot203.jpeg
    Untracked:  Rplot204.jpeg
    Untracked:  Rplot205.jpeg
    Untracked:  Rplot206.jpeg
    Untracked:  Rplot207.jpeg
    Untracked:  Rplot208.jpeg
    Untracked:  Rplot209.jpeg
    Untracked:  Rplot210.jpeg
    Untracked:  Rplot211.jpeg
    Untracked:  Rplot212.jpeg
    Untracked:  Rplot213.jpeg
    Untracked:  Rplot214.jpeg
    Untracked:  Rplot215.jpeg
    Untracked:  Rplot216.jpeg
    Untracked:  Rplot217.jpeg
    Untracked:  Rplot218.jpeg
    Untracked:  Rplot219.jpeg
    Untracked:  Rplot220.jpeg
    Untracked:  Rplot221.jpeg
    Untracked:  Rplot222.jpeg
    Untracked:  Rplot223.jpeg
    Untracked:  Rplot224.jpeg
    Untracked:  Rplot225.jpeg
    Untracked:  Rplot226.jpeg
    Untracked:  Rplot227.jpeg
    Untracked:  Rplot228.jpeg
    Untracked:  Rplot229.jpeg
    Untracked:  Rplot230.jpeg
    Untracked:  Rplot231.jpeg
    Untracked:  Rplot232.jpeg
    Untracked:  Rplot233.jpeg
    Untracked:  Rplot234.jpeg
    Untracked:  Rplot235.jpeg
    Untracked:  Rplot236.jpeg
    Untracked:  Rplot237.jpeg
    Untracked:  Rplot238.jpeg
    Untracked:  Rplot239.jpeg
    Untracked:  Rplot240.jpeg
    Untracked:  Rplot241.jpeg
    Untracked:  Rplot242.jpeg
    Untracked:  Rplot243.jpeg
    Untracked:  Rplot244.jpeg
    Untracked:  Rplot245.jpeg
    Untracked:  Rplot246.jpeg
    Untracked:  Rplot247.jpeg
    Untracked:  Rplot248.jpeg
    Untracked:  Rplot249.jpeg
    Untracked:  Rplot250.jpeg
    Untracked:  Rplot251.jpeg
    Untracked:  Rplot252.jpeg
    Untracked:  Rplot253.jpeg
    Untracked:  Rplot254.jpeg
    Untracked:  Rplot255.jpeg
    Untracked:  Rplot256.jpeg
    Untracked:  Rplot257.jpeg
    Untracked:  Rplot258.jpeg
    Untracked:  Rplot259.jpeg
    Untracked:  Rplot260.jpeg
    Untracked:  Rplot261.jpeg
    Untracked:  Rplot262.jpeg
    Untracked:  Rplot263.jpeg
    Untracked:  Rplot264.jpeg
    Untracked:  Rplot265.jpeg
    Untracked:  Rplot266.jpeg
    Untracked:  Rplot267.jpeg
    Untracked:  Rplot268.jpeg
    Untracked:  Rplot269.jpeg
    Untracked:  Rplot270.jpeg
    Untracked:  Rplot271.jpeg
    Untracked:  Rplot272.jpeg
    Untracked:  Rplot273.jpeg
    Untracked:  Rplot274.jpeg
    Untracked:  Rplot275.jpeg
    Untracked:  Rplot276.jpeg
    Untracked:  Rplot277.jpeg
    Untracked:  Rplot278.jpeg
    Untracked:  Rplot279.jpeg
    Untracked:  Rplot280.jpeg
    Untracked:  Rplot281.jpeg
    Untracked:  Rplot282.jpeg
    Untracked:  Rplot283.jpeg
    Untracked:  Rplot284.jpeg
    Untracked:  Rplot285.jpeg
    Untracked:  Rplot286.jpeg
    Untracked:  Rplot287.jpeg
    Untracked:  Rplot288.jpeg
    Untracked:  Rplot289.jpeg
    Untracked:  Rplot290.jpeg
    Untracked:  Rplot291.jpeg
    Untracked:  Rplot292.jpeg
    Untracked:  Rplot293.jpeg
    Untracked:  Rplot294.jpeg
    Untracked:  Rplot295.jpeg
    Untracked:  Rplot296.jpeg
    Untracked:  Rplot297.jpeg
    Untracked:  Rplot298.jpeg
    Untracked:  Rplot299.jpeg
    Untracked:  Rplot300.jpeg
    Untracked:  Rplot301.jpeg
    Untracked:  Rplot302.jpeg
    Untracked:  Rplot303.jpeg
    Untracked:  Rplot304.jpeg
    Untracked:  S2A.jpeg
    Untracked:  S2B.jpeg
    Untracked:  analysis/juicer_enrichment.Rmd
    Untracked:  code/mediate.test.regressing.R
    Untracked:  data/Chimp_orthoexon_extended_info.txt
    Untracked:  data/Human_orthoexon_extended_info.txt
    Untracked:  data/Meta_data.txt
    Untracked:  data/TADs/
    Untracked:  data/chimp_lengths.txt
    Untracked:  data/counts_iPSC.txt
    Untracked:  data/epigenetic_enrichments/
    Untracked:  data/final.10kb.homer.df
    Untracked:  data/final.juicer.10kb.KR
    Untracked:  data/final.juicer.10kb.VC
    Untracked:  data/hic_gene_overlap/
    Untracked:  data/human_lengths.txt
    Untracked:  data/old_mediation_permutations/
    Untracked:  docs/figure/juicer_initial_QC.Rmd/
    Untracked:  output/DC_regions.txt
    Untracked:  output/IEE.RPKM.RDS
    Untracked:  output/IEE_voom_object.RDS
    Untracked:  output/data.4.filtered.lm.QC
    Untracked:  output/data.4.fixed.init.LM
    Untracked:  output/data.4.fixed.init.QC
    Untracked:  output/data.4.init.LM
    Untracked:  output/data.4.init.QC
    Untracked:  output/data.4.lm.QC
    Untracked:  output/full.data.10.init.LM
    Untracked:  output/full.data.10.init.QC
    Untracked:  output/full.data.10.lm.QC
    Untracked:  output/full.data.annotations.RDS
    Untracked:  output/gene.hic.filt.RDS
    Untracked:  output/juicer.filt.KR
    Untracked:  output/juicer.filt.KR.lm
    Untracked:  output/juicer.filt.VC
    Untracked:  output/juicer.filt.VC.lm

Unstaged changes:
    Modified:   analysis/TADs.Rmd
    Modified:   analysis/gene_expression.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 8380e8b Ittai Eres 2019-04-30 Add variety of juicer analyses into website files.

###Linear Modeling Having performed some basic quality control and filtering on the data, I now move to quantify how well Hi-C interaction frequency values can be predicted from species identity. To accomplish this, I utilize limma to linearly model Hi-C interaction frequencies as a function of all the metadata about each sample’s species, sex, and batch. I then add information from linear modeling to the full dataframe, including log-fold changes between species, p-values, betas, and variances. I then make a plot of the mean vs. the log-fold change as an MA plot proxy for quality control. I move on to look at distributions of all the p-values and betas for each of the terms, make some QQ plots to check for normality in the p-values for each term, and represent the log-fold change and p-values in a volcano plot of the data.

###Read in data, normalized in initial_QC document, with condition of at least 2 individuals for discovery.
filt.KR <- fread("output/juicer.filt.KR", header = TRUE, data.table=FALSE, stringsAsFactors = FALSE, showProgress = FALSE)
filt.VC <- fread("output/juicer.filt.VC", header=TRUE, data.table=FALSE, stringsAsFactors=FALSE, showProgress = FALSE)

###Reassign metadata here, just for quick reference.
meta.data <- data.frame("SP"=c("H", "H", "C", "C", "H", "H", "C", "C"), "SX"=c("F", "M", "M", "F", "M", "F", "M", "F"), "Batch"=c(1, 1, 1, 1, 2, 2, 2, 2))

###Now to move on to linear modeling, doing it in both the full data set and the | 4 individuals condition. I utilize limma to make this quick, parllelizable, and simple. First make the model, then do the actual fitting, and finally do multiple testing adjustment with topTable.
design <- model.matrix(~1+meta.data$SP+meta.data$SX+meta.data$Batch) #Parameterize the linear model, with an intercept, and fixed effect terms for species, sex, and library prep batch. If you prefer to think in contrasts, my contrast is humans minus chimps. I prefer to think of one species as the baseline in the linear model, and in this case that's chimps (so chimps get a 0 for species, and humans get a 1).
lmFit(filt.KR[,112:119], design) %>% eBayes(.) -> model.KR
lmFit(filt.VC[,112:119], design) %>% eBayes(.) -> model.VC
volc.KR <- topTable(model.KR, coef = 2, sort.by = "none", number = Inf)
volc.VC <- topTable(model.VC, coef=2, sort.by="none", number=Inf)

###Now append the information extracted from linear modeling (p-values and betas for species, sex, and batch) to the 2 data frames:
filt.KR$sp_beta <- volc.KR$logFC #Beta already is logFC since values are log2(obs/exp) Hi-C reads.
filt.KR$sp_pval <- volc.KR$P.Value #Unadjusted P-val
filt.KR$sp_BH_pval <- volc.KR$adj.P.Val #Benjamini-Hochberg adjusted p-values
filt.KR$avg_IF <- volc.KR$AveExpr #Average Interaction Frequency value across all individuals. Useful later for a variety of plots.
filt.KR$t_statistic <- volc.KR$t #t-statistic from limma topTable
filt.KR$B_statistic <- volc.KR$B #B-statistic (LOD) from limma topTable
filt.KR$sx_pval <- model.KR$p.value[,3]
filt.KR$btc_pval <- model.KR$p.value[,4]
filt.KR$sx_beta <- model.KR$coefficients[,3]
filt.KR$btc_beta <- model.KR$coefficients[,4]
filt.KR$SE <- sqrt(model.KR$s2.post)*model.KR$stdev.unscaled[,2]

filt.VC$sp_beta <- volc.VC$logFC #Beta already is logFC since values are log2(obs/exp) Hi-C reads.
filt.VC$sp_pval <- volc.VC$P.Value #Unadjusted P-val
filt.VC$sp_BH_pval <- volc.VC$adj.P.Val #Benjamini-Hochberg adjusted p-values
filt.VC$avg_IF <- volc.VC$AveExpr #Average Interaction Frequency value across all individuals. Useful later for a variety of plots.
filt.VC$t_statistic <- volc.VC$t #t-statistic from limma topTable
filt.VC$B_statistic <- volc.VC$B #B-statistic (LOD) from limma topTable
filt.VC$sx_pval <- model.VC$p.value[,3]
filt.VC$btc_pval <- model.VC$p.value[,4]
filt.VC$sx_beta <- model.VC$coefficients[,3]
filt.VC$btc_beta <- model.VC$coefficients[,4]
filt.VC$SE <- sqrt(model.VC$s2.post)*model.VC$stdev.unscaled[,2]

####NONE of these figures are in any supplement so WHO CURRS

###Before moving to any actual assessment of the linear modeling, do a QC check by producing MA plots for both the full set and | 2 individuals. The MA plot here is the mean of Hi-C contact frequencies (avg_IF) on the x-axis, and the logFC (species beta, in this case) on the y-axis. What we are generally looking for is a fairly uniform cloud around 0 stretching across the x-axis.
ggplot(data=filt.KR[,c('avg_IF','sp_beta')], aes(x=avg_IF, y=sp_beta)) + geom_point() + xlab("Means of Hi-C Contact Frequencies") + ylab("Species Beta (log ratio H/C)") + ggtitle("MA Plot, KR Data") + coord_cartesian(xlim=c(-5, 500), ylim=c(-50, 50))

ggplot(data=filt.VC[,c('avg_IF','sp_beta')], aes(x=avg_IF, y=sp_beta)) + geom_point() + xlab("Means of Hi-C Contact Frequencies") + ylab("Species Beta (log ratio H/C)") + ggtitle("MA Plot, VC Data") + coord_cartesian(xlim=c(-5, 500), ylim=c(-50, 50))

###Check some of the more basic linear modeling issues: distributions of betas, p-vals, QQplot, volcano plot, etc.
##Check p-vals and betas for all covariates in the full model, on all the data.
ggplot(data=filt.KR[,120:140], aes(x=sp_pval)) + geom_histogram(binwidth=0.01) + ggtitle("P-vals for Species, KR") + xlab("P-value") + coord_cartesian(ylim=c(0, 1000))

ggplot(data=filt.KR[,120:140], aes(x=sx_pval)) + geom_histogram(binwidth=0.01) + ggtitle("P-vals for Sex, KR") + xlab("P-value") + coord_cartesian(ylim=c(0, 1000))

ggplot(data=filt.KR[,120:140], aes(x=btc_pval)) + geom_histogram(binwidth=0.01) + ggtitle("P-vals for Batch, KR") + xlab("P-value") + coord_cartesian(ylim=c(0, 1000))

ggplot(data=filt.KR[,120:140], aes(x=sp_beta)) + geom_histogram(binwidth=0.01) + ggtitle("Betas for Species, KR") + xlab("Beta")  + coord_cartesian(xlim=c(-4.5, 4.5), ylim=c(0, 200))

ggplot(data=filt.KR[,120:140], aes(x=sx_beta)) + geom_histogram(binwidth=0.01) + ggtitle("Betas for Sex, KR") + xlab("Beta")  + coord_cartesian(xlim=c(-4.5, 4.5), ylim=c(0, 200))

ggplot(data=filt.KR[,120:140], aes(x=btc_beta)) + geom_histogram(binwidth=0.01) + ggtitle("Betas for Batch, KR") + xlab("Beta")  + coord_cartesian(xlim=c(-4.5, 4.5), ylim=c(0, 200))

ggplot(data=filt.VC[,120:140], aes(x=sp_pval)) + geom_histogram(binwidth=0.01) + ggtitle("P-vals for Species, VC") + xlab("P-value") + coord_cartesian(ylim=c(0, 1500))

ggplot(data=filt.VC[,120:140], aes(x=sx_pval)) + geom_histogram(binwidth=0.01) + ggtitle("P-vals for Sex, VC") + xlab("P-value") + coord_cartesian(ylim=c(0, 1500))

ggplot(data=filt.VC[,120:140], aes(x=btc_pval)) + geom_histogram(binwidth=0.01) + ggtitle("P-vals for Batch, VC") + xlab("P-value") + coord_cartesian(ylim=c(0, 1500))

ggplot(data=filt.VC[,120:140], aes(x=sp_beta)) + geom_histogram(binwidth=0.01) + ggtitle("Betas for Species, VC") + xlab("Beta")  + coord_cartesian(xlim=c(-4.5, 4.5), ylim=c(0, 210))

ggplot(data=filt.VC[,120:140], aes(x=sx_beta)) + geom_histogram(binwidth=0.01) + ggtitle("Betas for Sex, VC") + xlab("Beta")  + coord_cartesian(xlim=c(-4.5, 4.5), ylim=c(0, 210))

ggplot(data=filt.VC[,120:140], aes(x=btc_beta)) + geom_histogram(binwidth=0.01) + ggtitle("Betas for Batch, VC") + xlab("Beta")  + coord_cartesian(xlim=c(-4.5, 4.5), ylim=c(0, 210))

#These all look pretty good, with p-val distributions for sex and batch being fairly uniform, and betas all around being fairly normally distributed. We see distinct enrichment for significant p-values for the species term, which is what we were hoping for! Also of note is the fact that the beta distribution for species looks wider than those for batch and sex, reassuring us that species is a driving factor in differential Hi-C contacts.

###Now, to double-check on the p-values with some QQplots. First, I define a function for creating a qqplot easily and coloring elements of the distribution in order to understand where most of the density on the plot is:
newqqplot=function(pvals, quant, title){  
  len = length(pvals)
  res=qqplot(-log10((1:len)/(1+len)),pvals,plot.it=F)
  plot(res$x,res$y, main=title, xlab="Theoretical", ylab="Actual", col=ifelse(res$y>as.numeric(quantile(res$y, quant[1])), ifelse(res$y>as.numeric(quantile(res$y, quant[2])), "red", "blue"), "black"))
  abline(0, 1)
}

##Start QQplotting some of these actual values.
newqqplot(-log10(filt.KR$sp_pval), c(0.5, 0.75), "QQ Plot, Species P-vals, KR Data")

newqqplot(-log10(filt.KR$sx_pval), c(0.5, 0.75), "QQ Plot, Sex P-vals, KR Data")

newqqplot(-log10(filt.KR$btc_pval), c(0.5, 0.75), "QQ Plot, Batch P-vals, KR Data")

newqqplot(-log10(filt.VC$sp_pval), c(0.5, 0.75), "QQ Plot, Species P-vals, VC Data")

newqqplot(-log10(filt.VC$sx_pval), c(0.5, 0.75), "QQ Plot, Sex P-vals, VC Data")

newqqplot(-log10(filt.VC$btc_pval), c(0.5, 0.75), "QQ Plot, Batch P-vals, VC Data")

###MAIN PAPER FIG 2
#Volcano plots of the data colored by species of discovery:
volcplot.KR <- data.frame(pval=-log10(filt.KR$sp_pval), beta=filt.KR$sp_beta, species=filt.KR$disc_species)
ggplot(data=volcplot.KR, aes(x=beta, y=pval, color=species)) + geom_point() + xlab("Log2 Fold Change in KR Contact Frequency") + ylab("P-values") + ggtitle("KR Contact Frequency Differences Colored by Species of Discovery") + coord_cartesian(xlim=c(-6, 6))

#ggplot(data=volcplot.KR, aes(x=beta, y=pval)) + geom_point() + xlab("Log2 Fold Change in Contact Frequency") + ylab("-log10 BH-corrected p-values") + ggtitle("Contact Frequency Differences, KR") + coord_cartesian(xlim=c(-6, 6))

volcplot.VC <- data.frame(pval=-log10(filt.VC$sp_pval), beta=filt.VC$sp_beta, species=filt.VC$disc_species)
ggplot(data=volcplot.VC, aes(x=beta, y=pval, color=species)) + geom_point() + xlab("Log2 Fold Change in VC Contact Frequency") + ylab("P-values") + ggtitle("VC Contact Frequency Differences Colored by Species of Discovery") + coord_cartesian(xlim=c(-6, 6))

#ggplot(data=volcplot.VC, aes(x=beta, y=pval)) + geom_point() + xlab("Log2 Fold Change in Contact Frequency") + ylab("-log10 BH-corrected p-values") + ggtitle("Contact Frequency Differences, VC") + coord_cartesian(xlim=c(-6, 6))

#Write out the data with the new columns added on!
fwrite(filt.KR, "output/juicer.filt.KR.lm", quote = TRUE, sep = "\t", row.names = FALSE, col.names = TRUE, na="NA", showProgress = FALSE)
fwrite(filt.VC, "output/juicer.filt.VC.lm", quote=TRUE, sep="\t", row.names=FALSE, col.names=TRUE, na="NA", showProgress=FALSE)

From this analysis two concerns emerge for further exploration: the QQ plots for the species p-values seem inflated, and the volcano plot seems highly asymmetrical. I’ll examine this further in the next analysis on linear modeling QC.



sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] bedr_1.0.6         forcats_0.4.0      purrr_0.3.2       
 [4] readr_1.3.1        tibble_2.1.1       tidyverse_1.2.1   
 [7] edgeR_3.20.9       RColorBrewer_1.1-2 heatmaply_0.15.2  
[10] viridis_0.5.1      viridisLite_0.3.0  stringr_1.4.0     
[13] gplots_3.0.1.1     Hmisc_4.2-0        Formula_1.2-3     
[16] survival_2.44-1    lattice_0.20-38    dplyr_0.8.0.1     
[19] plotly_4.8.0       cowplot_0.9.4      ggplot2_3.1.0     
[22] reshape2_1.4.3     data.table_1.12.0  tidyr_0.8.3       
[25] plyr_1.8.4         limma_3.34.9      

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1     class_7.3-15         modeltools_0.2-22   
  [4] mclust_5.4.3         rprojroot_1.3-2      htmlTable_1.13.1    
  [7] futile.logger_1.4.3  base64enc_0.1-3      fs_1.2.7            
 [10] rstudioapi_0.10      flexmix_2.3-15       mvtnorm_1.0-8       
 [13] lubridate_1.7.4      xml2_1.2.0           R.methodsS3_1.7.1   
 [16] codetools_0.2-16     splines_3.4.0        robustbase_0.92-8   
 [19] knitr_1.22           jsonlite_1.6         workflowr_1.2.0     
 [22] broom_0.5.1          cluster_2.0.7-1      kernlab_0.9-27      
 [25] R.oo_1.22.0          httr_1.4.0           backports_1.1.3     
 [28] assertthat_0.2.1     Matrix_1.2-15        lazyeval_0.2.2      
 [31] cli_1.1.0            formatR_1.6          acepack_1.4.1       
 [34] htmltools_0.3.6      tools_3.4.0          gtable_0.3.0        
 [37] glue_1.3.1           Rcpp_1.0.1           cellranger_1.1.0    
 [40] trimcluster_0.1-2.1  gdata_2.18.0         nlme_3.1-137        
 [43] iterators_1.0.10     fpc_2.1-11.1         xfun_0.5            
 [46] testthat_2.0.1       rvest_0.3.2          gtools_3.8.1        
 [49] dendextend_1.10.0    DEoptimR_1.0-8       MASS_7.3-51.1       
 [52] scales_1.0.0         TSP_1.1-6            hms_0.4.2           
 [55] parallel_3.4.0       lambda.r_1.2.3       yaml_2.2.0          
 [58] gridExtra_2.3        rpart_4.1-13         latticeExtra_0.6-28 
 [61] stringi_1.4.3        gclus_1.3.2          foreach_1.4.4       
 [64] checkmate_1.9.1      seriation_1.2-3      caTools_1.17.1.2    
 [67] rlang_0.3.3          pkgconfig_2.0.2      prabclus_2.2-7      
 [70] bitops_1.0-6         evaluate_0.13        labeling_0.3        
 [73] htmlwidgets_1.3      tidyselect_0.2.5     magrittr_1.5        
 [76] R6_2.4.0             generics_0.0.2       pillar_1.3.1        
 [79] haven_2.1.0          whisker_0.3-2        foreign_0.8-71      
 [82] withr_2.1.2          nnet_7.3-12          modelr_0.1.4        
 [85] crayon_1.3.4         futile.options_1.0.1 KernSmooth_2.23-15  
 [88] rmarkdown_1.12       locfit_1.5-9.1       grid_3.4.0          
 [91] readxl_1.3.1         git2r_0.25.2         digest_0.6.18       
 [94] diptest_0.75-7       webshot_0.5.1        VennDiagram_1.6.20  
 [97] R.utils_2.8.0        stats4_3.4.0         munsell_0.5.0       
[100] registry_0.5-1