Last updated: 2019-03-14
Checks: 6 0
Knit directory: HiCiPSC/
This reproducible R Markdown analysis was created with workflowr (version 1.2.0). The Report tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190311)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Untracked files:
Untracked: Rplot.jpeg
Untracked: S2A.jpeg
Untracked: S2B.jpeg
Untracked: data/Chimp_orthoexon_extended_info.txt
Untracked: data/Human_orthoexon_extended_info.txt
Untracked: data/Meta_data.txt
Untracked: data/chimp_lengths.txt
Untracked: data/counts_iPSC.txt
Untracked: data/final.10kb.homer.df
Untracked: data/final.juicer.10kb.KR
Untracked: data/final.juicer.10kb.VC
Untracked: data/human_lengths.txt
Untracked: output/DC_regions.txt
Untracked: output/IEE.RPKM.RDS
Untracked: output/IEE_voom_object.RDS
Untracked: output/data.4.filtered.lm.QC
Untracked: output/data.4.fixed.init.LM
Untracked: output/data.4.fixed.init.QC
Untracked: output/data.4.init.LM
Untracked: output/data.4.init.QC
Untracked: output/data.4.lm.QC
Untracked: output/full.data.10.init.LM
Untracked: output/full.data.10.init.QC
Untracked: output/full.data.10.lm.QC
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a6451a3 | Ittai Eres | 2019-03-14 | Add gene expression overlay (still needs more modification); update index |
First, load necessary libraries: limma, plyr, tidyr, data.table, reshape2, cowplot, plotly, dplyr, Hmisc, gplots, stringr, heatmaply, RColorBrewer, edgeR, tidyverse, and compiler
Here I read in a dataframe of counts summarized at the gene-level, then do some pre-processing and normalization to obtain a voom object to later run linear modeling analyses on.
#Read in counts data, create DGEList object out of them and convert to log counts per million (CPM). Also read in metadata.
setwd("/Users/ittaieres/HiCiPSC")
counts <- fread("data/counts_iPSC.txt", header=TRUE, data.table=FALSE, stringsAsFactors = FALSE, na.strings=c("NA",""))
colnames(counts) <- c("genes", "C-3649", "G-3624", "H-3651", "D-40300", "F-28834", "B-28126", "E-28815", "A-21792")
rownames(counts) <- counts$genes
counts <- counts[,-1]
dge <- DGEList(counts, genes=rownames(counts))
#Now, convert counts into RPKM to account for gene length differences between species. First load in and re-organize metadata, then the gene lengths for both species, and then the function to convert counts to RPKM.
meta_data <- fread("data/Meta_data.txt", sep="\t",stringsAsFactors = FALSE,header=T,na.strings=c("NA",""))
meta_data$fullID <- c("C-3649", "H-3651", "B-28126", "D-40300", "G-3624", "A-21792", "E-28815", "F-28834")
ord <- data.frame(fullID=colnames(counts)) #Pull order of samples from expression object
left_join(ord, meta_data, by="fullID") -> group_ref #left join meta data to this to make sure sample IDs correct
Warning: Column `fullID` joining factor and character vector, coercing into
character vector
#Read in human and chimp gene lengths for the RPKM function:
human_lengths<- fread("data/human_lengths.txt", sep="\t",stringsAsFactors = FALSE,header=T,na.strings=c("NA",""))
chimp_lengths<- fread("data/chimp_lengths.txt", sep="\t",stringsAsFactors = FALSE,header=T,na.strings=c("NA",""))
#The function for RPKM conversion.
vRPKM <- function(expr.obj,chimp_lengths,human_lengths,meta4) {
if (is.null(expr.obj$E)) {
meta4%>%filter(SP=="C" & fullID %in% colnames(counts))->chimp_meta
meta4%>%filter(SP=="H" & fullID %in% colnames(counts))->human_meta
#using RPKM function:
#Put genes in correct order:
expr.obj$genes %>%select(Geneid=genes)%>%
left_join(.,chimp_lengths,by="Geneid")%>%select(Geneid,ch.length)->chlength
expr.obj$genes %>%select(Geneid=genes)%>%
left_join(.,human_lengths,by="Geneid")%>%select(Geneid,hu.length)->hulength
#Chimp RPKM
expr.obj$genes$Length<-(chlength$ch.length)
RPKMc=rpkm(expr.obj,normalized.lib.sizes=TRUE, log=TRUE)
RPKMc[,colnames(RPKMc) %in% chimp_meta$fullID]->rpkm_chimp
#Human RPKM
expr.obj$genes$Length<-hulength$hu.length
RPKMh=rpkm(expr.obj,normalized.lib.sizes=TRUE, log=TRUE)
RPKMh[,colnames(RPKMh) %in% human_meta$fullID]->rpkm_human
cbind(rpkm_chimp,rpkm_human)->allrpkm
expr.obj$E <- allrpkm
return(expr.obj)
}
else {
#Pull out gene order from voom object and add in gene lengths from feature counts file
#Put genes in correct order:
expr.obj$genes %>%select(Geneid=genes)%>%
left_join(.,chimp_lengths,by="Geneid")%>%select(Geneid,ch.length)->chlength
expr.obj$genes %>%select(Geneid=genes)%>%
left_join(.,human_lengths,by="Geneid")%>%select(Geneid,hu.length)->hulength
#Filter meta data to be able to separate human and chimp
meta4%>%filter(SP=="C")->chimp_meta
meta4%>%filter(SP=="H")->human_meta
#Pull out the expression data in cpm to convert to RPKM
expr.obj$E->forRPKM
forRPKM[,colnames(forRPKM) %in% chimp_meta$fullID]->rpkm_chimp
forRPKM[,colnames(forRPKM) %in% human_meta$fullID]->rpkm_human
#Make log2 in KB:
row.names(chlength)=chlength$Geneid
chlength %>% select(-Geneid)->chlength
as.matrix(chlength)->chlength
row.names(hulength)=hulength$Geneid
hulength %>% select(-Geneid)->hulength
as.matrix(hulength)->hulength
log2(hulength/1000)->l2hulength
log2(chlength/1000)->l2chlength
#Subtract out log2 kb:
sweep(rpkm_chimp, 1,l2chlength,"-")->chimp_rpkm
sweep(rpkm_human, 1,l2hulength,"-")->human_rpkm
colnames(forRPKM)->column_order
cbind(chimp_rpkm,human_rpkm)->vRPKMS
#Put RPKMS back into the VOOM object:
expr.obj$E <- (vRPKMS[,colnames(vRPKMS) %in% column_order])
return(expr.obj)
}
}
dge <- vRPKM(dge, chimp_lengths, human_lengths, group_ref) #Normalize via log2 RPKM.
#A typical low-expression filtering step: use default prior count adding (0.25), and filtering out anything that has fewer than half the individuals within each species having logCPM less than 1.5 (so want 2 humans AND 2 chimps with log2CPM >= 1.5)
lcpms <- cpm(dge$counts, log=TRUE) #Obtain log2CPM!
good.chimps <- which(rowSums(lcpms[,1:4]>=1.5)>=2) #Obtain good chimp indices
good.humans <- which(rowSums(lcpms[,5:8]>=1.5)>=2) #Obtain good human indices
filt <- good.humans[which(good.humans %in% good.chimps)] #Subsets us down to a solid 11,292 genes--will go for a similar percentage with RPKM cutoff vals! (25.6% of total)
#Repeat filtering step, this time on RPKMs. 0.4 was chosen as a cutoff as it obtains close to the same results as 1.5 lcpm (in terms of percentage of genes retained)
good.chimps <- which(rowSums(dge$E[,1:4]>=0.4)>=2) #Obtain good chimp indices.
good.humans <- which(rowSums(dge$E[,5:8]>=0.4)>=2) #Obtain good human indices.
RPKM_filt <- good.humans[which(good.humans %in% good.chimps)] #Still leaves us with 11,946 genes (27.1% of total)
#Do the actual filtering.
dge_filt <- dge[RPKM_filt,]
dge_filt$E <- dge$E[RPKM_filt,]
dge_filt$counts <- dge$counts[RPKM_filt,]
dge_filt$lcpm_counts <- cpm(dge$counts, log=TRUE)[RPKM_filt,] #Add this in to be able to look at log cpms later
dge_final <- calcNormFactors(dge_filt, method="TMM") #Calculate normalization factors with trimmed mean of M-values (TMM).
dge_norm <- calcNormFactors(dge, method="TMM") #Calculate normalization factors with TMM on dge before filtering out lowly expressed genes, for normalization visualization.
#Quick visualization of the filtering I've just performed:
col <- brewer.pal(8, "Paired")
par(mfrow=c(1,2))
plot(density(dge$E[,1]), col=col[1], lwd=2, ylim=c(0,0.35), las=2,
main="", xlab="")
title(main="A. Raw data", xlab="Log-RPKM")
abline(v=1, lty=3)
for (i in 2:8){
den <- density(dge$E[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", colnames(dge$E[,1:8]), text.col=col, bty="n")
plot(density(dge_final$E[,1]), col=col[1], lwd=2, ylim=c(0,0.35), las=2,
main="", xlab="")
title(main="B. Filtered data", xlab="Log-RPKM")
abline(v=1, lty=3)
for (i in 2:8){
den <- density(dge_final$E[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", colnames(dge_final[,1:8]), text.col=col, bty="n")
#Quick visualization of the normalization on the whole set of genes.
col <- brewer.pal(8, "Paired")
raw <- as.data.frame(dge$E[,1:8])
normed <- as.data.frame(dge_norm$E[,1:8])
par(mfrow=c(1,2))
boxplot(raw, las=2, col=col, main="")
title(main="Unnormalized data",ylab="Log-RPKM")
boxplot(normed, las=2, col=col, main="")
title(main="Normalized data",ylab="Log-RPKM")
#Now, observe normalization on the filtered set of genes.
col <- brewer.pal(8, "Paired")
raw <- as.data.frame(dge_filt$E[,1:8])
normed <- as.data.frame(dge_final$E[,1:8])
par(mfrow=c(1,2))
boxplot(raw, las=2, col=col, main="")
title(main="Unnormalized data",ylab="Log-RPKM")
boxplot(normed, las=2, col=col, main="")
title(main="Normalized data",ylab="Log-RPKM")
#Now, do some quick MDS plotting to make sure this expression data separates out species properly.
species <- c("C", "C", "C", "C", "H", "H", "H", "H")
color <- c(rep("red", 4), rep("blue", 4))
par(mfrow=c(1,1))
plotMDS(dge_final$E[,1:8], labels=species, col=color, main="MDS Plot") #Shows separation of the species along the logFC dimension representing the majority of the variance--orthogonal check to PCA, and looks great!
###Now, apply voom to get quality weights.
meta.exp.data <- data.frame("SP"=c("C", "C", "C", "C", "H", "H", "H", "H"), "SX"=c("M","M" ,"F","F","F", "M","M","F"))
SP <- factor(meta.exp.data$SP,levels = c("H","C"))
exp.design <- model.matrix(~0+SP)#(~1+meta.exp.data$SP+meta.exp.data$SX)
colnames(exp.design) <- c("Human", "Chimp")
weighted.data <- voom(dge_final, exp.design, plot=TRUE, normalize.method = "cyclicloess")
##Obtain rest of LM results, with particular eye to DE table!
vfit <- lmFit(weighted.data, exp.design)
efit <- eBayes(vfit)
mycon <- makeContrasts(HvC = Human-Chimp, levels = exp.design)
diff_species <- contrasts.fit(efit, mycon)
finalfit <- eBayes(diff_species)
detable <- topTable(finalfit, coef = 1, adjust.method = "BH", number = Inf, sort.by="none")
plotSA(efit, main="Final model: Mean−variance trend")
#Get lists of the DE and non-DE genes so I can run separate analyses on them at any point.
DEgenes <- detable$genes[which(detable$adj.P.Val<=0.05)]
nonDEgenes <- detable$genes[-which(detable$adj.P.Val<=0.05)]
#Rearrange RPKM and weight columns in voom object to be similar to the rest of my setup throughout in other dataframes.
weighted.data$E <- weighted.data$E[,c(8, 6, 1, 4, 7, 5, 2, 3)]
weighted.data$weights <- weighted.data$weights[,c(8, 6, 1, 4, 7, 5, 2, 3)]
RPKM <- weighted.data$E
rownames(RPKM) <- NULL #Just to match what I had before on midway2, about to write this out.
saveRDS(RPKM, file="output/IEE.RPKM.RDS")
saveRDS(weighted.data, file="output/IEE_voom_object.RDS") #write this object out, can then be read in with readRDS.
In this section I find the overlap between the final filtered set of Hi-C significant hits and genes picked up on by an orthogonal RNA-seq experiment in the same set of cell lines. I utilize an in-house curated set of orthologous genes between humans and chimpanzees. Given that the resolution of the data is 10kb, I choose a simple and conservative approach and use a 1-nucleotide interval at the start of each gene as a proxy for the promoter. I then take a conservative pass and only use genes that had direct overlap with a bin from the Hi-C significant hits data, with more motivation explained below.
#Now, read in filtered data from linear_modeling_QC.Rmd.
data.filtered <- fread("output/data.4.filtered.lm.QC", header=TRUE, data.table=FALSE, stringsAsFactors = FALSE, showProgress=FALSE)
meta.data <- data.frame("SP"=c("H", "H", "C", "C", "H", "H", "C", "C"), "SX"=c("F", "M", "M", "F", "M", "F", "M", "F"), "Batch"=c(1, 1, 1, 1, 2, 2, 2, 2))
#TABLES1
#Write out data.filtered regions that are DC for supplementary table:
TABLES1 <- select(data.filtered, H1, H2, C1, C2, sp_BH_pval, sp_beta) %>% filter(., sp_BH_pval<=0.05)
TABLES1$Hchr <- gsub("-.*", "", TABLES1$H1)
TABLES1$H1start <- as.numeric(gsub(".*-", "", TABLES1$H1))
TABLES1$H1end <- TABLES1$H1start+10000
TABLES1$H2start <- as.numeric(gsub(".*-", "", TABLES1$H2))
TABLES1$H2end <- TABLES1$H2start+10000
TABLES1$Cchr <- gsub("-.*", "", TABLES1$C1)
TABLES1$C1start <- as.numeric(gsub(".*-", "", TABLES1$C1))
TABLES1$C1end <- TABLES1$C1start+10000
TABLES1$C2start <- as.numeric(gsub(".*-", "", TABLES1$C2))
TABLES1$C2end <- TABLES1$C2start+10000
select(TABLES1, Hchr, H1start, H1end, H2start, H2end, Cchr, C1start, C1end, C2start, C2end, sp_BH_pval, sp_beta) -> TABLES1
fwrite(TABLES1, "output/DC_regions.txt", col.names = TRUE, row.names = FALSE, sep = "\t")
#####GENE Hi-C Hit overlap: First, I obtain and rearrange the necessary files to get genes from both species and their overlaps with Hi-C bins.
#Read in necessary files: human and chimp orthologous genes from the meta ortho exon trios file. Then rearrange the columns of humgenes and chimpgenes to use group_by on them.
humgenes <- fread("data/Human_orthoexon_extended_info.txt", stringsAsFactors = FALSE, header=TRUE, data.table=FALSE)
chimpgenes <- fread("data/Chimp_orthoexon_extended_info.txt", stringsAsFactors = FALSE, header=TRUE, data.table=FALSE)
humgenes <- as.data.frame(humgenes[,c(1,5:8)])
chimpgenes <- as.data.frame(chimpgenes[,c(1,5:8)])
colnames(humgenes) <- c("genes", "Hchr", "Hstart", "Hend", "Hstrand")
colnames(chimpgenes) <- c("genes", "Cchr", "Cstart", "Cend", "Cstrand")
humgenes$Hchr <- paste("chr", humgenes$Hchr, sep="") #All properly formatted now!
#Bedtools groupby appears to have not worked properly to create files of single TSSs for genes from the meta ortho exons file, but this totally does! I've also utilized dplyr's group_by on the original file as well to ensure the same results. Now just making gene BED files that are 1-nt overlap at the very beginning of the first exon. Since I maintain strand information and will utilize it in bedtools closest-to, I am not concerned about whether the nt overlap goes the right direction or not. Note that if plyr is accidentally loaded AFTER dplyr, this will have issues (periods added at the end of start coords):
group_by(humgenes, genes) %>% summarise(Hchr=unique(Hchr), Hstart=as.numeric(min(Hstart)), Hstrand=unique(Hstrand)) -> humgenes
group_by(chimpgenes, genes) %>% summarise(Cchr=unique(Cchr), Cstart=min(Cstart), Cstrand=unique(Cstrand)) -> chimpgenes
#Quickly create a bed file to extract the genes that may overlap the region I'm pulling out for the paper example (chr5:147210000:147750000)
options(scipen = 999)
mytest <- filter(humgenes, Hchr=="chr5")
mytest <- mytest[(mytest$genes %in% weighted.data$genes$genes),]
mean.species.expression <- as.data.frame(weighted.data$E) #Now, need to pull out the mean expression values for all those genes!
mean.species.expression$genes <- rownames(mean.species.expression)
mean.species.expression$Hmean <- rowMeans(mean.species.expression[,c(1:2,5:6)])
mean.species.expression$Cmean <- rowMeans(mean.species.expression[,c(3:4,7:8)])
mean.species.expression <- mean.species.expression[,(-1:-8)]
#Now, prep bed files from the filtered data for each bin, in order to run bedtools-closest on them with the human and chimp gene data. This is for getting each bin's proximity to TSS by overlapping with the dfs just made (humgenes and chimpgenes). In the end this set of bedfiles is fairly useless, because really it would be preferable to get rid of duplicates so that I can merely group_by on a given bin afterwards and left_join as necessary. So somewhat deprecated, but I keep it here still:
hbin1 <- data.frame(chr=data.filtered$Hchr, start=as.numeric(gsub("chr.*-", "", data.filtered$H1)), end=as.numeric(gsub("chr.*-", "", data.filtered$H1))+10000)
hbin2 <- data.frame(chr=data.filtered$Hchr, start=as.numeric(gsub("chr.*-", "", data.filtered$H2)), end=as.numeric(gsub("chr.*-", "", data.filtered$H2))+10000)
cbin1 <- data.frame(chr=data.filtered$Cchr, start=as.numeric(gsub("chr.*-", "", data.filtered$C1)), end=as.numeric(gsub("chr.*-", "", data.filtered$C1))+10000)
cbin2 <- data.frame(chr=data.filtered$Cchr, start=as.numeric(gsub("chr.*-", "", data.filtered$C2)), end=as.numeric(gsub("chr.*-", "", data.filtered$C2))+10000)
#In most analyses, it will make more sense to have a single bed file for both sets of bins, and remove all duplicates. I create that here:
hbins <- rbind(hbin1[!duplicated(hbin1),], hbin2[!duplicated(hbin2),])
hbins <- hbins[!duplicated(hbins),]
cbins <- rbind(cbin1[!duplicated(cbin1),], cbin2[!duplicated(cbin2),])
cbins <- cbins[!duplicated(cbins),]
#Now, write all of these files out for analysis with bedtools.
write.table(hbin1, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/hbin1.bed", quote = FALSE, sep="\t", row.names = FALSE, col.names=FALSE)
write.table(hbin2, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/hbin2.bed", quote = FALSE, sep="\t", row.names = FALSE, col.names=FALSE)
write.table(cbin1, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/cbin1.bed", quote = FALSE, sep="\t", row.names = FALSE, col.names=FALSE)
write.table(cbin2, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/cbin2.bed", quote = FALSE, sep="\t", row.names = FALSE, col.names=FALSE)
write.table(humgenes, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/humgenes.bed", quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
write.table(chimpgenes, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/chimpgenes.bed", quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
write.table(hbins, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/hbins.bed", quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
write.table(cbins, "~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/unsorted/cbins.bed", quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
options(scipen=0)
#Read in new, simpler bedtools closest files for genes. This is after running two commands, after sorting the files w/ sort -k1,1 -k2,2n in.bed > out.bed:
#bedtools closest -D a -a cgenes.sorted.bed -b cbins.sorted.bed > cgene.hic.overlap
#bedtools closest -D a -a hgenes.sorted.bed -b hbins.sorted.bed > hgene.hic.overlap
hgene.hic <- fread("~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/hgene.hic.overlap", header=FALSE, stringsAsFactors = FALSE, data.table=FALSE)
cgene.hic <- fread("~/Desktop/Hi-C/gene_expression/10kb_filt_overlaps/cgene.hic.overlap", header=FALSE, stringsAsFactors = FALSE, data.table=FALSE)
#Visualize the overlap of genes with bins and see how many genes we get back!
hum.genelap <- data.frame(overlap=seq(0, 100000, 1000), perc.genes = NA, tot.genes=NA)
for(row in 1:nrow(hum.genelap)){
hum.genelap$perc.genes[row] <- sum(abs(hgene.hic$V10)<=hum.genelap$overlap[row])/length(hgene.hic$V10)
hum.genelap$tot.genes[row] <- sum(abs(hgene.hic$V10)<=hum.genelap$overlap[row])
}
c.genelap <- data.frame(overlap=seq(0, 100000, 1000), perc.genes=NA, tot.genes=NA)
for(row in 1:nrow(c.genelap)){
c.genelap$perc.genes[row] <- sum(abs(cgene.hic$V10)<=c.genelap$overlap[row])/length(cgene.hic$V10)
c.genelap$tot.genes[row] <- sum(abs(cgene.hic$V10)<=c.genelap$overlap[row])
}
c.genelap$type <- "chimp"
hum.genelap$type <- "human"
#Examine what the potential gains are here if we are more lenient about the overlap/closeness to a TSS...
ggoverlap <- rbind(hum.genelap, c.genelap)
ggplot(data=ggoverlap) + geom_line(aes(x=overlap, y=perc.genes*100, color=type)) + ggtitle("Percent of Total Genes Picked Up | Min. Distance from TSS") + xlab("Distance from TSS") + ylab("Percentage of genes in ortho exon trios file (~44k)") + scale_color_discrete(guide=guide_legend(title="Species")) + coord_cartesian(xlim=c(0, 30000)) + scale_x_continuous(breaks=seq(0, 30000, 5000))
ggplot(data=ggoverlap) + geom_line(aes(x=overlap, y=tot.genes, color=type)) + ggtitle("Total # Genes Picked Up | Min. Distance from TSS") + xlab("Distance from TSS") + ylab("Total # of Genes Picked up On (of ~44k)") + scale_color_discrete(guide=guide_legend(title="Species")) + coord_cartesian(xlim=c(0, 30000)) + scale_x_continuous(breaks=seq(0, 30000, 5000))
#Start with a conservative pass--only take those genes that had an actual overlap with a bin, not ones that were merely close to one. Allowing some leeway to include genes that are within 1kb, 2kb, 3kb etc. of a Hi-C bin adds an average of ~800 genes per 1kb. We can also examine the distribution manually to motivate this decision:
quantile(abs(hgene.hic$V10), probs=seq(0, 1, 0.025))
0% 2.5% 5% 7.5% 10% 12.5% 15%
0.0 0.0 0.0 0.0 0.0 0.0 0.0
17.5% 20% 22.5% 25% 27.5% 30% 32.5%
0.0 0.0 0.0 0.0 0.0 0.0 0.0
35% 37.5% 40% 42.5% 45% 47.5% 50%
0.0 0.0 0.0 0.0 0.0 0.0 0.0
52.5% 55% 57.5% 60% 62.5% 65% 67.5%
0.0 0.0 0.0 435.4 1741.0 2968.2 4324.0
70% 72.5% 75% 77.5% 80% 82.5% 85%
6226.2 8868.0 12336.0 16610.0 22034.4 29271.5 38300.2
87.5% 90% 92.5% 95% 97.5% 100%
50248.0 67698.4 92599.2 135125.0 243068.9 7581822.0
quantile(abs(cgene.hic$V10), probs=seq(0, 1, 0.025))
0% 2.5% 5% 7.5% 10% 12.5%
0.0 0.0 0.0 0.0 0.0 0.0
15% 17.5% 20% 22.5% 25% 27.5%
0.0 0.0 0.0 0.0 0.0 0.0
30% 32.5% 35% 37.5% 40% 42.5%
0.0 0.0 0.0 0.0 0.0 0.0
45% 47.5% 50% 52.5% 55% 57.5%
0.0 0.0 0.0 0.0 0.0 0.0
60% 62.5% 65% 67.5% 70% 72.5%
237.4 1507.0 2800.0 4237.4 6125.6 8750.4
75% 77.5% 80% 82.5% 85% 87.5%
12181.0 16648.0 22317.4 29420.0 38490.0 50656.0
90% 92.5% 95% 97.5% 100%
67470.4 92415.3 136417.4 243285.0 21022081.0
#Note I looked at proportion of overlap with DE and with non-DE genes just for curiosity, and roughly 66% of the DE genes have overlap with a Hi-C bin while roughly 70% of the non-DE genes do. Since this result isn't particularly interesting I have collapsed that analysis here.
#Also are interested in seeing how this differs for DE and non-DE genes.
dehgene.hic <- hgene.hic[which(hgene.hic$V4 %in% DEgenes),]
decgene.hic <- cgene.hic[which(cgene.hic$V4 %in% DEgenes),]
nondehgene.hic <- hgene.hic[which(hgene.hic$V4 %in% nonDEgenes),]
nondecgene.hic <- cgene.hic[which(cgene.hic$V4 %in% nonDEgenes),]
sum(dehgene.hic$V10==0)
[1] 1538
sum(nondehgene.hic$V10==0)
[1] 6231
quantile(abs(dehgene.hic$V10), probs=seq(0, 1, 0.025))
0% 2.5% 5% 7.5% 10% 12.5%
0.00 0.00 0.00 0.00 0.00 0.00
15% 17.5% 20% 22.5% 25% 27.5%
0.00 0.00 0.00 0.00 0.00 0.00
30% 32.5% 35% 37.5% 40% 42.5%
0.00 0.00 0.00 0.00 0.00 0.00
45% 47.5% 50% 52.5% 55% 57.5%
0.00 0.00 0.00 0.00 0.00 0.00
60% 62.5% 65% 67.5% 70% 72.5%
0.00 0.00 0.00 0.00 1061.10 2339.00
75% 77.5% 80% 82.5% 85% 87.5%
3465.25 4832.80 6883.20 10197.35 14622.95 20465.88
90% 92.5% 95% 97.5% 100%
30342.90 44341.90 65604.90 129936.58 5262467.00
quantile(abs(nondehgene.hic$V10), probs=seq(0, 1, 0.025))
0% 2.5% 5% 7.5% 10% 12.5%
0.000 0.000 0.000 0.000 0.000 0.000
15% 17.5% 20% 22.5% 25% 27.5%
0.000 0.000 0.000 0.000 0.000 0.000
30% 32.5% 35% 37.5% 40% 42.5%
0.000 0.000 0.000 0.000 0.000 0.000
45% 47.5% 50% 52.5% 55% 57.5%
0.000 0.000 0.000 0.000 0.000 0.000
60% 62.5% 65% 67.5% 70% 72.5%
0.000 0.000 0.000 0.000 0.000 813.875
75% 77.5% 80% 82.5% 85% 87.5%
2008.500 3090.750 4453.000 6491.250 10166.000 15596.750
90% 92.5% 95% 97.5% 100%
23375.000 35249.500 54960.750 104309.875 6845725.000
And we can see that the majority of the genes (57.5%) have direct overlap with a bin. I’ll thus start with one very conservative set with only genes that have direct overlap with a bin. Later I may return to this and add and another slightly more lenient bin capturing ~10% more of the genes by allowing +/- 5kb of wiggle room.
In this next section I simply add information obtained from linear modeling on the Hi-C interaction frequencies to the appropriate genes having overlap with Hi-C bins. Because one Hi-C bin frequently shows up many times in the data, this means I must choose some kind of summary for Hi-C contact frequencies and linear modeling annotations for each gene. I toy with a variety of these summaries here, including choosing the minimum FDR contact, the maximum beta contact, the upstream contact, or summarizing all a bin’s contacts with the weighted Z-combine method or median FDR values.
hgene.hic.overlap <- filter(hgene.hic, V10==0) #Still leaves a solid ~26k genes.
cgene.hic.overlap <- filter(cgene.hic, V10==0) #Still leaves a solid ~26k genes.
#Add a column to both dfs indicating where along a bin the gene in question is found (from 0-10k):
hgene.hic.overlap$bin_pos <- abs(hgene.hic.overlap$V8-hgene.hic.overlap$V2)
cgene.hic.overlap$bin_pos <- abs(cgene.hic.overlap$V8-cgene.hic.overlap$V2)
#Rearrange columns and create another column of the bin ID.
hgene.hic.overlap <- hgene.hic.overlap[,c(4, 7:9, 6, 11, 1:2)]
hgene.hic.overlap$HID <- paste(hgene.hic.overlap$V7, hgene.hic.overlap$V8, sep="-")
cgene.hic.overlap <- cgene.hic.overlap[,c(4, 7:9, 6, 11, 1:2)]
cgene.hic.overlap$CID <- paste(cgene.hic.overlap$V7, cgene.hic.overlap$V8, sep="-")
colnames(hgene.hic.overlap) <- c("genes", "HiC_chr", "H1start", "H1end", "Hstrand", "bin_pos", "genechr", "genepos", "HID")
colnames(cgene.hic.overlap) <- c("genes", "HiC_chr", "C1start", "C1end", "Cstrand", "bin_pos", "genechr", "genepos", "CID")
#Gets me an hfinal table with a lot of the information concatenated together--now need the same thing for chimps, only to get the n contacts (since this could differ!)
hbindf <- select(data.filtered, "H1", "H2", "ALLvar", "SE", "sp_beta", "sp_pval", "sp_BH_pval", "Hdist")
names(hbindf) <- c("HID", "HID2", "ALLvar", "SE", "sp_beta", "sp_pval", "sp_BH_pval", "distance") #I have confirmed that all the HID2s are higher numbered coordinates than the HID1s, the only instance in which this isn't the case is when the two bins are identical (this should have been filtered out long before now).
hbindf <- hbindf[(which(hbindf$dist!=0)),] #Removes pairs where the same bin represents both mates. These instances occur exclusively when liftOver of the genomic coordinates from one species to another, and the subsequent rounding to the nearest 10kb, results in a contact between adjacent bins in one species being mapped as a contact between the same bin in the other species. Because there are less than 50 instances of this total in the dataset I simply remove it here without further worry.
#Remembering that all the first mates in the pair are lower coordinates than the second mates:
#This works for getting the FDR of closest downstream hits for the first column. Technically this would also make these the closest upstream hits for any bins that are UNIQUE and NOT REPEATED to the second column. For unique bins in this column, they have no upstream hits, and this gets their downstream hits. I can do the same thing but on the second set of IDs to get the potential upstream hits for any bins, then do a full_join on the two to get everything! Many of these metrics need to be done on a duplicated df to ensure I have all copies of a bin in one column, but the upstream and downstream analyses need to be run separately.
group_by(hbindf, HID) %>% summarise(DS_bin=HID2[which.min(distance)], DS_FDR=sp_BH_pval[which.min(distance)], DS_dist=distance[which.min(distance)]) -> hbin1.downstream
group_by(hbindf, HID2) %>% summarise(US_bin=HID[which.min(distance)], US_FDR=sp_BH_pval[which.min(distance)], US_dist=distance[which.min(distance)]) -> hbin2.upstream
colnames(hbin2.upstream) <- c("HID", "US_bin", "US_FDR", "US_dist")
Hstreams <- full_join(hbin1.downstream, hbin2.upstream, by="HID")
#Now, need to create a df with all the hits duplicated (but columns reversed) to account for duplicated bins in each column. This is better for any analysis that shares information across the interactions (minimums, sums, means, medians, etc.)
hbindf.flip <- hbindf[,c(2, 1, 3:7)]
colnames(hbindf.flip)[1:2] <- c("HID", "HID2")
hbindf_x2 <- rbind(hbindf[,1:7], hbindf.flip) #It's worth noting that a version of this with duplicates removed would be very useful for enrichment analyses...
#Now, use group_by from dplyr to combine information for a given bin across all its Hi-C contacts. Here I'll be pulling out contact, p-value, and bin with minimum FDR, its beta; the median FDR; the maximum beta and its FDR and bin; and a weighted combination method for p-values for species from linear modeling. This is based off of (http://onlinelibrary.wiley.com/doi/10.1111/j.1420-9101.2005.00917.x/full) under the assumption $s2.post is the actual error variance. (forthcoming)
group_by(hbindf_x2, HID) %>% summarise(min_FDR_bin=HID2[which.min(sp_BH_pval)], min_FDR=min(sp_BH_pval), min_FDR_pval=sp_pval[which.min(sp_BH_pval)], min_FDR_B=sp_beta[which.min(sp_BH_pval)], median_FDR=median(sp_BH_pval), weighted_Z.ALLvar=pnorm((sum((1/ALLvar)*((qnorm(1-sp_pval))))/sqrt(sum((1/ALLvar)^2))), lower.tail=FALSE), weighted_Z.s2post=pnorm(sum((1/(SE^2))*qnorm(1-sp_pval))/sqrt(sum(1/SE^2)), lower.tail=FALSE), fisher=-2*sum(log(sp_pval)), numcontacts=n(), max_B_bin=HID2[which.max(abs(sp_beta))], max_B_FDR=sp_BH_pval[which.max(abs(sp_beta))], max_B=sp_beta[which.max(abs(sp_beta))]) -> hbin.info
#Now, full_join the hbin.info and Hstreams dfs, incorporating all the information about the hbins in my data:
full_join(hbin.info, Hstreams, by="HID") -> hbin.full.info
#Now I combine the gene overlap tables and the full information tables for the human genes and bins!
left_join(hgene.hic.overlap, hbin.full.info, by="HID") -> humgenes.hic.full
colnames(humgenes.hic.full)[1:5] <- c("genes", "Hchr", "Hstart", "Hend", "Hstrand") #Fix column names for what was just created
###Now, do the whole thing over again for chimps, and then combine with the human gene overlap before joining on detable!
#Gets me a cfinal table with a lot of the information concatenated together.
cbindf <- select(data.filtered, "C1", "C2", "ALLvar", "SE", "sp_beta", "sp_pval", "sp_BH_pval", "Cdist") #Pulling out cols C1, C2, ALLvar, SE, sp_beta, sp_pval, sp_BH_pval, and Cdist
names(cbindf) <- c("CID", "CID2", "ALLvar", "SE", "sp_beta", "sp_pval", "sp_BH_pval", "distance") #I have confirmed that all the CID2s are higher numbered coordinates than the CID1s, the only instance in which this isn't the case is when the two bins are identical (this should have been filtered out long before now).
cbindf <- cbindf[(which(cbindf$dist!=0)),] #Removes rows from the cbindf that REALLY shouldn't be there to begin with. There are ~620 hits like this.
#Remembering that all the first mates in the pair are lower coordinates than the second mates:
#This works for getting the FDR of closest downstream hits for the first column. Technically this would also make these the closest upstream hits for any bins that are UNIQUE and NOT REPEATED to the second column. For unique bins in this column, they have no upstream hits, and this gets their downstream hits. I can do the same thing but on the second set of IDs to get the potential upstream hits for any bins, then do a full_join on the two to get everything! Many of these metrics need to be done on a duplicated df to ensure I have all copies of a bin in one column, but the upstream and downstream analyses need to be run separately.
group_by(cbindf, CID) %>% summarise(DS_bin=CID2[which.min(distance)], DS_FDR=sp_BH_pval[which.min(distance)], DS_dist=distance[which.min(distance)]) -> cbin1.downstream
group_by(cbindf, CID2) %>% summarise(US_bin=CID[which.min(distance)], US_FDR=sp_BH_pval[which.min(distance)], US_dist=distance[which.min(distance)]) -> cbin2.upstream
colnames(cbin2.upstream) <- c("CID", "US_bin", "US_FDR", "US_dist")
Cstreams <- full_join(cbin1.downstream, cbin2.upstream, by="CID")
#Now, need to create a df with all the hits duplicated (but columns reversed) to account for duplicated bins in each column. This is better for any analysis that shares information across the interactions (minimums, sums, means, medians, etc.)
cbindf.flip <- cbindf[,c(2, 1, 3:7)]
colnames(cbindf.flip)[1:2] <- c("CID", "CID2")
cbindf_x2 <- rbind(cbindf[,1:7], cbindf.flip)
#Group by again for chimp hits as was done for humans above.
group_by(cbindf_x2, CID) %>% summarise(min_FDR_bin=CID2[which.min(sp_BH_pval)], min_FDR=min(sp_BH_pval), min_FDR_B=sp_beta[which.min(sp_BH_pval)], median_FDR=median(sp_BH_pval), weighted_Z.ALLvar=pnorm((sum((1/ALLvar)*((qnorm(1-sp_pval))))/sqrt(sum((1/ALLvar)^2))), lower.tail=FALSE), weighted_Z.s2post=pnorm(sum((1/(SE^2))*qnorm(1-sp_pval))/sqrt(sum(1/SE^2)), lower.tail=FALSE), fisher=-2*sum(log(sp_pval)), numcontacts=n(), max_B_bin=CID2[which.max(abs(sp_beta))], max_B_FDR=sp_BH_pval[which.max(abs(sp_beta))], max_B=sp_beta[which.max(abs(sp_beta))]) -> cbin.info
#Now, full_join the cbin.info and Cstreams dfs, incorporating all the information about the cbins in my data:
full_join(cbin.info, Cstreams, by="CID") -> cbin.full.info
#Now I combine the gene overlap tables and the full information tables for the human genes and bins!
left_join(cgene.hic.overlap, cbin.full.info, by="CID") -> chimpgenes.hic.full
colnames(chimpgenes.hic.full)[1:5] <- c("genes", "Hchr", "Hstart", "Hend", "Hstrand") #Fix column names for what was just created
#Now, combine chimpgenes.hic.full and humgenes.hic.full before a final left_join on detable:
full_join(humgenes.hic.full, chimpgenes.hic.full, by="genes", suffix=c(".H", ".C")) -> genes.hic.info
left_join(detable, genes.hic.info, by="genes") -> gene.hic.overlap.info
#Clean this dataframe up, removing rows where there is absolutely no Hi-C information for the gene.
filt.indices <- rowSums(is.na(gene.hic.overlap.info)) #51 NA values are found when there is absolutely no Hi-C information.
filt.indices <- which(filt.indices==51)
gene.hic.filt <- gene.hic.overlap.info[-filt.indices,] #Still leaves a solid 8,174 genes. Note that I will have to choose human or chimp values here for many of these columns, as not all of the values are the same (and many are missing in one species relative to the other). In some cases, may be able to just take minimum or maximum value from either in order to get at what I want.
saveRDS(gene.hic.filt, "~/Desktop/Hi-C/gene.hic.filt.RDS")
Now I look for enrichment of DHi-C in DE genes using a variety of different metrics to call DHi-C. I now look to see if genes that are differentially expressed are also differential in Hi-C contacts (DHi-C). That is to say, are differentially expressed genes enriched in their overlapping bins for Hi-C contacts that are also differential between the species? To do this I utilize p-values from my prior linear modeling as well as previous RNA-seq analysis. I construct a function to calculate proportions of DE and DHi-C genes, as well as a function to plot this out in a variety of different ways.
####Enrichment analyses!
#A function for calculating proportion of DE genes that are DHi-C under a variety of different paradigms. Accounts for when no genes are DHi-C and when all genes are DHi-C. Returns the proportion of DE genes that are also DHi-C, as well as the expected proportion based on conditional probability alone.
prop.calculator <- function(de.vec, hic.vec, i, k){
my.result <- data.frame(prop=NA, exp.prop=NA, chisq.p=NA, Dneither=NA, DE=NA, DHiC=NA, Dboth=NA)
bad.indices <- which(is.na(hic.vec)) #First obtain indices where Hi-C info is missing, if there are any, then remove from both vectors.
if(length(bad.indices>0)){
de.vec <- de.vec[-bad.indices]
hic.vec <- hic.vec[-bad.indices]}
de.vec <- ifelse(de.vec<=i, 1, 0)
hic.vec <- ifelse(hic.vec<=k, 1, 0)
if(sum(hic.vec, na.rm=TRUE)==0){#The case where no genes show up as DHi-C.
my.result[1,] <- c(0, 0, 0, sum(de.vec==0, na.rm=TRUE), sum(de.vec==1, na.rm=TRUE), 0, 0) #Since no genes are DHi-C, the proportion is 0 and our expectation is 0, set p-val=0 since it's irrelevant.
}
else if(sum(hic.vec)==length(hic.vec)){ #The case where every gene shows up as DHi-C
my.result[1,] <- c(1, 1, 0, 0, 0, sum(hic.vec==1&de.vec==0, na.rm = TRUE), sum(de.vec==1&hic.vec==1, na.rm=TRUE)) #If every gene is DHi-C, the observed proportion of DE genes DHi-C is 1, and the expected proportion of DE genes also DHi-C would also be 1 (all DE genes are DHi-C, since all genes are). Again set p-val to 0 since irrelevant comparison.
}
else{#The typical case, where we get an actual table
mytable <- table(as.data.frame(cbind(de.vec, hic.vec)))
my.result[1,1] <- mytable[2,2]/sum(mytable[2,]) #The observed proportion of DE genes that are also DHi-C. # that are both/total # DE
my.result[1,2] <- (((sum(mytable[2,])/sum(mytable))*((sum(mytable[,2])/sum(mytable))))*sum(mytable))/sum(mytable[2,]) #The expected proportion: (p(DE) * p(DHiC)) * total # genes / # DE genes
my.result[1,3] <- chisq.test(mytable)$p.value
my.result[1,4] <- mytable[1,1]
my.result[1,5] <- mytable[2,1]
my.result[1,6] <- mytable[1,2]
my.result[1,7] <- mytable[2,2]
}
return(my.result)
}
#This is a function that computes observed and expected proportions of DE and DHiC enrichments, and spits out a variety of different visualizations for them. As input it takes a dataframe, the names of its DHiC and DE p-value columns, and a name to represent the type of Hi-C contact summary for the gene that ends up on the x-axis of all the plots.
enrichment.plotter <- function(df, HiC_col, DE_col, xlab, xmax=0.3, i=c(0.01, 0.025, 0.05, 0.075, 0.1), k=seq(0.01, 1, 0.01)){
enrich.table <- data.frame(DEFDR = c(rep(i[1], 100), rep(i[2], 100), rep(i[3], 100), rep(i[4], 100), rep(i[5], 100)), DHICFDR=rep(k, 5), prop.obs=NA, prop.exp=NA, chisq.p=NA, Dneither=NA, DE=NA, DHiC=NA, Dboth=NA)
for(de.FDR in i){
for(hic.FDR in k){
enrich.table[which(enrich.table$DEFDR==de.FDR&enrich.table$DHICFDR==hic.FDR), 3:9] <- prop.calculator(df[,DE_col], df[,HiC_col], de.FDR, hic.FDR)
}
}
des.enriched <- ggplot(data=enrich.table, aes(x=DHICFDR, y=prop.obs, group=as.factor(DEFDR), color=as.factor(DEFDR))) +geom_line()+ geom_line(aes(y=prop.exp), linetype="dashed", size=0.5) + ggtitle("Enrichment of DC in DE Genes") + xlab(xlab) + ylab("Proportion of DE genes that are DC") + guides(color=guide_legend(title="FDR for DE Genes"))
dhics.enriched <- ggplot(data=enrich.table, aes(x=DHICFDR, y=Dboth/(Dboth+DHiC), group=as.factor(DEFDR), color=as.factor(DEFDR))) + geom_line() + geom_line(aes(y=(((((DE+Dboth)/(Dneither+DE+DHiC+Dboth))*((DHiC+Dboth)/(Dneither+DE+DHiC+Dboth)))*(Dneither+DE+DHiC+Dboth))/(DHiC+Dboth))), linetype="dashed") + ylab("Proportion of DC genes that are DE") +xlab(xlab) + ggtitle("Enrichment of DE in DC Genes") + coord_cartesian(xlim=c(0, xmax)) + guides(color=guide_legend(title="DE FDR"))
joint.enriched <- ggplot(data=enrich.table, aes(x=DHICFDR, y=Dboth/(Dneither+DE+DHiC+Dboth), group=as.factor(DEFDR), color=as.factor(DEFDR))) + geom_line() + ylab("Proportion of ALL Genes both DE & DHi-C") + xlab(xlab) + geom_line(aes(y=((DE+Dboth)/(Dneither+DE+DHiC+Dboth))*((DHiC+Dboth)/(Dneither+DE+DHiC+Dboth))), linetype="dashed") + ggtitle("Enrichment of Joint DE & DHi-C in All Genes")
chisq.p <- ggplot(data=enrich.table, aes(x=DHICFDR, y=-log10(chisq.p), group=as.factor(DEFDR), color=as.factor(DEFDR))) + geom_point() + geom_hline(yintercept=-log10(0.05), color="red") + ggtitle("Chi-squared Test P-values for Enrichment of DC in DE Genes") + xlab(xlab) + ylab("-log10(chi-squared p-values)") + coord_cartesian(xlim=c(0, xmax)) + guides(color=guide_legend(title="DE FDR"))
print(des.enriched)
print(dhics.enriched)
print(joint.enriched)
print(chisq.p)
print(enrich.table[which(enrich.table$DEFDR==0.025),]) #Added to figure out comparison for the paper.
}
#Visualization of enrichment of DE/DC in one another. For most of these, using the gene.hic.filt df is sufficient as their Hi-C FDR numbers are the same. For the upstream genes it's a little more complicated because gene.hic.filt doesn't incorporate strand information on the genes, so use the specific US dfs for that, with the USFDR column.
#FIG4A
enrichment.plotter(gene.hic.filt, "min_FDR.H", "adj.P.Val", "Minimum FDR of Hi-C Contacts Overlapping Gene", xmax=1) #FIG4A
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.04602109 0.04598145 1.00000000 6412 995 309 48
102 0.025 0.02 0.05848514 0.05963421 0.92180398 6319 982 402 61
103 0.025 0.03 0.06807287 0.06942298 0.90534608 6253 972 468 71
104 0.025 0.04 0.08533078 0.08037094 0.56728856 6186 954 535 89
105 0.025 0.05 0.09779482 0.09222050 0.54104946 6107 941 614 102
106 0.025 0.06 0.11217641 0.10419887 0.39427305 6029 926 692 117
107 0.025 0.07 0.12751678 0.11424523 0.16276235 5967 910 754 133
108 0.025 0.08 0.14477469 0.12660999 0.06488555 5889 892 832 151
109 0.025 0.09 0.15915628 0.14026275 0.06567647 5798 877 923 166
110 0.025 0.10 0.17449664 0.15108192 0.02621767 5730 861 991 182
111 0.025 0.11 0.18408437 0.16280268 0.05048324 5649 851 1072 192
112 0.025 0.12 0.19558965 0.17709943 0.10149612 5550 839 1171 204
113 0.025 0.13 0.20517737 0.18907779 0.16615517 5467 829 1254 214
114 0.025 0.14 0.22051774 0.20260175 0.13211565 5378 813 1343 230
115 0.025 0.15 0.23489933 0.21586811 0.11753927 5290 798 1431 245
116 0.025 0.16 0.24736337 0.22797527 0.11770441 5209 785 1512 258
117 0.025 0.17 0.26749760 0.24252962 0.04734506 5117 764 1604 279
118 0.025 0.18 0.27996165 0.25540958 0.05535409 5030 751 1691 292
119 0.025 0.19 0.29242570 0.27009274 0.08755444 4929 738 1792 305
120 0.025 0.20 0.30488974 0.28657908 0.17104689 4814 725 1907 318
121 0.025 0.21 0.31255992 0.29752705 0.26916534 4737 717 1984 326
122 0.025 0.22 0.32981783 0.31491499 0.28108959 4620 699 2101 344
123 0.025 0.23 0.34228188 0.32843895 0.32330574 4528 686 2193 357
124 0.025 0.24 0.35858102 0.34453890 0.32185392 4420 669 2301 374
125 0.025 0.25 0.36912752 0.36128284 0.59458342 4301 658 2420 385
126 0.025 0.26 0.38638543 0.37854199 0.59818411 4185 640 2536 403
127 0.025 0.27 0.40076702 0.39657393 0.79215580 4060 625 2661 418
128 0.025 0.28 0.41514861 0.41409067 0.96747895 3939 610 2782 433
129 0.025 0.29 0.42761266 0.42928903 0.93310346 3834 597 2887 446
130 0.025 0.30 0.44774688 0.44448738 0.84601732 3737 576 2984 467
131 0.025 0.31 0.46692234 0.46032973 0.67030073 3634 556 3087 487
132 0.025 0.32 0.47651007 0.47359608 0.86559920 3541 546 3180 497
133 0.025 0.33 0.49760307 0.49330242 0.79077540 3410 524 3311 519
134 0.025 0.34 0.51006711 0.50901597 0.96833309 3301 511 3420 532
135 0.025 0.35 0.52348993 0.52357032 1.00000000 3202 497 3519 546
136 0.025 0.36 0.53307766 0.53464709 0.93953415 3126 487 3595 556
137 0.025 0.37 0.55033557 0.55229263 0.91784465 3007 469 3714 574
138 0.025 0.38 0.56759348 0.57109737 0.83201100 2879 451 3842 592
139 0.025 0.39 0.58581016 0.58861412 0.86975309 2762 432 3959 611
140 0.025 0.40 0.60115053 0.60497166 0.81243992 2651 416 4070 627
141 0.025 0.41 0.61265580 0.61862442 0.69485565 2557 404 4164 639
142 0.025 0.42 0.63374880 0.63369397 1.00000000 2462 382 4259 661
143 0.025 0.43 0.64813039 0.64837713 1.00000000 2363 367 4358 676
144 0.025 0.44 0.66155321 0.66138588 1.00000000 2276 353 4445 690
145 0.025 0.45 0.67497603 0.67928903 0.77557036 2151 339 4570 704
146 0.025 0.46 0.68264621 0.69178259 0.51520095 2062 331 4659 712
147 0.025 0.47 0.69319271 0.70376095 0.44310203 1980 320 4741 723
148 0.025 0.48 0.70565676 0.71470891 0.50990075 1908 307 4813 736
149 0.025 0.49 0.71716203 0.72501288 0.56661085 1840 295 4881 748
150 0.025 0.50 0.73250240 0.73634724 0.79091116 1768 279 4953 764
151 0.025 0.51 0.74304890 0.74510562 0.90002284 1711 268 5010 775
152 0.025 0.52 0.75551294 0.75643998 0.97112240 1636 255 5085 788
153 0.025 0.53 0.76605944 0.76738794 0.94438418 1562 244 5159 799
154 0.025 0.54 0.76989453 0.77460072 0.72549294 1510 240 5211 803
155 0.025 0.55 0.78044104 0.78554869 0.69549200 1436 229 5285 814
156 0.025 0.56 0.78427613 0.79224626 0.52158901 1388 225 5333 818
157 0.025 0.57 0.79098754 0.79894384 0.51727543 1343 218 5378 825
158 0.025 0.58 0.79674017 0.80757342 0.36193260 1282 212 5439 831
159 0.025 0.59 0.80536913 0.81543019 0.39127591 1230 203 5491 840
160 0.025 0.60 0.81975072 0.82290057 0.80815024 1187 188 5534 855
161 0.025 0.61 0.82166826 0.82972694 0.48396946 1136 186 5585 857
162 0.025 0.62 0.82933845 0.83681092 0.51126608 1089 178 5632 865
163 0.025 0.63 0.83413231 0.84312210 0.41664566 1045 173 5676 870
164 0.025 0.64 0.84276127 0.84981968 0.52267074 1002 164 5719 879
165 0.025 0.65 0.84659636 0.85793405 0.28032538 943 160 5778 883
166 0.025 0.66 0.85330777 0.86476043 0.26536782 897 153 5824 890
167 0.025 0.67 0.86097795 0.87107161 0.31933472 856 145 5865 898
168 0.025 0.68 0.86673058 0.87493560 0.41755218 832 139 5889 904
169 0.025 0.69 0.87440077 0.88060278 0.54014141 796 131 5925 912
170 0.025 0.70 0.89165868 0.88730036 0.67026956 762 113 5959 930
171 0.025 0.71 0.89741131 0.89167955 0.55745201 734 107 5987 936
172 0.025 0.72 0.90220518 0.89760433 0.63700890 693 102 6028 941
173 0.025 0.73 0.90508150 0.90069552 0.65025245 672 99 6049 944
174 0.025 0.74 0.90795781 0.90494590 0.76438601 642 96 6079 947
175 0.025 0.75 0.91083413 0.90868109 0.84017248 616 93 6105 950
176 0.025 0.76 0.91658677 0.91228748 0.63926204 594 87 6127 956
177 0.025 0.77 0.92138063 0.91834106 0.74554778 552 82 6169 961
178 0.025 0.78 0.93096836 0.92323545 0.34427129 524 72 6197 971
179 0.025 0.79 0.93288591 0.92658423 0.43842128 500 70 6221 973
180 0.025 0.80 0.93288591 0.93044822 0.78931050 470 70 6251 973
181 0.025 0.81 0.93672100 0.93379701 0.73289298 448 66 6273 977
182 0.025 0.82 0.93959732 0.93766100 0.83431392 421 63 6300 980
183 0.025 0.83 0.94343241 0.94268418 0.96797718 386 59 6335 984
184 0.025 0.84 0.95014382 0.94654817 0.63059228 363 52 6358 991
185 0.025 0.85 0.95493768 0.94938176 0.42149849 346 47 6375 996
186 0.025 0.86 0.95877277 0.95260175 0.35249852 325 43 6396 1000
187 0.025 0.87 0.95973154 0.95646574 0.63551112 296 42 6425 1001
188 0.025 0.88 0.96260786 0.95981453 0.68256049 273 39 6448 1004
189 0.025 0.89 0.96644295 0.96277692 0.55902110 254 35 6467 1008
190 0.025 0.90 0.96931927 0.96586811 0.56994919 233 32 6488 1011
191 0.025 0.91 0.97219559 0.96857290 0.53172936 215 29 6506 1014
192 0.025 0.92 0.97507191 0.97256569 0.66668311 187 26 6534 1017
193 0.025 0.93 0.97794823 0.97617208 0.76788983 162 23 6559 1020
194 0.025 0.94 0.97986577 0.97913447 0.95121777 141 21 6580 1022
195 0.025 0.95 0.98561841 0.98196806 0.40814652 125 15 6596 1028
196 0.025 0.96 0.99232982 0.98596084 0.08227945 101 8 6620 1035
197 0.025 0.97 0.99424736 0.98969603 0.16162088 74 6 6647 1037
198 0.025 0.98 0.99616491 0.99291602 0.25169704 51 4 6670 1039
199 0.025 0.99 0.99712368 0.99639361 0.88458574 25 3 6696 1040
200 0.025 1.00 1.00000000 1.00000000 0.00000000 0 0 6721 1043
enrichment.plotter(gene.hic.filt, "min_FDR.C", "adj.P.Val", "Minimum FDR of Hi-C Contacts Overlapping Gene, Chimp")
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.04339441 0.04602134 0.72328641 6429 992 313 45
102 0.025 0.02 0.05785921 0.06029053 0.77697904 6333 977 409 60
103 0.025 0.03 0.06557377 0.06877491 0.71016249 6275 969 467 68
104 0.025 0.04 0.08100289 0.08073017 1.00000000 6198 953 544 84
105 0.025 0.05 0.09257473 0.09114282 0.90912239 6129 941 613 96
106 0.025 0.06 0.10800386 0.10309808 0.61482727 6052 925 690 112
107 0.025 0.07 0.11957570 0.11261088 0.47810469 5990 913 752 124
108 0.025 0.08 0.13404050 0.12366628 0.29861816 5919 898 823 139
109 0.025 0.09 0.14850530 0.13793547 0.31158300 5823 883 919 154
110 0.025 0.10 0.16682739 0.14924798 0.09697646 5754 864 988 173
111 0.025 0.11 0.17839923 0.15953207 0.08243235 5686 852 1056 185
112 0.025 0.12 0.19479267 0.17380126 0.06118745 5592 835 1150 202
113 0.025 0.13 0.19961427 0.18524232 0.21619429 5508 830 1234 207
114 0.025 0.14 0.21504339 0.19822599 0.15638153 5423 814 1319 223
115 0.025 0.15 0.23047252 0.21262373 0.14205899 5327 798 1415 239
116 0.025 0.16 0.23722276 0.22355058 0.27346858 5249 791 1493 246
117 0.025 0.17 0.25940212 0.24051935 0.13643216 5140 768 1602 269
118 0.025 0.18 0.27290260 0.25388867 0.14079902 5050 754 1692 283
119 0.025 0.19 0.28640309 0.26828641 0.16859407 4952 740 1790 297
120 0.025 0.20 0.29701061 0.28486952 0.37158366 4834 729 1908 308
121 0.025 0.21 0.30568949 0.29605348 0.48793635 4756 720 1986 317
122 0.025 0.22 0.32497589 0.31302224 0.39216772 4644 700 2098 337
123 0.025 0.23 0.34040501 0.32536316 0.28239517 4564 684 2178 353
124 0.025 0.24 0.36065574 0.34156061 0.17458186 4459 663 2283 374
125 0.025 0.25 0.37415622 0.35878648 0.28298069 4339 649 2403 388
126 0.025 0.26 0.39247830 0.37652655 0.26941668 4220 630 2522 407
127 0.025 0.27 0.40597878 0.39400951 0.41612044 4098 616 2644 421
128 0.025 0.28 0.42237223 0.41136393 0.45934370 3980 599 2762 438
129 0.025 0.29 0.43297975 0.42666152 0.68315673 3872 588 2870 449
130 0.025 0.30 0.45033751 0.44375884 0.67123277 3757 570 2985 467
131 0.025 0.31 0.46865959 0.45879933 0.51504613 3659 551 3083 486
132 0.025 0.32 0.47733848 0.47229721 0.75208694 3563 542 3179 495
133 0.025 0.33 0.49951784 0.49157989 0.60594389 3436 519 3306 518
134 0.025 0.34 0.51205400 0.50803445 0.80664861 3321 506 3421 531
135 0.025 0.35 0.52169720 0.52230364 0.99313349 3220 496 3522 541
136 0.025 0.36 0.53519769 0.53451600 0.98896077 3139 482 3603 555
137 0.025 0.37 0.55737705 0.54968505 0.61618466 3044 459 3698 578
138 0.025 0.38 0.56509161 0.56742512 0.89715287 2914 451 3828 586
139 0.025 0.39 0.58919961 0.58645070 0.87350012 2791 426 3951 611
140 0.025 0.40 0.60559306 0.60264816 0.86180029 2682 409 4060 628
141 0.025 0.41 0.61812922 0.61588893 0.90049677 2592 396 4150 641
142 0.025 0.42 0.63934426 0.63092943 0.56960568 2497 374 4245 663
143 0.025 0.43 0.65188042 0.64622702 0.70832401 2391 361 4351 676
144 0.025 0.44 0.66441659 0.65972490 0.75859835 2299 348 4443 689
145 0.025 0.45 0.67695275 0.67630801 0.99041010 2183 335 4559 702
146 0.025 0.46 0.68370299 0.68903458 0.71706379 2091 328 4651 709
147 0.025 0.47 0.69334619 0.70111840 0.58172916 2007 318 4735 719
148 0.025 0.48 0.70973963 0.71294511 0.83504903 1932 301 4810 736
149 0.025 0.49 0.72420444 0.72310066 0.96167390 1868 286 4874 751
150 0.025 0.50 0.73674060 0.73415606 0.86924988 1795 273 4947 764
151 0.025 0.51 0.74445516 0.74238334 0.89994085 1739 265 5003 772
152 0.025 0.52 0.75795564 0.75408150 0.78526707 1662 251 5080 786
153 0.025 0.53 0.76374156 0.76385139 1.00000000 1592 245 5150 792
154 0.025 0.54 0.76759884 0.77143592 0.78226707 1537 241 5205 796
155 0.025 0.55 0.77724204 0.78133436 0.76256334 1470 231 5272 806
156 0.025 0.56 0.78302797 0.78891888 0.64661275 1417 225 5325 812
157 0.025 0.57 0.78784957 0.79637486 0.48963611 1364 220 5378 817
158 0.025 0.58 0.79749277 0.80460213 0.56316613 1310 210 5432 827
159 0.025 0.59 0.80520733 0.81218666 0.56500294 1259 202 5483 835
160 0.025 0.60 0.81870781 0.81951408 0.97674471 1216 188 5526 849
161 0.025 0.61 0.82160077 0.82529888 0.76955046 1174 185 5568 852
162 0.025 0.62 0.83124397 0.83185499 0.99049083 1133 175 5609 862
163 0.025 0.63 0.83413693 0.83879676 0.69433091 1082 172 5660 865
164 0.025 0.64 0.84378014 0.84560998 0.89734367 1039 162 5703 875
165 0.025 0.65 0.85053038 0.85229464 0.90052882 994 155 5748 882
166 0.025 0.66 0.85728062 0.85936496 0.87334096 946 148 5796 889
167 0.025 0.67 0.86306654 0.86502121 0.88150393 908 142 5834 895
168 0.025 0.68 0.87078110 0.86913485 0.90496213 884 134 5858 903
169 0.025 0.69 0.87463838 0.87401980 0.98865360 850 130 5892 907
170 0.025 0.70 0.89006750 0.88031881 0.32339288 817 114 5925 923
171 0.025 0.71 0.89296046 0.88520375 0.42989909 782 111 5960 926
172 0.025 0.72 0.89778206 0.89137421 0.51007701 739 106 6003 931
173 0.025 0.73 0.90067502 0.89535930 0.58490805 711 103 6031 934
174 0.025 0.74 0.90356798 0.89973004 0.69915301 680 100 6062 937
175 0.025 0.75 0.90646095 0.90448644 0.86058638 646 97 6096 940
176 0.025 0.76 0.91128255 0.90847153 0.77996746 620 92 6122 945
177 0.025 0.77 0.91803279 0.91399923 0.66127072 584 85 6158 952
178 0.025 0.78 0.92381871 0.91862707 0.55129117 554 79 6188 958
179 0.025 0.79 0.92574735 0.92248361 0.71898993 526 77 6216 960
180 0.025 0.80 0.92671167 0.92646870 1.00000000 496 76 6246 961
181 0.025 0.81 0.92864031 0.92929682 0.98122857 476 74 6266 963
182 0.025 0.82 0.93346191 0.93366757 1.00000000 447 69 6295 968
183 0.025 0.83 0.93828351 0.93906672 0.96527679 410 64 6332 973
184 0.025 0.84 0.94214079 0.94215195 1.00000000 390 60 6352 977
185 0.025 0.85 0.94792671 0.94575138 0.79596869 368 54 6374 983
186 0.025 0.86 0.95178399 0.95037923 0.88316524 336 50 6406 987
187 0.025 0.87 0.95274831 0.95410721 0.88476661 308 49 6434 988
188 0.025 0.88 0.95756991 0.95834940 0.95894465 280 44 6462 993
189 0.025 0.89 0.96046287 0.96143463 0.92991609 259 41 6483 996
190 0.025 0.90 0.96528447 0.96490551 1.00000000 237 36 6505 1001
191 0.025 0.91 0.96914176 0.96734799 0.79850802 222 32 6520 1005
192 0.025 0.92 0.97396336 0.97159018 0.69379510 194 27 6548 1010
193 0.025 0.93 0.97685632 0.97544672 0.83578036 167 24 6575 1013
194 0.025 0.94 0.97782064 0.97827484 1.00000000 146 23 6596 1014
195 0.025 0.95 0.98457088 0.98213138 0.60929212 123 16 6619 1021
196 0.025 0.96 0.99035680 0.98624502 0.28105941 97 10 6645 1027
197 0.025 0.97 0.99228544 0.99010156 0.55211311 69 8 6673 1029
198 0.025 0.98 0.99421408 0.99305823 0.77896570 48 6 6694 1031
199 0.025 0.99 0.99807136 0.99652912 0.53296052 25 2 6717 1035
200 0.025 1.00 1.00000000 1.00000000 0.00000000 0 0 6742 1037
#enrichment.plotter(h_US, "USFDR", "adj.P.Val", "FDR for Closest Upstream Hi-C Contact Overlapping Gene, Human") #These two are ugly, and can't be run anyway until next chunk is complete to create their DFs. It's OK without them.
#enrichment.plotter(c_US, "USFDR", "adj.P.Val", "FDR for Closest Upstream Hi-C Contact Overlapping Gene, Chimp")
enrichment.plotter(gene.hic.filt, "max_B_FDR.H", "adj.P.Val", "FDR for Hi-C Contact Overlapping Gene w/ Strongest Effect Size, Human")
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.04410355 0.04482226 0.96797521 6419 997 302 46
102 0.025 0.02 0.05848514 0.05873261 1.00000000 6326 982 395 61
103 0.025 0.03 0.06807287 0.06774858 1.00000000 6266 972 455 71
104 0.025 0.04 0.08341323 0.07792375 0.51648247 6203 956 518 87
105 0.025 0.05 0.09491850 0.08861412 0.47678815 6132 944 589 99
106 0.025 0.06 0.10834132 0.09994848 0.35975878 6058 930 663 113
107 0.025 0.07 0.12176414 0.10935085 0.18439223 5999 916 722 127
108 0.025 0.08 0.13614573 0.12017002 0.09807805 5930 901 791 142
109 0.025 0.09 0.14765101 0.13227718 0.12700571 5848 889 873 154
110 0.025 0.10 0.16011505 0.14013395 0.05116591 5800 876 921 167
111 0.025 0.11 0.16778523 0.15108192 0.11583942 5723 868 998 175
112 0.025 0.12 0.17833174 0.16280268 0.15707440 5643 857 1078 186
113 0.025 0.13 0.18696069 0.17349304 0.23382277 5569 848 1152 195
114 0.025 0.14 0.19942474 0.18495621 0.21106396 5493 835 1228 208
115 0.025 0.15 0.21380633 0.19706337 0.15584384 5414 820 1307 223
116 0.025 0.16 0.22722915 0.20852653 0.11946554 5339 806 1382 237
117 0.025 0.17 0.24640460 0.22230809 0.04865798 5252 786 1469 257
118 0.025 0.18 0.25790988 0.23402885 0.05504120 5173 774 1548 269
119 0.025 0.19 0.26845638 0.24729521 0.09612592 5081 763 1640 280
120 0.025 0.20 0.28283797 0.26313756 0.12973070 4973 748 1748 295
121 0.025 0.21 0.28859060 0.27228233 0.21708427 4908 742 1813 301
122 0.025 0.22 0.30680729 0.28851108 0.17225277 4801 723 1920 320
123 0.025 0.23 0.32118888 0.30203503 0.15801141 4711 708 2010 335
124 0.025 0.24 0.33652924 0.31736218 0.16342653 4608 692 2113 351
125 0.025 0.25 0.34611697 0.33191654 0.31182231 4505 682 2216 361
126 0.025 0.26 0.36241611 0.35097888 0.42548848 4374 665 2347 378
127 0.025 0.27 0.37967402 0.36810922 0.42496919 4259 647 2462 396
128 0.025 0.28 0.39213806 0.38446677 0.60783602 4145 634 2576 409
129 0.025 0.29 0.40076702 0.39966512 0.96481406 4036 625 2685 418
130 0.025 0.30 0.41994247 0.41486347 0.74590232 3938 605 2783 438
131 0.025 0.31 0.43720038 0.42967543 0.62128531 3841 587 2880 456
132 0.025 0.32 0.44487057 0.44165379 0.84825697 3756 579 2965 464
133 0.025 0.33 0.46500479 0.46097372 0.80466066 3627 558 3094 485
134 0.025 0.34 0.47938639 0.47578568 0.82825249 3527 543 3194 500
135 0.025 0.35 0.48993289 0.48969603 1.00000000 3430 532 3291 511
136 0.025 0.36 0.49952061 0.50167439 0.90746178 3347 522 3374 521
137 0.025 0.37 0.51773730 0.52073673 0.86100265 3218 503 3503 540
138 0.025 0.38 0.53691275 0.53941267 0.88810084 3093 483 3628 560
139 0.025 0.39 0.55225312 0.55692942 0.76931887 2973 467 3748 576
140 0.025 0.40 0.56279962 0.57264297 0.51115662 2862 456 3859 587
141 0.025 0.41 0.57526366 0.58629572 0.45702754 2769 443 3952 600
142 0.025 0.42 0.59539789 0.60226687 0.65043398 2666 422 4055 621
143 0.025 0.43 0.60786194 0.61720762 0.52662942 2563 409 4158 634
144 0.025 0.44 0.62224353 0.63008758 0.59645414 2478 394 4243 649
145 0.025 0.45 0.63854267 0.64799073 0.51451041 2356 377 4365 666
146 0.025 0.46 0.64621285 0.66164348 0.27270818 2258 369 4463 674
147 0.025 0.47 0.65963567 0.67555384 0.25234643 2164 355 4557 688
148 0.025 0.48 0.67114094 0.68650180 0.26550979 2091 343 4630 700
149 0.025 0.49 0.68168744 0.69577537 0.30455679 2030 332 4691 711
150 0.025 0.50 0.69606903 0.70852653 0.36023834 1946 317 4775 726
151 0.025 0.51 0.71332694 0.71818650 0.73539655 1889 299 4832 744
152 0.025 0.52 0.72962608 0.73042246 0.98021707 1811 282 4910 761
153 0.025 0.53 0.73921381 0.74227202 0.83783935 1729 272 4992 771
154 0.025 0.54 0.74496644 0.75128800 0.63897896 1665 266 5056 777
155 0.025 0.55 0.75551294 0.76339516 0.54543469 1582 255 5139 788
156 0.025 0.56 0.76510067 0.77241113 0.57171214 1522 245 5199 798
157 0.025 0.57 0.76989453 0.77988150 0.42573110 1469 240 5252 803
158 0.025 0.58 0.78044104 0.78967027 0.45612742 1404 229 5317 814
159 0.025 0.59 0.78811122 0.79778465 0.42687172 1349 221 5372 822
160 0.025 0.60 0.80345158 0.80654302 0.81845530 1297 205 5424 838
161 0.025 0.61 0.80632790 0.81465739 0.48315154 1237 202 5484 841
162 0.025 0.62 0.81783317 0.82264297 0.69393543 1187 190 5534 853
163 0.025 0.63 0.82262704 0.82934055 0.56516250 1140 185 5581 858
164 0.025 0.64 0.83125599 0.83719732 0.60757686 1088 176 5633 867
165 0.025 0.65 0.83604986 0.84621329 0.35143663 1023 171 5698 872
166 0.025 0.66 0.84372004 0.85394127 0.33832325 971 163 5750 880
167 0.025 0.67 0.84947267 0.86038125 0.29626238 927 157 5794 886
168 0.025 0.68 0.85522531 0.86450283 0.37223280 901 151 5820 892
169 0.025 0.69 0.86385427 0.87107161 0.48523775 859 142 5862 901
170 0.025 0.70 0.87919463 0.87815559 0.95264298 820 126 5901 917
171 0.025 0.71 0.88398849 0.88330757 0.98261612 785 121 5936 922
172 0.025 0.72 0.89357622 0.89116435 0.82946532 734 111 5987 932
173 0.025 0.73 0.89837009 0.89528594 0.76777009 707 106 6014 937
174 0.025 0.74 0.90124640 0.90043792 0.96956620 670 103 6051 940
175 0.025 0.75 0.90604027 0.90443071 0.89384996 644 98 6077 945
176 0.025 0.76 0.91179291 0.90803709 0.69390656 622 92 6099 951
177 0.025 0.77 0.91658677 0.91447707 0.83964058 577 87 6144 956
178 0.025 0.78 0.93000959 0.92014426 0.22940588 547 73 6174 970
179 0.025 0.79 0.93192713 0.92387944 0.32187030 520 71 6201 972
180 0.025 0.80 0.93192713 0.92787223 0.63140801 489 71 6232 972
181 0.025 0.81 0.93576222 0.93134982 0.58926313 466 67 6255 976
182 0.025 0.82 0.93863854 0.93572901 0.73087002 435 64 6286 979
183 0.025 0.83 0.94247363 0.94062339 0.84042744 401 60 6320 983
184 0.025 0.84 0.94822627 0.94474498 0.64834678 375 54 6346 989
185 0.025 0.85 0.95302013 0.94757857 0.43962883 358 49 6363 994
186 0.025 0.86 0.95781400 0.95105616 0.31244262 336 44 6385 999
187 0.025 0.87 0.95877277 0.95492014 0.57252310 307 43 6414 1000
188 0.025 0.88 0.96260786 0.95878413 0.55924440 281 39 6440 1004
189 0.025 0.89 0.96644295 0.96251932 0.52906158 256 35 6465 1008
190 0.025 0.90 0.96836050 0.96548171 0.64823182 235 33 6486 1010
191 0.025 0.91 0.97123682 0.96805770 0.59409550 218 30 6503 1013
192 0.025 0.92 0.97507191 0.97217929 0.61051223 190 26 6531 1017
193 0.025 0.93 0.97698945 0.97591448 0.89273525 163 24 6558 1019
194 0.025 0.94 0.97890700 0.97887687 1.00000000 142 22 6579 1021
195 0.025 0.95 0.98465964 0.98183926 0.54283717 125 16 6596 1027
196 0.025 0.96 0.99232982 0.98596084 0.08227945 101 8 6620 1035
197 0.025 0.97 0.99424736 0.98969603 0.16162088 74 6 6647 1037
198 0.025 0.98 0.99616491 0.99291602 0.25169704 51 4 6670 1039
199 0.025 0.99 0.99712368 0.99639361 0.88458574 25 3 6696 1040
200 0.025 1.00 1.00000000 1.00000000 0.00000000 0 0 6721 1043
enrichment.plotter(gene.hic.filt, "max_B_FDR.C", "adj.P.Val", "FDR for Hi-C Contact Overlapping Gene w/ Strongest Effect Size, Chimp")
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.04050145 0.04422162 0.58588865 6440 995 302 42
102 0.025 0.02 0.05785921 0.05913356 0.90751480 6342 977 400 60
103 0.025 0.03 0.06557377 0.06710374 0.88482114 6288 969 454 68
104 0.025 0.04 0.08003857 0.07828770 0.87022587 6216 954 526 83
105 0.025 0.05 0.09161041 0.08792904 0.69596818 6153 942 589 95
106 0.025 0.06 0.10607522 0.09949865 0.48126631 6078 927 664 110
107 0.025 0.07 0.11571842 0.10824013 0.43602319 6020 917 722 120
108 0.025 0.08 0.12825458 0.11813858 0.30187099 5956 904 786 133
109 0.025 0.09 0.13886210 0.13022239 0.40178621 5873 893 869 144
110 0.025 0.10 0.15332690 0.13909243 0.16922958 5819 878 923 159
111 0.025 0.11 0.16393443 0.14899087 0.16007538 5753 867 989 170
112 0.025 0.12 0.17647059 0.16068903 0.14957291 5675 854 1067 183
113 0.025 0.13 0.18225651 0.17058748 0.30360077 5604 848 1138 189
114 0.025 0.14 0.19672131 0.18151433 0.18635635 5534 833 1208 204
115 0.025 0.15 0.21215043 0.19449801 0.13347864 5449 817 1293 220
116 0.025 0.16 0.22179364 0.20439645 0.14680121 5382 807 1360 230
117 0.025 0.17 0.24204436 0.21943695 0.06442748 5286 786 1456 251
118 0.025 0.18 0.25168756 0.23113511 0.09958999 5205 776 1537 261
119 0.025 0.19 0.26422372 0.24424733 0.11653410 5116 763 1626 274
120 0.025 0.20 0.27386692 0.26005913 0.29336155 5003 753 1739 284
121 0.025 0.21 0.27965284 0.26905772 0.43022442 4939 747 1803 290
122 0.025 0.22 0.29990357 0.28474097 0.26049382 4838 726 1904 311
123 0.025 0.23 0.31629701 0.29708189 0.15619392 4759 709 1983 328
124 0.025 0.24 0.33461909 0.31263659 0.10864494 4657 690 2085 347
125 0.025 0.25 0.34619094 0.32754853 0.18074156 4553 678 2189 359
126 0.025 0.26 0.36162006 0.34657411 0.28977852 4421 662 2321 375
127 0.025 0.27 0.37512054 0.36264301 0.38810623 4310 648 2432 389
128 0.025 0.28 0.39054966 0.37871192 0.41806836 4201 632 2541 405
129 0.025 0.29 0.39922854 0.39375241 0.72366660 4093 623 2649 414
130 0.025 0.30 0.41562199 0.41046407 0.74231587 3980 606 2762 431
131 0.025 0.31 0.43105111 0.42434760 0.66326352 3888 590 2854 447
132 0.025 0.32 0.44165863 0.43733128 0.78860016 3798 579 2944 458
133 0.025 0.33 0.45998071 0.45532845 0.77208369 3677 560 3065 477
134 0.025 0.34 0.47348120 0.47062604 0.86937694 3572 546 3170 491
135 0.025 0.35 0.48216008 0.48412392 0.91831516 3476 537 3266 500
136 0.025 0.36 0.49469624 0.49697905 0.90086173 3389 524 3353 513
137 0.025 0.37 0.51591128 0.51394781 0.91834496 3279 502 3463 535
138 0.025 0.38 0.52555448 0.53194498 0.68211302 3149 492 3593 545
139 0.025 0.39 0.54773385 0.55084201 0.85509928 3025 469 3717 568
140 0.025 0.40 0.55930569 0.56588250 0.67058671 2920 457 3822 580
141 0.025 0.41 0.57377049 0.58015169 0.67927824 2824 442 3918 595
142 0.025 0.42 0.59402122 0.59634915 0.89646267 2719 421 4023 616
143 0.025 0.43 0.60559306 0.61164674 0.69252726 2612 409 4130 628
144 0.025 0.44 0.62005786 0.62630158 0.68037591 2513 394 4229 643
145 0.025 0.45 0.63645130 0.64378455 0.62069306 2394 377 4348 660
146 0.025 0.46 0.64416586 0.65702532 0.36710736 2299 369 4443 668
147 0.025 0.47 0.65380906 0.67026610 0.23983117 2206 359 4536 678
148 0.025 0.48 0.66730955 0.68222137 0.28372763 2127 345 4615 692
149 0.025 0.49 0.68081003 0.69070575 0.48112428 2075 331 4667 706
150 0.025 0.50 0.69334619 0.70407507 0.43745365 1984 318 4758 719
151 0.025 0.51 0.70877531 0.71358786 0.74039302 1926 302 4816 735
152 0.025 0.52 0.72613308 0.72695719 0.97881899 1840 284 4902 753
153 0.025 0.53 0.73288332 0.73736984 0.75294610 1766 277 4976 760
154 0.025 0.54 0.74252652 0.74701118 0.75012509 1701 267 5041 770
155 0.025 0.55 0.75699132 0.75870935 0.92041380 1625 252 5117 785
156 0.025 0.56 0.76470588 0.76809359 0.81177745 1560 244 5182 793
157 0.025 0.57 0.76759884 0.77657797 0.48043032 1497 241 5245 796
158 0.025 0.58 0.78013500 0.78621931 0.63645097 1435 228 5307 809
159 0.025 0.59 0.78881389 0.79431804 0.66736204 1381 219 5361 818
160 0.025 0.60 0.80327869 0.80331662 1.00000000 1326 204 5416 833
161 0.025 0.61 0.80810029 0.81090114 0.83771287 1272 199 5470 838
162 0.025 0.62 0.82063645 0.81822856 0.86286875 1228 186 5514 851
163 0.025 0.63 0.82256509 0.82568454 0.80997545 1172 184 5570 853
164 0.025 0.64 0.83124397 0.83404037 0.82963969 1116 175 5626 862
165 0.025 0.65 0.83992285 0.84188199 0.88863812 1064 166 5678 871
166 0.025 0.66 0.84763742 0.84985217 0.86676202 1010 158 5732 879
167 0.025 0.67 0.85342334 0.85576552 0.85469454 970 152 5772 885
168 0.025 0.68 0.86017358 0.86013626 1.00000000 943 145 5799 892
169 0.025 0.69 0.86499518 0.86592107 0.96407055 903 140 5839 897
170 0.025 0.70 0.88138862 0.87273428 0.39632701 867 123 5875 914
171 0.025 0.71 0.88331726 0.87787633 0.60037691 829 121 5913 916
172 0.025 0.72 0.89006750 0.88546086 0.65416282 777 114 5965 923
173 0.025 0.73 0.89392478 0.88996015 0.70028750 746 110 5996 927
174 0.025 0.74 0.89681774 0.89535930 0.91214970 707 107 6035 930
175 0.025 0.75 0.90163934 0.90037280 0.92781737 673 102 6069 935
176 0.025 0.76 0.90646095 0.90461499 0.87241173 645 97 6097 940
177 0.025 0.77 0.91224687 0.91027124 0.85655670 607 91 6135 946
178 0.025 0.78 0.92285439 0.91567039 0.40414591 576 80 6166 957
179 0.025 0.79 0.92478303 0.92016969 0.59802288 543 78 6199 959
180 0.025 0.80 0.92574735 0.92428333 0.89784562 512 77 6230 960
181 0.025 0.81 0.92767599 0.92736856 1.00000000 490 75 6252 962
182 0.025 0.82 0.93249759 0.93238205 1.00000000 456 70 6286 967
183 0.025 0.83 0.93731919 0.93778121 1.00000000 419 65 6323 972
184 0.025 0.84 0.94117647 0.94099499 1.00000000 398 61 6344 976
185 0.025 0.85 0.94696239 0.94433732 0.74646237 378 55 6364 982
186 0.025 0.86 0.95081967 0.94935082 0.87631133 343 51 6399 986
187 0.025 0.87 0.95178399 0.95269315 0.94453337 318 50 6424 987
188 0.025 0.88 0.95756991 0.95732099 1.00000000 288 44 6454 993
189 0.025 0.89 0.96046287 0.96117753 0.96679200 261 41 6481 996
190 0.025 0.90 0.96432015 0.96451986 1.00000000 239 37 6503 1000
191 0.025 0.91 0.96817743 0.96696233 0.88720191 224 33 6518 1004
192 0.025 0.92 0.97396336 0.97120453 0.63770366 197 27 6545 1010
193 0.025 0.93 0.97685632 0.97531816 0.81387761 168 24 6574 1013
194 0.025 0.94 0.97782064 0.97814629 1.00000000 147 23 6595 1014
195 0.025 0.95 0.98457088 0.98213138 0.60929212 123 16 6619 1021
196 0.025 0.96 0.99035680 0.98624502 0.28105941 97 10 6645 1027
197 0.025 0.97 0.99228544 0.99010156 0.55211311 69 8 6673 1029
198 0.025 0.98 0.99421408 0.99305823 0.77896570 48 6 6694 1031
199 0.025 0.99 0.99807136 0.99652912 0.53296052 25 2 6717 1035
200 0.025 1.00 1.00000000 1.00000000 0.00000000 0 0 6742 1037
enrichment.plotter(gene.hic.filt, "median_FDR.H", "adj.P.Val", "Median FDR of Hi-C Contacts Overlapping Gene, Human")
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC
101 0.025 0.01 0.02013423 0.02099433 0.92655649 6579 1022 142
102 0.025 0.02 0.02588686 0.02743431 0.82045151 6535 1016 186
103 0.025 0.03 0.02876318 0.02910871 1.00000000 6525 1013 196
104 0.025 0.04 0.03259827 0.03232870 1.00000000 6504 1009 217
105 0.025 0.05 0.03547459 0.03464709 0.94732058 6489 1006 232
106 0.025 0.06 0.04026846 0.03735188 0.65549902 6473 1001 248
107 0.025 0.07 0.04122723 0.03915507 0.77561650 6460 1000 261
108 0.025 0.08 0.04410355 0.04070067 0.60755904 6451 997 270
109 0.025 0.09 0.04793864 0.04211747 0.35593456 6444 993 277
110 0.025 0.10 0.04985618 0.04314786 0.28728637 6438 991 283
111 0.025 0.11 0.05273250 0.04456466 0.19589493 6430 988 291
112 0.025 0.12 0.05465005 0.04662545 0.21415801 6416 986 305
113 0.025 0.13 0.05560882 0.04752705 0.21487034 6410 985 311
114 0.025 0.14 0.05560882 0.04829985 0.26885388 6404 985 317
115 0.025 0.15 0.05656759 0.05010304 0.34094463 6391 984 330
116 0.025 0.16 0.05944391 0.05280783 0.33930638 6373 981 348
117 0.025 0.17 0.06327900 0.05577022 0.28765695 6354 977 367
118 0.025 0.18 0.06519655 0.05692942 0.24335327 6347 975 374
119 0.025 0.19 0.06519655 0.05989181 0.48026922 6324 975 397
120 0.025 0.20 0.06903164 0.06272540 0.40419343 6306 971 415
121 0.025 0.21 0.07286673 0.06555899 0.33825584 6288 967 433
122 0.025 0.22 0.07478428 0.06890778 0.45954080 6264 965 457
123 0.025 0.23 0.07861937 0.07122617 0.35078791 6250 961 471
124 0.025 0.24 0.08149569 0.07444616 0.38495587 6228 958 493
125 0.025 0.25 0.08341323 0.07766615 0.49449973 6205 956 516
126 0.025 0.26 0.08341323 0.08088614 0.79433669 6180 956 541
127 0.025 0.27 0.08533078 0.08397733 0.91289378 6158 954 563
128 0.025 0.28 0.09300096 0.08925811 0.69114225 6125 946 596
129 0.025 0.29 0.09971237 0.09312210 0.46544355 6102 939 619
130 0.025 0.30 0.10354746 0.09763009 0.52481008 6071 935 650
131 0.025 0.31 0.10642378 0.10110768 0.57759985 6047 932 674
132 0.025 0.32 0.10930010 0.10432767 0.60991646 6025 929 696
133 0.025 0.33 0.11409396 0.11038125 0.72022923 5983 924 738
134 0.025 0.34 0.11984660 0.11591963 0.70854110 5946 918 775
135 0.025 0.35 0.12464046 0.12313241 0.91346815 5895 913 826
136 0.025 0.36 0.13039310 0.12995878 1.00000000 5848 907 873
137 0.025 0.37 0.14093960 0.13691396 0.72027961 5805 896 916
138 0.025 0.38 0.14765101 0.14502834 0.83267369 5749 889 972
139 0.025 0.39 0.15915628 0.15262751 0.55929943 5702 877 1019
140 0.025 0.40 0.16490892 0.15984029 0.66378368 5652 871 1069
141 0.025 0.41 0.17353787 0.16834106 0.66165589 5595 862 1126
142 0.025 0.42 0.18791946 0.17851623 0.41858665 5531 847 1190
143 0.025 0.43 0.19463087 0.18572901 0.45219080 5482 840 1239
144 0.025 0.44 0.20230105 0.19448738 0.52009610 5422 832 1299
145 0.025 0.45 0.21284756 0.20672334 0.62849152 5338 821 1383
146 0.025 0.46 0.22243528 0.21612571 0.62295233 5275 811 1446
147 0.025 0.47 0.24065197 0.22784647 0.30770530 5203 792 1518
148 0.025 0.48 0.25023969 0.23647604 0.27784331 5146 782 1575
149 0.025 0.49 0.26174497 0.24742401 0.26553000 5073 770 1648
150 0.025 0.50 0.27804410 0.26146316 0.20341754 4981 753 1740
151 0.025 0.51 0.29434324 0.27472952 0.13677284 4895 736 1826
152 0.025 0.52 0.30872483 0.28619268 0.09034106 4821 721 1900
153 0.025 0.53 0.32502397 0.29933024 0.05599162 4736 704 1985
154 0.025 0.54 0.33557047 0.31259660 0.09210379 4644 693 2077
155 0.025 0.55 0.35091083 0.32599176 0.07032863 4556 677 2165
156 0.025 0.56 0.36625120 0.33977331 0.05673441 4465 661 2256
157 0.025 0.57 0.38446788 0.35561566 0.03965277 4361 642 2360
158 0.025 0.58 0.40460211 0.37107161 0.01755911 4262 621 2459
159 0.025 0.59 0.41323106 0.38588357 0.05538974 4156 612 2565
160 0.025 0.60 0.43432407 0.40211231 0.02467845 4052 590 2669
161 0.025 0.61 0.44774688 0.41666667 0.03120024 3953 576 2768
162 0.025 0.62 0.46692234 0.43611540 0.03377046 3822 556 2899
163 0.025 0.63 0.48705657 0.45749614 0.04274404 3677 535 3044
164 0.025 0.64 0.49952061 0.47707367 0.12685051 3538 522 3183
165 0.025 0.65 0.51294343 0.49497682 0.22472401 3413 508 3308
166 0.025 0.66 0.52732502 0.51481195 0.40328379 3274 493 3447
167 0.025 0.67 0.54170662 0.53310149 0.57183835 3147 478 3574
168 0.025 0.68 0.56759348 0.55074704 0.25339840 3037 451 3684
169 0.025 0.69 0.58389262 0.56826378 0.28839546 2918 434 3803
170 0.025 0.70 0.61073826 0.58990211 0.15082515 2778 406 3943
171 0.025 0.71 0.63374880 0.60754766 0.06748084 2665 382 4056
172 0.025 0.72 0.65004794 0.63060278 0.17256823 2503 365 4218
173 0.025 0.73 0.67305849 0.65236991 0.14074334 2358 341 4363
174 0.025 0.74 0.68744008 0.67014426 0.21441160 2235 326 4486
175 0.025 0.75 0.70373921 0.68920659 0.29189070 2104 309 4617
176 0.025 0.76 0.72962608 0.70955693 0.13416802 1973 282 4748
177 0.025 0.77 0.74304890 0.72964967 0.31262534 1831 268 4890
178 0.025 0.78 0.76318313 0.75115920 0.35399504 1685 247 5036
179 0.025 0.79 0.78044104 0.76944874 0.38627201 1561 229 5160
180 0.025 0.80 0.79578140 0.78696548 0.47973730 1441 213 5280
181 0.025 0.81 0.80441035 0.80667182 0.87552975 1297 204 5424
182 0.025 0.82 0.82646213 0.82457496 0.89776758 1181 181 5540
183 0.025 0.83 0.84276127 0.84067491 0.87886192 1073 164 5648
184 0.025 0.84 0.86385427 0.85806285 0.59725740 960 142 5761
185 0.025 0.85 0.87823586 0.87287481 0.61097697 860 127 5861
186 0.025 0.86 0.89741131 0.88794436 0.32265851 763 107 5958
187 0.025 0.87 0.91658677 0.90056672 0.07144170 685 87 6036
188 0.025 0.88 0.92713327 0.91228748 0.07792306 605 76 6116
189 0.025 0.89 0.93767977 0.92490984 0.10548658 518 65 6203
190 0.025 0.90 0.94630872 0.93495621 0.12590115 449 56 6272
191 0.025 0.91 0.95781400 0.94345698 0.03701412 395 44 6326
192 0.025 0.92 0.96452541 0.95337455 0.07893351 325 37 6396
193 0.025 0.93 0.97027804 0.96161772 0.13938057 267 31 6454
194 0.025 0.94 0.97698945 0.96831530 0.10438754 222 24 6499
195 0.025 0.95 0.98465964 0.97514168 0.04389353 177 16 6544
196 0.025 0.96 0.99041227 0.98261206 0.05188634 125 10 6596
197 0.025 0.97 0.99232982 0.98737764 0.16431524 90 8 6631
198 0.025 0.98 0.99328859 0.99149923 0.62039809 59 7 6662
199 0.025 0.99 0.99616491 0.99600721 1.00000000 27 4 6694
200 0.025 1.00 1.00000000 1.00000000 0.00000000 0 0 6721
Dboth
101 21
102 27
103 30
104 34
105 37
106 42
107 43
108 46
109 50
110 52
111 55
112 57
113 58
114 58
115 59
116 62
117 66
118 68
119 68
120 72
121 76
122 78
123 82
124 85
125 87
126 87
127 89
128 97
129 104
130 108
131 111
132 114
133 119
134 125
135 130
136 136
137 147
138 154
139 166
140 172
141 181
142 196
143 203
144 211
145 222
146 232
147 251
148 261
149 273
150 290
151 307
152 322
153 339
154 350
155 366
156 382
157 401
158 422
159 431
160 453
161 467
162 487
163 508
164 521
165 535
166 550
167 565
168 592
169 609
170 637
171 661
172 678
173 702
174 717
175 734
176 761
177 775
178 796
179 814
180 830
181 839
182 862
183 879
184 901
185 916
186 936
187 956
188 967
189 978
190 987
191 999
192 1006
193 1012
194 1019
195 1027
196 1033
197 1035
198 1036
199 1039
200 1043
enrichment.plotter(gene.hic.filt, "median_FDR.C", "adj.P.Val", "Median FDR of Hi-C Contacts Overlapping Gene, Chimp")
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC
101 0.025 0.01 0.01928640 0.02082530 0.797970136 6600 1017 142
102 0.025 0.02 0.02507232 0.02686721 0.778846246 6559 1011 183
103 0.025 0.03 0.02796528 0.02879547 0.942611913 6547 1008 195
104 0.025 0.04 0.03182257 0.03226636 1.000000000 6524 1004 218
105 0.025 0.05 0.03471553 0.03432318 1.000000000 6511 1001 231
106 0.025 0.06 0.03857281 0.03689420 0.826221764 6495 997 247
107 0.025 0.07 0.04050145 0.03817971 0.739847555 6487 995 255
108 0.025 0.08 0.04243009 0.03959378 0.676247109 6478 993 264
109 0.025 0.09 0.04435873 0.04062219 0.568523068 6472 991 270
110 0.025 0.10 0.04532305 0.04139350 0.549417045 6467 990 275
111 0.025 0.11 0.04918033 0.04280756 0.314126703 6460 986 282
112 0.025 0.12 0.05110897 0.04486438 0.335599946 6446 984 296
113 0.025 0.13 0.05110897 0.04537858 0.383085940 6442 984 300
114 0.025 0.14 0.05110897 0.04679265 0.530018036 6431 984 311
115 0.025 0.15 0.05207329 0.04859236 0.629502431 6418 983 324
116 0.025 0.16 0.05400193 0.05077773 0.665723121 6403 981 339
117 0.025 0.17 0.05882353 0.05360586 0.467075368 6386 976 356
118 0.025 0.18 0.06075217 0.05489137 0.414020852 6378 974 364
119 0.025 0.19 0.06364513 0.05810515 0.454553955 6356 971 386
120 0.025 0.20 0.06557377 0.06016197 0.473307851 6342 969 400
121 0.025 0.21 0.06846673 0.06260445 0.442357075 6326 966 416
122 0.025 0.22 0.06943105 0.06556113 0.635894198 6304 965 438
123 0.025 0.23 0.07232401 0.06800360 0.597935076 6288 962 454
124 0.025 0.24 0.07618129 0.07108883 0.534875220 6268 958 474
125 0.025 0.25 0.07907425 0.07391696 0.536513687 6249 955 493
126 0.025 0.26 0.08003857 0.07751639 0.791868051 6222 954 520
127 0.025 0.27 0.08293153 0.08034452 0.788816252 6203 951 539
128 0.025 0.28 0.08968177 0.08432961 0.544371671 6179 944 563
129 0.025 0.29 0.09450338 0.08780049 0.447054748 6157 939 585
130 0.025 0.30 0.09739634 0.09307109 0.647272374 6119 936 623
131 0.025 0.31 0.09932498 0.09615632 0.752597931 6097 934 645
132 0.025 0.32 0.10318226 0.10091271 0.837370379 6064 930 678
133 0.025 0.33 0.10896818 0.10618331 0.795983400 6029 924 713
134 0.025 0.34 0.11282546 0.11158247 0.933384713 5991 920 751
135 0.025 0.35 0.11668274 0.11801003 0.927800480 5945 916 797
136 0.025 0.36 0.11957570 0.12379483 0.694700141 5903 913 839
137 0.025 0.37 0.13018322 0.13086515 0.983652974 5859 902 883
138 0.025 0.38 0.13693346 0.13934953 0.846830347 5800 895 942
139 0.025 0.39 0.14946962 0.14680550 0.831124383 5755 882 987
140 0.025 0.40 0.15621986 0.15323306 0.809930128 5712 875 1030
141 0.025 0.41 0.16586307 0.16171744 0.730717294 5656 865 1086
142 0.025 0.42 0.18129219 0.17084458 0.359733668 5601 849 1141
143 0.025 0.43 0.18322083 0.17714359 0.612211918 5554 847 1188
144 0.025 0.44 0.18997107 0.18549942 0.722571949 5496 840 1246
145 0.025 0.45 0.20250723 0.19732613 0.682973826 5417 827 1325
146 0.025 0.46 0.21311475 0.20606762 0.574506386 5360 816 1382
147 0.025 0.47 0.22661524 0.21827998 0.510792104 5279 802 1463
148 0.025 0.48 0.23722276 0.22663581 0.403781437 5225 791 1517
149 0.025 0.49 0.25265188 0.23679136 0.210824127 5162 775 1580
150 0.025 0.50 0.27000964 0.25118910 0.143571408 5068 757 1674
151 0.025 0.51 0.28447445 0.26442988 0.124954421 4980 742 1762
152 0.025 0.52 0.30183221 0.27612804 0.051007253 4907 724 1835
153 0.025 0.53 0.31340405 0.28936881 0.072395763 4816 712 1926
154 0.025 0.54 0.32979749 0.30273814 0.045398665 4729 695 2013
155 0.025 0.55 0.34233365 0.31572182 0.051827427 4641 682 2101
156 0.025 0.56 0.35679846 0.32767708 0.034806659 4563 667 2179
157 0.025 0.57 0.37222758 0.34323178 0.037769227 4458 651 2284
158 0.025 0.58 0.38862102 0.35801517 0.029745580 4360 634 2382
159 0.025 0.59 0.39826422 0.37189870 0.063958841 4262 624 2480
160 0.025 0.60 0.42044359 0.38925312 0.029365639 4150 601 2592
161 0.025 0.61 0.43780135 0.40532202 0.024172176 4043 583 2699
162 0.025 0.62 0.45708775 0.42383340 0.021791292 3919 563 2823
163 0.025 0.63 0.47926712 0.44453015 0.017102275 3781 540 2961
164 0.025 0.64 0.49373192 0.46227021 0.031608716 3658 525 3084
165 0.025 0.65 0.50916104 0.47923898 0.041505924 3542 509 3200
166 0.025 0.66 0.52362584 0.49903587 0.095354196 3403 494 3339
167 0.025 0.67 0.54676953 0.51831855 0.052841878 3277 470 3465
168 0.025 0.68 0.56894889 0.53708703 0.029489819 3154 447 3588
169 0.025 0.69 0.59016393 0.55636971 0.020376133 3026 425 3716
170 0.025 0.70 0.61427194 0.57950893 0.016299069 2871 400 3871
171 0.025 0.71 0.63837994 0.59802031 0.004902755 2752 375 3990
172 0.025 0.72 0.65284474 0.62077388 0.024320797 2590 360 4152
173 0.025 0.73 0.67695275 0.64237048 0.013856895 2447 335 4295
174 0.025 0.74 0.68948891 0.65959635 0.031797388 2326 322 4416
175 0.025 0.75 0.70202507 0.67900758 0.094980543 2188 309 4554
176 0.025 0.76 0.72613308 0.69983288 0.051354684 2051 284 4691
177 0.025 0.77 0.74445516 0.72142949 0.081954268 1902 265 4840
178 0.025 0.78 0.76759884 0.74302610 0.056516589 1758 241 4984
179 0.025 0.79 0.78592093 0.76179458 0.054866081 1631 222 5111
180 0.025 0.80 0.80038573 0.77902044 0.081681332 1512 207 5230
181 0.025 0.81 0.80906461 0.79778892 0.352602815 1375 198 5367
182 0.025 0.82 0.82642237 0.81578609 0.364911907 1253 180 5489
183 0.025 0.83 0.84185149 0.83262630 0.417870153 1138 164 5604
184 0.025 0.84 0.85920926 0.84869521 0.332863839 1031 146 5711
185 0.025 0.85 0.87367406 0.86617817 0.476100546 910 131 5832
186 0.025 0.86 0.89392478 0.88301838 0.261901153 800 110 5942
187 0.025 0.87 0.91031823 0.89613061 0.120210055 715 93 6027
188 0.025 0.88 0.91610415 0.90821442 0.374829047 627 87 6115
189 0.025 0.89 0.92960463 0.92132665 0.316532659 539 73 6203
190 0.025 0.90 0.93828351 0.93148220 0.386917713 469 64 6273
191 0.025 0.91 0.94889103 0.94073788 0.261101994 408 53 6334
192 0.025 0.92 0.95949855 0.95192184 0.251333678 332 42 6410
193 0.025 0.93 0.96432015 0.96040622 0.542694407 271 37 6471
194 0.025 0.94 0.96721311 0.96747654 1.000000000 219 34 6523
195 0.025 0.95 0.97878496 0.97544672 0.523234415 169 22 6573
196 0.025 0.96 0.98553520 0.98225993 0.464227705 123 15 6619
197 0.025 0.97 0.98842816 0.98753053 0.896956228 85 12 6657
198 0.025 0.98 0.98939248 0.99138707 0.571288857 56 11 6686
199 0.025 0.99 0.99614272 0.99601491 1.000000000 27 4 6715
200 0.025 1.00 1.00000000 1.00000000 0.000000000 0 0 6742
Dboth
101 20
102 26
103 29
104 33
105 36
106 40
107 42
108 44
109 46
110 47
111 51
112 53
113 53
114 53
115 54
116 56
117 61
118 63
119 66
120 68
121 71
122 72
123 75
124 79
125 82
126 83
127 86
128 93
129 98
130 101
131 103
132 107
133 113
134 117
135 121
136 124
137 135
138 142
139 155
140 162
141 172
142 188
143 190
144 197
145 210
146 221
147 235
148 246
149 262
150 280
151 295
152 313
153 325
154 342
155 355
156 370
157 386
158 403
159 413
160 436
161 454
162 474
163 497
164 512
165 528
166 543
167 567
168 590
169 612
170 637
171 662
172 677
173 702
174 715
175 728
176 753
177 772
178 796
179 815
180 830
181 839
182 857
183 873
184 891
185 906
186 927
187 944
188 950
189 964
190 973
191 984
192 995
193 1000
194 1003
195 1015
196 1022
197 1025
198 1026
199 1033
200 1037
enrichment.plotter(gene.hic.filt, "weighted_Z.s2post.H", "adj.P.Val", "FDR for Weighted p-val Combine of Hi-C Contacts Overlapping Gene, Human")
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.6931927 0.6813498 0.3972617 2154 320 4567 723
102 0.025 0.02 0.7037392 0.6926842 0.4262433 2077 309 4644 734
103 0.025 0.03 0.7085331 0.6971922 0.4119093 2047 304 4674 739
104 0.025 0.04 0.7142857 0.7028594 0.4057078 2009 298 4712 745
105 0.025 0.05 0.7190796 0.7064657 0.3549975 1986 293 4735 750
106 0.025 0.06 0.7238734 0.7102009 0.3127680 1962 288 4759 755
107 0.025 0.07 0.7248322 0.7122617 0.3538870 1947 287 4774 756
108 0.025 0.08 0.7257910 0.7145801 0.4094726 1930 286 4791 757
109 0.025 0.09 0.7286673 0.7172849 0.4006753 1912 283 4809 760
110 0.025 0.10 0.7286673 0.7206337 0.5589481 1886 283 4835 760
111 0.025 0.11 0.7305849 0.7217929 0.5196454 1879 281 4842 762
112 0.025 0.12 0.7315436 0.7237249 0.5688617 1865 280 4856 763
113 0.025 0.13 0.7315436 0.7252705 0.6523268 1853 280 4868 763
114 0.025 0.14 0.7344199 0.7269449 0.5857314 1843 277 4878 766
115 0.025 0.15 0.7392138 0.7299073 0.4901449 1825 272 4896 771
116 0.025 0.16 0.7411314 0.7314529 0.4712415 1815 270 4906 773
117 0.025 0.17 0.7430489 0.7344152 0.5215973 1794 268 4927 775
118 0.025 0.18 0.7430489 0.7349304 0.5479962 1790 268 4931 775
119 0.025 0.19 0.7449664 0.7362184 0.5148587 1782 266 4939 777
120 0.025 0.20 0.7468840 0.7373776 0.4764457 1775 264 4946 779
121 0.025 0.21 0.7478428 0.7382792 0.4731672 1769 263 4952 780
122 0.025 0.22 0.7478428 0.7390520 0.5112206 1763 263 4958 780
123 0.025 0.23 0.7497603 0.7402112 0.4728104 1756 261 4965 782
124 0.025 0.24 0.7516779 0.7411128 0.4241548 1751 259 4970 784
125 0.025 0.25 0.7516779 0.7422720 0.4786914 1742 259 4979 784
126 0.025 0.26 0.7516779 0.7434312 0.5370215 1733 259 4988 784
127 0.025 0.27 0.7535954 0.7452344 0.5300933 1721 257 5000 786
128 0.025 0.28 0.7555129 0.7465224 0.4970437 1713 255 5008 788
129 0.025 0.29 0.7564717 0.7471664 0.4808983 1709 254 5012 789
130 0.025 0.30 0.7574305 0.7479392 0.4712536 1704 253 5017 790
131 0.025 0.31 0.7583893 0.7489696 0.4741843 1697 252 5024 791
132 0.025 0.32 0.7593480 0.7506440 0.5093372 1685 251 5036 792
133 0.025 0.33 0.7593480 0.7510304 0.5292214 1682 251 5039 792
134 0.025 0.34 0.7593480 0.7520608 0.5842127 1674 251 5047 792
135 0.025 0.35 0.7593480 0.7528336 0.6272298 1668 251 5053 792
136 0.025 0.36 0.7593480 0.7536064 0.6716460 1662 251 5059 792
137 0.025 0.37 0.7603068 0.7545080 0.6679048 1656 250 5065 793
138 0.025 0.38 0.7622244 0.7552808 0.6017349 1652 248 5069 795
139 0.025 0.39 0.7622244 0.7559248 0.6381185 1647 248 5074 795
140 0.025 0.40 0.7622244 0.7563112 0.6604139 1644 248 5077 795
141 0.025 0.41 0.7631831 0.7572128 0.6566643 1638 247 5083 796
142 0.025 0.42 0.7679770 0.7587584 0.4783072 1631 242 5090 801
143 0.025 0.43 0.7689358 0.7599176 0.4877400 1623 241 5098 802
144 0.025 0.44 0.7698945 0.7604328 0.4650896 1620 240 5101 803
145 0.025 0.45 0.7698945 0.7608192 0.4842679 1617 240 5104 803
146 0.025 0.46 0.7698945 0.7609480 0.4907593 1616 240 5105 803
147 0.025 0.47 0.7718121 0.7614632 0.4214972 1614 238 5107 805
148 0.025 0.48 0.7737296 0.7630088 0.4031619 1604 236 5117 807
149 0.025 0.49 0.7737296 0.7636528 0.4329490 1599 236 5122 807
150 0.025 0.50 0.7737296 0.7642968 0.4640294 1594 236 5127 807
151 0.025 0.51 0.7756472 0.7649408 0.4024955 1591 234 5130 809
152 0.025 0.52 0.7756472 0.7653272 0.4202417 1588 234 5133 809
153 0.025 0.53 0.7756472 0.7664863 0.4762909 1579 234 5142 809
154 0.025 0.54 0.7756472 0.7667439 0.4893072 1577 234 5144 809
155 0.025 0.55 0.7785235 0.7677743 0.3985457 1572 231 5149 812
156 0.025 0.56 0.7794823 0.7689335 0.4069948 1564 230 5157 813
157 0.025 0.57 0.7804410 0.7700927 0.4155751 1556 229 5165 814
158 0.025 0.58 0.7823586 0.7713807 0.3855222 1548 227 5173 816
159 0.025 0.59 0.7833174 0.7721535 0.3765925 1543 226 5178 817
160 0.025 0.60 0.7833174 0.7727975 0.4055625 1538 226 5183 817
161 0.025 0.61 0.7833174 0.7736991 0.4483874 1531 226 5190 817
162 0.025 0.62 0.7833174 0.7749871 0.5140273 1521 226 5200 817
163 0.025 0.63 0.7833174 0.7761463 0.5773590 1512 226 5209 817
164 0.025 0.64 0.7833174 0.7769191 0.6216599 1506 226 5215 817
165 0.025 0.65 0.7842761 0.7780783 0.6328810 1498 225 5223 818
166 0.025 0.66 0.7861937 0.7791087 0.5804679 1492 223 5229 820
167 0.025 0.67 0.7871524 0.7796239 0.5549829 1489 222 5232 821
168 0.025 0.68 0.7871524 0.7803967 0.5987092 1483 222 5238 821
169 0.025 0.69 0.7881112 0.7818135 0.6248479 1473 221 5248 822
170 0.025 0.70 0.7890700 0.7821999 0.5909646 1471 220 5250 823
171 0.025 0.71 0.7900288 0.7827151 0.5651334 1468 219 5253 824
172 0.025 0.72 0.7900288 0.7831015 0.5870885 1465 219 5256 824
173 0.025 0.73 0.7900288 0.7842607 0.6553850 1456 219 5265 824
174 0.025 0.74 0.7900288 0.7852911 0.7188717 1448 219 5273 824
175 0.025 0.75 0.7900288 0.7864503 0.7929415 1439 219 5282 824
176 0.025 0.76 0.7909875 0.7883823 0.8566349 1425 218 5296 825
177 0.025 0.77 0.7909875 0.7892839 0.9170080 1418 218 5303 825
178 0.025 0.78 0.7929051 0.7904431 0.8657246 1411 216 5310 827
179 0.025 0.79 0.7938639 0.7916023 0.8789435 1403 215 5318 828
180 0.025 0.80 0.7948226 0.7931479 0.9184080 1392 214 5329 829
181 0.025 0.81 0.7957814 0.7944359 0.9406938 1383 213 5338 830
182 0.025 0.82 0.7957814 0.7953375 1.0000000 1376 213 5345 830
183 0.025 0.83 0.7976989 0.7962391 0.9326620 1371 211 5350 832
184 0.025 0.84 0.7986577 0.7972694 0.9374527 1364 210 5357 833
185 0.025 0.85 0.7986577 0.7977846 0.9728592 1360 210 5361 833
186 0.025 0.86 0.8015340 0.7994590 0.8899851 1350 207 5371 836
187 0.025 0.87 0.8034516 0.8007470 0.8466728 1342 205 5379 838
188 0.025 0.88 0.8072867 0.8015198 0.6454091 1340 201 5381 842
189 0.025 0.89 0.8092042 0.8029366 0.6134962 1331 199 5390 844
190 0.025 0.90 0.8111218 0.8037094 0.5445943 1327 197 5394 846
191 0.025 0.91 0.8130393 0.8048686 0.5005267 1320 195 5401 848
192 0.025 0.92 0.8168744 0.8080886 0.4640761 1299 191 5422 852
193 0.025 0.93 0.8168744 0.8102782 0.5881470 1282 191 5439 852
194 0.025 0.94 0.8187919 0.8129830 0.6351886 1263 189 5458 854
195 0.025 0.95 0.8207095 0.8162030 0.7181698 1240 187 5481 856
196 0.025 0.96 0.8226270 0.8201958 0.8599597 1211 185 5510 858
197 0.025 0.97 0.8274209 0.8247038 0.8381277 1181 180 5540 863
198 0.025 0.98 0.8350911 0.8297269 0.6519196 1150 172 5571 871
199 0.025 0.99 0.8427613 0.8362957 0.5744072 1107 164 5614 879
200 0.025 1.00 1.0000000 1.0000000 0.0000000 0 0 6721 1043
enrichment.plotter(gene.hic.filt, "weighted_Z.s2post.C", "adj.P.Val", "FDR for Weighted p-val Combine of Hi-C Contacts Overlapping Gene, Chimp")
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.6856316 0.6751510 0.46021414 2201 326 4541 711
102 0.025 0.02 0.6962392 0.6855637 0.44760222 2131 315 4611 722
103 0.025 0.03 0.7020251 0.6898059 0.38011907 2104 309 4638 728
104 0.025 0.04 0.7078110 0.6949479 0.35230133 2070 303 4672 734
105 0.025 0.05 0.7135969 0.6986759 0.27636813 2047 297 4695 740
106 0.025 0.06 0.7164899 0.7018897 0.28570217 2025 294 4717 743
107 0.025 0.07 0.7193828 0.7040751 0.26122974 2011 291 4731 746
108 0.025 0.08 0.7222758 0.7063890 0.24198531 1996 288 4746 749
109 0.025 0.09 0.7242044 0.7085744 0.24888459 1981 286 4761 751
110 0.025 0.10 0.7242044 0.7119167 0.36720698 1955 286 4787 751
111 0.025 0.11 0.7261331 0.7129451 0.33129286 1949 284 4793 753
112 0.025 0.12 0.7270974 0.7146163 0.35805795 1937 283 4805 754
113 0.025 0.13 0.7280617 0.7162874 0.38623713 1925 282 4817 755
114 0.025 0.14 0.7299904 0.7180872 0.37991961 1913 280 4829 757
115 0.025 0.15 0.7338476 0.7213009 0.35197339 1892 276 4850 761
116 0.025 0.16 0.7367406 0.7232292 0.31376976 1880 273 4862 764
117 0.025 0.17 0.7377049 0.7254146 0.36009552 1864 272 4878 765
118 0.025 0.18 0.7386692 0.7261859 0.35187595 1859 271 4883 766
119 0.025 0.19 0.7396336 0.7273428 0.35902517 1851 270 4891 767
120 0.025 0.20 0.7405979 0.7279856 0.34573331 1847 269 4895 768
121 0.025 0.21 0.7415622 0.7288855 0.34267463 1841 268 4901 769
122 0.025 0.22 0.7415622 0.7295282 0.36835975 1836 268 4906 769
123 0.025 0.23 0.7434908 0.7306852 0.33658296 1829 266 4913 771
124 0.025 0.24 0.7473481 0.7319707 0.24473439 1823 262 4919 775
125 0.025 0.25 0.7473481 0.7336419 0.30077161 1810 262 4932 775
126 0.025 0.26 0.7473481 0.7353130 0.36502563 1797 262 4945 775
127 0.025 0.27 0.7492768 0.7373698 0.36917169 1783 260 4959 777
128 0.025 0.28 0.7512054 0.7383983 0.33203979 1777 258 4965 779
129 0.025 0.29 0.7531340 0.7394267 0.29732325 1771 256 4971 781
130 0.025 0.30 0.7550627 0.7403265 0.26079136 1766 254 4976 783
131 0.025 0.31 0.7560270 0.7414835 0.26659559 1758 253 4984 784
132 0.025 0.32 0.7569913 0.7432832 0.29495148 1745 252 4997 785
133 0.025 0.33 0.7569913 0.7436689 0.30901872 1742 252 5000 785
134 0.025 0.34 0.7579556 0.7444402 0.30132508 1737 251 5005 786
135 0.025 0.35 0.7579556 0.7454686 0.34043709 1729 251 5013 786
136 0.025 0.36 0.7589200 0.7463684 0.33728650 1723 250 5019 787
137 0.025 0.37 0.7608486 0.7473968 0.30183959 1717 248 5025 789
138 0.025 0.38 0.7627772 0.7480396 0.25603078 1714 246 5028 791
139 0.025 0.39 0.7627772 0.7486823 0.27768691 1709 246 5033 791
140 0.025 0.40 0.7627772 0.7490680 0.29128122 1706 246 5036 791
141 0.025 0.41 0.7637416 0.7497108 0.27929964 1702 245 5040 792
142 0.025 0.42 0.7685632 0.7512534 0.17813969 1695 240 5047 797
143 0.025 0.43 0.7695275 0.7521532 0.17595841 1689 239 5053 798
144 0.025 0.44 0.7704918 0.7526674 0.16442762 1686 238 5056 799
145 0.025 0.45 0.7704918 0.7531816 0.17699899 1682 238 5060 799
146 0.025 0.46 0.7704918 0.7533102 0.18025366 1681 238 5061 799
147 0.025 0.47 0.7724204 0.7539530 0.14862023 1678 236 5064 801
148 0.025 0.48 0.7733848 0.7553670 0.15823145 1668 235 5074 802
149 0.025 0.49 0.7733848 0.7563954 0.18345675 1660 235 5082 802
150 0.025 0.50 0.7743491 0.7571667 0.17791938 1655 234 5087 803
151 0.025 0.51 0.7762777 0.7579380 0.14927166 1651 232 5091 805
152 0.025 0.52 0.7762777 0.7585808 0.16408545 1646 232 5096 805
153 0.025 0.53 0.7762777 0.7598663 0.19708411 1636 232 5106 805
154 0.025 0.54 0.7762777 0.7606376 0.21914754 1630 232 5112 805
155 0.025 0.55 0.7782064 0.7616660 0.19233559 1624 230 5118 807
156 0.025 0.56 0.7791707 0.7623088 0.18317497 1620 229 5122 808
157 0.025 0.57 0.7801350 0.7633372 0.18423861 1613 228 5129 809
158 0.025 0.58 0.7810993 0.7643656 0.18530647 1606 227 5136 810
159 0.025 0.59 0.7820636 0.7652655 0.18298868 1600 226 5142 811
160 0.025 0.60 0.7820636 0.7657797 0.19683399 1596 226 5146 811
161 0.025 0.61 0.7830280 0.7666795 0.19441671 1590 225 5152 812
162 0.025 0.62 0.7830280 0.7678365 0.22816994 1581 225 5161 812
163 0.025 0.63 0.7830280 0.7686078 0.25294590 1575 225 5167 812
164 0.025 0.64 0.7839923 0.7695076 0.25011205 1569 224 5173 813
165 0.025 0.65 0.7849566 0.7707932 0.26020650 1560 223 5182 814
166 0.025 0.66 0.7859209 0.7715645 0.25297990 1555 222 5187 815
167 0.025 0.67 0.7878496 0.7720787 0.20742390 1553 220 5189 817
168 0.025 0.68 0.7888139 0.7728500 0.20120588 1548 219 5194 818
169 0.025 0.69 0.7897782 0.7740069 0.20605049 1540 218 5202 819
170 0.025 0.70 0.7907425 0.7745211 0.19265044 1537 217 5205 820
171 0.025 0.71 0.7917068 0.7749068 0.17653650 1535 216 5207 821
172 0.025 0.72 0.7917068 0.7752925 0.18671530 1532 216 5210 821
173 0.025 0.73 0.7926712 0.7765780 0.19484369 1523 215 5219 822
174 0.025 0.74 0.7926712 0.7770922 0.20958350 1519 215 5223 822
175 0.025 0.75 0.7926712 0.7783777 0.25004309 1509 215 5233 822
176 0.025 0.76 0.7936355 0.7799203 0.26922711 1498 214 5244 823
177 0.025 0.77 0.7936355 0.7806916 0.29752312 1492 214 5250 823
178 0.025 0.78 0.7945998 0.7817200 0.29919696 1485 213 5257 824
179 0.025 0.79 0.7945998 0.7827484 0.34025411 1477 213 5265 824
180 0.025 0.80 0.7974928 0.7841625 0.28002460 1469 210 5273 827
181 0.025 0.81 0.7984571 0.7857051 0.30098146 1458 209 5284 828
182 0.025 0.82 0.7984571 0.7866050 0.33708380 1451 209 5291 828
183 0.025 0.83 0.8003857 0.7878905 0.30940098 1443 207 5299 830
184 0.025 0.84 0.8003857 0.7885332 0.33546641 1438 207 5304 830
185 0.025 0.85 0.8003857 0.7891760 0.36296578 1433 207 5309 830
186 0.025 0.86 0.8032787 0.7907186 0.30441762 1424 204 5318 833
187 0.025 0.87 0.8042430 0.7926469 0.34299283 1410 203 5332 834
188 0.025 0.88 0.8100289 0.7943180 0.19249185 1403 197 5339 840
189 0.025 0.89 0.8129219 0.7959892 0.15792902 1393 194 5349 843
190 0.025 0.90 0.8138862 0.7963749 0.14352988 1391 193 5351 844
191 0.025 0.91 0.8158149 0.7975318 0.12544667 1384 191 5358 846
192 0.025 0.92 0.8196721 0.8001028 0.09875858 1368 187 5374 850
193 0.025 0.93 0.8196721 0.8025453 0.14808855 1349 187 5393 850
194 0.025 0.94 0.8235294 0.8051163 0.11739041 1333 183 5409 854
195 0.025 0.95 0.8254581 0.8082016 0.14055283 1311 181 5431 856
196 0.025 0.96 0.8302797 0.8119296 0.11372824 1287 176 5455 861
197 0.025 0.97 0.8379942 0.8175858 0.07429587 1251 168 5491 869
198 0.025 0.98 0.8437801 0.8227279 0.06244267 1217 162 5525 875
199 0.025 0.99 0.8534233 0.8295411 0.03135694 1174 152 5568 885
200 0.025 1.00 1.0000000 1.0000000 0.00000000 0 0 6742 1037
#FIG4B
enrichment.plotter(gene.hic.filt, "weighted_Z.ALLvar.H", "adj.P.Val", "FDR for Weighted p-val Combine of Hi-C Contacts Overlapping Gene", xmax=1) #FIG4B
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.1073826 0.0951829 0.16566657 6094 931 627 112
102 0.025 0.02 0.1323106 0.1209428 0.24639993 5920 905 801 138
103 0.025 0.03 0.1591563 0.1448995 0.17427317 5762 877 959 166
104 0.025 0.04 0.1792905 0.1664091 0.24774781 5616 856 1105 187
105 0.025 0.05 0.2032598 0.1846986 0.10579031 5499 831 1222 212
106 0.025 0.06 0.2224353 0.2005410 0.06338830 5396 811 1325 232
107 0.025 0.07 0.2310642 0.2153529 0.19837106 5290 802 1431 241
108 0.025 0.08 0.2540748 0.2315817 0.07008123 5188 778 1533 265
109 0.025 0.09 0.2713327 0.2461360 0.04639849 5093 760 1628 283
110 0.025 0.10 0.2905081 0.2592736 0.01485043 5011 740 1710 303
111 0.025 0.11 0.3029722 0.2753735 0.03509085 4899 727 1822 316
112 0.025 0.12 0.3135187 0.2881247 0.05619163 4811 716 1910 327
113 0.025 0.13 0.3202301 0.2982998 0.10363917 4739 709 1982 334
114 0.025 0.14 0.3326942 0.3082174 0.07124345 4675 696 2046 347
115 0.025 0.15 0.3394056 0.3212262 0.18825806 4581 689 2140 354
116 0.025 0.16 0.3518696 0.3323029 0.15956077 4508 676 2213 367
117 0.025 0.17 0.3643337 0.3444101 0.15549829 4427 663 2294 380
118 0.025 0.18 0.3748802 0.3563885 0.19173623 4345 652 2376 391
119 0.025 0.19 0.3844679 0.3633436 0.13624023 4301 642 2420 401
120 0.025 0.20 0.3969319 0.3733900 0.09792680 4236 629 2485 414
121 0.025 0.21 0.4103547 0.3858836 0.08713408 4153 615 2568 428
122 0.025 0.22 0.4189837 0.3956723 0.10507811 4086 606 2635 437
123 0.025 0.23 0.4266539 0.4045595 0.12634723 4025 598 2696 445
124 0.025 0.24 0.4333653 0.4149923 0.20746512 3951 591 2770 452
125 0.025 0.25 0.4467881 0.4234930 0.10898156 3899 577 2822 466
126 0.025 0.26 0.4544583 0.4328954 0.13966930 3834 569 2887 474
127 0.025 0.27 0.4621285 0.4434570 0.20369707 3760 561 2961 482
128 0.025 0.28 0.4736337 0.4545337 0.19426645 3686 549 3035 494
129 0.025 0.29 0.4803452 0.4645801 0.28740313 3615 542 3106 501
130 0.025 0.30 0.4908917 0.4733385 0.23523257 3558 531 3163 512
131 0.025 0.31 0.4985618 0.4841577 0.33345645 3482 523 3239 520
132 0.025 0.32 0.5043145 0.4945904 0.52098840 3407 517 3314 526
133 0.025 0.33 0.5167785 0.5038640 0.38797232 3348 504 3373 539
134 0.025 0.34 0.5234899 0.5117208 0.43305399 3294 497 3427 546
135 0.025 0.35 0.5311601 0.5204791 0.47844162 3234 489 3487 554
136 0.025 0.36 0.5378715 0.5278207 0.50573082 3184 482 3537 561
137 0.025 0.37 0.5455417 0.5350335 0.48521579 3136 474 3585 569
138 0.025 0.38 0.5512943 0.5421175 0.54454660 3087 468 3634 575
139 0.025 0.39 0.5627996 0.5508758 0.42449961 3031 456 3690 587
140 0.025 0.40 0.5685523 0.5580886 0.48526872 2981 450 3740 593
141 0.025 0.41 0.5752637 0.5659454 0.53590184 2927 443 3794 600
142 0.025 0.42 0.5829338 0.5739310 0.54964267 2873 435 3848 608
143 0.025 0.43 0.5925216 0.5829469 0.52198812 2813 425 3908 618
144 0.025 0.44 0.6011505 0.5914477 0.51485525 2756 416 3965 627
145 0.025 0.45 0.6116970 0.5999485 0.42461271 2701 405 4020 638
146 0.025 0.46 0.6232023 0.6079341 0.29304502 2651 393 4070 650
147 0.025 0.47 0.6299137 0.6159196 0.33479652 2596 386 4125 657
148 0.025 0.48 0.6356663 0.6239052 0.41884916 2540 380 4181 663
149 0.025 0.49 0.6500479 0.6322772 0.21322226 2490 365 4231 678
150 0.025 0.50 0.6558006 0.6385884 0.22666331 2447 359 4274 684
151 0.025 0.51 0.6605944 0.6455435 0.29034101 2398 354 4323 689
152 0.025 0.52 0.6730585 0.6534003 0.16184419 2350 341 4371 702
153 0.025 0.53 0.6816874 0.6625451 0.17067444 2288 332 4433 711
154 0.025 0.54 0.6893576 0.6700155 0.16378147 2238 324 4483 719
155 0.025 0.55 0.7008629 0.6789026 0.11027142 2181 312 4540 731
156 0.025 0.56 0.7056568 0.6863730 0.15947943 2128 307 4593 736
157 0.025 0.57 0.7152445 0.6946162 0.12887998 2074 297 4647 746
158 0.025 0.58 0.7219559 0.7022154 0.14372584 2022 290 4699 753
159 0.025 0.59 0.7286673 0.7108449 0.18423758 1962 283 4759 760
160 0.025 0.60 0.7363375 0.7174137 0.15505243 1919 275 4802 768
161 0.025 0.61 0.7468840 0.7246265 0.09059411 1874 264 4847 779
162 0.025 0.62 0.7535954 0.7317105 0.09355014 1826 257 4895 786
163 0.025 0.63 0.7593480 0.7396960 0.12935779 1770 251 4951 792
164 0.025 0.64 0.7651007 0.7465224 0.14868182 1723 245 4998 798
165 0.025 0.65 0.7737296 0.7545080 0.13063361 1670 236 5051 807
166 0.025 0.66 0.7813998 0.7613344 0.11073551 1625 228 5096 815
167 0.025 0.67 0.7909875 0.7690623 0.07733338 1575 218 5146 825
168 0.025 0.68 0.8024928 0.7774343 0.04026349 1522 206 5199 837
169 0.025 0.69 0.8130393 0.7856775 0.02296910 1469 195 5252 848
170 0.025 0.70 0.8187919 0.7921175 0.02504537 1425 189 5296 854
171 0.025 0.71 0.8226270 0.7980422 0.03714207 1383 185 5338 858
172 0.025 0.72 0.8283797 0.8070582 0.06675188 1319 179 5402 864
173 0.025 0.73 0.8379674 0.8143998 0.03927027 1272 169 5449 874
174 0.025 0.74 0.8427613 0.8207110 0.05094645 1228 164 5493 879
175 0.025 0.75 0.8542665 0.8274086 0.01539372 1188 152 5533 891
176 0.025 0.76 0.8571429 0.8339773 0.03432323 1140 149 5581 894
177 0.025 0.77 0.8609779 0.8414477 0.07022857 1086 145 5635 898
178 0.025 0.78 0.8657718 0.8482741 0.09964101 1038 140 5683 903
179 0.025 0.79 0.8686481 0.8571613 0.27485605 972 137 5749 906
180 0.025 0.80 0.8744008 0.8642452 0.32680957 923 131 5798 912
181 0.025 0.81 0.8791946 0.8691396 0.32434551 890 126 5831 917
182 0.025 0.82 0.8820709 0.8760948 0.56252048 839 123 5882 920
183 0.025 0.83 0.8878236 0.8824060 0.59464098 796 117 5925 926
184 0.025 0.84 0.8954938 0.8898764 0.56886945 746 109 5975 934
185 0.025 0.85 0.9022052 0.8973467 0.61649707 695 102 6026 941
186 0.025 0.86 0.9127517 0.9048171 0.37788876 648 91 6073 952
187 0.025 0.87 0.9242570 0.9128027 0.17692208 598 79 6123 964
188 0.025 0.88 0.9319271 0.9188563 0.10945665 559 71 6162 972
189 0.025 0.89 0.9367210 0.9264554 0.19313941 505 66 6216 977
190 0.025 0.90 0.9424736 0.9339258 0.25956552 453 60 6268 983
191 0.025 0.91 0.9491850 0.9406234 0.23519006 408 53 6313 990
192 0.025 0.92 0.9530201 0.9479650 0.47452447 355 49 6366 994
193 0.025 0.93 0.9568552 0.9544049 0.74295051 309 45 6412 998
194 0.025 0.94 0.9645254 0.9612313 0.61277921 264 37 6457 1006
195 0.025 0.95 0.9712368 0.9687017 0.68194799 213 30 6508 1013
196 0.025 0.96 0.9760307 0.9741113 0.75294672 176 25 6545 1018
197 0.025 0.97 0.9817833 0.9802937 0.80080639 134 19 6587 1024
198 0.025 0.98 0.9932886 0.9875064 0.09749270 90 7 6631 1036
199 0.025 0.99 0.9961649 0.9920144 0.15222995 58 4 6663 1039
200 0.025 1.00 1.0000000 1.0000000 0.00000000 0 0 6721 1043
enrichment.plotter(gene.hic.filt, "weighted_Z.ALLvar.C", "adj.P.Val", "FDR for Weighted p-val Combine of Hi-C Contacts Overlapping Gene, Chimp")
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
DEFDR DHICFDR prop.obs prop.exp chisq.p Dneither DE DHiC Dboth
101 0.025 0.01 0.1070395 0.09139992 0.068853196 6142 926 600 111
102 0.025 0.02 0.1340405 0.11633886 0.063214019 5976 898 766 139
103 0.025 0.03 0.1581485 0.13857822 0.056002329 5828 873 914 164
104 0.025 0.04 0.1783992 0.15876077 0.069806446 5692 852 1050 185
105 0.025 0.05 0.2005786 0.17688649 0.035375055 5574 829 1168 208
106 0.025 0.06 0.2189007 0.19346960 0.028910832 5464 810 1278 227
107 0.025 0.07 0.2275796 0.20825299 0.108432406 5358 801 1384 236
108 0.025 0.08 0.2507232 0.22445044 0.032498272 5256 777 1486 260
109 0.025 0.09 0.2709740 0.23961949 0.012356986 5159 756 1583 281
110 0.025 0.10 0.2892960 0.25414578 0.005880942 5065 737 1677 300
111 0.025 0.11 0.2979749 0.26867207 0.024510965 4961 728 1781 309
112 0.025 0.12 0.3066538 0.28242705 0.068081758 4863 719 1879 318
113 0.025 0.13 0.3124397 0.29155418 0.120444739 4798 713 1944 324
114 0.025 0.14 0.3220829 0.30003857 0.103628174 4742 703 2000 334
115 0.025 0.15 0.3326905 0.31147962 0.121550116 4664 692 2078 345
116 0.025 0.16 0.3461909 0.32202083 0.079496527 4596 678 2146 359
117 0.025 0.17 0.3587271 0.33243347 0.058057354 4528 665 2214 372
118 0.025 0.18 0.3751205 0.34567425 0.035150264 4442 648 2300 389
119 0.025 0.19 0.3866924 0.35338732 0.017543076 4394 636 2348 401
120 0.025 0.20 0.3963356 0.36290012 0.017759159 4330 626 2412 411
121 0.025 0.21 0.4069431 0.37446973 0.022229963 4251 615 2491 422
122 0.025 0.22 0.4156220 0.38513948 0.032967045 4177 606 2565 431
123 0.025 0.23 0.4233365 0.39478082 0.046961821 4110 598 2632 439
124 0.025 0.24 0.4358727 0.40416506 0.027734682 4050 585 2692 452
125 0.025 0.25 0.4435873 0.41174958 0.027538008 3999 577 2743 460
126 0.025 0.26 0.4522662 0.42126237 0.032500938 3934 568 2808 469
127 0.025 0.27 0.4599807 0.43141792 0.049857681 3863 560 2879 477
128 0.025 0.28 0.4667310 0.44183057 0.088975524 3789 553 2953 484
129 0.025 0.29 0.4734812 0.45262887 0.156893824 3712 546 3030 491
130 0.025 0.30 0.4840887 0.46149891 0.125033582 3654 535 3088 502
131 0.025 0.31 0.4956606 0.47114025 0.095761121 3591 523 3151 514
132 0.025 0.32 0.5014465 0.48168145 0.181907573 3515 517 3227 520
133 0.025 0.33 0.5120540 0.49145134 0.163873995 3450 506 3292 531
134 0.025 0.34 0.5207329 0.50096413 0.182115992 3385 497 3357 540
135 0.025 0.35 0.5294118 0.51009127 0.192397158 3323 488 3419 549
136 0.025 0.36 0.5342334 0.51690449 0.243557982 3275 483 3467 554
137 0.025 0.37 0.5429122 0.52526032 0.234305586 3219 474 3523 563
138 0.025 0.38 0.5496625 0.53130222 0.215250651 3179 467 3563 570
139 0.025 0.39 0.5593057 0.54017226 0.195493243 3120 457 3622 580
140 0.025 0.40 0.5631630 0.54801388 0.308020850 3063 453 3679 584
141 0.025 0.41 0.5737705 0.55739812 0.268459804 3001 442 3741 595
142 0.025 0.42 0.5795564 0.56433989 0.304004528 2953 436 3789 601
143 0.025 0.43 0.5882353 0.57269572 0.292398396 2897 427 3845 610
144 0.025 0.44 0.5949855 0.58233706 0.393477739 2829 420 3913 617
145 0.025 0.45 0.6046287 0.59107854 0.357864220 2771 410 3971 627
146 0.025 0.46 0.6152363 0.59917727 0.271564608 2719 399 4023 638
147 0.025 0.47 0.6210222 0.60663324 0.324751980 2667 393 4075 644
148 0.025 0.48 0.6287367 0.61563183 0.369405642 2605 385 4137 652
149 0.025 0.49 0.6412729 0.62218794 0.184442385 2567 372 4175 665
150 0.025 0.50 0.6480231 0.63002957 0.209614647 2513 365 4229 672
151 0.025 0.51 0.6528447 0.63787119 0.296965338 2457 360 4285 677
152 0.025 0.52 0.6653809 0.64507006 0.151735391 2414 347 4328 690
153 0.025 0.53 0.6730955 0.65394010 0.174531621 2353 339 4389 698
154 0.025 0.54 0.6808100 0.66113896 0.160817050 2305 331 4437 706
155 0.025 0.55 0.6914176 0.67026610 0.128305720 2245 320 4497 717
156 0.025 0.56 0.6962392 0.67656511 0.155857310 2201 315 4541 722
157 0.025 0.57 0.7078110 0.68466384 0.091549698 2150 303 4592 734
158 0.025 0.58 0.7126326 0.69250546 0.140865172 2094 298 4648 739
159 0.025 0.59 0.7193828 0.70124695 0.182157727 2033 291 4709 746
160 0.025 0.60 0.7280617 0.70728885 0.122942868 1995 282 4747 755
161 0.025 0.61 0.7377049 0.71525903 0.092283471 1943 272 4799 765
162 0.025 0.62 0.7483124 0.72207225 0.046712056 1901 261 4841 776
163 0.025 0.63 0.7589200 0.73029952 0.028298505 1848 250 4894 787
164 0.025 0.64 0.7627772 0.73736984 0.050088057 1797 246 4945 791
165 0.025 0.65 0.7714561 0.74636843 0.050442569 1736 237 5006 800
166 0.025 0.66 0.7762777 0.75279599 0.065152590 1691 232 5051 805
167 0.025 0.67 0.7897782 0.76269443 0.030549193 1628 218 5114 819
168 0.025 0.68 0.7984571 0.77027896 0.022759294 1578 209 5164 828
169 0.025 0.69 0.8081003 0.77773493 0.012912256 1530 199 5212 838
170 0.025 0.70 0.8138862 0.78377684 0.012794712 1489 193 5253 844
171 0.025 0.71 0.8167792 0.78956164 0.023283223 1447 190 5295 847
172 0.025 0.72 0.8235294 0.79778892 0.029609167 1390 183 5352 854
173 0.025 0.73 0.8360656 0.80614475 0.009997835 1338 170 5404 867
174 0.025 0.74 0.8408872 0.81244376 0.013222715 1294 165 5448 872
175 0.025 0.75 0.8495661 0.81810001 0.005465053 1259 156 5483 881
176 0.025 0.76 0.8534233 0.82568454 0.012949356 1204 152 5538 885
177 0.025 0.77 0.8572806 0.83314051 0.028176325 1150 148 5592 889
178 0.025 0.78 0.8659595 0.84021082 0.017067390 1104 139 5638 898
179 0.025 0.79 0.8688525 0.84946651 0.067461266 1035 136 5707 901
180 0.025 0.80 0.8784957 0.85743669 0.041772820 983 126 5759 911
181 0.025 0.81 0.8842816 0.86412135 0.046984011 937 120 5805 917
182 0.025 0.82 0.8881389 0.87157732 0.096419716 883 116 5859 921
183 0.025 0.83 0.8919961 0.87736213 0.135609405 842 112 5900 925
184 0.025 0.84 0.8977821 0.88494665 0.180521043 789 106 5953 931
185 0.025 0.85 0.9026037 0.89253117 0.284116150 735 101 6007 936
186 0.025 0.86 0.9112825 0.90011570 0.217726259 685 92 6057 945
187 0.025 0.87 0.9189971 0.90898573 0.251799329 624 84 6118 953
188 0.025 0.88 0.9267117 0.91489909 0.160143564 586 76 6156 961
189 0.025 0.89 0.9324976 0.92196940 0.195124167 537 70 6205 967
190 0.025 0.90 0.9382835 0.92955393 0.264922383 484 64 6258 973
191 0.025 0.91 0.9450338 0.93726700 0.298730434 431 57 6311 980
192 0.025 0.92 0.9469624 0.94575138 0.911375049 367 55 6375 982
193 0.025 0.93 0.9488910 0.95192184 0.680273268 321 53 6421 984
194 0.025 0.94 0.9566056 0.95963491 0.654391822 269 45 6473 992
195 0.025 0.95 0.9633558 0.96657668 0.598155106 222 38 6520 999
196 0.025 0.96 0.9729990 0.97261859 1.000000000 185 28 6557 1009
197 0.025 0.97 0.9787850 0.97891760 1.000000000 142 22 6600 1015
198 0.025 0.98 0.9893925 0.98675922 0.515072513 92 11 6650 1026
199 0.025 0.99 0.9942141 0.99190127 0.479870616 57 6 6685 1031
200 0.025 1.00 1.0000000 1.00000000 0.000000000 0 0 6742 1037
#Custom function for doing this for the paper:
pap.enrichment.plotter <- function(df, HiC_col, DE_col, xlab, xmax=0.3, i=c(0.01, 0.025, 0.05, 0.075, 0.1), k=seq(0.01, 1, 0.01), significance=FALSE){
enrich.table <- data.frame(DEFDR = c(rep(i[1], 100), rep(i[2], 100), rep(i[3], 100), rep(i[4], 100), rep(i[5], 100)), DHICFDR=rep(k, 5), prop.obs=NA, prop.exp=NA, chisq.p=NA, Dneither=NA, DE=NA, DHiC=NA, Dboth=NA)
for(de.FDR in i){
for(hic.FDR in k){
enrich.table[which(enrich.table$DEFDR==de.FDR&enrich.table$DHICFDR==hic.FDR), 3:9] <- prop.calculator(df[,DE_col], df[,HiC_col], de.FDR, hic.FDR)
}
}
des.enriched <- ggplot(data=enrich.table, aes(x=DHICFDR, y=prop.obs, group=as.factor(DEFDR), color=as.factor(DEFDR))) +geom_line()+ geom_line(aes(y=prop.exp), linetype="dashed", size=0.5) + ggtitle("Enrichment of DC in DE Genes") + xlab(xlab) + ylab("Proportion of DE genes that are DC") + guides(color=guide_legend(title="FDR for DE Genes"))
dhics.enriched <- ggplot(data=enrich.table, aes(x=DHICFDR, y=Dboth/(Dboth+DHiC), group=as.factor(DEFDR), color=as.factor(DEFDR))) + geom_line() + geom_line(aes(y=(((((DE+Dboth)/(Dneither+DE+DHiC+Dboth))*((DHiC+Dboth)/(Dneither+DE+DHiC+Dboth)))*(Dneither+DE+DHiC+Dboth))/(DHiC+Dboth))), linetype="dashed") + ylab("Proportion of DC genes that are DE") +xlab(xlab) + ggtitle("Enrichment of DE in DC Genes") + coord_cartesian(xlim=c(0, xmax), ylim=c(0.05, 0.32)) + guides(color=FALSE)
joint.enriched <- ggplot(data=enrich.table, aes(x=DHICFDR, y=Dboth/(Dneither+DE+DHiC+Dboth), group=as.factor(DEFDR), color=as.factor(DEFDR))) + geom_line() + ylab("Proportion of ALL Genes both DE & DHi-C") + xlab(xlab) + geom_line(aes(y=((DE+Dboth)/(Dneither+DE+DHiC+Dboth))*((DHiC+Dboth)/(Dneither+DE+DHiC+Dboth))), linetype="dashed") + ggtitle("Enrichment of Joint DE & DHi-C in All Genes")
chisq.p <- ggplot(data=enrich.table, aes(x=DHICFDR, y=-log10(chisq.p), group=as.factor(DEFDR), color=as.factor(DEFDR))) + geom_point() + geom_hline(yintercept=-log10(0.05), color="red") + ggtitle("Chi-squared Test P-values") + xlab(xlab) + ylab("-log10(chi-squared p-values)") + coord_cartesian(xlim=c(0, xmax), ylim=c(0, 3.2)) + guides(color=guide_legend(title="DE FDR"))
if(significance==TRUE){
return(chisq.p)
}
else{
return(dhics.enriched)
}
}
#FIG4
FIG4A <- pap.enrichment.plotter(gene.hic.filt, "min_FDR.H", "adj.P.Val", "Minimum FDR of Contacts", xmax=1) #FIG4A
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
FIG4B <- pap.enrichment.plotter(gene.hic.filt, "min_FDR.H", "adj.P.Val", "Minimum FDR of Contacts", xmax=1, significance = TRUE) #FIG4B
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
FIG4C <- pap.enrichment.plotter(gene.hic.filt, "weighted_Z.ALLvar.H", "adj.P.Val", "FDR for Weighted P-val Combination", xmax=1) #FIG4C
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
FIG4D <- pap.enrichment.plotter(gene.hic.filt, "weighted_Z.ALLvar.H", "adj.P.Val", "FDR for Weighted P-val Combination", xmax=1, significance = TRUE) #FIG4D
Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
FIG4 <- plot_grid(FIG4A, FIG4B, FIG4C, FIG4D, labels=c("A", "B", "C", "D"), align="h", rel_widths=c(1, 1.2))
save_plot("~/Desktop/FIG4.png", FIG4, nrow=2, ncol=2)
Now that I’ve seen some enrichment of differential contact (DC) in differential expression (DE) based on my linear modeling results from before, I would like to further quantify this effect. In order to do so, I now extract log2 observed/expected homer-normalized contact frequency values and RPKM expression values for each gene/bin set, so I can look at correlations of these values and assess their explanatory power.
In this section, I proceed to create a function in order to extract the Hi-C interaction frequency values for the different types of summaries I’ve made gene overlaps with above. This allows for operations to be performed separately utilizing different summaries of Hi-C contacts.
#Get a df with the H and C coordinates of the hits, and the IF values from homer. This subset df makes things easier to extract.
contacts <- data.frame(h1=data.filtered$H1, h2=data.filtered$H2, c1=data.filtered$C1, c2=data.filtered$C2, A_21792_HIC=data.filtered$`A-21792_norm`, B_28126_HIC=data.filtered$`B-28126_norm`, C_3649_HIC=data.filtered$`C-3649_norm`, D_40300_HIC=data.filtered$`D-40300_norm`, E_28815_HIC=data.filtered$`E-28815_norm`, F_28834_HIC=data.filtered$`F-28834_norm`, G_3624_HIC=data.filtered$`G-3624_norm`, H_3651_HIC=data.filtered$`H-3651_norm`, stringsAsFactors = FALSE)
#Now ensure first member of a pair is always lower than second:
newH1 <- as.numeric(gsub(".*-", "", contacts$h1))
newH2 <- as.numeric(gsub(".*-", "", contacts$h2))
lower.HID <- ifelse(newH1<newH2, contacts$h1, contacts$h2)
higher.HID2 <- ifelse(newH1<newH2, contacts$h2, contacts$h1)
contacts$hpair <- paste(lower.HID, higher.HID2, sep="_")
newC1 <- as.numeric(gsub(".*-", "", contacts$c1))
newC2 <- as.numeric(gsub(".*-", "", contacts$c2))
lower.CID <- ifelse(newC1<newC2, contacts$c1, contacts$c2)
higher.CID2 <- ifelse(newC1<newC2, contacts$c2, contacts$c1)
contacts$cpair <- paste(lower.CID, higher.CID2, sep="_")
#A function that takes a dataframe (like gene.hic.filt) and two columns from the dataframe to create a pair vector for the given interaction. First ensures the first bin in a pair is always lowest to make this easier. Then extracts the IF values for that vector from the contacts df created above. This provides me with the appropriate Hi-C data values for the different bin classes we're examining here, so that I can later test them with linear modeling to quantify their effect on expression.
IF.extractor <- function(dataframe, col1, col2, contacts, species, strand=FALSE){
new1 <- as.numeric(gsub(".*-", "", dataframe[,col1]))
if(strand==FALSE){#In the case where I'm not worried about strand, I just work with the second column selected.
new2 <- as.numeric(gsub(".*-", "", dataframe[,col2]))
lower1 <- ifelse(new1<new2, dataframe[,col1], dataframe[,col2])
higher2 <- ifelse(new1<new2, dataframe[,col2], dataframe[,col1]) #Fix all the columns first
if(species=="H"){ #Then depending on species create the pair column and merge to contact info.
dataframe[,"hpair"] <- paste(lower1, higher2, sep="_")
finaldf <- left_join(dataframe, contacts[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "hpair")], by="hpair")
}
else if(species=="C"){
dataframe[,"cpair"] <- paste(lower1, higher2, sep="_")
finaldf <- left_join(dataframe, contacts[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "cpair")], by="cpair")
}
}
else if(strand==TRUE){#If dealing with upstream and downstream hits, need to do things separately for genes on the + and - strand.
if(species=="H"){
US <- ifelse(dataframe[,"Hstrand.H"]=="+", dataframe[,"US_bin.H"], dataframe[,"DS_bin.H"]) #Obtain upstream bins depending on strand.
new2 <- as.numeric(gsub(".*-", "", US)) #Now rearrange the pairs to ensure regardless of stream we can find the pair (first mate lower coordinates than 2nd).
lower1 <- ifelse(new1<new2, dataframe[,col1], US)
higher2 <- ifelse(new1<new2, US, dataframe[,col1])
dataframe[,"hpair"] <- paste(lower1, higher2, sep="_")
dataframe[,"USFDR"] <- ifelse(dataframe[,"Hstrand.H"]=="+", dataframe[,"US_FDR.H"], dataframe[,"DS_FDR.H"])
finaldf <- left_join(dataframe, contacts[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "hpair")], by="hpair")
}
else if(species=="C"){
US <- ifelse(dataframe[,"Hstrand.C"]=="+", dataframe[,"US_bin.C"], dataframe[,"DS_bin.C"]) #Obtain upstream bins depending on strand.
new2 <- as.numeric(gsub(".*-", "", US)) #Now rearrange the pairs to ensure regardless of stream we can find the pair (first mate lower coordinates than 2nd).
lower1 <- ifelse(new1<new2, dataframe[,col1], US)
higher2 <- ifelse(new1<new2, US, dataframe[,col1])
dataframe[,"cpair"] <- paste(lower1, higher2, sep="_")
dataframe[,"USFDR"] <- ifelse(dataframe[,"Hstrand.C"]=="+", dataframe[,"US_FDR.C"], dataframe[,"DS_FDR.C"])
finaldf <- left_join(dataframe, contacts[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "cpair")], by="cpair")
}
}
#before finally returning, remove rows where we don't have full Hi-C data.
finaldf <- finaldf[complete.cases(finaldf[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC")]),]
return(finaldf)
}
#Now I can use the IF.extractor function to make a number of different dataframes for actually testing different IF values with the RPKM expression values.
h_minFDR <- IF.extractor(gene.hic.filt, "HID", "min_FDR_bin.H", contacts, "H")
c_minFDR <- IF.extractor(gene.hic.filt, "CID", "min_FDR_bin.C", contacts, "C")
h_maxB <- IF.extractor(gene.hic.filt, "HID", "max_B_bin.H", contacts, "H")
c_maxB <- IF.extractor(gene.hic.filt, "CID", "max_B_bin.C", contacts, "C")
h_US <- IF.extractor(gene.hic.filt, "HID", "US_bin.H", contacts, "H", TRUE)
c_US <- IF.extractor(gene.hic.filt, "CID", "US_bin.C", contacts, "C", TRUE)
#Write these out so they can be permuted upon on midway2.
fwrite(h_minFDR, "~/Desktop/Hi-C/HiC_covs/h_minFDR", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(c_minFDR, "~/Desktop/Hi-C/HiC_covs/c_minFDR", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(h_maxB, "~/Desktop/Hi-C/HiC_covs/h_maxB", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(c_maxB, "~/Desktop/Hi-C/HiC_covs/c_maxB", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(h_US, "~/Desktop/Hi-C/HiC_covs/h_US", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(c_US, "~/Desktop/Hi-C/HiC_covs/c_US", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
#Now the same thing, but subsetting down to ONLY the genes that show evidence for DE at 5% FDR.
h_minFDR_DE <- filter(h_minFDR, adj.P.Val<=0.05)
c_minFDR_DE <- filter(c_minFDR, adj.P.Val<=0.05)
h_maxB_DE <- filter(h_maxB, adj.P.Val<=0.05)
c_maxB_DE <- filter(c_maxB, adj.P.Val<=0.05)
h_US_DE <- filter(h_US, adj.P.Val<=0.05)
c_US_DE <- filter(c_US, adj.P.Val<=0.05)
fwrite(h_minFDR_DE, "~/Desktop/Hi-C/HiC_covs/h_minFDR_DE", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(c_minFDR_DE, "~/Desktop/Hi-C/HiC_covs/c_minFDR_DE", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(h_maxB_DE, "~/Desktop/Hi-C/HiC_covs/h_maxB_DE", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(c_maxB_DE, "~/Desktop/Hi-C/HiC_covs/c_maxB_DE", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(h_US_DE, "~/Desktop/Hi-C/HiC_covs/h_US_DE", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
fwrite(c_US_DE, "~/Desktop/Hi-C/HiC_covs/c_US_DE", quote = FALSE, sep = "\t", na = "NA", col.names = TRUE)
Now I use these data frames and join each again on the RPKM table by genes, in order to obtain RPKM expression values for each gene-Hi-C bin set.
In this next section I join each of the data frames created above on the RPKM table by genes, in order to have a merged table with expression values and contact frequencies for each gene-Hi-C bin set. I move then to explore correlations between the values.
#Join all of the previously-made Hi-C interaction frequency tables to the RPKM table by genes, to obtain RPKM values in concert with contact frequency values.
RPKM <- as.data.frame(weighted.data$E)
RPKM$genes <- rownames(RPKM)
hmin <- left_join(h_minFDR, RPKM, by="genes")
cmin <- left_join(c_minFDR, RPKM, by="genes")
hmaxB <- left_join(h_maxB, RPKM, by="genes")
cmaxB <- left_join(c_maxB, RPKM, by="genes")
hUS <- left_join(h_US, RPKM, by="genes")
cUS <- left_join(c_US, RPKM, by="genes")
#Get the same thing filtered for only DE genes.
hminDE <- filter(hmin, adj.P.Val<=0.05)
cminDE <- filter(cmin, adj.P.Val<=0.05)
hmaxBDE <- filter(hmaxB, adj.P.Val<=0.05)
cmaxBDE <- filter(cmaxB, adj.P.Val <=0.05)
hUSDE <- filter(hUS, adj.P.Val<=0.05)
cUSDE <- filter(cUS, adj.P.Val<=0.05)
hmin_noDE <- filter(hmin, adj.P.Val>0.05) #Pull out specific set of non-DE hits.
#Extract contacts and expression for the contact with the lowest FDR from linear modeling.
mycontacts <- hmin[,61:68]
myexprs <- hmin[,69:76]
DEcontacts <- hminDE[,61:68]
DEexprs <- hminDE[,69:76]
nonDEcontacts <- hmin_noDE[,61:68]
nonDEexprs <- hmin_noDE[,69:76]
#Extract contacts and expression stratified by mean interspecies expression quantiles.
quant1contactexpr <- filter(hmin, AveExpr<=quantile(hmin$AveExpr)[2]) %>% select(., A_21792_HIC, B_28126_HIC, C_3649_HIC, D_40300_HIC, E_28815_HIC, F_28834_HIC, G_3624_HIC, H_3651_HIC, "A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")
quant2contactexpr <- filter(hmin, AveExpr>quantile(hmin$AveExpr)[2]&AveExpr<=quantile(hmin$AveExpr)[3]) %>% select(., A_21792_HIC, B_28126_HIC, C_3649_HIC, D_40300_HIC, E_28815_HIC, F_28834_HIC, G_3624_HIC, H_3651_HIC, "A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")
quant3contactexpr <- filter(hmin, AveExpr>quantile(hmin$AveExpr)[3]&AveExpr<=quantile(hmin$AveExpr)[4]) %>% select(., A_21792_HIC, B_28126_HIC, C_3649_HIC, D_40300_HIC, E_28815_HIC, F_28834_HIC, G_3624_HIC, H_3651_HIC, "A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")
quant4contactexpr <- filter(hmin, AveExpr>quantile(hmin$AveExpr)[4]&AveExpr<=quantile(hmin$AveExpr)[5]) %>% select(., A_21792_HIC, B_28126_HIC, C_3649_HIC, D_40300_HIC, E_28815_HIC, F_28834_HIC, G_3624_HIC, H_3651_HIC, "A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")
#Function for calculating gene-wise correlations.
cor.calc <- function(hicdata, exprdata){
cor.exp <- vector(length=nrow(hicdata))
for(i in 1:nrow(hicdata)){
cor.exp[i] <- cor(as.numeric(hicdata[i,]), as.numeric(exprdata[i,]))
}
return(cor.exp)
}
#Calculate correlations for different sets! I've commented out the species-specific calculations here because they just aren't that interesting. If you look at DE dynamics within each species you do NOT get a gorgeous bimodal as you do for across--looks more uniform and messy.
fullcors <- data.frame(cor=cor.calc(mycontacts, myexprs), type="all")
fullDEcors <- data.frame(cor=cor.calc(DEcontacts, DEexprs), type="DE")
fullnoDEcors <- data.frame(cor=cor.calc(nonDEcontacts, nonDEexprs), type="non-DE")
quant1cor <- data.frame(cor=cor.calc(quant1contactexpr[,1:8], quant1contactexpr[,9:16]), type="quant1")
quant2cor <- data.frame(cor=cor.calc(quant2contactexpr[,1:8], quant2contactexpr[,9:16]), type="quant2")
quant3cor <- data.frame(cor=cor.calc(quant3contactexpr[,1:8], quant3contactexpr[,9:16]), type="quant3")
quant4cor <- data.frame(cor=cor.calc(quant4contactexpr[,1:8], quant4contactexpr[,9:16]), type="quant4")
# humcors <- data.frame(cor=cor.calc(mycontacts[,c(1:2, 5:6)], myexprs[,c(1:2, 5:6)]), type="h_ALL")
# humDEcors <- data.frame(cor=cor.calc(DEcontacts[,c(1:2, 5:6)], DEexprs[,c(1:2, 5:6)]), type="h_DE")
# humnoDEcors <- data.frame(cor=cor.calc(nonDEcontacts[,c(1:2, 5:6)], nonDEexprs[,c(1:2, 5:6)]), type="h_non-DE")
# chimpcors <- data.frame(cor=cor.calc(mycontacts[,c(3:4, 7:8)], myexprs[,c(3:4, 7:8)]), type="c_ALL")
# chimpDEcors <- data.frame(cor=cor.calc(DEcontacts[,c(3:4, 7:8)], DEexprs[,c(3:4, 7:8)]), type="c_DE")
# chimpnoDEcors <- data.frame(cor=cor.calc(nonDEcontacts[,c(3:4, 7:8)], nonDEexprs[,c(3:4, 7:8)]), type="c_non-DE")
#Combine these dfs to plot in one gorgeous ggplot!
ggcors <- rbind(fullcors, fullDEcors, fullnoDEcors, quant1cor, quant2cor, quant3cor, quant4cor)#, humcors, humDEcors, humnoDEcors, chimpcors, chimpDEcors, chimpnoDEcors)
#First without the expression quantiles, then with.
ggplot(data=filter(ggcors, type=="all"|type=="DE"|type=="non-DE")) + stat_density(aes(x=cor, group=type, color=type, y=..scaled..), position="identity", geom="line") + ggtitle("Correlation b/t RPKM Expression and Hi-C Contact Frequency") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + scale_color_manual("Gene Set", values=c("red", "blue", "green"), labels=c("All genes", "DE genes", "non-DE genes"))
ggplot(data=ggcors) + stat_density(aes(x=cor, group=type, color=type, y=..scaled..), position="identity", geom="line") + ggtitle("Correlation b/t RPKM Expression and Hi-C Contact Frequency") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + scale_color_manual("Gene Set", values=c("red", "blue", "green", "purple", "orange", "yellow", "brown"), labels=c("All genes", "DE genes", "non-DE genes", "expression.quant1", "expression.quant2", "expression.quant3", "expression.quant4"))
ggplot(data=filter(ggcors, type=="DE"|type=="non-DE")) + stat_density(aes(x=cor, group=type, color=type, y=..scaled..), position="identity", geom="line") + ggtitle("Correlations between Expression and Hi-C Contact Frequency") + xlab("Pearson Correlations, Expression and Hi-C Contact Frequency") + ylab("Density") + scale_color_manual("Gene Set", values=c("red", "blue"), labels=c("DE genes", "non-DE genes")) + theme(plot.title=element_text(hjust=0.2))
###This looks great, showing strong bimodal distribution for the DE genes and broader distributions with a peak at 0 for the non-DE set and the full set. Also see no difference in terms of the distribution of correlations on individual sets of genes stratified by quantiles of expression (mean b/t species). Note that I've done that on just DE genes as well and see slightly shorter peaks on the bimodality for the middle quantiles of expression, which makes sense (extreme unlikely to be as severe b/t species). Also, to assess if this is legit at all, I now re-run this correlation analysis after permuting the Hi-C values. I shuffle sample IDs on a gene-by-gene basis to accomplish this:
cor.permuter <- function(hicdata, exprdata, nperm){
result <- data.frame(cor=NA, type=rep(1:nperm, each=nrow(exprdata)))
for(perm in 1:nperm){
permute <- hicdata
for(row in 1:nrow(hicdata)){
permute[row,] <- sample(hicdata[row,])
}
myindices <- which(result$type==perm)
result[myindices,1] <- cor.calc(permute, exprdata)
}
return(result)
}
#Just do it with 10 permutations to see the general effect quickly:
full.perm <- cor.permuter(mycontacts, myexprs, 10)
DE.perm <- cor.permuter(DEcontacts, DEexprs, 10)
nonDE.perm <- cor.permuter(nonDEcontacts, nonDEexprs, 10)
#Now visualize.
#FIGS4
ggplot(data=filter(ggcors, type=="all"|type=="DE"|type=="non-DE")) + stat_density(aes(x=cor, group=type, color=type, y=..scaled..), position="identity", geom="line") + stat_density(data=full.perm, aes(x=cor, group=type), geom="line", linetype="dotted", position="identity") + stat_density(data=DE.perm, aes(x=cor, group=type), geom="line", linetype="dashed", position="identity") + stat_density(data=nonDE.perm, aes(x=cor, group=type), geom="line", linetype="twodash", position="identity") + ggtitle("Correlation b/t RPKM Expression and Hi-C Contact Frequency") + xlab("Pearson Correlations, RPKM Expression & Hi-C Contact Frequency") + ylab("Density") + scale_color_manual("Gene Set", values=c("red", "blue", "green"), labels=c("All genes", "DE genes", "non-DE genes")) #FIGS4
#See that permuted datasets have a tighter correlation distribution with a strong peak at 0. Reassuring. In the future I will repeat this analysis on Midway2, running 10000 permutations to see the full range of permuted data possible.
####Deprecated, for use in species-specific analysis.
# plotfull <- data.frame(fullcors=fullcors, Hcors=humcors, Ccors=chimpcors)
# plotDE <- data.frame(fullcors=fullDEcors[,1], Hcors=humDEcors[,1], Ccors=chimpDEcors[,1])
#
# ggplot(data=plotfull) + stat_density(aes(x=fullcors), geom="line") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + ggtitle("Correlations between Hi-C and Expression, all genes")
# ggplot(data=plotfull) + stat_density(aes(x=Hcors), geom="line") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + ggtitle("Correlations between Hi-C and Expression, all genes, Humans")
# ggplot(data=plotfull) + stat_density(aes(x=Ccors), geom="line") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + ggtitle("Correlations between Hi-C and Expression, all genes, Chimps")
#
# ggplot(data=plotDE) + stat_density(aes(x=fullcors), geom="line") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + ggtitle("Correlations between Hi-C and Expression, DE genes")
# ggplot(data=plotDE) + stat_density(aes(x=Hcors), geom="line") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + ggtitle("Correlations between Hi-C and Expression, DE genes, Humans")
# ggplot(data=plotDE) + stat_density(aes(x=Ccors), geom="line") + xlab("Pearson Correlations b/t RPKM Expression and Hi-C Contact Frequency") + ylab("Density") + ggtitle("Correlations between Hi-C and Expression, DE genes, Chimps")
Now that I’ve seen some nice effects in the correlation between expression and contact for different sets of genes, and for permutations on those sets, I move to see what kind of quantitiative explanatory power differential contact (DC) might actually have for differential expression (DE).
In this next section I “regress out” the effect of Hi-C contacts from their overlapping genes’ RPKM expression values, comparing a linear model run on the base values to one run on the residuals of expression after regressing out Hi-C data. Comparing the p-values before and after this regression can give some sense of whether the DE is being driven by differential Hi-C contacts (DC).
###A function to calculate gene-wise correlations between Hi-C data and expression data, but for spearman correlations. Pearson correlation calculator function is in the chunk above.
cor.calc.spear <- function(hicdata, exprdata){
cor.exp <- vector(length=nrow(hicdata))
for(i in 1:nrow(hicdata)){
cor.exp[i] <- cor(as.numeric(hicdata[i,]), as.numeric(exprdata[i,]), method="spearman")
}
return(cor.exp)
}
###A function to permute a Hi-C df by going gene-by-gene (row-by-row) and shuffling all sample IDs.
shuffler <- function(hicdata){
for(row in 1:nrow(hicdata)){
hicdata[row,] <- sample(hicdata[row,])
}
return(hicdata)
}
###A function that runs a linear model, both with and without Hi-C corrected expression values, and returns a dataframe of hit classes (DE or not before and after correction). Also spits back out p-values before and after correction, as well as correlations between Hi-C data and expression data. Since the former is a one-row data frame of 4 points and the latter is a 3-column data frame with the number of genes rows, returns a list.
lmcorrect <- function(voom.obj, exprs, cov_matrix, meta_df){
mygenes <- cov_matrix$genes #Pull out the relevant genes here; used for subsetting the voom.obj in a bit.
cov_matrix <- cov_matrix[, -9] #Remove genes from the cov matrix
hic_present <- sapply(1:nrow(cov_matrix), function(i) !any(is.na(cov_matrix[i,]))) #First, remove any rows w/ missing Hi-C data.
exprs <- data.matrix(exprs[hic_present,]) #Filter expression with this
cov_matrix <- data.matrix(cov_matrix[hic_present,]) #Filter Hi-C data with this
#Now, prepare to run the actual models. First run a model w/ Hi-C as a covariate to evaluate
SP <- factor(meta_df$SP,levels = c("H","C"))
design <- model.matrix(~0+SP)
colnames(design) <- c("Human", "Chimp")
resid_hic <- array(0, dim=c(nrow(exprs), ncol(cov_matrix))) #Initialize a dataframe for storing the residuals.
for(i in 1:nrow(exprs)){#Loop through rows of the expression df, running linear modeling w/ Hi-C to obtain residuals.
resid_hic[i,] <- lm(exprs[i,]~cov_matrix[i,])$resid
}
mycon <- makeContrasts(HvC = Human-Chimp, levels = design)
#Filter the voom object to only contain genes that had Hi-C information here.
good.indices <- which(rownames(voom.obj$E) %in% mygenes)
voom.obj <- voom.obj[good.indices,]
#Now, replace the RPKM values in the voom object for after linear modeling with the residuals.
voom.obj.after <- voom.obj
voom.obj.after$E <- resid_hic
lmFit(voom.obj, design=design) %>% eBayes(.) %>% contrasts.fit(., mycon) %>% eBayes(.) %>% topTable(., coef = 1, adjust.method = "BH", number = Inf, sort.by="none") -> fit_before
lmFit(voom.obj.after, design=design) %>% eBayes(.) %>% contrasts.fit(., mycon) %>% eBayes(.) %>% topTable(., coef = 1, adjust.method = "BH", number = Inf, sort.by="none") -> fit_after
result.DE.cats <- data.frame(DEneither=sum(fit_before$adj.P.Val>0.05&fit_after$adj.P.Val>0.05), DEbefore=sum(fit_before$adj.P.Val<=0.05&fit_after$adj.P.Val>0.05), DEafter=sum(fit_before$adj.P.Val>0.05&fit_after$adj.P.Val<=0.05), DEboth=sum(fit_before$adj.P.Val<=0.05&fit_after$adj.P.Val<=0.05))
result.DE.stats <- data.frame(cor.pear=cor.calc(cov_matrix, exprs), cor.spear=cor.calc.spear(cov_matrix, exprs), pval.before=fit_before$adj.P.Val, pval.after=fit_after$adj.P.Val)
result <- list("categories"=result.DE.cats, "stats"=result.DE.stats)
return(result)
}
#Proceed to use the function on the different dataframes I've created with different sets of overlapping Hi-C contacts:
h_minFDR_pvals <- lmcorrect(weighted.data, hmin[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], hmin[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], meta.data)
c_minFDR_pvals <- lmcorrect(weighted.data, cmin[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], cmin[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], meta.data)
h_maxB_pvals <- lmcorrect(weighted.data, hmaxB[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], hmaxB[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], meta.data)
c_maxB_pvals <- lmcorrect(weighted.data, cmaxB[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], cmaxB[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], meta.data)
h_US_pvals <- lmcorrect(weighted.data, hUS[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], hUS[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], meta.data)
c_US_pvals <- lmcorrect(weighted.data, cUS[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], cUS[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], meta.data)
#I now proceed to visualize the difference in p-values for expression before and after "correcting" for the Hi-C data. I am hoping to see many hits in the bottom right quadrant of the following plots, indicating genes that showed up as DE before Hi-C correction, but not after. I also expect that p-values falling farther away from the null line of expectation for p-values being identical between models will have higher correlations, which I color here for the pearson correlation (results look similar for spearman).
ggplot(data=h_minFDR_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after), color=abs(cor.pear))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE before vs. after regressing out Hi-C contact frequency") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression")
ggplot(data=h_minFDR_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE Before vs. After Regressing out Hi-C Contact Frequency") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression)") + theme(plot.title=element_text(hjust=1)) #Slightly cleaner, for MindBytes 2018 poster
ggplot(data=h_maxB_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after), color=abs(cor.pear))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE before vs. after regressing out Hi-C max beta (H)") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression")
ggplot(data=h_US_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after), color=abs(cor.spear))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE before vs. after regressing out Hi-C US bin FDR (H)") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression")
ggplot(data=c_minFDR_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after), color=abs(cor.pear))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE before vs. after regressing out Hi-C min. FDR (C)") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression")
ggplot(data=c_maxB_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after), color=abs(cor.pear))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE before vs. after regressing out Hi-C max beta (C)") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression")
ggplot(data=c_US_pvals$stats, aes(x=-log10(pval.before), y=-log10(pval.after), color=abs(cor.pear))) + geom_point(size=0.01) + geom_hline(yintercept=-log10(0.05), color="red") + geom_vline(xintercept=-log10(0.05), color="red") + geom_abline(slope=1, intercept=0, color="green", linetype="dashed") + ggtitle("Evidence for DE before vs. after regressing out Hi-C US bin FDR (C)") + xlab("-log10(p-value of DE before Hi-C regression)") + ylab("-log10(p-value of DE after Hi-C regression")
Critically, here I see that “regressing out” Hi-C data and trying to model expression again moves many genes from being significantly differentially expressed (at FDR of 5%) to no longer showing differential expression. Seeing most of the hits in the bottom right corner of these visualizations is what confirms this. Reassuringly, I also see stronger correlations between RPKM expression values and normalized Hi-C interaction frequency values for points farther away from the diagonal green line of expectation. However, since this is not a statistical test, I have no assessment of significance. To accomplish this, I compare these results to running the same kind of analysis on permuted data in the next section.
Here, I visualize the difference in classes of DE hits changing (either gaining, losing, or maintaining DE status) after “regressing out” the effect of Hi-C from expression. I show distributions of these percentages across 10000 permutations of the data, as compared to the observed percentages.
##Visualization of the permutations!
perm.vis <- function(categories.df.file, df, metadata, hictype, DE=FALSE){
if(DE==TRUE){expected1 <- readRDS(paste("~/Desktop/Hi-C/HiC_covs/results/new_results/DE/batch1/", categories.df.file, sep=""))
expected2 <- readRDS(paste("~/Desktop/Hi-C/HiC_covs/results/new_results/DE/batch2/", categories.df.file, sep=""))
observed <- lmcorrect(weighted.data, df[which(df$adj.P.Val<=0.05), c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], df[which(df$adj.P.Val<=0.05), c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], metadata)
}
else{expected1 <- readRDS(paste("~/Desktop/Hi-C/HiC_covs/results/new_results/batch1/", categories.df.file, sep=""))
expected2 <- readRDS(paste("~/Desktop/Hi-C/HiC_covs/results/new_results/batch2/", categories.df.file, sep=""))
observed <- lmcorrect(weighted.data, df[,c("A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651")], df[,c("A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC", "genes")], metadata)
}
expected <- c(expected1[1:5000], expected2[1:5000])
expected[[10001]] <- rbind(expected1[[5001]], expected2[[5001]])
expected[[10001]] <- ((expected[[10001]])/sum(expected[[10001]][1,]))*100
for(i in 1:(length(expected)-1)){
expected[[i]]$type <- i
expected[[i]]$data <- "expected"
}
observed$categories <- (observed$categories/sum(observed$categories[1,]))*100
observed$stats$type <- 10001
observed$stats$data <- "observed"
expected[[10001]]$data <- "permutations"
expected[[10001]][10001,] <- c(observed$categories, "observation")
deneitherbox <- ggplot(data=expected[[10001]], aes(x="", y=DEneither)) + geom_boxplot(aes(color="Expected"), show.legend=FALSE) + geom_point(aes(y=observed$categories$DEneither, color="Observed"), size=3) + ggtitle("Percent of genes with no evidence for DE regardless") + ylab("% genes not DE in either") + xlab(paste("10000 Permutations of Hi-C data ", hictype, sep="")) + scale_color_manual(values=c("blue", "red"), guide=FALSE) +guides(color=guide_legend("Data", override.aes = list(shape=c(16, 16))))
debeforebox <- ggplot(data=expected[[10001]], aes(x="", y=DEbefore)) + geom_boxplot(aes(color="Expected"), show.legend=FALSE) + geom_point(aes(y=observed$categories$DEbefore, color="Observed"), size=3) + ggtitle("Percent of genes with reduced evidence for DE after Hi-C correction") + ylab("% genes DE before, but not after, Hi-C correction") + xlab(paste("10000 Permutations of Hi-C data ", hictype, sep="")) + scale_color_manual(values=c("blue", "red"), guide=FALSE) +guides(color=guide_legend("Data", override.aes = list(shape=c(16, 16))))
deafterbox <- ggplot(data=expected[[10001]], aes(x="", y=DEafter)) + geom_boxplot(aes(color="Expected"), show.legend=FALSE) + geom_point(aes(y=observed$categories$DEafter, color="Observed"), size=3) + ggtitle("Percent of genes with increased evidence for DE after Hi-C correction") + ylab("% genes DE after, but not before, Hi-C correction") + xlab(paste("10000 Permutations of Hi-C data ", hictype, sep="")) + scale_color_manual(values=c("blue", "red"), guide=FALSE) +guides(color=guide_legend("Data", override.aes = list(shape=c(16, 16))))
debothbox <- ggplot(data=expected[[10001]], aes(x="", y=DEboth)) + geom_boxplot(aes(color="Expected"), show.legend=FALSE) + geom_point(aes(y=observed$categories$DEboth, color="Observed"), size=3) + ggtitle("Percent of genes with evidence for DE regardless") + ylab("% genes DE in both") + xlab(paste("10000 Permutations of Hi-C data ", hictype, sep="")) + scale_color_manual(values=c("blue", "red"), guide=FALSE) +guides(color=guide_legend("Data", override.aes = list(shape=c(16, 16))))
print(deneitherbox)
print(debeforebox)
print(deafterbox)
print(debothbox)
}
perm.vis("permout_h_minFDR", hmin, meta.data, "(min FDR contact, Humans)")
perm.vis("permout_h_maxB", hmaxB, meta.data, "(max beta contact, Humans)")
perm.vis("permout_h_US", hUS, meta.data, "(upstream contact, Humans)")
perm.vis("permout_c_minFDR", cmin, meta.data, "(min FDR contact, Chimps)")
perm.vis("permout_c_maxB", cmaxB, meta.data, "(max beta contact, Chimps)")
perm.vis("permout_c_US", cUS, meta.data, "(upstream contact, Chimps)")
perm.vis("permout_h_minFDR_DE", hminDE, meta.data, "(min FDR contact, Humans)", DE=TRUE)
perm.vis("permout_h_maxB_DE", hmaxBDE, meta.data, "(max beta contact, Humans)", DE=TRUE)
perm.vis("permout_h_US_DE", hUSDE, meta.data, "(upstream contact, Humans)", DE=TRUE)
perm.vis("permout_c_minFDR_DE", cminDE, meta.data, "(min FDR contact, Chimps)", DE=TRUE)
perm.vis("permout_c_maxB_DE", cmaxBDE, meta.data, "(max beta contact, Chimps)", DE=TRUE)
perm.vis("permout_c_US_DE", cUSDE, meta.data, "(upstream contact, Chimps)", DE=TRUE)
The permutations reveal that the observed data does appear pretty significant: it falls well outside the range of the distributions in each of the categories. The most important is showing that many more genes lose their DE status after regressing out Hi-C data in the observed case as compared to the permutations.
Here, I look at A/B compartment calls from HOMER and see if DE and/or DC genes are more likely to be in different compartments between species.
###Start with A/B compartments; much simpler. Read them in!
A_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/A-21792.PC1.txt", header = TRUE, data.table=FALSE)
B_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/B-28126.PC1.txt", header = TRUE, data.table=FALSE)
E_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/E-28815.PC1.txt", header = TRUE, data.table=FALSE)
F_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/F-28834.PC1.txt", header = TRUE, data.table=FALSE)
C_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/C-3649.PC1.txt", header = TRUE, data.table=FALSE)
D_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/D-40300.PC1.txt", header = TRUE, data.table=FALSE)
G_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/G-3624.PC1.txt", header = TRUE, data.table=FALSE)
H_AB <- fread("~/Desktop/Hi-C/AB_compartments/50kb/H-3651.PC1.txt", header = TRUE, data.table=FALSE)
colnames(A_AB) <- colnames(B_AB) <- colnames(C_AB) <- colnames(D_AB) <- colnames(E_AB) <- colnames(F_AB) <- colnames(G_AB) <- colnames(H_AB) <- c("peakID", "chr", "start", "end", "strand", "PC1")
#Interestingly, within a species they're not all the same length (which doesn't make much sense as I thought these were identical genomes). A and F are females, B and E are males, so that's why B and E are larger--just remove chrY data to deal with that. But why aren't B and E identical, or A and F identical? Perhaps this is because not every individual gets enough reads at each of the regions to call compartments? Not really clear to me...for now, use ALL regions I can by combining these DFs, taking the mean PC1 value across individuals for those hits which ARE repeated (should be the vast majority).
humans.AB <- rbind(A_AB, B_AB, E_AB, F_AB)
humans.AB <- group_by(humans.AB, peakID) %>% summarise(chr=unique(chr), start=unique(start), end=unique(end), strand=unique(strand), PC1=mean(PC1))
chimps.AB <- rbind(C_AB, D_AB, G_AB, H_AB)
chimps.AB <- group_by(chimps.AB, peakID) %>% summarise(chr=unique(chr), start=unique(start), end=unique(end), strand=unique(strand), PC1=mean(PC1))
#Now, I want to assign a compartment (A or B) to each of these loci, so that they can then be examined WRT the Hi-C bins that might overlap them.
humans.AB$compartment <- ifelse(humans.AB$PC1>0, "A", "B")
chimps.AB$compartment <- ifelse(chimps.AB$PC1>0, "A", "B")
#Really, only want to look at genes we can actually call DE on based on expression levels, so only care about those genes in detable object. Can extract their exact positions from hgene.hic and cgene.hic and create peakIDs for them that match the AB compartment peak names. Basically, I just want to call which of the AB compartment peak bins they'll fall into. Note that those compartments were made 50kb at a time, so I need to round to lower 50k (or higher, depending on strand and proximity to boundary!) to figure out which compartment a gene belongs to.
colnames(hgene.hic) <- c("Hchr", "Hstart", "Hend", "genes", "Hplace", "Hstrand", "binchr", "binstart", "binend", "bindist.TSS")
colnames(cgene.hic) <- c("Cchr", "Cstart", "Cend", "genes", "Cplace", "Cstrand", "binchr", "binstart", "binend", "bindist.TSS")
#Add in lengths for appropriate PCA bin assignment in next part.
colnames(human_lengths) <- colnames(chimp_lengths) <- c("genes", "length")
hgene.PCA <- left_join(hgene.hic, human_lengths, by="genes")
cgene.PCA <- left_join(cgene.hic, chimp_lengths, by="genes")
#Assign genes to A/B compartment bins based on their position along the 50kb interval and their gene length, as well as strand orientation (trying to get the gene body in the actual 50kb interval).
hgene.PCA$HID <- paste(hgene.PCA$Hchr, ifelse(hgene.PCA$Hstrand=="+", ifelse(hgene.PCA$Hstart%%50000>=(50000-hgene.PCA$length), as.integer(50000*ceiling(hgene.PCA$Hstart/50000)), as.integer(50000*floor(hgene.PCA$Hstart/50000))), ifelse(hgene.PCA$Hstart%%50000<=hgene.PCA$length, as.integer((50000*floor(hgene.PCA$Hstart/50000)-50000)), as.integer(50000*floor(hgene.PCA$Hstart/50000)))), sep="-")
cgene.PCA$CID <- paste(cgene.PCA$Cchr, ifelse(cgene.PCA$Cstrand=="+", ifelse(cgene.PCA$Cstart%%50000>=(50000-cgene.PCA$length), as.integer(50000*ceiling(cgene.PCA$Cstart/50000)), as.integer(50000*floor(cgene.PCA$Cstart/50000))), ifelse(cgene.PCA$Cstart%%50000<=cgene.PCA$length, as.integer((50000*floor(cgene.PCA$Cstart/50000)-50000)), as.integer(50000*floor(cgene.PCA$Cstart/50000)))), sep="-")
#hgene.hic$HID <- ifelse(hgene.hic$Hstrand=="+", paste(hgene.hic$Hchr, as.integer(50000*floor(hgene.hic$Hstart/50000)), sep="-"))
#cgene.hic$CID <- paste(cgene.hic$Cchr, as.integer(50000*floor(cgene.hic$Cstart/50000)), sep="-")
hgenes.compartments <- select(hgene.PCA, Hchr, Hstart, genes, Hstrand, HID)
cgenes.compartments <- select(cgene.PCA, Cchr, Cstart, genes, Cstrand, CID)
compartment.genes <- left_join(detable, hgenes.compartments, by="genes") %>% left_join(., cgenes.compartments, by="genes")
#See how many genes fall within 5 kb of one of the boundaries here (at every 50kb interval): Though this was only considering the lower end, not if they fall on the other edge of the boundary (45k modulo remainder is equivalent and comparable #s!). REALLY what I should care about are positive strand genes falling at the upper edge of a boundary (implying their gene body may lie in the next compartment) and negative strand genes falling at the lower edge of a boundary (implying their gene body may lie in the previous compartment). If I have positive strand genes near the beginning or negative strand genes near the end that's not really an issue as their gene bodies should lie entirely within the compartment. It's really just those two specific cases.
sum((compartment.genes$Hstart%%50000)<=5000, na.rm=TRUE)
[1] 1065
sum((compartment.genes$Cstart%%50000)<=5000, na.rm=TRUE)
[1] 1094
#In both species, roughly 1k of the genes we can consider at all fall near (w/in 5kb) one of the A/B boundaries we've established here. Could there be issues with pos/neg strand and how close they are to the boundary? I can check on this...or can also assume it's balanced across both issues. If the results from a naive analysis don't look good I can go back and fix this part of that.
#Now, merge compartment.genes and the humans.AB and chimps.AB compartment assignment DFs!
HAB <- select(humans.AB, peakID, compartment, PC1)
CAB <- select(chimps.AB, peakID, compartment, PC1)
colnames(HAB) <- c("HID", "compartment.H", "PC.H")
colnames(CAB) <- c("CID", "compartment.C", "PC.C")
final.AB <- left_join(compartment.genes, HAB, by="HID") %>% left_join(., CAB, by="CID")
final.AB$comp.diff <- ifelse(final.AB$compartment.H==final.AB$compartment.C, "no", "yes") #Note that 311 of these rows have missing data from at least one species so this determination can't be made? Often b/c of being on a scaffold in chimps.
final.AB$PC.diff <- abs(final.AB$PC.H-final.AB$PC.C)
final.AB <- final.AB[order(final.AB$adj.P.Val),]
final.AB$DE <- ifelse(final.AB$adj.P.Val<=0.05, "yes", "no")
table(final.AB$DE, final.AB$comp.diff, dnn=c("DE?", "Different Compartment b/t Species?"))
Different Compartment b/t Species?
DE? no yes
no 7861 699
yes 2026 175
chisq.test(table(final.AB$DE, final.AB$comp.diff, dnn=c("DE?", "Same Compartment b/t Species?")))
Pearson's Chi-squared test with Yates' continuity correction
data: table(final.AB$DE, final.AB$comp.diff, dnn = c("DE?", "Same Compartment b/t Species?"))
X-squared = 0.081518, df = 1, p-value = 0.7753
#So roughly 7.9% of DE genes are in different compartments between species, whereas ~8.1% of non-DE genes are in different compartments between species...shitty! And not significant!
#Would be nice to know if the differential compartments between species are at least concentrated in those differential contacts b/t species?! Add a column of the DC status between species.
final.AB$DC <- ifelse(final.AB$genes %in% filter(gene.hic.filt, min_FDR.H<=0.05|min_FDR.C<=0.05)$genes, "yes", "no")
table(final.AB$DC, final.AB$comp.diff, dnn=c("DC?", "Different Compartment b/t Species?"))
Different Compartment b/t Species?
DC? no yes
no 9267 831
yes 620 43
chisq.test(table(final.AB$DC, final.AB$comp.diff, dnn=c("DC?", "Different Compartment b/t Species?")))
Pearson's Chi-squared test with Yates' continuity correction
data: table(final.AB$DC, final.AB$comp.diff, dnn = c("DC?", "Different Compartment b/t Species?"))
X-squared = 2.3066, df = 1, p-value = 0.1288
#Again, not significant...can see that the genes which are DC have ~6.5% of them in different compartments b/t species, while the non-DC ones have 8.2% of them in different compartments b/t species. Note that this is also the exact opposite result I was hoping for...
#So a simple classification system is not sufficient for A/B compartment changes between the species. What if I actually took the mean PC value and plotted the difference bewtween species vs decreasing significance? The final.AB df is already ordered by DE p-val, so can do this real quick simply:
ggplot(data=final.AB) + geom_line(aes(x=adj.P.Val, y=PC.diff)) + ggtitle("Absolute Difference of PC1 Ordered by DE Significance") + xlab("Decreasing DE Significance") + ylab("|PC1 Difference b/t Species|")
t.test(filter(final.AB, adj.P.Val<=0.05)$PC.diff, filter(final.AB, adj.P.Val>0.05)$PC.diff)
Welch Two Sample t-test
data: filter(final.AB, adj.P.Val <= 0.05)$PC.diff and filter(final.AB, adj.P.Val > 0.05)$PC.diff
t = -0.41452, df = 3592.6, p-value = 0.6785
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01694799 0.01103238
sample estimates:
mean of x mean of y
0.2271048 0.2300626
ggplot(data=final.AB) + geom_violin(aes(group=DE, x=DE, y=PC.diff), draw_quantiles = c(0.25, 0.5, 0.75)) + ggtitle("Absolute Difference of PC1 | DE") + xlab("DE Status") + ylab("|PC1 Difference b/t Species|")
Warning: Removed 313 rows containing non-finite values (stat_ydensity).
#None of this is particularly encouraging...but what if I did it on DC status? Can just check on binary status here to see if it'll even be significant or not:
t.test(filter(final.AB, DC=="yes")$PC.diff, filter(final.AB, DC=="no")$PC.diff)
Welch Two Sample t-test
data: filter(final.AB, DC == "yes")$PC.diff and filter(final.AB, DC == "no")$PC.diff
t = 1.1368, df = 750.12, p-value = 0.256
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01033890 0.03878608
sample estimates:
mean of x mean of y
0.2428049 0.2285813
#Aaaaand it's not. My inclination is to not spend any more time on the PCA element of this since the results don't seem to be good at all...
#Just a quick check to see if not using absolute difference would be better?
final.AB$PC.hdiff <- final.AB$PC.C-final.AB$PC.H
ggplot(data=final.AB) + geom_line(aes(x=adj.P.Val, y=PC.hdiff)) + ggtitle("Difference of PC1 Ordered by DE Significance") + xlab("Decreasing DE Significance") + ylab("PC1 Difference b/t Species (H-C)")
t.test(filter(final.AB, adj.P.Val<=0.05)$PC.hdiff, filter(final.AB, adj.P.Val>0.05)$PC.hdiff)
Welch Two Sample t-test
data: filter(final.AB, adj.P.Val <= 0.05)$PC.hdiff and filter(final.AB, adj.P.Val > 0.05)$PC.hdiff
t = -0.77796, df = 3544.3, p-value = 0.4366
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.02450552 0.01058275
sample estimates:
mean of x mean of y
-0.03009098 -0.02312960
ggplot(data=final.AB) + geom_violin(aes(group=DE, x=DE, y=PC.hdiff), draw_quantiles = c(0.25, 0.5, 0.75)) + ggtitle("Difference of PC1 | DE") + xlab("DE Status") + ylab("PC1 Difference b/t Species (H-C)")
Warning: Removed 313 rows containing non-finite values (stat_ydensity).
#None of this is particularly encouraging...but what if I did it on DC status? Can just check on binary status here to see if it'll even be significant or not:
t.test(filter(final.AB, DC=="yes")$PC.hdiff, filter(final.AB, DC=="no")$PC.hdiff) #Interestingly, if starting with humans as baseline here, this p-val is ALMOST significant!!!! Same if starting with chimps--just not using absolute value makes this almost significant...
Welch Two Sample t-test
data: filter(final.AB, DC == "yes")$PC.hdiff and filter(final.AB, DC == "no")$PC.hdiff
t = -1.9217, df = 747.95, p-value = 0.05503
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.0609932583 0.0006511334
sample estimates:
mean of x mean of y
-0.05286562 -0.02269456
#Note that I also tried coloring the differences of PC1 plots by DC status and saw lines of both short and long variety, no clear trend of a bigger difference b/t species there, so no point in putting in the work to look at DC on x-axis.
#With the finished AB compartment DFs, should be as easy as just assigning Hi-C bins to A or B! Note that this current paradigm is limited to the genes that had direct overlap with a significant Hi-C bin; there's no reason for this! FIX! gene.hic.filt may not be the best data frame for this to work on then, since it doesn't include position information on all genes--get that!
# ABall <- select(gene.hic.filt, genes, logFC, AveExpr, adj.P.Val, B, HID, min_FDR_bin.H, CID, min_FDR_bin.C)
# hcomps <- select(humans.AB, peakID, compartment)
# ccomps <- select(chimps.AB, peakID, compartment)
# colnames(hcomps) <- c("HID", "compartment.H")
# colnames(ccomps) <- c("CID", "compartment.C")
# left_join(ABall, hcomps, by="HID") %>% left_join(., ccomps, by="CID") -> AB.all
#
# head(gene.hic.filt)
#Exploration of why they're not identically sized, not important. Also show some within-species exploration of the variance in these values; probably also not important. Collapsed.
# good.indices <- which(B_AB$`#peakID` %in% E_AB$`#peakID`)
# test <- B_AB[-good.indices,]
# good.indices <- which(E_AB$`#peakID` %in% B_AB$`#peakID`)
# test2 <- E_AB[-good.indices,]
#See how much within-species variance exists in these, after standardizing them to be identical across individuals from same species (remove X chromosome?). Or just do left joins on one and see what variance is like in that subset--most
# left_join(A_AB, B_AB, by="peakID") %>% left_join(., E_AB, by="peakID") %>% left_join(., F_AB, by="peakID") -> newtest
# newtest <- select(newtest, peakID, PC1.x, PC1.y, PC1.x.x, PC1.y.y)
# colnames(newtest) <- c("peakID", "A", "B", "E", "F")
# newtest$var <- apply(newtest[,-1], 1, var)
# ggplot(data=newtest) + geom_violin(aes(x="", y=var))
# sum(newtest$var>1, na.rm=TRUE) #Returns ~1000, so of about 54k rows there are 1000 with high variance between individuals. Leave it be for now, just something to think about in case this looks wonky later.
#Look at some candidates of the strongest compartment differences real quick to see if anything interesting here?
quantile(final.AB$PC.diff, probs=c(0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99), na.rm = TRUE)
50% 60% 70% 80% 85% 90% 95%
0.1283966 0.1742264 0.2370739 0.3225847 0.3867206 0.4989470 0.7753319
99%
1.7327811
theonepercent <- which(final.AB$PC.diff>=1.7327811)
extremes <- final.AB[theonepercent,]
#I went back with some of this to look...nothing looks great TBH. For what it's worth, at least there are more instances in compartment differences where it's open in humans and closed in chimps than vice versa:
sum(final.AB$comp.diff=="yes"&final.AB$compartment.H=="A"&final.AB$compartment.C=="B", na.rm=TRUE) #Open in humans, closed in chimps
[1] 513
sum(final.AB$comp.diff=="yes"&final.AB$compartment.H=="B"&final.AB$compartment.C=="A", na.rm=TRUE) #Open in chimps, closed in humans
[1] 361
####25kb PCA A/B compartment analysis. Repeat this entire analysis utilizing 25kb as the window size for PCA calls.
###Start with A/B compartments; much simpler. Read them in!
A_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/A-21792.PC1.txt", header = TRUE, data.table=FALSE)
B_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/B-28126.PC1.txt", header = TRUE, data.table=FALSE)
E_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/E-28815.PC1.txt", header = TRUE, data.table=FALSE)
F_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/F-28834.PC1.txt", header = TRUE, data.table=FALSE)
C_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/C-3649.PC1.txt", header = TRUE, data.table=FALSE)
D_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/D-40300.PC1.txt", header = TRUE, data.table=FALSE)
G_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/G-3624.PC1.txt", header = TRUE, data.table=FALSE)
H_AB <- fread("~/Desktop/Hi-C/AB_compartments/25kb/H-3651.PC1.txt", header = TRUE, data.table=FALSE)
colnames(A_AB) <- colnames(B_AB) <- colnames(C_AB) <- colnames(D_AB) <- colnames(E_AB) <- colnames(F_AB) <- colnames(G_AB) <- colnames(H_AB) <- c("peakID", "chr", "start", "end", "strand", "PC1")
#Interestingly, within a species they're not all the same length (which doesn't make much sense as I thought these were identical genomes). A and F are females, B and E are males, so that's why B and E are larger--just remove chrY data to deal with that. But why aren't B and E identical, or A and F identical? Perhaps this is because not every individual gets enough reads at each of the regions to call compartments? Not really clear to me...for now, use ALL regions I can by combining these DFs, taking the mean PC1 value across individuals for those hits which ARE repeated (should be the vast majority).
humans.AB <- rbind(A_AB, B_AB, E_AB, F_AB)
humans.AB <- group_by(humans.AB, peakID) %>% summarise(chr=unique(chr), start=unique(start), end=unique(end), strand=unique(strand), PC1=mean(PC1))
chimps.AB <- rbind(C_AB, D_AB, G_AB, H_AB)
chimps.AB <- group_by(chimps.AB, peakID) %>% summarise(chr=unique(chr), start=unique(start), end=unique(end), strand=unique(strand), PC1=mean(PC1))
#Now, I want to assign a compartment (A or B) to each of these loci, so that they can then be examined WRT the Hi-C bins that might overlap them.
humans.AB$compartment <- ifelse(humans.AB$PC1>0, "A", "B")
chimps.AB$compartment <- ifelse(chimps.AB$PC1>0, "A", "B")
#Really, only want to look at genes we can actually call DE on based on expression levels, so only care about those genes in detable object. Can extract their exact positions from hgene.hic and cgene.hic and create peakIDs for them that match the AB compartment peak names. Basically, I just want to call which of the AB compartment peak bins they'll fall into. Note that those compartments were made 50kb at a time, so I need to round to lower 50k (or higher, depending on strand and proximity to boundary!) to figure out which compartment a gene belongs to.
colnames(hgene.hic) <- c("Hchr", "Hstart", "Hend", "genes", "Hplace", "Hstrand", "binchr", "binstart", "binend", "bindist.TSS")
colnames(cgene.hic) <- c("Cchr", "Cstart", "Cend", "genes", "Cplace", "Cstrand", "binchr", "binstart", "binend", "bindist.TSS")
#Add in lengths for appropriate PCA bin assignment in next part.
colnames(human_lengths) <- colnames(chimp_lengths) <- c("genes", "length")
hgene.PCA <- left_join(hgene.hic, human_lengths, by="genes")
cgene.PCA <- left_join(cgene.hic, chimp_lengths, by="genes")
#Assign genes to A/B compartment bins based on their position along the 50kb interval and their gene length, as well as strand orientation (trying to get the gene body in the actual 50kb interval).
hgene.PCA$HID <- paste(hgene.PCA$Hchr, ifelse(hgene.PCA$Hstrand=="+", ifelse(hgene.PCA$Hstart%%25000>=(25000-hgene.PCA$length), as.integer(25000*ceiling(hgene.PCA$Hstart/25000)), as.integer(25000*floor(hgene.PCA$Hstart/25000))), ifelse(hgene.PCA$Hstart%%25000<=hgene.PCA$length, as.integer((25000*floor(hgene.PCA$Hstart/25000)-25000)), as.integer(25000*floor(hgene.PCA$Hstart/25000)))), sep="-")
cgene.PCA$CID <- paste(cgene.PCA$Cchr, ifelse(cgene.PCA$Cstrand=="+", ifelse(cgene.PCA$Cstart%%25000>=(25000-cgene.PCA$length), as.integer(25000*ceiling(cgene.PCA$Cstart/25000)), as.integer(25000*floor(cgene.PCA$Cstart/25000))), ifelse(cgene.PCA$Cstart%%25000<=cgene.PCA$length, as.integer((25000*floor(cgene.PCA$Cstart/25000)-25000)), as.integer(25000*floor(cgene.PCA$Cstart/25000)))), sep="-")
#hgene.hic$HID <- ifelse(hgene.hic$Hstrand=="+", paste(hgene.hic$Hchr, as.integer(25000*floor(hgene.hic$Hstart/25000)), sep="-"))
#cgene.hic$CID <- paste(cgene.hic$Cchr, as.integer(25000*floor(cgene.hic$Cstart/25000)), sep="-")
hgenes.compartments <- select(hgene.PCA, Hchr, Hstart, genes, Hstrand, HID)
cgenes.compartments <- select(cgene.PCA, Cchr, Cstart, genes, Cstrand, CID)
compartment.genes <- left_join(detable, hgenes.compartments, by="genes") %>% left_join(., cgenes.compartments, by="genes")
#See how many genes fall within 5 kb of one of the boundaries here (at every 50kb interval): Though this was only considering the lower end, not if they fall on the other edge of the boundary (45k modulo remainder is equivalent and comparable #s!). REALLY what I should care about are positive strand genes falling at the upper edge of a boundary (implying their gene body may lie in the next compartment) and negative strand genes falling at the lower edge of a boundary (implying their gene body may lie in the previous compartment). If I have positive strand genes near the beginning or negative strand genes near the end that's not really an issue as their gene bodies should lie entirely within the compartment. It's really just those two specific cases.
sum((compartment.genes$Hstart%%25000)<=5000, na.rm=TRUE)
[1] 2167
sum((compartment.genes$Cstart%%25000)<=5000, na.rm=TRUE)
[1] 2182
#In both species, roughly 1k of the genes we can consider at all fall near (w/in 5kb) one of the A/B boundaries we've established here. Could there be issues with pos/neg strand and how close they are to the boundary? I can check on this...or can also assume it's balanced across both issues. If the results from a naive analysis don't look good I can go back and fix this part of that.
#Now, merge compartment.genes and the humans.AB and chimps.AB compartment assignment DFs!
HAB <- select(humans.AB, peakID, compartment, PC1)
CAB <- select(chimps.AB, peakID, compartment, PC1)
colnames(HAB) <- c("HID", "compartment.H", "PC.H")
colnames(CAB) <- c("CID", "compartment.C", "PC.C")
final.AB <- left_join(compartment.genes, HAB, by="HID") %>% left_join(., CAB, by="CID")
final.AB$comp.diff <- ifelse(final.AB$compartment.H==final.AB$compartment.C, "no", "yes") #Note that 311 of these rows have missing data from at least one species so this determination can't be made? Often b/c of being on a scaffold in chimps.
final.AB$PC.diff <- abs(final.AB$PC.H-final.AB$PC.C)
final.AB <- final.AB[order(final.AB$adj.P.Val),]
final.AB$DE <- ifelse(final.AB$adj.P.Val<=0.05, "yes", "no")
table(final.AB$DE, final.AB$comp.diff, dnn=c("DE?", "Different Compartment b/t Species?"))
Different Compartment b/t Species?
DE? no yes
no 7968 570
yes 2048 145
chisq.test(table(final.AB$DE, final.AB$comp.diff, dnn=c("DE?", "Same Compartment b/t Species?")))
Pearson's Chi-squared test with Yates' continuity correction
data: table(final.AB$DE, final.AB$comp.diff, dnn = c("DE?", "Same Compartment b/t Species?"))
X-squared = 0.0035226, df = 1, p-value = 0.9527
#So roughly 6.61% of DE genes are in different compartments between species, whereas 6.68% of non-DE genes are in different compartments between species...shitty! And not significant!
#Would be nice to know if the differential compartments between species are at least concentrated in those differential contacts b/t species?! Add a column of the DC status between species.
final.AB$DC <- ifelse(final.AB$genes %in% filter(gene.hic.filt, min_FDR.H<=0.05|min_FDR.C<=0.05)$genes, "yes", "no")
table(final.AB$DC, final.AB$comp.diff, dnn=c("DC?", "Different Compartment b/t Species?"))
Different Compartment b/t Species?
DC? no yes
no 9407 674
yes 609 41
chisq.test(table(final.AB$DC, final.AB$comp.diff, dnn=c("DC?", "Different Compartment b/t Species?")))
Pearson's Chi-squared test with Yates' continuity correction
data: table(final.AB$DC, final.AB$comp.diff, dnn = c("DC?", "Different Compartment b/t Species?"))
X-squared = 0.086185, df = 1, p-value = 0.7691
#Again, not significant...can see that the genes which are DC have ~6.3% of them in different compartments b/t species, while the non-DC ones have 6.7% of them in different compartments b/t species. Note that this is also the exact opposite result I was hoping for...
#So a simple classification system is not sufficient for A/B compartment changes between the species. What if I actually took the mean PC value and plotted the difference bewtween species vs decreasing significance? The final.AB df is already ordered by DE p-val, so can do this real quick simply:
ggplot(data=final.AB) + geom_line(aes(x=adj.P.Val, y=PC.diff)) + ggtitle("Absolute Difference of PC1 Ordered by DE Significance") + xlab("Decreasing DE Significance") + ylab("|PC1 Difference b/t Species|")
t.test(filter(final.AB, DE=="yes")$PC.diff, filter(final.AB, DE=="no")$PC.diff)
Welch Two Sample t-test
data: filter(final.AB, DE == "yes")$PC.diff and filter(final.AB, DE == "no")$PC.diff
t = 0.25904, df = 3489.3, p-value = 0.7956
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.009105945 0.011878442
sample estimates:
mean of x mean of y
0.2175951 0.2162088
ggplot(data=final.AB) + geom_violin(aes(group=DE, x=DE, y=PC.diff), draw_quantiles = c(0.25, 0.5, 0.75)) + ggtitle("Absolute Difference of PC1 | DE") + xlab("DE Status") + ylab("|PC1 Difference b/t Species|")
Warning: Removed 343 rows containing non-finite values (stat_ydensity).
#None of this is particularly encouraging...but what if I did it on DC status? Can just check on binary status here to see if it'll even be significant or not:
t.test(filter(final.AB, DC=="yes")$PC.diff, filter(final.AB, DC=="no")$PC.diff)
Welch Two Sample t-test
data: filter(final.AB, DC == "yes")$PC.diff and filter(final.AB, DC == "no")$PC.diff
t = 2.6375, df = 726.52, p-value = 0.008531
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.006501246 0.044361584
sample estimates:
mean of x mean of y
0.2403831 0.2149517
#This time it is!...
ggplot(data=final.AB) + geom_violin(aes(group=DC, x=DC, y=PC.diff), draw_quantiles = c(0.25, 0.5, 0.75)) + ggtitle("Absolute Difference of PC1 | DC") + xlab("DC Status") + ylab("|PC1 Difference b/t Species|") + annotate("text", x=1.5, y=1, label="8.5 e - 3")
Warning: Removed 343 rows containing non-finite values (stat_ydensity).
#table(final.AB$DC, final.AB$PC.diff, dnn=c("DC?", "Different Compartment b/t Species?")) Doesn't make sense here
#Just a quick check to see if not using absolute difference would be better?
final.AB$PC.hdiff <- final.AB$PC.C-final.AB$PC.H
ggplot(data=final.AB) + geom_line(aes(x=adj.P.Val, y=PC.hdiff)) + ggtitle("Difference of PC1 Ordered by DE Significance") + xlab("Decreasing DE Significance") + ylab("PC1 Difference b/t Species (H-C)")
t.test(filter(final.AB, adj.P.Val<=0.05)$PC.hdiff, filter(final.AB, adj.P.Val>0.05)$PC.hdiff)
Welch Two Sample t-test
data: filter(final.AB, adj.P.Val <= 0.05)$PC.hdiff and filter(final.AB, adj.P.Val > 0.05)$PC.hdiff
t = -0.67607, df = 3442.5, p-value = 0.499
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.019637088 0.009566943
sample estimates:
mean of x mean of y
-0.02171785 -0.01668278
ggplot(data=final.AB) + geom_violin(aes(group=DE, x=DE, y=PC.hdiff), draw_quantiles = c(0.25, 0.5, 0.75)) + ggtitle("Difference of PC1 | DE") + xlab("DE Status") + ylab("PC1 Difference b/t Species (H-C)")
Warning: Removed 343 rows containing non-finite values (stat_ydensity).
#None of this is particularly encouraging...but what if I did it on DC status? Can just check on binary status here to see if it'll even be significant or not:
t.test(filter(final.AB, DC=="yes")$PC.hdiff, filter(final.AB, DC=="no")$PC.hdiff)
Welch Two Sample t-test
data: filter(final.AB, DC == "yes")$PC.hdiff and filter(final.AB, DC == "no")$PC.hdiff
t = -3.2522, df = 724.08, p-value = 0.001198
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.07021267 -0.01735265
sample estimates:
mean of x mean of y
-0.05884241 -0.01505974
#Overall, most of the A/B compartment calling results look rather atrocious...nothing much interesting to be gleaned here, unfortuantely.
This next section would be to look at differences in effect size and sign changes when doing the two separate lms (one before and one after regressing out Hi-C data from RPKM expression). Unfortunately the ash results I get have all their svalues as 1 so this is boring/not interesting right now. Hence it is not included. But let’s give it another go!
#######FRESH CODE 6-21-18#########
design <- model.matrix(~meta.data$SP)
exprs <- select(hmin, "A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651", genes) #Should be done on just DE genes, filter with adj.P.Val<=0.05
hic <- select(hmin,"A_21792_HIC", "B_28126_HIC", "C_3649_HIC", "D_40300_HIC", "E_28815_HIC", "F_28834_HIC", "G_3624_HIC", "H_3651_HIC")
present <- sapply(1:nrow(hic), function(i) !any(is.na(hic[i,])))
genes <- exprs$genes
DEgenes <- filter(hmin, adj.P.Val<=0.05) %>% select(., genes)
myDEindices <- which(genes %in% DEgenes[,1])
exprs <- exprs[,-9]
exprs <- data.matrix(exprs[present,])
hic <- data.matrix(hic[present,])
designy <- as.factor(c("human", "human", "chimp", "chimp", "human", "human", "chimp", "chimp"))
colnames(hic) <- colnames(exprs)
hic <- as.data.table(hic)
exprs <- as.data.table(exprs)
test <- mediate.test.regressing(exprs, designy, hic)
Attaching package: 'assertthat'
The following object is masked from 'package:tibble':
has_name
#Shrink dat variance!
vshrDE <- vash(test$d_se[myDEindices], df=6)
vshrnonDE <- vash(test$d_se[-myDEindices], df=6)
vshrink <- vash(test$d_se, df=6)
#Separate out DE from non-DE genes:
DEtest <- test[myDEindices,]
nonDEtest <- test[-myDEindices,]
#Ash on DE genes
ash_reg <- ash(as.vector(DEtest$d), as.vector(vshrDE$sd.post), mixcompdist = "normal") #Now it works with this design coding. Use 6 df, as 2-group comparison and 8 individual samples (samples - 2)
summary(ash_reg$result$svalue<=0.05) #Incredible result, looks too good to be true.
Mode FALSE TRUE
logical 1244 293
ind_sig <- (ash_reg$result$svalue < 0.05)
ind_sig[ind_sig == TRUE] <- "red"
ind_sig[ind_sig == FALSE] <- "black"
#FIG5A
plot(x=ash_reg$result$betahat, y=ash_reg$result$sebetahat, ylab="Standard error of the difference", xlab="Difference in Effect Size from 2 Linear Models", main="Difference of Effect Size in DE Genes", pch=16, cex=0.6, col=ind_sig, xlim=c(-4.5, 4.5), ylim=c(0, 3))
legend("topleft", legend=c("s < 0.05", "s >= 0.05"), col=c("red", "black"), pch=16:16, cex=0.8)
plot(ash_reg$result$betahat, -log10(ash_reg$result$svalue))
###### START EXAMPLE
#Figure out where some of these are to pull out good examples, try to find a dense region of them
myhits <- which(ash_reg$result$svalue<=0.05) #These SHOULD line up with DEtest indices...
realhits <- myDEindices[myhits] #Now go back to exprs original with these indices to get the gene names!
exprs <- select(hmin, "A-21792", "B-28126", "C-3649", "D-40300", "E-28815", "F-28834", "G-3624", "H-3651", genes) #Should be done on just DE genes, filter with adj.P.Val<=0.05
demgenes <- exprs[realhits,9]
testcases <- filter(gene.hic.filt, genes %in% demgenes)
table(testcases$genechr.H)
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2
21 17 16 16 3 9 13 25 37 4 21 7
chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX
9 3 7 17 12 8 18 7 7 12 4
testcases$sval <- ash_reg$result$svalue[myhits]
testcases <- testcases[order(testcases$sval),]
#Now, pull out expression values for those suckers:
options(scipen = 999)
mytest <- filter(humgenes, genes %in% testcases$genes)
mytest <- mytest[(mytest$genes %in% weighted.data$genes$genes),]
mean.species.expression <- as.data.frame(weighted.data$E) #Now, need to pull out the mean expression values for all those genes!
mean.species.expression$genes <- rownames(mean.species.expression)
mean.species.expression$Hmean <- rowMeans(mean.species.expression[,c(1:2,5:6)])
mean.species.expression$Cmean <- rowMeans(mean.species.expression[,c(3:4,7:8)])
mean.species.expression <- mean.species.expression[,(-1:-8)]
left_join(testcases, mean.species.expression, by="genes") -> filt.example
filter(filt.example, min_FDR.H<=0.05|min_FDR.C<=0.05) -> final.example
final.example$Hmean-final.example$Cmean
[1] 4.5328579 1.2367448 1.4489860 2.4987542 -1.2978160 -1.4169804
[7] -3.1946646 -1.9931026 -1.9871103 1.1738427 -5.2288365 1.2336727
[13] -2.3053998 -1.0226005 0.9405034 -1.1595681 -0.8852151 -1.1076224
[19] 2.7855279 -1.4640895 -1.1098327 1.1607247 -1.0098363 -1.6114598
[25] -1.3898762 -0.8417933 1.8813176 -1.0449806 -1.1797100 1.3543069
[31] -1.5268885 -0.9104789 1.0591123 -3.0020201 -1.0473910 1.0138926
[37] -0.6466907 -4.7380406 -1.1886107 -0.6727022 4.2174090 0.8670641
[43] 0.8224821 -0.7521561 -0.6123475 0.6108439 -1.6004895 -0.4286268
[49] 1.0320064 0.7067310 0.7565189 -1.1045673 0.7209822 0.7714959
[55] 0.5101153 0.7215741 0.7052795 -0.8612434
#gonna end up pulling 48th row example from final.example. Pull out its contact frequency vals for arc diagram creation:
final.example[48,]
genes Length logFC AveExpr t P.Value
48 ENSG00000105486 6343 -0.4282247 6.929213 -3.293358 0.007383284
adj.P.Val B Hchr.H Hstart.H Hend.H Hstrand.H bin_pos.H
48 0.04020062 -3.034007 chr19 48110000 48120000 - 5444
genechr.H genepos.H HID min_FDR_bin.H min_FDR.H
48 chr19 48115444 chr19-48110000 chr19-47960000 0.0003698497
min_FDR_pval min_FDR_B.H median_FDR.H weighted_Z.ALLvar.H
48 0.000002316065 1.932498 0.2929028 0.09896256
weighted_Z.s2post.H fisher.H numcontacts.H max_B_bin.H max_B_FDR.H
48 0 69.32522 7 chr19-47960000 0.0003698497
max_B.H DS_bin.H DS_FDR.H DS_dist.H US_bin.H US_FDR.H
48 1.932498 chr19-48120000 0.1804779 10000 chr19-47970000 0.001113113
US_dist.H Hchr.C Hstart.C Hend.C Hstrand.C bin_pos.C genechr.C
48 140000 chr19 50210000 50220000 - 3062 chr19
genepos.C CID min_FDR_bin.C min_FDR.C min_FDR_B.C
48 50213062 chr19-50210000 chr19-50050000 0.0003698497 1.932498
median_FDR.C weighted_Z.ALLvar.C weighted_Z.s2post.C fisher.C
48 0.2929028 0.09896256 0 69.32522
numcontacts.C max_B_bin.C max_B_FDR.C max_B.C DS_bin.C
48 7 chr19-50050000 0.0003698497 1.932498 chr19-50220000
DS_FDR.C DS_dist.C US_bin.C US_FDR.C US_dist.C sval
48 0.1804779 10000 chr19-50060000 0.001113113 150000 0.03554588
Hmean Cmean
48 6.7149 7.143526
which((data.filtered$H1=="chr17-64470000"|data.filtered$H2=="chr17-64470000")&(data.filtered$C1=="chr17-63970000"|data.filtered$C2=="chr17-63970000"))
[1] 47406 49167 206246 207015 207030 251993 282314 283043
which((data.filtered$H1=="chr17-64470000"&data.filtered$H2=="chr17-64430000")|(data.filtered$H2=="chr17-64470000"&data.filtered$H1=="chr17-64430000"))
[1] 282314
#Check on what other contacts might be significant in the region
QC <- filter(data.filtered, Hchr=="Chr. 17"&sp_BH_pval<=0.05)
QC$H1 <- gsub(".*-", "", QC$H1)
QC$H2 <- gsub(".*-", "", QC$H2)
which((QC$H1>=64400000&QC$H1<=64500000)&(QC$H2>=64400000&QC$H2<=64500000))
[1] 218 323 397
QC[c(218,323,397),]
num-A H_chr1-A Hstart1-A Hend1-A Hstrand1-A H_chr2-A Hstart2-A Hend2-A
218 NA <NA> NA NA <NA> <NA> NA NA
323 NA <NA> NA NA <NA> <NA> NA NA
397 NA <NA> NA NA <NA> <NA> NA NA
Hstrand2-A C_chr1-A Cstart1-A Cend1-A Cstrand1-A C_chr2-A Cstart2-A
218 <NA> <NA> NA NA <NA> <NA> NA
323 <NA> <NA> NA NA <NA> <NA> NA
397 <NA> <NA> NA NA <NA> <NA> NA
Cend2-A Cstrand2-A InteractionID-A PeakID(1)-A chr(1)-A start(1)-A
218 NA <NA> <NA> <NA> <NA> NA
323 NA <NA> <NA> <NA> <NA> NA
397 NA <NA> <NA> <NA> <NA> NA
end(1)-A strand(1)-A Total_Reads(1)-A PeakID(2)-A chr(2)-A start(2)-A
218 NA <NA> NA <NA> <NA> NA
323 NA <NA> NA <NA> <NA> NA
397 NA <NA> NA <NA> <NA> NA
end(2)-A strand(2)-A Total_Reads(2)-A Distance-A Interaction_Reads-A
218 NA <NA> NA NA NA
323 NA <NA> NA NA NA
397 NA <NA> NA NA NA
Expected_Reads-A Z-score-A LogP-A FDR-A Circos_Thickness-A
218 NA NA NA NA NA
323 NA NA NA NA NA
397 NA NA NA NA NA
HCpair num-B
218 chr17-64430000_chr17-64470000/chr17-63920000_chr17-63970000 NA
323 chr17-64430000_chr17-64490000/chr17-63920000_chr17-63990000 NA
397 chr17-64420000_chr17-64500000/chr17-63910000_chr17-64000000 NA
H_chr1-B Hstart1-B Hend1-B Hstrand1-B H_chr2-B Hstart2-B Hend2-B
218 <NA> NA NA <NA> <NA> NA NA
323 <NA> NA NA <NA> <NA> NA NA
397 <NA> NA NA <NA> <NA> NA NA
Hstrand2-B C_chr1-B Cstart1-B Cend1-B Cstrand1-B C_chr2-B Cstart2-B
218 <NA> <NA> NA NA <NA> <NA> NA
323 <NA> <NA> NA NA <NA> <NA> NA
397 <NA> <NA> NA NA <NA> <NA> NA
Cend2-B Cstrand2-B InteractionID-B PeakID(1)-B chr(1)-B start(1)-B
218 NA <NA> <NA> <NA> <NA> NA
323 NA <NA> <NA> <NA> <NA> NA
397 NA <NA> <NA> <NA> <NA> NA
end(1)-B strand(1)-B Total_Reads(1)-B PeakID(2)-B chr(2)-B start(2)-B
218 NA <NA> NA <NA> <NA> NA
323 NA <NA> NA <NA> <NA> NA
397 NA <NA> NA <NA> <NA> NA
end(2)-B strand(2)-B Total_Reads(2)-B Distance-B Interaction_Reads-B
218 NA <NA> NA NA NA
323 NA <NA> NA NA NA
397 NA <NA> NA NA NA
Expected_Reads-B Z-score-B LogP-B FDR-B Circos_Thickness-B num-E
218 NA NA NA NA NA NA
323 NA NA NA NA NA NA
397 NA NA NA NA NA NA
H_chr1-E Hstart1-E Hend1-E Hstrand1-E H_chr2-E Hstart2-E Hend2-E
218 <NA> NA NA <NA> <NA> NA NA
323 <NA> NA NA <NA> <NA> NA NA
397 <NA> NA NA <NA> <NA> NA NA
Hstrand2-E C_chr1-E Cstart1-E Cend1-E Cstrand1-E C_chr2-E Cstart2-E
218 <NA> <NA> NA NA <NA> <NA> NA
323 <NA> <NA> NA NA <NA> <NA> NA
397 <NA> <NA> NA NA <NA> <NA> NA
Cend2-E Cstrand2-E InteractionID-E PeakID(1)-E chr(1)-E start(1)-E
218 NA <NA> <NA> <NA> <NA> NA
323 NA <NA> <NA> <NA> <NA> NA
397 NA <NA> <NA> <NA> <NA> NA
end(1)-E strand(1)-E Total_Reads(1)-E PeakID(2)-E chr(2)-E start(2)-E
218 NA <NA> NA <NA> <NA> NA
323 NA <NA> NA <NA> <NA> NA
397 NA <NA> NA <NA> <NA> NA
end(2)-E strand(2)-E Total_Reads(2)-E Distance-E Interaction_Reads-E
218 NA <NA> NA NA NA
323 NA <NA> NA NA NA
397 NA <NA> NA NA NA
Expected_Reads-E Z-score-E LogP-E FDR-E Circos_Thickness-E num-F
218 NA NA NA NA NA NA
323 NA NA NA NA NA NA
397 NA NA NA NA NA NA
H_chr1-F Hstart1-F Hend1-F Hstrand1-F H_chr2-F Hstart2-F Hend2-F
218 <NA> NA NA <NA> <NA> NA NA
323 <NA> NA NA <NA> <NA> NA NA
397 <NA> NA NA <NA> <NA> NA NA
Hstrand2-F C_chr1-F Cstart1-F Cend1-F Cstrand1-F C_chr2-F Cstart2-F
218 <NA> <NA> NA NA <NA> <NA> NA
323 <NA> <NA> NA NA <NA> <NA> NA
397 <NA> <NA> NA NA <NA> <NA> NA
Cend2-F Cstrand2-F InteractionID-F PeakID(1)-F chr(1)-F start(1)-F
218 NA <NA> <NA> <NA> <NA> NA
323 NA <NA> <NA> <NA> <NA> NA
397 NA <NA> <NA> <NA> <NA> NA
end(1)-F strand(1)-F Total_Reads(1)-F PeakID(2)-F chr(2)-F start(2)-F
218 NA <NA> NA <NA> <NA> NA
323 NA <NA> NA <NA> <NA> NA
397 NA <NA> NA <NA> <NA> NA
end(2)-F strand(2)-F Total_Reads(2)-F Distance-F Interaction_Reads-F
218 NA <NA> NA NA NA
323 NA <NA> NA NA NA
397 NA <NA> NA NA NA
Expected_Reads-F Z-score-F LogP-F FDR-F Circos_Thickness-F num-C
218 NA NA NA NA NA 236246
323 NA NA NA NA NA 238387
397 NA NA NA NA NA 241076
H_chr1-C Hstart1-C Hend1-C Hstrand1-C H_chr2-C Hstart2-C Hend2-C
218 chr17 64473558 64483581 + chr17 64425334 64435328
323 chr17 64493060 64502533 + chr17 64425334 64435328
397 chr17 64502533 64512511 + chr17 64415312 64425334
Hstrand2-C C_chr1-C Cstart1-C Cend1-C Cstrand1-C C_chr2-C Cstart2-C
218 + chr17 63970000 63980000 + chr17 63920000
323 + chr17 63990000 64000000 + chr17 63920000
397 + chr17 64000000 64010000 + chr17 63910000
Cend2-C Cstrand2-C InteractionID-C PeakID(1)-C chr(1)-C start(1)-C
218 63930000 + interaction1690 chr17-63970000 chr17 63970196
323 63930000 + interaction3831 chr17-63990000 chr17 63989573
397 63920000 + interaction6520 chr17-64000000 chr17 63998969
end(1)-C strand(1)-C Total_Reads(1)-C PeakID(2)-C chr(2)-C
218 63980196 + 4054 chr17-63920000 chr17
323 63999573 + 4108 chr17-63920000 chr17
397 64008969 + 4323 chr17-63910000 chr17
start(2)-C end(2)-C strand(2)-C Total_Reads(2)-C Distance-C
218 63920046 63930046 + 4096 50150
323 63919163 63929163 + 4096 70410
397 63910481 63920481 + 3824 88488
Interaction_Reads-C Expected_Reads-C Z-score-C LogP-C FDR-C
218 109 45.60180 3.071184 -34.82025 0.000000
323 83 36.13381 2.561925 -25.00079 0.000000
397 66 29.30642 2.237250 -19.51321 0.000018
Circos_Thickness-C num-D H_chr1-D Hstart1-D Hend1-D Hstrand1-D
218 10 276268 chr17 64473558 64483581 +
323 6 267998 chr17 64493060 64502533 +
397 4 272586 chr17 64502533 64512511 +
H_chr2-D Hstart2-D Hend2-D Hstrand2-D C_chr1-D Cstart1-D Cend1-D
218 chr17 64425334 64435328 + chr17 63970000 63980000
323 chr17 64425334 64435328 + chr17 63990000 64000000
397 chr17 64415312 64425334 + chr17 64000000 64010000
Cstrand1-D C_chr2-D Cstart2-D Cend2-D Cstrand2-D InteractionID-D
218 + chr17 63920000 63930000 + interaction10406
323 + chr17 63920000 63930000 + interaction2136
397 + chr17 63910000 63920000 + interaction6724
PeakID(1)-D chr(1)-D start(1)-D end(1)-D strand(1)-D
218 chr17-63970000 chr17 63970104 63980104 +
323 chr17-63990000 chr17 63989692 63999692 +
397 chr17-64000000 chr17 63998786 64008786 +
Total_Reads(1)-D PeakID(2)-D chr(2)-D start(2)-D end(2)-D
218 4438 chr17-63920000 chr17 63920229 63930229
323 4138 chr17-63920000 chr17 63919951 63929951
397 4609 chr17-63910000 chr17 63910226 63920226
strand(2)-D Total_Reads(2)-D Distance-D Interaction_Reads-D
218 + 4234 49875 74
323 + 4234 69741 76
397 + 4070 88560 59
Expected_Reads-D Z-score-D LogP-D FDR-D Circos_Thickness-D
218 39.91519 2.246405 -14.06295 0.002596 4
323 29.69296 2.986874 -27.98229 0.000000 8
397 26.54339 2.275882 -17.18421 0.000177 4
num-G H_chr1-G Hstart1-G Hend1-G Hstrand1-G H_chr2-G Hstart2-G
218 245433 chr17 64473558 64483581 + chr17 64425334
323 244589 chr17 64493060 64502533 + chr17 64425334
397 252103 chr17 64502533 64512511 + chr17 64415312
Hend2-G Hstrand2-G C_chr1-G Cstart1-G Cend1-G Cstrand1-G C_chr2-G
218 64435328 + chr17 63970000 63980000 + chr17
323 64435328 + chr17 63990000 64000000 + chr17
397 64425334 + chr17 64000000 64010000 + chr17
Cstart2-G Cend2-G Cstrand2-G InteractionID-G PeakID(1)-G chr(1)-G
218 63920000 63930000 + interaction2061 chr17-63970000 chr17
323 63920000 63930000 + interaction1217 chr17-63990000 chr17
397 63910000 63920000 + interaction8731 chr17-64000000 chr17
start(1)-G end(1)-G strand(1)-G Total_Reads(1)-G PeakID(2)-G
218 63970383 63980383 + 3709 chr17-63920000
323 63990457 64000457 + 4083 chr17-63920000
397 63998577 64008577 + 4402 chr17-63910000
chr(2)-G start(2)-G end(2)-G strand(2)-G Total_Reads(2)-G Distance-G
218 chr17 63919745 63929745 + 3841 50638
323 chr17 63919818 63929818 + 3841 70639
397 chr17 63911113 63921113 + 3362 87464
Interaction_Reads-G Expected_Reads-G Z-score-G LogP-G FDR-G
218 96 43.12241 2.837183 -26.95338 0.000000
323 93 37.05095 2.868164 -32.73931 0.000000
397 59 29.00601 1.984718 -14.34447 0.002335
Circos_Thickness-G num-H H_chr1-H Hstart1-H Hend1-H Hstrand1-H
218 8 237722 chr17 64473558 64483581 +
323 8 235932 chr17 64493060 64502533 +
397 4 236640 chr17 64502533 64512511 +
H_chr2-H Hstart2-H Hend2-H Hstrand2-H C_chr1-H Cstart1-H Cend1-H
218 chr17 64425334 64435328 + chr17 63970000 63980000
323 chr17 64425334 64435328 + chr17 63990000 64000000
397 chr17 64415312 64425334 + chr17 64000000 64010000
Cstrand1-H C_chr2-H Cstart2-H Cend2-H Cstrand2-H InteractionID-H
218 + chr17 63920000 63930000 + interaction2814
323 + chr17 63920000 63930000 + interaction1024
397 + chr17 63910000 63920000 + interaction1732
PeakID(1)-H chr(1)-H start(1)-H end(1)-H strand(1)-H
218 chr17-63970000 chr17 63970192 63980192 +
323 chr17-63990000 chr17 63990164 64000164 +
397 chr17-64000000 chr17 63998479 64008479 +
Total_Reads(1)-H PeakID(2)-H chr(2)-H start(2)-H end(2)-H
218 3929 chr17-63920000 chr17 63919979 63929979
323 4147 chr17-63920000 chr17 63919418 63929418
397 4593 chr17-63910000 chr17 63910629 63920629
strand(2)-H Total_Reads(2)-H Distance-H Interaction_Reads-H
218 + 3943 50213 92
323 + 3943 70746 94
397 + 3770 87850 80
Expected_Reads-H Z-score-H LogP-H FDR-H Circos_Thickness-H Hchr
218 42.68019 2.790360 -24.22792 0 6 Chr. 17
323 35.59823 3.097276 -36.03307 0 10 Chr. 17
397 31.38126 2.685398 -29.18700 0 8 Chr. 17
H1 H2 Cchr C1 C2 A-21792_norm
218 64430000 64470000 chr17 chr17-63920000 chr17-63970000 0.3205138
323 64430000 64490000 chr17 chr17-63920000 chr17-63990000 0.4236885
397 64420000 64500000 chr17 chr17-63910000 chr17-64000000 0.1031776
B-28126_norm C-3649_norm D-40300_norm E-28815_norm F-28834_norm
218 0.5313236 1.264110 0.8994624 0.6280133 0.4838557
323 0.3712357 1.204258 1.3677771 0.7631368 0.3003590
397 -0.1003274 1.181793 1.1611072 0.5462269 -0.2464347
G-3624_norm H-3651_norm Hmean Cmean ALLmean Hvar
218 1.148654 1.131268 0.49092659 1.110873 0.8009000 0.01650506
323 1.315510 1.429335 0.46460500 1.329220 0.8969125 0.04216332
397 1.017858 1.360370 0.07566058 1.180282 0.6279712 0.11896900
Cvar ALLvar found_in_H found_in_C disc_species tot_indiv_IDd
218 0.023339734 0.1268861 0 4 C 4
323 0.009104341 0.2355602 0 4 C 4
397 0.019727711 0.4080667 0 4 C 4
sp_beta sp_pval sp_BH_pval avg_IF t_statistic B_statistic
218 -0.6199468 0.001295658 0.03657103 0.8009000 -5.403267 -1.0625188
323 -0.8646150 0.000688142 0.02206974 0.8969125 -6.058465 -0.3645291
397 -1.1046213 0.001222219 0.03489531 0.6279712 -5.461487 -0.9981794
sx_pval btc_pval sx_beta btc_beta SE H1diff H2diff
218 0.1557771 0.4412829 0.18425022 0.09409531 0.1147355 23 6
323 0.8229855 0.4666735 0.03324526 0.11034517 0.1427119 527 6
397 0.7514995 0.6945329 0.06683261 0.08306741 0.2022565 22 22
C1diff C2diff Hdist Cdist dist_diff
218 NA NA 40000 50000 10000
323 NA NA 60000 70000 10000
397 NA NA 80000 90000 10000
#filter(example, Hstart>=140500000&Hstart<=141500000) -> filt.example
hgenebed <- select(filt.example, genechr.H, genepos.H, Hmean)
cgenebed <- select(filt.example, genechr.C, genepos.C, Cmean)
hgenebed$end <- hgenebed$genepos.H+5000
cgenebed$end <- cgenebed$genepos.C+5000
hgenebed <- hgenebed[,c(1,2,4,3)]
cgenebed <- cgenebed[,c(1,2,4,3)]
cgenebed <- cgenebed[complete.cases(cgenebed),]
fwrite(hgenebed, "~/Desktop/Hi-C/pyGenomeTracks/filt.h.genes.bed", quote=FALSE, sep="\t", col.names =FALSE)
fwrite(cgenebed, "~/Desktop/Hi-C/pyGenomeTracks/filt.c.genes.bed", quote=FALSE, sep="\t", col.names =FALSE)
#Write out the appropriate columns from data.filtered to make an "arcs" file for visualization:
harcs <- filter(data.filtered, sp_beta>0, sp_BH_pval<=0.05) %>% select(., H1, H2, Hmean) #Note chr notation different due to other things changed; check on this later.
harcs$chr <- gsub("-.*", "", harcs$H1)
harcs$H1start <- as.numeric(gsub(".*-", "", harcs$H1))
harcs$H1end <- harcs$H1start+10000
harcs$H2start <- as.numeric(gsub(".*-", "", harcs$H2))
harcs$H2end <- harcs$H2start+10000
harcs <- harcs[,c(4:6, 4, 7:8, 3)]
harcs <- format(harcs, scientific=FALSE)
fwrite(harcs, "~/Desktop/Hi-C/pyGenomeTracks/RNAseq/humans.arcs", quote=FALSE, sep="\t", col.names=FALSE)
#Prep the same thing for chimpanzees:
carcs <- filter(data.filtered, sp_beta<0, sp_BH_pval<=0.05) %>% select(., C1, C2, Cmean) #Note chr notation different due to other things changed; check on this later.
carcs$chr <- gsub("-.*", "", carcs$C1)
carcs$C1start <- as.numeric(gsub(".*-", "", carcs$C1))
carcs$C1end <- carcs$C1start+10000
carcs$C2start <- as.numeric(gsub(".*-", "", carcs$C2))
carcs$C2end <- carcs$C2start+10000
carcs <- carcs[,c(4:6, 4, 7:8, 3)]
carcs <- format(carcs, scientific=FALSE)
fwrite(carcs, "~/Desktop/Hi-C/pyGenomeTracks/RNAseq/chimps.arcs", quote=FALSE, sep="\t", col.names=FALSE)
#Now add in an extra column placeholder so strand is in the right column for bedtools, and rearrange columns as well for bedtools' ease.
humgenes$placeholder <- "."
chimpgenes$placeholder <- "."
humgenes$Hend <- humgenes$Hstart+1
chimpgenes$Cend <- chimpgenes$Cstart+1
humgenes <- humgenes[,c(2:3, 6, 1, 5, 4)]
chimpgenes <- chimpgenes[,c(2:3, 6, 1, 5, 4)]
#Turn off scientific notation for writing the file out for bedtools to be able to use correctly.
options(scipen=999)
####### END EXAMPLE
#Now on non-DE genes
ash_reg_nonDE <- ash(as.vector(nonDEtest$d), as.vector(vshrnonDE$sd.post), mixcompdist = "normal") #Now it works with this design coding. Use 6 df, as 2-group comparison and 8 individual samples (samples - 2)
Warning in estimate_mixprop(data, g, prior, optmethod = optmethod, control
= control, : Optimization failed to converge. Results may be unreliable.
Try increasing maxiter and rerunning.
summary(ash_reg_nonDE$result$svalue<=0.05) #Incredible result, looks too good to be true.
Mode FALSE
logical 6227
ind_sig <- (ash_reg_nonDE$result$svalue < 0.05)
ind_sig[ind_sig == TRUE] <- "red"
ind_sig[ind_sig == FALSE] <- "black"
#FIG5B
plot(x=ash_reg_nonDE$result$betahat, y=ash_reg_nonDE$result$sebetahat, ylab="Standard error of the difference", xlab="Difference in Effect Size from 2 Linear Models", main="Difference of Effect Size in non-DE Genes", pch=16, cex=0.6, col=ind_sig, xlim=c(-4.5, 4.5), ylim=c(0, 3))
legend("topleft", legend=c("s < 0.05", "s >= 0.05"), col=c("red", "black"), pch=16:16, cex=0.8)
plot(ash_reg_nonDE$result$betahat, -log10(ash_reg_nonDE$result$svalue))
#Now on the full set, for comparison's sake.
ash_reg_full <- ash(as.vector(test$d), as.vector(vshrink$sd.post), mixcompdist = "normal") #Now it works with this design coding. Use 6 df, as 2-group comparison and 8 individual samples (samples - 2)
summary(ash_reg_full$result$svalue<=0.05) #Incredible result, looks too good to be true.
Mode FALSE TRUE
logical 7714 50
ind_sig <- (ash_reg_full$result$svalue < 0.05)
ind_sig[ind_sig == TRUE] <- "red"
ind_sig[ind_sig == FALSE] <- "black"
plot(x=ash_reg_full$result$betahat, y=ash_reg_full$result$sebetahat, ylab="Standard error of the difference", xlab="Difference in effect size from 2 lms", main="Difference of effect size (DE genes)", pch=16, cex=0.6, col=ind_sig)#, xlim=c(-2.5, 2.5), ylim=c(0, 1.3))
legend("topleft", legend=c("s < 0.05", "s >= 0.05"), col=c("red", "black"), pch=16:16, cex=0.8)
plot(ash_reg_full$result$betahat, -log10(ash_reg_full$result$svalue))
#Look at effect size shrinkage in full set ash vs. separated DE/nonDE ash:
plot(c(ash_reg$result$PosteriorMean, ash_reg_nonDE$result$PosteriorMean), ash_reg_full$result$PosteriorMean)
#Look at this just in DE:
plot(ash_reg$result$PosteriorMean, ash_reg_full$result$PosteriorMean[myDEindices])
#Look at this just in non-DE:
plot(ash_reg_nonDE$result$PosteriorMean, ash_reg_full$result$PosteriorMean[-myDEindices])
{r, include=FALSE, echo=FALSE} # #Simple plotting of logFC vs logFC. A positive logFC from the expression data indicates higher expression in humans. # #First, perform for entire set of genes, on non-DE and DE genes separately. # mydat <- select(gene.hic.filt, logFC, min_FDR_B.H, adj.P.Val, AveExpr) # ggplot(mydat) + geom_point(aes(x=logFC, y=min_FDR_B.H, color=adj.P.Val)) # ggplot(filter(mydat, adj.P.Val<=0.05)) + geom_point(aes(x=logFC, y=min_FDR_B.H)) + xlab("LogFC, Expression") + ylab("LogFC, Contact") + ggtitle("LogFC Expression vs. LogFC Contact for DE Genes") # ggplot(filter(mydat, adj.P.Val>0.05)) + geom_point(aes(x=logFC, y=min_FDR_B.H)) + xlab("LogFC, Expression") + ylab("LogFC, Contact") + ggtitle("LogFC Expression vs. LogFC Contact for non-DE Genes") # # #Now, take a look at what happens if I stratify things by which species it's more highly expressed in, and its DE status, and calculate R-squareds. Should do this with actual mean expression values per species # human.high.DE <- filter(mydat, logFC>0, adj.P.Val<=0.05) # human.high.nonDE <- filter(mydat, logFC>0, adj.P.Val>0.05) # chimp.high.DE <- filter(mydat, logFC<0, adj.P.Val<=0.05) # chimp.high.nonDE <- filter(mydat, logFC<0, adj.P.Val>0.05) # summary(lm(human.high.nonDE$logFC~human.high.nonDE$min_FDR_B.H))$adj.r.squared # summary(lm(human.high.DE$logFC~human.high.DE$min_FDR_B.H))$adj.r.squared # summary(lm(chimp.high.nonDE$logFC~chimp.high.nonDE$min_FDR_B.H))$adj.r.squared # summary(lm(chimp.high.DE$logFC~chimp.high.DE$min_FDR_B.H))$adj.r.squared # # # #Now, try out the suggested regression. # new.weighted.data <- weighted.data # new.weighted.data$E <- dge_final$lcpm_counts[,c(8, 6, 1, 4, 7, 5, 2, 3)] # # #Now, need to subset this down to the genes we have contact frequency for (from hmin table, created far below) # my.hmin <- select(hmin, genes, A_21792_HIC, B_28126_HIC, C_3649_HIC, D_40300_HIC, E_28815_HIC, F_28834_HIC, G_3624_HIC, H_3651_HIC) # deez.genes <- left_join(new.weighted.data$genes, my.hmin) # good.indices <- which(complete.cases(deez.genes)) # new.weighted.data <- new.weighted.data[good.indices,] # new.weighted.data$HiC <- deez.genes[good.indices, 3:10] # # meta.exp.data <- data.frame("SP"=c("C", "C", "C", "C", "H", "H", "H", "H"), "SX"=c("M","M" ,"F","F","F", "M","M","F")) # SP <- factor(meta.exp.data$SP,levels = c("H","C")) # # # for(row in 1:nrow(new.weighted.data$E)){ # lmFit() # } # exp.design <- model.matrix(~0+SP+new.weighted.data$HiC+SP:new.weighted.data$HiC)#(~1+meta.exp.data$SP+meta.exp.data$SX) But where does freq come from? # colnames(exp.design) <- c("Human", "Chimp") # weighted.data <- voom(new.weighted.data, exp.design, plot=TRUE, normalize.method = "cyclicloess") # # ##Obtain rest of LM results, with particular eye to DE table! # vfit <- lmFit(weighted.data, exp.design) # efit <- eBayes(vfit) # # mycon <- makeContrasts(HvC = Human-Chimp, levels = exp.design) # diff_species <- contrasts.fit(efit, mycon) # finalfit <- eBayes(diff_species) # detable <- topTable(finalfit, coef = 1, adjust.method = "BH", number = Inf, sort.by="none") #
sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid compiler stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] assertthat_0.2.0 VennDiagram_1.6.20 futile.logger_1.4.3
[4] glmnet_2.0-16 foreach_1.4.4 Matrix_1.2-15
[7] medinome_0.0.1 vashr_0.99.1 qvalue_2.10.0
[10] SQUAREM_2017.10-1 ashr_2.2-32 forcats_0.4.0
[13] purrr_0.3.1 readr_1.3.1 tibble_2.0.1
[16] tidyverse_1.2.1 edgeR_3.20.9 RColorBrewer_1.1-2
[19] heatmaply_0.15.2 viridis_0.5.1 viridisLite_0.3.0
[22] stringr_1.4.0 gplots_3.0.1.1 Hmisc_4.2-0
[25] Formula_1.2-3 survival_2.43-3 lattice_0.20-38
[28] dplyr_0.8.0.1 plotly_4.8.0 cowplot_0.9.4
[31] ggplot2_3.1.0 reshape2_1.4.3 data.table_1.12.0
[34] tidyr_0.8.3 plyr_1.8.4 limma_3.34.9
loaded via a namespace (and not attached):
[1] colorspace_1.4-0 class_7.3-15 modeltools_0.2-22
[4] mclust_5.4.2 rprojroot_1.3-2 htmlTable_1.13.1
[7] base64enc_0.1-3 fs_1.2.6 rstudioapi_0.9.0
[10] flexmix_2.3-15 mvtnorm_1.0-8 lubridate_1.7.4
[13] xml2_1.2.0 codetools_0.2-16 splines_3.4.0
[16] pscl_1.5.2 doParallel_1.0.14 robustbase_0.92-8
[19] knitr_1.22 jsonlite_1.6 workflowr_1.2.0
[22] broom_0.5.1 cluster_2.0.7-1 kernlab_0.9-27
[25] httr_1.4.0 backports_1.1.3 lazyeval_0.2.1
[28] cli_1.0.1 formatR_1.6 acepack_1.4.1
[31] htmltools_0.3.6 tools_3.4.0 gtable_0.2.0
[34] glue_1.3.0 Rcpp_1.0.0 cellranger_1.1.0
[37] trimcluster_0.1-2.1 gdata_2.18.0 nlme_3.1-137
[40] iterators_1.0.10 fpc_2.1-11.1 xfun_0.5
[43] rvest_0.3.2 gtools_3.8.1 dendextend_1.9.0
[46] DEoptimR_1.0-8 MASS_7.3-51.1 scales_1.0.0
[49] TSP_1.1-6 hms_0.4.2 parallel_3.4.0
[52] lambda.r_1.2.3 yaml_2.2.0 gridExtra_2.3
[55] rpart_4.1-13 latticeExtra_0.6-28 stringi_1.3.1
[58] gclus_1.3.2 checkmate_1.9.1 seriation_1.2-3
[61] caTools_1.17.1.2 truncnorm_1.0-8 rlang_0.3.1
[64] pkgconfig_2.0.2 prabclus_2.2-7 bitops_1.0-6
[67] evaluate_0.13 labeling_0.3 htmlwidgets_1.3
[70] tidyselect_0.2.5 magrittr_1.5 R6_2.4.0
[73] generics_0.0.2 pillar_1.3.1 haven_2.1.0
[76] whisker_0.3-2 foreign_0.8-71 withr_2.1.2
[79] mixsqp_0.1-97 nnet_7.3-12 modelr_0.1.4
[82] crayon_1.3.4 futile.options_1.0.1 KernSmooth_2.23-15
[85] rmarkdown_1.11 locfit_1.5-9.1 readxl_1.3.0
[88] git2r_0.24.0 digest_0.6.18 diptest_0.75-7
[91] webshot_0.5.1 stats4_3.4.0 munsell_0.5.0
[94] registry_0.5-1