Last updated: 2023-04-06

Checks: 6 1

Knit directory: ChromatinSplicingQTLs/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20191126) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 3b44dfd. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory
    Ignored:    code/.DS_Store
    Ignored:    code/.RData
    Ignored:    code/._report.html
    Ignored:    code/.ipynb_checkpoints/
    Ignored:    code/.snakemake/
    Ignored:    code/APA_Processing/
    Ignored:    code/Alignments/
    Ignored:    code/ChromHMM/
    Ignored:    code/ENCODE/
    Ignored:    code/ExpressionAnalysis/
    Ignored:    code/FastqFastp/
    Ignored:    code/FastqFastpSE/
    Ignored:    code/FastqSE/
    Ignored:    code/FineMapping/
    Ignored:    code/Genotypes/
    Ignored:    code/H3K36me3_CutAndTag.pdf
    Ignored:    code/IntronSlopes/
    Ignored:    code/LR.bed
    Ignored:    code/LR.seq.bed
    Ignored:    code/LongReads/
    Ignored:    code/Metaplots/
    Ignored:    code/Misc/
    Ignored:    code/MiscCountTables/
    Ignored:    code/Multiqc/
    Ignored:    code/Multiqc_chRNA/
    Ignored:    code/NonCodingRNA/
    Ignored:    code/NonCodingRNA_annotation/
    Ignored:    code/PairwisePi1Traits.P.all.txt.gz
    Ignored:    code/PeakCalling/
    Ignored:    code/Phenotypes/
    Ignored:    code/PlotGruberQTLs/
    Ignored:    code/PlotQTLs/
    Ignored:    code/ProCapAnalysis/
    Ignored:    code/QC/
    Ignored:    code/QTL_SNP_Enrichment/
    Ignored:    code/QTLs/
    Ignored:    code/RPKM_tables/
    Ignored:    code/ReadLengthMapExperiment/
    Ignored:    code/ReadLengthMapExperimentResults/
    Ignored:    code/ReadLengthMapExperimentSpliceCounts/
    Ignored:    code/ReferenceGenome/
    Ignored:    code/Rplots.pdf
    Ignored:    code/Session.vim
    Ignored:    code/SmallMolecule/
    Ignored:    code/SplicingAnalysis/
    Ignored:    code/TODO
    Ignored:    code/Tehranchi/
    Ignored:    code/bigwigs/
    Ignored:    code/bigwigs_FromNonWASPFilteredReads/
    Ignored:    code/config/.DS_Store
    Ignored:    code/config/._.DS_Store
    Ignored:    code/config/.ipynb_checkpoints/
    Ignored:    code/config/config.local.yaml
    Ignored:    code/dag.pdf
    Ignored:    code/dag.png
    Ignored:    code/dag.svg
    Ignored:    code/debug.ipynb
    Ignored:    code/debug_python.ipynb
    Ignored:    code/deepTools/
    Ignored:    code/featureCounts/
    Ignored:    code/featureCountsBasicGtf/
    Ignored:    code/gwas_summary_stats/
    Ignored:    code/hyprcoloc/
    Ignored:    code/igv_session.xml
    Ignored:    code/isoseqbams/
    Ignored:    code/log
    Ignored:    code/logs/
    Ignored:    code/notebooks/.ipynb_checkpoints/
    Ignored:    code/pi1/
    Ignored:    code/rules/.ipynb_checkpoints/
    Ignored:    code/rules/OldRules/
    Ignored:    code/rules/notebooks/
    Ignored:    code/scratch/
    Ignored:    code/scripts/.ipynb_checkpoints/
    Ignored:    code/scripts/GTFtools_0.8.0/
    Ignored:    code/scripts/__pycache__/
    Ignored:    code/scripts/liftOverBedpe/liftOverBedpe.py
    Ignored:    code/snakemake.dryrun.log
    Ignored:    code/snakemake.log
    Ignored:    code/snakemake.sbatch.log
    Ignored:    code/snakemake_profiles/slurm/__pycache__/
    Ignored:    code/test.introns.bed
    Ignored:    code/test.introns2.bed
    Ignored:    code/test.log
    Ignored:    code/tracks.xml
    Ignored:    data/.DS_Store
    Ignored:    data/._.DS_Store
    Ignored:    data/._20220414203249_JASPAR2022_combined_matrices_25818_jaspar.txt
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-10.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-11.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-2.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-3.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-4.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-5.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-6.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-7.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-8.csv
    Ignored:    data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022.csv
    Ignored:    data/Metaplots/.DS_Store

Untracked files:
    Untracked:  code/scripts/CombinePermutationPassSNPsIntoBed.R

Unstaged changes:
    Modified:   analysis/20230324_MakeFigs.Rmd
    Modified:   code/config/ColocRunWildcards.tsv
    Modified:   code/rules/GWAS_PrepForColoc.smk
    Modified:   code/rules/QTLTools.smk
    Modified:   code/rules/QTL_SNP_Enrichment.smk
    Modified:   code/scripts/GenometracksByGenotype
    Modified:   code/scripts/hyprcoloc_gwas2.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/20230324_MakeFigs.Rmd) and HTML (docs/20230324_MakeFigs.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 3b44dfd Benjmain Fair 2023-03-30 add fig nb
html 3b44dfd Benjmain Fair 2023-03-30 add fig nb
Rmd ddbf222 Benjmain Fair 2023-03-27 add pi1 heatmaps nb
html ddbf222 Benjmain Fair 2023-03-27 add pi1 heatmaps nb

knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(RColorBrewer)
library(data.table)

Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':

    between, first, last
The following object is masked from 'package:purrr':

    transpose
library(edgeR)
Loading required package: limma
# Set theme
theme_set(
  theme_classic() +
  theme(text=element_text(size=16,  family="Helvetica")))

# I use layer a lot, to rotate long x-axis labels
Rotate_x_labels <- theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))

#test plot
ggplot(mtcars, aes(x=mpg, y=cyl)) +
  geom_point()

Version Author Date
ddbf222 Benjmain Fair 2023-03-27
ExpectedP <- function(vec){
  L <- length(vec)
  Interval = 1/(L+1)
  return(seq(Interval, 1-Interval, Interval))
}

ExpectedP(c(0.1, 0.2, 0.3, 0.9))
[1] 0.2 0.4 0.6 0.8

Intro

I notebook to make figure panels for publication…

Figures to make:

  1. Placeholder, Nascent RNA reveals abundance of NMD-destined transcripts

  2. sQTLs, eQTLs

a. pi heatmap across promoter hQTLs, H3K36me3, chRNA, 4sU, polyA. Number of QTLs in margin

b. Enrichment of annotations in transcriptional vs post-transcriptional eQTLs. Classify transcriptional vs post-transcriptional by comparing hQTL to eQTL significance (or effect size).

c. QQ-plot of eQTL signal, grouped by hQTLs, sQTLs that affect productive junctions, and sQTLs that affect unproductive junctions

d. polyA sQTL beta vs polyA eQTL beta scatter, faceted by stable vs unstable.

e. Beta vs beta spearman slope with different lines for chRNA, 4sU, polyA, faceted by stable vs unstable

  1. Contribution of splicing-mediated decay to complex trait

a. A single GWAS trait (ie MS) QQ plot, colored by splicing-mediated eQTLs, transcription-mediated QTLs, productive-AS sQTLs

b. Heatmap of similar enrichment in A, but across many traits

c. Example locus showing colocalization eQTL/sQTL/GWAS scatter, and IGV screenshot of the splicing and eQTL effect

  1. The effect of splice-modifying drugs on gene expression

a. Cartoon showing inset with risdiplam bulge repair, along with dose:response trend showing the tested doses and the increasing response in GA|GT but not AG|GT splicing.

b. Scatter that shows splicing vs expression effect, faeted by productive unproductive. Or something like that

c. Something that shows the 4B effect is not in chRNA

d. An illustrative example effect in IGV.

Supplemental Figures to make

  1. Placeholder

  2. sQTLs, eQTLs

a. Total number of QTLs in each type of type of trait. Bargraph, with multiple lines for different FDR levels

b. Correlation of effect sizes across promoter hQTLs, H3K36me3, chRNA, 4sU, polyA

i. Need to select set of eQTLs to plot… Previous versions of this plot I tend to just pick colocalized things. That would be easiest.

c. sQTL QQ plot, with different colors for post-transcriptional eQTLs to transcriptional eQTLs (like what Carlos has done in red and black)

d. eQTL QQ plot, with different colors for H3K27AC QTLs and sQTLs

e. lead gene-wise sQTLs are not strongly enriched for eQTLs compared to control SNPs. eQTL QQ-plot with top gene-wise sQTL SNP, compared to some controls. Or do this with a pi1, and can use various thresholds

f. enrichment of eQTL signals is explained by both primary and non-primary eQTL signals (FigX). More precisely, we used the SuSiE finemapping approach which does not assume a single causal SNP for each expression trait, and can thus be used to finemap multiple eQTL signals per gene. We find that approximately x% of primary signals could be overlapped with sQTLs, compared to y% of primary signals that we could overlap to hQTLs. Furthermore, x% and y% of non primary signals could be mapped sQTLs and hQTLs, respectively

g. More extensive version of 2d, so we can explain the nuances in filtering out transcriptional effects, and there are some lingering non-NMD effects that are degraded in cytoplasm

  1. Contribution of splicing-mediated decay to complex trait

a. Colocalization framework cartoon

b. Version of 3b that uses colocalization. Shown as barplots similar to Phoenix’s paper

  1. The effect of splice-modifying drugs on gene expression

a. In total, we observe x% of observed non-canonincal GA-GT splice sites are activated with x criteria… ecdf of spearman correlation coefficient, for different groups of 5’ss.

Figure2

A

pi heatmap of gene-feature QTLs. Num QTLs in margin.

Only include hQTLs at promoter, chRNA, 4sU, and polyA.

library(qvalue)
library(viridis)

PeaksToTSS <- Sys.glob("../code/Misc/PeaksClosestToTSS/*_assigned.tsv.gz") %>%
  setNames(str_replace(., "../code/Misc/PeaksClosestToTSS/(.+?)_assigned.tsv.gz", "\\1")) %>%
  lapply(read_tsv) %>%
  bind_rows(.id="ChromatinMark") %>%
  mutate(GenePeakPair = paste(gene, peak, sep = ";")) %>%
  distinct(ChromatinMark, peak, gene, .keep_all=T)

PC1.filter <- c("Expression.Splicing", "MetabolicLabelled.30min", "MetabolicLabelled.60min", "H3K36ME3", "H3K27AC", "H3K4ME3", "H3K4ME1", "Expression.Splicing.Subset_YRI", "chRNA.Expression.Splicing")

dat <- fread("../code/pi1/PairwisePi1Traits.P.all.txt.gz") %>%
  filter((PC1 %in% PC1.filter) & (PC2 %in% PC1.filter))

dat$PC1 %>% unique()
[1] "Expression.Splicing"            "Expression.Splicing.Subset_YRI"
[3] "chRNA.Expression.Splicing"      "MetabolicLabelled.30min"       
[5] "MetabolicLabelled.60min"        "H3K27AC"                       
[7] "H3K4ME3"                        "H3K4ME1"                       
[9] "H3K36ME3"                      
PhenotypeAliases <- read_tsv("../data/Phenotypes_recode_for_Plotting.txt")

Many genes have multiple promoters that should all be accounted for as potential transcriptional effects. So calculating pi1 can be tricky. I want to ask the question, “What fraction of eQTLs have a promoter hQTL in at least one of the promoters?”… I have already solved this problem… For each gene, get lowest P value for across tests (ie all the promoters) and use that a a test statistic to come up with a new Pvalue, to calculate pi1. The test statistic distribution under the null can be approximate by drawing from uniform n times.

First we need to approximate that null using runif

MaxSampleSizeToCreateANull <- 250
NSamplesToEstimateDistribution <- 10000
NullSimulatedTestStats <- matrix(nrow=MaxSampleSizeToCreateANull, ncol=NSamplesToEstimateDistribution)
rownames(NullSimulatedTestStats) <- paste0("runif_samplesize", 1:MaxSampleSizeToCreateANull)
colnames(NullSimulatedTestStats) <- paste0("Sample_", 1:NSamplesToEstimateDistribution)


for (i in 1:MaxSampleSizeToCreateANull){
  SampleSizeFromUniform <- i
  SampledDat <- matrix(runif(SampleSizeFromUniform*NSamplesToEstimateDistribution), nrow=NSamplesToEstimateDistribution)
  SampleDatNullTestStatistics <- -log10(apply(SampledDat, 1, min))
  NullSimulatedTestStats[i,] <- SampleDatNullTestStatistics
}


NullSimulatedTestStats %>%
  as.data.frame() %>%
  rownames_to_column("runif_samplesize") %>%
  slice(1:250) %>%
  mutate(runif_samplesize = as.numeric(str_remove(runif_samplesize, "runif_samplesize"))) %>%
  gather(key="Sample", value="value", -runif_samplesize) %>%
  ggplot(aes(x=value, color=runif_samplesize, group=runif_samplesize)) +
  geom_density() +
  scale_color_viridis_c() +
  theme_bw() +
  labs(x="Test statistic under null\n-log10(MinP) across n draws from uniform", color="n")

Version Author Date
ddbf222 Benjmain Fair 2023-03-27
ecdf.functions <- apply(NullSimulatedTestStats, 1, ecdf)
ecdf.functions[[1]](1)
[1] 0.9034

Make a function to calculate pi1

# pi0est function breaks sometimes. this wrapper function returns 1 when that happens
CalculatePi1 <- function (dat.in, ...) {
  return(tryCatch(1-pi0est(dat.in$Pvals.For.Pi1, ...)$pi0, error=function(e) 1))
}

And now, recode peaks as promoters, and calculate test statistics.

Recode.PCs = c("ActivatingMark_HostGenePromoter"="Promoter marks*", "H3K36ME3"="H3K36ME3", "chRNA.Expression.Splicing"="chRNA", "MetabolicLabelled.30min"="4sU 30min", "MetabolicLabelled.60min"="4sU 60min", "Expression.Splicing.Subset_YRI"="polyA YRI", "Expression.Splicing"="polyA")

ActivatingMarksToConsider <- c("H3K27AC")

ProcessPi1HeatmapDat <- function(FDR.cutoff=0.1){
  dat.split <- dat %>%
  filter(FDR.x < FDR.cutoff) %>%
  mutate(PC1 = case_when(
    paste(GeneLocus, P1, sep=";") %in% PeaksToTSS$GenePeakPair & PC1 %in% ActivatingMarksToConsider ~ "ActivatingMark_HostGenePromoter",
    P1 %in% PeaksToTSS$peak ~ "ActivatingMark_OtherGenePromoter",
    PC1 %in% c("H3K4ME1", "H3K4ME3", "H3K27AC") ~ "ActivatingMark_Enhancer",
    TRUE ~ PC1
  )) %>%
  mutate(PC2 = case_when(
    paste(GeneLocus, P2, sep=";") %in% PeaksToTSS$GenePeakPair & PC2 %in% ActivatingMarksToConsider ~ "ActivatingMark_HostGenePromoter",
    P2 %in% PeaksToTSS$peak ~ "ActivatingMark_OtherGenePromoter",
    PC2 %in% c("H3K4ME1", "H3K4ME3", "H3K27AC") ~ "ActivatingMark_Enhancer",
    TRUE ~ PC2
  )) %>%
  group_by(PC1, P1, PC2) %>%
  mutate(test.stat.obs = -log10(min(trait.x.p.in.y))) %>%
  ungroup() %>%
  add_count(PC1, P1, PC2) %>%
  filter(n<=250) %>%
  group_by(PC1, PC2) %>%
  rowwise() %>%
  mutate(Pvals.For.Pi1 = 1-ecdf.functions[[n]](test.stat.obs)) %>%
  ungroup() %>%
  select(PC1, PC2, Pvals.For.Pi1) %>%
  filter(!PC1==PC2) %>%
  split(paste(.$PC1, .$PC2, sep = ";"))
dat.out <- lapply(dat.split, CalculatePi1, pi0.method="bootstrap") %>%
  unlist() %>%
  data.frame(pi1=.) %>%
  rownames_to_column("PC1_PC2") %>%
  separate(PC1_PC2, into=c("PC1", "PC2"), sep=';') %>%
  filter((PC1%in% c(PC1.filter, "ActivatingMark_HostGenePromoter") & (PC2%in% c(PC1.filter, "ActivatingMark_HostGenePromoter") ))) %>%
  mutate(TextColor = case_when(
    pi1 >= 0.7 ~ "white",
    TRUE ~ "black"
  )) %>%
mutate(PC1 = recode(PC1, !!!Recode.PCs)) %>%
mutate(PC2 = recode(PC2, !!!Recode.PCs)) %>%
mutate(PC1 = factor(PC1, levels=Recode.PCs)) %>%
mutate(PC2 = factor(PC2, levels=Recode.PCs))
P <- ggplot(dat.out, aes(x=PC1, y=PC2, fill=pi1, color=TextColor)) +
  geom_raster() +
  geom_text(aes(label=signif(pi1*100, 2))) +
  scale_color_identity() +
  scale_fill_viridis_c(option="A", direction = -1, limits=c(0,1)) +
  coord_flip() +
  scale_x_discrete(limits=rev) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(x="Discovery QTL phenotype", y="Phenotype assessed for overlap", caption=paste("*Promoter marks include", paste(ActivatingMarksToConsider, collapse=",")))
return(list(dat=dat, P=P))
}

ProcessPi1HeatmapDat()$P

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
ddbf222 Benjmain Fair 2023-03-27

Redo, but also consider both H3K27AC and H3K4ME3 at promoter…

ActivatingMarksToConsider <- c("H3K27AC", "H3K4ME3")

ProcessPi1HeatmapDat()$P

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
ddbf222 Benjmain Fair 2023-03-27

Redo but consider H3K4ME3, H3K4ME1, and H3K27AC

ActivatingMarksToConsider <- c("H3K27AC", "H3K4ME3", "H3K4ME1")

ProcessPi1HeatmapDat()$P

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
ddbf222 Benjmain Fair 2023-03-27

In theory things should only go up as you include H3K4ME1, but that is not the case. This phenomena is something I’ve noticed before, and I think it’s because adding non-overlapping phenotypes to promoter marks only modestly increases the lowest P-value, but more greatly can increase the n number of features used to determine the null.

So, let’s now include H3K4ME1. This is more thought of as an enhancer rather than promoter mark anyway.

So now let’s redo, but only include H3K27AC and H3K4ME3 but use lower FDR cutoffs.

ActivatingMarksToConsider <- c("H3K27AC", "H3K4ME3")

ProcessPi1HeatmapDat(FDR.cutoff = 0.05)$P

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
ddbf222 Benjmain Fair 2023-03-27
ProcessPi1HeatmapDat(FDR.cutoff = 0.01)$P

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
ddbf222 Benjmain Fair 2023-03-27

Ok, cool. I mean slightly different versions. I’ll leave it to Yang as to which version should be the main figure, and perhaps we can include alternate versions as supplements. In every case though I notice that the top right corner is dark, suggesting histone effects can be picked up pretty well in downstream assays (ie polyA), but the bottom left corner is light, because of post-transcriptional mechanisms of gene regulation.

  • save final prettier versions, with axes flipped (yang likes discovery on x-axis)
  • save alternate versions for supplement, with different FDR thresholds

B

Now I need a way to classify eQTLs as transcriptional or post-transcriptional. Carlos has previously used effect sizes in polyA vs chRNA. I may do something similar. First let’s some scatter plots of effect sizes

FDR.cutoff <- 0.1
ActivatingMarksToConsider <- c("H3K27AC", "H3K4ME3")

dat.recoded <- dat %>%
  filter(FDR.x < FDR.cutoff) %>%
  mutate(PC1 = case_when(
    paste(GeneLocus, P1, sep=";") %in% PeaksToTSS$GenePeakPair & PC1 %in% ActivatingMarksToConsider ~ PC1,
    P1 %in% PeaksToTSS$peak ~ "ActivatingMark_OtherGenePromoter",
    PC1 %in% c("H3K4ME1", "H3K4ME3", "H3K27AC") ~ "ActivatingMark_Enhancer",
    TRUE ~ PC1
  )) %>%
  mutate(PC2 = case_when(
    paste(GeneLocus, P2, sep=";") %in% PeaksToTSS$GenePeakPair & PC2 %in% ActivatingMarksToConsider ~ PC2,
    P1 %in% PeaksToTSS$peak ~ "ActivatingMark_OtherGenePromoter",
    PC2 %in% c("H3K4ME1", "H3K4ME3", "H3K27AC") ~ "ActivatingMark_Enhancer",
    TRUE ~ PC2
  ))

dat.recoded %>%
  filter(PC1 == "Expression.Splicing") %>%
  filter(!str_detect(PC2, "ActivatingMark_Enhancer")) %>%
  filter(!str_detect(PC2, "ActivatingMark_OtherGenePromoter")) %>%
  group_by(PC1, P1, PC2) %>%
  slice(which.min(trait.x.p.in.y)) %>%
  ungroup() %>%
  ggplot(aes(x=beta.x, y=x.beta.in.y)) +
  geom_point(alpha=0.1) +
  facet_wrap(~PC2) +
  labs(x="eQTL beta", y="facetQTL beta")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30

I think what I want to do is to first consider eQTLs in polyA, then perform one-sided test for significant differences in beta for corresponding feature in chRNA/promoterMarks. For simplicity, I think I can use Z-test, like this, but first, to check that this approach is reasonable, I think I’m going to check that Z-scores from random trait pairs closely follow normal distribution as they should under null. Also, I note that winners curse will bias things since I am conditioning on significant QTLs for polyA, then assessing difference in betas between polyA vs chRNA/promoter. But since only the polyA beta is subject to winners curse bias, it will turn out being overly convervative for the SNP enrichments I’m trying to highlight (because some transcriptional QTLs might erronesouly be classified as non-trancriptional) so i think that’s acceptable.

PC1.filter = c("Expression.Splicing", "Expression.Splicing.Subset_YRI")

dat.ZsFromAllTraitPairs <- dat.recoded %>%
  filter(PC1 %in% PC1.filter) %>%
  filter(!str_detect(PC2, "ActivatingMark_Enhancer")) %>%
  filter(!str_detect(PC2, "ActivatingMark_OtherGenePromoter")) %>%
  group_by(PC1, P1, PC2) %>%
  slice(which.min(trait.x.p.in.y)) %>%
  ungroup() %>%
  filter(!is.na(x.beta.in.y)) %>%
  distinct(PC1, P1, PC2, .keep_all=T) %>%
  mutate(Z = (abs(beta.x)-abs(x.beta.in.y)*sign(beta.x)*sign(x.beta.in.y))/sqrt(x.beta_se.in.y**2 + beta_se.x**2)) %>%
  mutate(OneSidedZTest.P = pnorm(Z, lower.tail=F)) %>%
  group_by(PC1, PC2) %>%
  mutate(q = qvalue(OneSidedZTest.P)$qvalues) %>%
  mutate(PolyASpecific = cut(q, breaks=c(0, 0.001, 0.01, 0.1, 1), include.lowest=T, ordered_result = T))


dat.ZsFromAllTraitPairs %>%
ggplot(aes(x=beta.x, y=x.beta.in.y, color=PolyASpecific)) +
  geom_point(alpha=0.1) +
  scale_color_viridis_d(direction=-1) +
  facet_grid(PC1~PC2) +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  labs(color="PolyA-specific FDR") +
  theme(legend.position="bottom")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
dat.ZsFromAllTraitPairs %>%
ggplot(aes(x=beta.x, y=x.beta.in.y, color=PolyASpecific=="[0,0.001]")) +
  geom_point(alpha=0.1) +
  facet_grid(PC1~PC2) +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  labs(color="PolyA-specific FDR < 0.001") +
  theme(legend.position="bottom")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
dat.ZsFromAllTraitPairs %>%
  count(PC1, PC2, PolyASpecific) %>%
  ggplot(aes(x=PC2, y=n, fill=fct_rev(PolyASpecific))) +
  geom_col(position="stack") +
  facet_wrap(~PC1) +
  scale_fill_viridis_d(direction=1) +
  Rotate_x_labels

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
dat.ZsFromAllTraitPairs %>%
  count(PC1, PC2, PolyASpecific) %>%
  ggplot(aes(x=PC2, y=n, fill=fct_rev(PolyASpecific))) +
  geom_col(position="fill") +
  facet_wrap(~PC1) +
  scale_fill_viridis_d(direction=1) +
  Rotate_x_labels

Version Author Date
3b44dfd Benjmain Fair 2023-03-30

Okay, keep in mind that there is both biology and power issues at play… Biologically, I would expect very few polyA-specific eQTLs when comparing to 4sU, yet there are basically the same amount of polyA-specific eQTLs in that comparison as when comparing to chRNA. One could chalk that up to the 4sU dataset being so underpowered, that a lot more eQTLs would be considered polyA specific. Similarly, that could explain why so many are polyA-specific when comparing to the relatively underpowered H3K36me3 dataset.

Let’s explore Carlos’ susie finemapping output

susie.finemap <- fread("/project2/yangili1/cfbuenabadn/ChromatinSplicingQTLs/code/FineMapping/susie_runs_Geuvadis/susie_output.tab")
 
susie.finemap %>%
  distinct(cs_gene, cs_names) %>%
  count(cs_gene) %>%
  ggplot(aes(x=n)) +
  geom_bar() +
  labs(y="Num genes", x="Num eQTL signals")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
susie.finemap %>%
  group_by(cs_gene, cs_names) %>%
  summarise(PIP.sum = sum(cs_pip)) %>%
  ungroup() %>%
  ggplot(aes(x=PIP.sum)) +
  stat_ecdf() +
  labs(x="Sum PIP in CI", title="These are 95% CIs") +
  coord_cartesian(xlim=c(0.9,1))

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
susie.finemap %>%
  count(cs_gene, cs_names) %>%
  ggplot(aes(x=n)) +
  stat_ecdf() +
  coord_cartesian(xlim=c(0,20)) +
  labs(x="Num SNPs in CI")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30

Ok great, next I will intersect the finemapped SNPs with annotations, and sum up the number of SNPs in each annotation (weighted by PIP), and compare transcriptional vs post-transcriptional eQTLs. I did some work in the snakemake to intersect Carlos’s finemap output with SNP annotations and now let’s explore that output.

finemappip.annotation.intersections <- fread("../code/QTL_SNP_Enrichment/FinemapIntersections_Susie/YRI.bed.gz", col.names=c("chrom", "start", "stop", "name", "pip","featChrom", "featStart", "featStop","Annotation", "Overlap")) %>%
  separate(name, into=c("snp", "eGene", "Signal"), sep = "_") %>%
  mutate(Annotation = str_replace_all(Annotation, "_0$", "_UnannotatedButObserved")) %>%
  mutate(Annotation = str_replace_all(Annotation, "_1$", "_Annotated")) %>%
  mutate(Annotation = str_replace_all(Annotation, "^\\d+?_", "ChromHMM:")) %>%
  filter(!is.na(Annotation)) %>%
  mutate(Annotation = if_else(Annotation==".", "No annotation", Annotation)) %>%
  filter(!str_detect(Annotation, "^ncRNA")) %>%
  filter(!Annotation=="No annotation")

transcriptional.PC2s <- c("chRNA.Expression.Splicing", "H3K27AC", "H3K4ME3")

transcriptional.eQTLs <- dat.ZsFromAllTraitPairs %>%
  filter(PC2 %in% transcriptional.PC2s) %>%
  mutate(IsPolyASpecific = if_else(q < 0.001, "PostTxn", "Txn"))

count(transcriptional.eQTLs,PC1, PC2, IsPolyASpecific)
# A tibble: 12 × 4
# Groups:   PC1, PC2 [6]
   PC1                            PC2                      IsPolyASpecific     n
   <chr>                          <chr>                    <chr>           <int>
 1 Expression.Splicing            H3K27AC                  PostTxn           523
 2 Expression.Splicing            H3K27AC                  Txn              7751
 3 Expression.Splicing            H3K4ME3                  PostTxn           473
 4 Expression.Splicing            H3K4ME3                  Txn              8447
 5 Expression.Splicing            chRNA.Expression.Splici… PostTxn           565
 6 Expression.Splicing            chRNA.Expression.Splici… Txn              9275
 7 Expression.Splicing.Subset_YRI H3K27AC                  PostTxn           760
 8 Expression.Splicing.Subset_YRI H3K27AC                  Txn              2610
 9 Expression.Splicing.Subset_YRI H3K4ME3                  PostTxn           643
10 Expression.Splicing.Subset_YRI H3K4ME3                  Txn              2935
11 Expression.Splicing.Subset_YRI chRNA.Expression.Splici… PostTxn           746
12 Expression.Splicing.Subset_YRI chRNA.Expression.Splici… Txn              3167
finemappip.annotation.intersections %>%
  filter(Signal == "L1") %>%
  left_join(transcriptional.eQTLs, by=c("eGene"="P1")) %>%
  filter(!is.na(IsPolyASpecific)) %>%
  group_by(Annotation, IsPolyASpecific, PC1, PC2) %>%
  summarise(SumInAnnotation = sum(pip, na.rm=T)) %>%
  ungroup() %>%
  group_by(IsPolyASpecific, PC1, PC2) %>%
  mutate(TotalPIP = sum(SumInAnnotation, na.rm=T)) %>%
  ungroup() %>%
  mutate(FractionPIP_in_Annotation = SumInAnnotation/TotalPIP ) %>%
  dplyr::select(FractionPIP_in_Annotation, PC1, PC2, IsPolyASpecific, Annotation) %>%
  pivot_wider(names_from = "IsPolyASpecific", values_from = c("FractionPIP_in_Annotation")) %>%
  mutate(Enrichment = log2(PostTxn/Txn)) %>%
  ggplot(aes(x=Annotation, y=Enrichment)) +
  geom_col() +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  coord_flip() +
  facet_grid(PC2~PC1) +
  labs(x = "Annotation", y="Finemapping posterior mass fold enrichment\n(Txn <-- --> PostTxn)")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
set.seed(0)
results <- list()
for (i in 1:100){
  print(paste("Bootstrap resample", i))
  transcriptional.eQTLs.sample <- transcriptional.eQTLs %>%
  group_by(PC1, PC2, IsPolyASpecific) %>%
  sample_frac(replace=T)
  results[[i]] <-
  finemappip.annotation.intersections %>%
  filter(Signal == "L1") %>%
  left_join(transcriptional.eQTLs.sample, by=c("eGene"="P1")) %>%
  filter(!is.na(IsPolyASpecific)) %>%
  group_by(Annotation, IsPolyASpecific, PC1, PC2) %>%
  summarise(SumInAnnotation = sum(pip, na.rm=T)) %>%
  ungroup() %>%
  group_by(IsPolyASpecific, PC1, PC2) %>%
  mutate(TotalPIP = sum(SumInAnnotation, na.rm=T)) %>%
  ungroup() %>%
  mutate(FractionPIP_in_Annotation = SumInAnnotation/TotalPIP ) %>%
  dplyr::select(FractionPIP_in_Annotation, PC1, PC2, IsPolyASpecific, Annotation) %>%
  pivot_wider(names_from = "IsPolyASpecific", values_from = c("FractionPIP_in_Annotation")) %>%
  mutate(Enrichment = log2(PostTxn/Txn)) %>%
  mutate(i=i)
}
[1] "Bootstrap resample 1"
[1] "Bootstrap resample 2"
[1] "Bootstrap resample 3"
[1] "Bootstrap resample 4"
[1] "Bootstrap resample 5"
[1] "Bootstrap resample 6"
[1] "Bootstrap resample 7"
[1] "Bootstrap resample 8"
[1] "Bootstrap resample 9"
[1] "Bootstrap resample 10"
[1] "Bootstrap resample 11"
[1] "Bootstrap resample 12"
[1] "Bootstrap resample 13"
[1] "Bootstrap resample 14"
[1] "Bootstrap resample 15"
[1] "Bootstrap resample 16"
[1] "Bootstrap resample 17"
[1] "Bootstrap resample 18"
[1] "Bootstrap resample 19"
[1] "Bootstrap resample 20"
[1] "Bootstrap resample 21"
[1] "Bootstrap resample 22"
[1] "Bootstrap resample 23"
[1] "Bootstrap resample 24"
[1] "Bootstrap resample 25"
[1] "Bootstrap resample 26"
[1] "Bootstrap resample 27"
[1] "Bootstrap resample 28"
[1] "Bootstrap resample 29"
[1] "Bootstrap resample 30"
[1] "Bootstrap resample 31"
[1] "Bootstrap resample 32"
[1] "Bootstrap resample 33"
[1] "Bootstrap resample 34"
[1] "Bootstrap resample 35"
[1] "Bootstrap resample 36"
[1] "Bootstrap resample 37"
[1] "Bootstrap resample 38"
[1] "Bootstrap resample 39"
[1] "Bootstrap resample 40"
[1] "Bootstrap resample 41"
[1] "Bootstrap resample 42"
[1] "Bootstrap resample 43"
[1] "Bootstrap resample 44"
[1] "Bootstrap resample 45"
[1] "Bootstrap resample 46"
[1] "Bootstrap resample 47"
[1] "Bootstrap resample 48"
[1] "Bootstrap resample 49"
[1] "Bootstrap resample 50"
[1] "Bootstrap resample 51"
[1] "Bootstrap resample 52"
[1] "Bootstrap resample 53"
[1] "Bootstrap resample 54"
[1] "Bootstrap resample 55"
[1] "Bootstrap resample 56"
[1] "Bootstrap resample 57"
[1] "Bootstrap resample 58"
[1] "Bootstrap resample 59"
[1] "Bootstrap resample 60"
[1] "Bootstrap resample 61"
[1] "Bootstrap resample 62"
[1] "Bootstrap resample 63"
[1] "Bootstrap resample 64"
[1] "Bootstrap resample 65"
[1] "Bootstrap resample 66"
[1] "Bootstrap resample 67"
[1] "Bootstrap resample 68"
[1] "Bootstrap resample 69"
[1] "Bootstrap resample 70"
[1] "Bootstrap resample 71"
[1] "Bootstrap resample 72"
[1] "Bootstrap resample 73"
[1] "Bootstrap resample 74"
[1] "Bootstrap resample 75"
[1] "Bootstrap resample 76"
[1] "Bootstrap resample 77"
[1] "Bootstrap resample 78"
[1] "Bootstrap resample 79"
[1] "Bootstrap resample 80"
[1] "Bootstrap resample 81"
[1] "Bootstrap resample 82"
[1] "Bootstrap resample 83"
[1] "Bootstrap resample 84"
[1] "Bootstrap resample 85"
[1] "Bootstrap resample 86"
[1] "Bootstrap resample 87"
[1] "Bootstrap resample 88"
[1] "Bootstrap resample 89"
[1] "Bootstrap resample 90"
[1] "Bootstrap resample 91"
[1] "Bootstrap resample 92"
[1] "Bootstrap resample 93"
[1] "Bootstrap resample 94"
[1] "Bootstrap resample 95"
[1] "Bootstrap resample 96"
[1] "Bootstrap resample 97"
[1] "Bootstrap resample 98"
[1] "Bootstrap resample 99"
[1] "Bootstrap resample 100"
finemappip.annotation.intersections$Annotation %>% unique()
 [1] "ChromHMM:Weak_Promoter"                        
 [2] "ChromHMM:Active_Promoter"                      
 [3] "ChromHMM:Heterochrom/lo"                       
 [4] "ChromHMM:Repressed"                            
 [5] "SpliceBranchpointRegion_UnannotatedButObserved"
 [6] "ChromHMM:Txn_Elongation"                       
 [7] "SpliceAcceptor_UnannotatedButObserved"         
 [8] "SpliceDonor_UnannotatedButObserved"            
 [9] "ChromHMM:Txn_Transition"                       
[10] "SpliceDonor_Annotated"                         
[11] "ChromHMM:Strong_Enhancer"                      
[12] "ChromHMM:Weak_Enhancer"                        
[13] "ChromHMM:Weak_Txn"                             
[14] "SpliceAcceptor_Annotated"                      
[15] "SpliceBranchpointRegion_Annotated"             
[16] "PAS_Region"                                    
[17] "ChromHMM:Repetitive/CNV"                       
[18] "ChromHMM:Poised_Promoter"                      
[19] "ChromHMM:Insulator"                            
[20] "miRNA_BS"                                      
bind_rows(results) %>%
  # filter(PC1 == "Expression.Splicing" & PC2 == "H3K27AC" & Annotation == "ChromHMM:Active_Promoter")
  group_by(Annotation, PC1, PC2) %>%
  summarise(mean = mean(Enrichment, na.rm=T),
            Percentile10 = quantile(Enrichment, probs = 0.05, na.rm=T),
            Percentile90 = quantile(Enrichment, probs = 0.95, na.rm=T)) %>%
  mutate(color = case_when(
    str_detect(Annotation, "^Splic") ~ "Splice region",
    str_detect(Annotation, "Enhancer") ~ "Enhancer",
    str_detect(Annotation, "Promoter") ~ "Promoter",
    str_detect(Annotation, "Txn") ~ "Transcribed",
    str_detect(Annotation, "ChromHMM") ~ "Other ChromHMM",
    TRUE ~ "Other"
  )) %>%
  ggplot(aes(x=Annotation, y=mean, color=color)) +
    geom_hline(yintercept = 0) +
    geom_point() +
    geom_errorbar(aes(ymin=Percentile10, ymax=Percentile90)) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
    coord_flip() +
    facet_grid(PC2~PC1) +
    labs(x = "Annotation", y="Finemapping posterior mass fold enrichment\n(ChromatinQTL|eQTL <-- --> sQTL)",
         caption = "Bootsrapped 90% confidence interval")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30

Hmm, the results aren’t quite what I was expecting, and they aren’t so consistent across the different ways to classify eQTLs as post txn vs txn (H3K27AC, H3K4ME3, chRNA). I wonder how consistent the classifications are…

transcriptional.eQTLs <- dat.ZsFromAllTraitPairs %>%
  filter(PC2 %in% transcriptional.PC2s) %>%
  mutate(IsPolyASpecific = if_else(q < 0.01, "PostTxn", "Txn"))

transcriptional.eQTLs %>%
  group_by(PC1) %>%
  add_count(P1) %>%
  filter(n==3) %>%
  ungroup() %>%
  mutate(IsPolyASpecific = IsPolyASpecific=="Txn") %>%
  group_by(P1, PC1) %>%
  summarise(TxnQTL_In_HowManyOf3 = sum(IsPolyASpecific)) %>%
  ggplot(aes(x=TxnQTL_In_HowManyOf3)) +
  geom_bar() +
  facet_wrap(~PC1) +
  labs("How many of the 3 methods is a gene classified as txn eQTL")

Version Author Date
3b44dfd Benjmain Fair 2023-03-30
transcriptional.eQTLs <- dat.ZsFromAllTraitPairs %>%
  filter(PC2 %in% transcriptional.PC2s) %>%
  mutate(IsPolyASpecific = if_else(q < 0.001, "PostTxn", "Txn"))

transcriptional.eQTLs %>%
  group_by(PC1) %>%
  add_count(P1) %>%
  filter(n==3) %>%
  ungroup() %>%
  mutate(IsPolyASpecific = IsPolyASpecific=="Txn") %>%
  group_by(P1, PC1) %>%
  summarise(TxnQTL_In_HowManyOf3 = sum(IsPolyASpecific)) %>%
  ggplot(aes(x=TxnQTL_In_HowManyOf3)) +
  geom_bar() +
  facet_wrap(~PC1) +
  labs("How many of the 3 methods is a gene classified as txn eQTL")

transcriptional.eQTLs <- dat.ZsFromAllTraitPairs %>%
  filter(PC2 %in% transcriptional.PC2s) %>%
  mutate(IsPolyASpecific = if_else(q < 0.0001, "PostTxn", "Txn"))

transcriptional.eQTLs %>%
  group_by(PC1) %>%
  add_count(P1) %>%
  filter(n==3) %>%
  ungroup() %>%
  mutate(IsPolyASpecific = IsPolyASpecific=="Txn") %>%
  group_by(P1, PC1) %>%
  summarise(TxnQTL_In_HowManyOf3 = sum(IsPolyASpecific)) %>%
  ggplot(aes(x=TxnQTL_In_HowManyOf3)) +
  geom_bar() +
  facet_wrap(~PC1) +
  labs("How many of the 3 methods is a gene classified as txn eQTL")

Ok so playing with the thresholds still doesn’t get a nice clean/consistent classification between the different methods… It’s not like there are a lot of genes that are in txn class in all 3 and in non-txn class in all 3. There are a lot of in-betweens. Let’s maybe just try reclassifying them as txn if they are significant any of the three…

transcriptional.eQTLs <- dat.ZsFromAllTraitPairs %>%
  filter(PC2 %in% transcriptional.PC2s) %>%
  group_by(PC1) %>%
  add_count(P1) %>%
  filter(n==3) %>%
  ungroup() %>%
  group_by(P1, PC1) %>%
  mutate(IsPolyASpecific = if_else(any(q < 0.001), "PostTxn", "Txn")) %>%
  ungroup()

count(transcriptional.eQTLs,PC1, PC2, IsPolyASpecific)
# A tibble: 12 × 4
   PC1                            PC2                      IsPolyASpecific     n
   <chr>                          <chr>                    <chr>           <int>
 1 Expression.Splicing            H3K27AC                  PostTxn          1027
 2 Expression.Splicing            H3K27AC                  Txn              7107
 3 Expression.Splicing            H3K4ME3                  PostTxn          1027
 4 Expression.Splicing            H3K4ME3                  Txn              7107
 5 Expression.Splicing            chRNA.Expression.Splici… PostTxn          1027
 6 Expression.Splicing            chRNA.Expression.Splici… Txn              7107
 7 Expression.Splicing.Subset_YRI H3K27AC                  PostTxn          1274
 8 Expression.Splicing.Subset_YRI H3K27AC                  Txn              2014
 9 Expression.Splicing.Subset_YRI H3K4ME3                  PostTxn          1274
10 Expression.Splicing.Subset_YRI H3K4ME3                  Txn              2014
11 Expression.Splicing.Subset_YRI chRNA.Expression.Splici… PostTxn          1274
12 Expression.Splicing.Subset_YRI chRNA.Expression.Splici… Txn              2014
set.seed(0)
results <- list()
for (i in 1:100){
  print(paste("Bootstrap resample", i))
  transcriptional.eQTLs.sample <- transcriptional.eQTLs %>%
  group_by(PC1, PC2, IsPolyASpecific) %>%
  sample_frac(replace=T)
  results[[i]] <-
  finemappip.annotation.intersections %>%
  filter(Signal == "L1") %>%
  inner_join(transcriptional.eQTLs.sample, by=c("eGene"="P1")) %>%
  filter(!is.na(IsPolyASpecific)) %>%
  group_by(Annotation, IsPolyASpecific, PC1, PC2) %>%
  summarise(SumInAnnotation = sum(pip, na.rm=T)) %>%
  ungroup() %>%
  group_by(IsPolyASpecific, PC1, PC2) %>%
  mutate(TotalPIP = sum(SumInAnnotation, na.rm=T)) %>%
  ungroup() %>%
  mutate(FractionPIP_in_Annotation = SumInAnnotation/TotalPIP ) %>%
  dplyr::select(FractionPIP_in_Annotation, PC1, PC2, IsPolyASpecific, Annotation) %>%
  pivot_wider(names_from = "IsPolyASpecific", values_from = c("FractionPIP_in_Annotation")) %>%
  mutate(Enrichment = log2(PostTxn/Txn)) %>%
  mutate(i=i)
}
[1] "Bootstrap resample 1"
[1] "Bootstrap resample 2"
[1] "Bootstrap resample 3"
[1] "Bootstrap resample 4"
[1] "Bootstrap resample 5"
[1] "Bootstrap resample 6"
[1] "Bootstrap resample 7"
[1] "Bootstrap resample 8"
[1] "Bootstrap resample 9"
[1] "Bootstrap resample 10"
[1] "Bootstrap resample 11"
[1] "Bootstrap resample 12"
[1] "Bootstrap resample 13"
[1] "Bootstrap resample 14"
[1] "Bootstrap resample 15"
[1] "Bootstrap resample 16"
[1] "Bootstrap resample 17"
[1] "Bootstrap resample 18"
[1] "Bootstrap resample 19"
[1] "Bootstrap resample 20"
[1] "Bootstrap resample 21"
[1] "Bootstrap resample 22"
[1] "Bootstrap resample 23"
[1] "Bootstrap resample 24"
[1] "Bootstrap resample 25"
[1] "Bootstrap resample 26"
[1] "Bootstrap resample 27"
[1] "Bootstrap resample 28"
[1] "Bootstrap resample 29"
[1] "Bootstrap resample 30"
[1] "Bootstrap resample 31"
[1] "Bootstrap resample 32"
[1] "Bootstrap resample 33"
[1] "Bootstrap resample 34"
[1] "Bootstrap resample 35"
[1] "Bootstrap resample 36"
[1] "Bootstrap resample 37"
[1] "Bootstrap resample 38"
[1] "Bootstrap resample 39"
[1] "Bootstrap resample 40"
[1] "Bootstrap resample 41"
[1] "Bootstrap resample 42"
[1] "Bootstrap resample 43"
[1] "Bootstrap resample 44"
[1] "Bootstrap resample 45"
[1] "Bootstrap resample 46"
[1] "Bootstrap resample 47"
[1] "Bootstrap resample 48"
[1] "Bootstrap resample 49"
[1] "Bootstrap resample 50"
[1] "Bootstrap resample 51"
[1] "Bootstrap resample 52"
[1] "Bootstrap resample 53"
[1] "Bootstrap resample 54"
[1] "Bootstrap resample 55"
[1] "Bootstrap resample 56"
[1] "Bootstrap resample 57"
[1] "Bootstrap resample 58"
[1] "Bootstrap resample 59"
[1] "Bootstrap resample 60"
[1] "Bootstrap resample 61"
[1] "Bootstrap resample 62"
[1] "Bootstrap resample 63"
[1] "Bootstrap resample 64"
[1] "Bootstrap resample 65"
[1] "Bootstrap resample 66"
[1] "Bootstrap resample 67"
[1] "Bootstrap resample 68"
[1] "Bootstrap resample 69"
[1] "Bootstrap resample 70"
[1] "Bootstrap resample 71"
[1] "Bootstrap resample 72"
[1] "Bootstrap resample 73"
[1] "Bootstrap resample 74"
[1] "Bootstrap resample 75"
[1] "Bootstrap resample 76"
[1] "Bootstrap resample 77"
[1] "Bootstrap resample 78"
[1] "Bootstrap resample 79"
[1] "Bootstrap resample 80"
[1] "Bootstrap resample 81"
[1] "Bootstrap resample 82"
[1] "Bootstrap resample 83"
[1] "Bootstrap resample 84"
[1] "Bootstrap resample 85"
[1] "Bootstrap resample 86"
[1] "Bootstrap resample 87"
[1] "Bootstrap resample 88"
[1] "Bootstrap resample 89"
[1] "Bootstrap resample 90"
[1] "Bootstrap resample 91"
[1] "Bootstrap resample 92"
[1] "Bootstrap resample 93"
[1] "Bootstrap resample 94"
[1] "Bootstrap resample 95"
[1] "Bootstrap resample 96"
[1] "Bootstrap resample 97"
[1] "Bootstrap resample 98"
[1] "Bootstrap resample 99"
[1] "Bootstrap resample 100"
bind_rows(results) %>%
  # filter(PC1 == "Expression.Splicing" & PC2 == "H3K27AC" & Annotation == "ChromHMM:Active_Promoter")
  group_by(Annotation, PC1, PC2) %>%
  summarise(mean = mean(Enrichment, na.rm=T),
            Percentile10 = quantile(Enrichment, probs = 0.05, na.rm=T),
            Percentile90 = quantile(Enrichment, probs = 0.95, na.rm=T)) %>%
  mutate(color = case_when(
    str_detect(Annotation, "^Splic") ~ "Splice region",
    str_detect(Annotation, "Enhancer") ~ "Enhancer",
    str_detect(Annotation, "Promoter") ~ "Promoter",
    str_detect(Annotation, "Txn") ~ "Transcribed",
    str_detect(Annotation, "ChromHMM") ~ "Other ChromHMM",
    TRUE ~ "Other"
  )) %>%
  ggplot(aes(x=Annotation, y=mean, color=color)) +
    geom_hline(yintercept = 0) +
    geom_point() +
    geom_errorbar(aes(ymin=Percentile10, ymax=Percentile90)) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
    coord_flip() +
    facet_grid(PC2~PC1) +
    labs(x = "Annotation", y="Finemapping posterior mass fold enrichment\n(ChromatinQTL|eQTL <-- --> sQTL)",
         caption = "Bootsrapped 90% confidence interval")

FigS2C

sQTL QQ plot, with different colors for post-transcriptional eQTLs to transcriptional eQTLs (like what Carlos has done in red and black)

transcriptional.PC2s <- c("chRNA.Expression.Splicing", "H3K27AC", "H3K4ME3")

transcriptional.eQTLs <- dat.ZsFromAllTraitPairs %>%
  filter(PC2 %in% transcriptional.PC2s) %>%
  mutate(IsPolyASpecific = q < 0.01 & trait.x.p.in.y>0.01 )

count(transcriptional.eQTLs, IsPolyASpecific, PC2)
# A tibble: 12 × 4
# Groups:   PC1, PC2 [6]
   PC1                            PC2                      IsPolyASpecific     n
   <chr>                          <chr>                    <lgl>           <int>
 1 Expression.Splicing            H3K27AC                  FALSE            7294
 2 Expression.Splicing            H3K27AC                  TRUE              980
 3 Expression.Splicing            H3K4ME3                  FALSE            8021
 4 Expression.Splicing            H3K4ME3                  TRUE              899
 5 Expression.Splicing            chRNA.Expression.Splici… FALSE            8943
 6 Expression.Splicing            chRNA.Expression.Splici… TRUE              897
 7 Expression.Splicing.Subset_YRI H3K27AC                  FALSE            1965
 8 Expression.Splicing.Subset_YRI H3K27AC                  TRUE             1405
 9 Expression.Splicing.Subset_YRI H3K4ME3                  FALSE            2217
10 Expression.Splicing.Subset_YRI H3K4ME3                  TRUE             1361
11 Expression.Splicing.Subset_YRI chRNA.Expression.Splici… FALSE            2326
12 Expression.Splicing.Subset_YRI chRNA.Expression.Splici… TRUE             1587
PC1.filter <- c("Expression.Splicing", "Expression.Splicing.Subset_YRI")

PC1.sQTL.filter <- c("polyA.Splicing", "polyA.Splicing.Subset_YRI", "chRNA.Splicing")


dat.forsQTLQQ <- fread("../code/pi1/PairwisePi1Traits.P.all.txt.gz") %>%
  filter((PC1 %in% PC1.filter) & (PC2 %in% PC1.sQTL.filter))

dat.forsQTLQQ %>%
  inner_join(
    transcriptional.eQTLs %>%
    dplyr::select(ClassificationPC = PC2, IsPolyASpecific, P1, PC1)
  ) %>%
  group_by(PC1, PC2, IsPolyASpecific, ClassificationPC ) %>%
  mutate(ExpectedP = percent_rank(trait.x.p.in.y)) %>%
  ungroup() %>%
  ggplot(aes(x=-log10(ExpectedP), y=-log10(trait.x.p.in.y), color=IsPolyASpecific)) +
  # geom_abline(slope=1, intercept=1)
  geom_point() +
  geom_abline(slope=1) +
  facet_grid(PC2~ClassificationPC, scales = "free") +
  labs(x="Expected -log10P", y="Observed -log10P", title="sQTL QQ for top eQTL SNPs", caption="Column facets: criteria to determine polyA-specific eQTLs\nRow facets: sQTL dataset")

Fig2c

The eQTL QQplot of stable sQTLs, unstable sQTLs, and H3K27AC QTLs

PC2.filter <- c("Expression.Splicing", "Expression.Splicing.Subset_YRI")
PC2.SignificanceFilter <- c("H3K27AC", "H3K4ME3", "H3K36ME3")
PC1.filter <- c("polyA.Splicing", "H3K27AC", "H3K4ME1", "H3K4ME3")
PC1.filter.Splicing <- PC1.filter[str_detect(PC1.filter, "Splicing")]
PC1.filter.NonSplicing <- PC1.filter[!str_detect(PC1.filter, "Splicing")]

dat.foreQTLQQ <- fread("../code/pi1/PairwisePi1Traits.P.all.txt.gz") %>%
  filter((PC1 %in% PC1.filter))

IntronAnnotatins <- read_tsv("../data/IntronAnnotationsFromYang.tsv.gz") %>%
  mutate(chrom = str_remove_all(chrom, "chr")) %>%
  mutate(Intron = paste(chrom, start, end, sep=":")) %>%
  filter(!str_detect(SuperAnnotation, "NoncodingGene"))

IntronAnnotatins$SuperAnnotation %>% unique()
[1] "AnnotatedJunc_UnproductiveCodingGene"  
[2] "UnannotatedJunc_UnproductiveCodingGene"
[3] "AnnotatedJunc_ProductiveCodingGene"    
[4] "UnannotatedJunc_ProductiveCodingGene"  
RandomTestSNPs <- paste0("../code/QTLs/QTLTools/", PC2.filter, "/NominalPassForColoc.RandomSamplePvals.txt.gz") %>%
  setNames(PC2.filter) %>%
  lapply(fread) %>%
  bind_rows(.id="PC2") %>%
  dplyr::select(trait.x.p.in.y=V1, PC2) %>%
  mutate(SNP_group="Random test SNPs") %>%
  group_by(PC2) %>%
  sample_n(50000)


dat.foreQTLQQ.sQTLs <- dat.foreQTLQQ %>%
    filter(PC1 %in% PC1.filter.Splicing) %>%
    group_by(PC1, P1) %>%
    filter(!any((PC2 %in% PC2.SignificanceFilter) & (trait.x.p.in.y < 0.01))) %>%
    ungroup() %>%
    filter(PC2 %in% PC2.filter) %>%
    separate(P1, into=c("Intron", "Cluster"), sep=":clu", remove=F) %>%
    inner_join(
      IntronAnnotatins %>%
        dplyr::select(Intron, SuperAnnotation),
      by="Intron") %>%
    group_by(PC1, PC2, Cluster) %>%
    mutate(SNP_group = case_when(
      all(str_detect(SuperAnnotation, "Productive")) ~ "Productive sQTL cluster",
      any(str_detect(SuperAnnotation, "Unproductive")) ~ "Unproductive sQTL cluster",
      TRUE ~ "sQTL Other"
    )) %>%
    # sample_n(1) %>%
    slice(which.min(p_permutation.x)) %>%
    ungroup()

bind_rows(
  dat.foreQTLQQ.sQTLs,
  dat.foreQTLQQ %>%
    filter(PC2 %in% PC2.filter) %>%
    filter(PC1 %in% PC1.filter.NonSplicing) %>%
    mutate(SNP_group = PC1),
  RandomTestSNPs
) %>%
  group_by(SNP_group, PC2) %>%
  mutate(MyRank = rank(trait.x.p.in.y, ties.method='random')) %>%
  mutate(ExpectedP = MyRank/(max(MyRank) + 1)) %>%
  ungroup() %>%
  ggplot(aes(x=-log10(ExpectedP), y=-log10(trait.x.p.in.y), color=SNP_group)) +
  geom_abline(slope=1, intercept=0) +
  geom_point() +
  facet_wrap(~PC2, scales="free") +
  labs(y="-log10(ObservedP)", caption="sQTL clusters containing any significant unproductive sQTL intron are grouped as Unproductive sQTL\nP value from only one significant sQTL intron from cluster selected at random from cluster\nsQTLs with nominal P<0.01 for H3K36me3, H3K27Ac, or H3K4me3 removed", title="eQTL QQ plot")

dat.foreQTLQQ.sQTLs %>%
  count(SNP_group, PC2)
# A tibble: 4 × 3
  SNP_group                 PC2                                n
  <chr>                     <chr>                          <int>
1 Productive sQTL cluster   Expression.Splicing             1472
2 Productive sQTL cluster   Expression.Splicing.Subset_YRI  1472
3 Unproductive sQTL cluster Expression.Splicing             1791
4 Unproductive sQTL cluster Expression.Splicing.Subset_YRI  1791
# pi0est function breaks sometimes. this wrapper function returns 1 when that happens
CalculatePi1_2 <- function (dat.in, ...) {
  return(tryCatch(1-pi0est(dat.in$trait.x.p.in.y, ...)$pi0, error=function(e) 1))
}

labels <- dat.foreQTLQQ.sQTLs %>%
  split(paste(.$SNP_group, .$PC2, sep=";")) %>%
  lapply(CalculatePi1_2, pi0.method="bootstrap") %>%
  unlist() %>%
  data.frame(pi1=.) %>%
  rownames_to_column("sQTLGroup_eQTLDataset") %>%
  separate(sQTLGroup_eQTLDataset, into=c("SNP_group", "PC2"), sep=";") %>%
  mutate(label=paste0("pi=", signif(pi1, digits=3)))

  
dat.foreQTLQQ.sQTLs %>%
  ggplot(aes(x=trait.x.p.in.y)) +
  geom_histogram() +
  geom_text(data = labels, aes(label=label), x=Inf, y=Inf, vjust=1, hjust=1) +
  facet_grid(SNP_group ~ PC2) +
  labs(x="P", title="eQTL P value historgram", caption="Vertical facets: eQTL dataset\nHorizontal facets: sQTL cluster type") +
  theme(strip.text = element_text(size = 8))

Fig3

Fig3, exploring connection to complex traits using gwas. There’s a few ways I did this. Firstly I took all molQTL top SNPs and got the corresponding gwas P-values, to compare to global gwas P-values and plot enrichment in QQplot form. I currently don’t have any good control sets of SNPs. Later I can come up with a control, and also come up with a systematic way to quantify enrichment of molQTLs for GWAS SNPs (ie lambda inflation factor, compared to control SNPs, or something like that) and think of a way to present these results not just for one GWAS but for many… The second way I am considering this problem is with hyprcoloc… I attempt to colocalize all molQTLs (chromatinQTLs, expressionQTLs, sQTLs, APA QTLs, etc) within a 1MB window centered around top SNP at each GWAS loci.

First let’s try to make the QQ plot.

gwas.trait.accession <- "IMSGC2019"

molQTL.gwas.P <- fread(paste0("../code/gwas_summary_stats/MolQTLIntersections/", gwas.trait.accession, ".bed.gz"), col.names = c("chrom", "varPos", "GWAS.P", "molQTL.name", "molQTL.P", "strand", "molQTL.beta", "molQTL.se", "molQTL.q", "varID", "overlap")) %>%
  separate(molQTL.name, into=c("PhenotypeClass", "MolPhenotypeName"), sep = ";")

gwas.P <- fread(paste0("../code/gwas_summary_stats/sorted_index_summarystat_hg38beds/", gwas.trait.accession, ".bed.gz"), select = 4, col.names = "GWAS.P") %>%
  mutate(PhenotypeClass = "All SNPs")

molQTL.gwas.P$PhenotypeClass %>% unique()
 [1] "APA_Nuclear"                      "H3K27AC"                         
 [3] "MetabolicLabelled.30min.Splicing" "H3K4ME1"                         
 [5] "polyA.Splicing"                   "polyA.Splicing.Subset_YRI"       
 [7] "MetabolicLabelled.60min.Splicing" "chRNA.Splicing"                  
 [9] "chRNA.Splicing.Order"             "CTCF"                            
[11] "APA_Total"                        "chRNA.Expression_ncRNA"          
[13] "DNaseISensitivity"                "H3K36ME3_ncRNA"                  
[15] "chRNA.IER"                        "polyA.IER"                       
[17] "H3K36ME3"                         "Expression.Splicing"             
[19] "MetabolicLabelled.30min"          "polyA.IER.Subset_YRI"            
[21] "ProCap"                           "H3K4ME3"                         
[23] "MetabolicLabelled.60min"          "chRNA.Expression.Splicing"       
[25] "Expression.Splicing.Subset_YRI"   "MetabolicLabelled.60min.IER"     
[27] "chRNA.Slopes"                     "MetabolicLabelled.30min.IER"     
PhenotypeClass.filter <- c("Expression.Splicing.Subset_YRI", "Expression.Splicing", "polyA.Splicing", "H3K27AC")



QQ.gwas <- bind_rows(
  molQTL.gwas.P %>%
    filter(PhenotypeClass %in% PhenotypeClass.filter) %>%
    filter(molQTL.q < 0.1) %>%
    dplyr::select(PhenotypeClass, GWAS.P),
  gwas.P %>%
    sample_n(1E5)
) %>%
dplyr::select(SNP_group = PhenotypeClass, GWAS.P) %>%
group_by(SNP_group) %>%
  mutate(MyRank = rank(GWAS.P, ties.method='random')) %>%
  mutate(ExpectedP = MyRank/(max(MyRank) + 1)) %>%
  ungroup() %>%
  mutate(SNP_group = relevel(factor(SNP_group), "All SNPs")) %>%
  arrange(SNP_group) %>%
  ggplot(aes(x=-log10(ExpectedP), y=-log10(GWAS.P), color=SNP_group)) +
  geom_abline(slope=1, intercept=0) +
  geom_point() +
  labs(y="-log10(ObservedP)", title="GWAS QQ plot", caption="GWAS SNPs sub-sampled to 100K for plotting speed")
QQ.gwas

QQ.gwas +
  coord_cartesian(ylim=c(0,20))

Ok, it is nice to see some enrichment, though just plotting different axis limits can really visually change the story. I will need to be careful here. Let’s distinguish the sQTLs now, between productive and non-productive…

# sQTL.permutationPass.Classified <- read_delim("../code/QTLs/QTLTools/polyA.Splicing/PermutationPass.FDR_Added.txt.gz", delim=' ') %>%
#   # head() %>%
#   separate(phe_id, into=c("intron", "cluster"), sep=":clu", remove=F) %>%
#   mutate(IntronName = str_replace(phe_id, "^(.+?:.+?:.+?:)clu_.+?_([+-])$", "chr\\1\\2")) %>%
#   inner_join(Intron.Annotations, by="IntronName") %>%
#   filter(q < 0.1) %>%
#   group_by(cluster) %>%
#   mutate(SNP_group = case_when(
#       all(str_detect(SuperAnnotation, "Productive")) ~ "Productive sQTL cluster",
#       any(str_detect(SuperAnnotation, "Unproductive")) ~ "Unproductive sQTL cluster",
#       TRUE ~ "sQTL Other"
#     )) %>%
#     slice(which.min(nom_pval)) %>%
#     ungroup()
# 
# sQTL.permutationPass.Classified %>%
#   count(SNP_group)

SplicingPhenotypeFilter <- PhenotypeClass.filter[str_detect(PhenotypeClass.filter, "Splicing") & !str_detect(PhenotypeClass.filter, "Expression")]

 

QQ.gwas <- bind_rows(
  molQTL.gwas.P %>%
    filter((PhenotypeClass %in% PhenotypeClass.filter) & !(PhenotypeClass %in% SplicingPhenotypeFilter)) %>%
    filter(molQTL.q < 0.1) %>%
    dplyr::select(PhenotypeClass, GWAS.P),
  molQTL.gwas.P %>%
    filter(PhenotypeClass %in% SplicingPhenotypeFilter) %>%
    filter(molQTL.q < 0.1) %>%
    dplyr::select(GWAS.P, MolPhenotypeName) %>%
    inner_join(
      dat.foreQTLQQ.sQTLs %>%
        dplyr::select(MolPhenotypeName=P1, PhenotypeClass=SNP_group) %>%
        distinct(),
      by=c("MolPhenotypeName")) %>%
    filter(!PhenotypeClass=="sQTL Other"),
  gwas.P %>%
    sample_n(1E5)
) %>%
dplyr::select(SNP_group = PhenotypeClass, GWAS.P) %>%
group_by(SNP_group) %>%
  mutate(MyRank = rank(GWAS.P, ties.method='random')) %>%
  mutate(ExpectedP = MyRank/(max(MyRank) + 1)) %>%
  ungroup() %>%
  mutate(SNP_group = relevel(factor(SNP_group), "All SNPs")) %>%
  arrange(SNP_group) %>%
  ggplot(aes(x=-log10(ExpectedP), y=-log10(GWAS.P), color=SNP_group)) +
  geom_abline(slope=1, intercept=0) +
  geom_point() +
  labs(y="-log10(ObservedP)", title="GWAS QQ plot", caption="GWAS SNPs sub-sampled to 100K for plotting speed")
QQ.gwas

QQ.gwas +
  coord_cartesian(ylim=c(0,20))

So from this the enrichment is similar whether looking at productive or unproductive sQTLs. Now let me go back and add some control SNPs. One easy control to include would be like what Yang did in his 2016 paper: Include molQTL test SNPs within peaks (or within introns) may work for sQTLs. Also, no need to include the YRI subset anymore. I was just including it in my exploratory analysis but it is unnecessary/distracting for a main figure. Let’s also take that out…

ControlSNPs <- fread(paste0("../code/gwas_summary_stats/MolQTLIntersections_ControlSNPs/", gwas.trait.accession, "/ALL.txt.gz"), col.names=c("GWAS.P", "PhenotypeClassControl"))

ControlSNPs$PhenotypeClassControl %>% unique()
 [1] "Expression.Splicing"              "Expression.Splicing.Subset_YRI"  
 [3] "chRNA.Expression.Splicing"        "MetabolicLabelled.30min"         
 [5] "MetabolicLabelled.60min"          "CTCF"                            
 [7] "H3K27AC"                          "H3K4ME3"                         
 [9] "H3K4ME1"                          "H3K36ME3"                        
[11] "H3K36ME3_ncRNA"                   "ProCap"                          
[13] "polyA.Splicing"                   "polyA.Splicing.Subset_YRI"       
[15] "chRNA.Splicing"                   "MetabolicLabelled.30min.Splicing"
[17] "MetabolicLabelled.60min.Splicing" "chRNA.Expression_ncRNA"          
[19] "APA_Nuclear"                      "APA_Total"                       
[21] "polyA.IER"                        "polyA.IER.Subset_YRI"            
[23] "chRNA.IER"                        "MetabolicLabelled.30min.IER"     
[25] "MetabolicLabelled.60min.IER"      "chRNA.Slopes"                    
[27] "chRNA.Splicing.Order"             "DNaseISensitivity"               
QQ.gwas <- bind_rows(
  molQTL.gwas.P %>%
    filter((PhenotypeClass %in% PhenotypeClass.filter) & !(PhenotypeClass %in% SplicingPhenotypeFilter)) %>%
    filter(!PhenotypeClass == "Expression.Splicing.Subset_YRI") %>%
    filter(molQTL.q < 0.1) %>%
    dplyr::select(PhenotypeClass, GWAS.P),
  molQTL.gwas.P %>%
    filter(PhenotypeClass %in% SplicingPhenotypeFilter) %>%
    filter(molQTL.q < 0.1) %>%
    dplyr::select(GWAS.P, MolPhenotypeName) %>%
    inner_join(
      dat.foreQTLQQ.sQTLs %>%
        dplyr::select(MolPhenotypeName=P1, PhenotypeClass=SNP_group) %>%
        distinct(),
      by=c("MolPhenotypeName")) %>%
    filter(!PhenotypeClass=="sQTL Other"),
  ControlSNPs %>%
    filter(PhenotypeClassControl %in% c("polyA.Splicing", "H3K27AC", "Expression.Splicing")) %>%
    mutate(PhenotypeClass = recode(PhenotypeClassControl, "polyA.Splicing"="Intronic SNPS", "H3K27AC"="H3K27AC peak SNPs", "Expression.Splicing"="genic SNPs")) %>%
    dplyr::select(-PhenotypeClassControl),
  gwas.P %>%
    sample_n(1E5)
) %>%
dplyr::select(SNP_group = PhenotypeClass, GWAS.P) %>%
group_by(SNP_group) %>%
  mutate(MyRank = rank(GWAS.P, ties.method='random')) %>%
  mutate(ExpectedP = MyRank/(max(MyRank) + 1)) %>%
  ungroup() %>%
  mutate(SNP_group = relevel(factor(SNP_group), "All SNPs")) %>%
  arrange(SNP_group) %>%
  ggplot(aes(x=-log10(ExpectedP), y=-log10(GWAS.P), color=SNP_group)) +
  geom_abline(slope=1, intercept=0) +
  geom_point() +
  labs(y="-log10(ObservedP)", title="GWAS QQ plot", caption="GWAS SNPs sub-sampled to 100K for plotting speed")
QQ.gwas

QQ.gwas +
  coord_cartesian(ylim=c(0,20))

Hmm, ok there seems to be subtle shift compared to the control SNPs. These may not be the best controls. Will have to think about this more later.

Now Let’s expore the hypercoloc results.

Note that I did the hyprcoloc a number of different ways, each with different sets of molQTLs attempted… Let’s explore the most inclusive hyprcoloc iteration.

gwas.traits <- read_tsv("../code/config/gwas_table.tsv")


hyprcoloc.results <- read_tsv("../code/hyprcoloc/Results/ForGWASColoc/GWASColoc_ChromatinAPAAndRNA/results.txt.gz") %>%
  dplyr::rename(GWAS.Loci = GWASLeadSnpChrom_Pos_RefAllele_AltAllele_rsID_trait) %>%
  separate(GWAS.Loci, into=c("GWAS.LeadSNP.Chrom", "GWAS.LeadSNP.Pos", "GWAS.accession"), sep="_", remove=F) %>%
  separate_rows(ColocalizedTraits, sep = ",") %>%
  mutate(IsColocalizedWithSomething = !ColocalizedTraits == "None") %>%
  mutate(Trait = if_else(IsColocalizedWithSomething, ColocalizedTraits, DroppedTrait)) %>%
  dplyr::select(-DroppedTrait, -ColocalizedTraits) %>%
  mutate(Trait = if_else(Trait == GWAS.Loci, paste("GWAS",GWAS.Loci, sep=";"),Trait)) %>%
  separate(Trait, into=c("PhenotypeClass", "Phenotype"), sep=";")


# hyprcoloc.results %>%
#   distinct(GWAS.Loci, .keep_all=T) %>%
#   count(GWAS.accession)

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] viridis_0.6.2      viridisLite_0.4.0  qvalue_2.28.0      edgeR_3.38.4      
 [5] limma_3.52.4       data.table_1.14.2  RColorBrewer_1.1-3 forcats_0.5.1     
 [9] stringr_1.4.0      dplyr_1.0.9        purrr_0.3.4        readr_2.1.2       
[13] tidyr_1.2.0        tibble_3.1.7       ggplot2_3.3.6      tidyverse_1.3.1   

loaded via a namespace (and not attached):
 [1] fs_1.5.2          lubridate_1.8.0   bit64_4.0.5       httr_1.4.3       
 [5] rprojroot_2.0.3   tools_4.2.0       backports_1.4.1   bslib_0.3.1      
 [9] utf8_1.2.2        R6_2.5.1          DBI_1.1.2         colorspace_2.0-3 
[13] withr_2.5.0       tidyselect_1.1.2  gridExtra_2.3     bit_4.0.4        
[17] compiler_4.2.0    git2r_0.30.1      cli_3.3.0         rvest_1.0.2      
[21] xml2_1.3.3        labeling_0.4.2    sass_0.4.1        scales_1.2.0     
[25] digest_0.6.29     rmarkdown_2.14    R.utils_2.11.0    pkgconfig_2.0.3  
[29] htmltools_0.5.2   dbplyr_2.1.1      fastmap_1.1.0     highr_0.9        
[33] rlang_1.0.2       readxl_1.4.0      rstudioapi_0.13   jquerylib_0.1.4  
[37] farver_2.1.0      generics_0.1.2    jsonlite_1.8.0    vroom_1.5.7      
[41] R.oo_1.24.0       magrittr_2.0.3    Rcpp_1.0.8.3      munsell_0.5.0    
[45] fansi_1.0.3       lifecycle_1.0.1   R.methodsS3_1.8.1 stringi_1.7.6    
[49] whisker_0.4       yaml_2.3.5        plyr_1.8.7        grid_4.2.0       
[53] parallel_4.2.0    promises_1.2.0.1  crayon_1.5.1      lattice_0.20-45  
[57] haven_2.5.0       splines_4.2.0     hms_1.1.1         locfit_1.5-9.7   
[61] knitr_1.39        pillar_1.7.0      reshape2_1.4.4    reprex_2.0.1     
[65] glue_1.6.2        evaluate_0.15     modelr_0.1.8      vctrs_0.4.1      
[69] tzdb_0.3.0        httpuv_1.6.5      cellranger_1.1.0  gtable_0.3.0     
[73] assertthat_0.2.1  xfun_0.30         broom_0.8.0       later_1.3.0      
[77] workflowr_1.7.0   ellipsis_0.3.2