Explore GWAS colocalizations

Last updated: 2022-01-18

Checks: 6 1

Knit directory: ChromatinSplicingQTLs/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20191126)

The command set.seed(20191126) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: c7ac752

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version c7ac752. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    ._.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    code/.DS_Store
    Ignored:    code/.Rhistory
    Ignored:    code/._.DS_Store
    Ignored:    code/._README.md
    Ignored:    code/._dag.pdf
    Ignored:    code/._report.html
    Ignored:    code/.ipynb_checkpoints/
    Ignored:    code/.snakemake/
    Ignored:    code/Alignments/
    Ignored:    code/ENCODE/
    Ignored:    code/ExpressionAnalysis/
    Ignored:    code/Fastq/
    Ignored:    code/FastqFastp/
    Ignored:    code/FastqFastpSE/
    Ignored:    code/FastqSE/
    Ignored:    code/Gather_gwas_summary_stats/
    Ignored:    code/Genotypes/
    Ignored:    code/Multiqc/
    Ignored:    code/Multiqc_chRNA/
    Ignored:    code/PeakCalling/
    Ignored:    code/Phenotypes/
    Ignored:    code/PlotGruberQTLs/
    Ignored:    code/ProCapAnalysis/
    Ignored:    code/QC/
    Ignored:    code/QTLs/
    Ignored:    code/ReferenceGenome/
    Ignored:    code/Session.vim
    Ignored:    code/SplicingAnalysis/
    Ignored:    code/TODO
    Ignored:    code/bigwigs/
    Ignored:    code/bigwigs_FromNonWASPFilteredReads/
    Ignored:    code/config/.DS_Store
    Ignored:    code/config/._.DS_Store
    Ignored:    code/config/ExternalFastqDataAccessions/
    Ignored:    code/config/OldSamplesConfig/
    Ignored:    code/dag.pdf
    Ignored:    code/featureCounts/
    Ignored:    code/gwas_summary_stats/
    Ignored:    code/hyprcoloc/
    Ignored:    code/logs/
    Ignored:    code/notebooks/.ipynb_checkpoints/
    Ignored:    code/out.hap.ld
    Ignored:    code/out.log
    Ignored:    code/report.html
    Ignored:    code/rules/OldRules/
    Ignored:    code/scratch/
    Ignored:    code/scripts/GTFtools_0.8.0/
    Ignored:    code/scripts/__pycache__/
    Ignored:    code/scripts/liftOverBedpe/liftOverBedpe.py
    Ignored:    code/snakemake.log
    Ignored:    code/snakemake.sbatch.log
    Ignored:    data/._PRJEB1350_RunTable.Ding_etal_CTCF.txt
    Ignored:    data/._igsr_samples.tsv
    Ignored:    data/._list_gwas_summary_statistics_PMID27863252.csv
    Ignored:    data/GrowthNotes/._20210830_GrowthNotes_chRNA.ConcentrationsToReplate.txt

Untracked files:
    Untracked:  analysis/20220114_ColocalizationEffectSizeCorrelations.Rmd
    Untracked:  code/scripts/TidyGenewiseColocs.R
    Untracked:  code/snakemake_profiles/slurm/__pycache__/
    Untracked:  output/hyprcoloc_results/ForColoc/hyprcoloc.results.OnlyColocalized.Stats.txt.gz

Unstaged changes:
    Modified:   code/Snakefile
    Modified:   code/rules/Coloc.smk
    Modified:   code/scripts/GenometracksByGenotype
    Modified:   code/scripts/hyprcoloc_genewise.R
    Modified:   output/hyprcoloc_results/ForColoc/hyprcoloc.results.txt.gz

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

Introduction

I have previously colocalized molecular traits gene-wise using hyprcoloc and saved the important summary stats, including beta values for each molecular trait for the top SNP for the hyprcoloc cluster, to file. Here I aim to check how those beta values correlate among different trait types for colocalized traits. For example, among H3K27Ac QTL peaks that colocalize with gene expression QTLs in polyA RNA-seq, are the beta values correlated. Since the paridigm is that H2K27Ac enahncers are activating, such that increase in enhancer signal would result in increase in target gene expression, I expect a generally positive correlation. More interstingly will be to check these kinds of correlations for splicing phenotypes. There are some reports that splicing (via the spliceosome) activates H3K4me3 writers nearby for example (EMATS mechanism), though Li et al did not see enrichment of sQTLs among H3K4me3 QTLs. So it would be interseting to see if we do or do not see these correlations among colocalization traits.

library(tidyverse)

── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──

✓ ggplot2 3.3.3     ✓ purrr   0.3.4
✓ tibble  2.1.3     ✓ dplyr   0.8.3
✓ tidyr   1.1.0     ✓ stringr 1.4.0
✓ readr   1.3.1     ✓ forcats 0.4.0

── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

library(gplots)


Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

dat <- read_tsv("../output/hyprcoloc_results/ForColoc/hyprcoloc.results.OnlyColocalized.Stats.txt.gz") %>%
  select(-iteration) %>%
  unite(Locus_snp, Locus, snp) %>%
  mutate(pheno_class = str_replace(phenotype_full, "(.+?);.+$", "\\1"))

Parsed with column specification:
cols(
  snp = col_character(),
  beta = col_double(),
  beta_se = col_double(),
  p = col_double(),
  Locus = col_character(),
  phenotype_full = col_character(),
  iteration = col_double(),
  ColocPr = col_double(),
  RegionalPr = col_double(),
  TopSNPFinemapPr = col_double()
)

head(dat)

# A tibble: 6 x 9
  Locus_snp   beta beta_se       p phenotype_full ColocPr RegionalPr
  <chr>      <dbl>   <dbl>   <dbl> <chr>            <dbl>      <dbl>
1 ENSG0000…  0.708  0.110  1.74e-8 MetabolicLabe…   0.838      0.962
2 ENSG0000…  0.580  0.114  3.59e-6 MetabolicLabe…   0.838      0.962
3 ENSG0000… -0.549  0.0880 2.95e-8 H3K27AC;H3K27…   0.377      0.690
4 ENSG0000… -0.538  0.128  7.30e-5 H3K27AC;H3K27…   0.377      0.690
5 ENSG0000… -0.605  0.104  1.75e-7 H3K27AC;H3K27…   0.377      0.690
6 ENSG0000… -0.455  0.121  3.57e-4 H3K4ME3;H3K4M…   0.377      0.690
# … with 2 more variables: TopSNPFinemapPr <dbl>, pheno_class <chr>

Ok, now lets do some data tidy-ing to match up all pairs of within-cluster colocalized traits and their beta values

dat.tidy.for.cor <- dat %>%
  # filter(Locus_snp == "ENSG00000002822.15_7:1998126:G:A") %>%
  left_join(., ., by = c("Locus_snp", "ColocPr", "RegionalPr", "TopSNPFinemapPr")) %>% 
  filter(phenotype_full.x != phenotype_full.y) %>% 
  rowwise() %>%
  mutate(name = toString(sort(c(phenotype_full.x,phenotype_full.y)))) %>% 
  distinct(Locus_snp, name, .keep_all = T) %>%
  mutate(pheno_class_name = toString(sort(c(pheno_class.x,pheno_class.y))))


dat.tidy.for.cor %>%
  pull(pheno_class_name) %>% unique()

 [1] "MetabolicLabelled.30min, MetabolicLabelled.60min"         
 [2] "H3K27AC, H3K27AC"                                         
 [3] "H3K27AC, H3K4ME3"                                         
 [4] "H3K4ME3, H3K4ME3"                                         
 [5] "Expression.Splicing, MetabolicLabelled.30min"             
 [6] "Expression.Splicing, Expression.Splicing.Subset_YRI"      
 [7] "Expression.Splicing.Subset_YRI, MetabolicLabelled.30min"  
 [8] "Expression.Splicing, H3K27AC"                             
 [9] "Expression.Splicing, H3K4ME3"                             
[10] "Expression.Splicing, polyA.Splicing.Subset_YRI"           
[11] "Expression.Splicing.Subset_YRI, H3K27AC"                  
[12] "H3K27AC, polyA.Splicing.Subset_YRI"                       
[13] "Expression.Splicing.Subset_YRI, H3K4ME3"                  
[14] "H3K4ME3, polyA.Splicing.Subset_YRI"                       
[15] "Expression.Splicing.Subset_YRI, polyA.Splicing.Subset_YRI"
[16] "polyA.Splicing.Subset_YRI, polyA.Splicing.Subset_YRI"     
[17] "H3K4ME3, MetabolicLabelled.30min"                         
[18] "MetabolicLabelled.30min, polyA.Splicing.Subset_YRI"       
[19] "Expression.Splicing, MetabolicLabelled.60min"             
[20] "H3K4ME3, MetabolicLabelled.60min"                         
[21] "H3K27AC, MetabolicLabelled.30min"                         
[22] "H3K27AC, MetabolicLabelled.60min"                         
[23] "Expression.Splicing.Subset_YRI, MetabolicLabelled.60min"  
[24] "MetabolicLabelled.60min, polyA.Splicing.Subset_YRI"       
[25] "chRNA.Expression.Splicing, H3K27AC"                       
[26] "chRNA.Expression.Splicing, H3K4ME3"                       
[27] "chRNA.Expression.Splicing, MetabolicLabelled.30min"       
[28] "chRNA.Expression.Splicing, polyA.Splicing.Subset_YRI"     
[29] "chRNA.Expression.Splicing, Expression.Splicing"           
[30] "chRNA.Expression.Splicing, MetabolicLabelled.60min"       
[31] "chRNA.Expression.Splicing, Expression.Splicing.Subset_YRI"

Ok, now let’s plot the correlation of beta values for certain classes of phenotypes (eg Expression QTLs and H3K27Ac QTLs), and more generally for all pairs of phenotype classes

dat.tidy.for.cor %>%
  filter(pheno_class_name == "Expression.Splicing, H3K27AC") %>%
  ggplot(aes(x=beta.x, y=beta.y)) +
  geom_point() +
  facet_wrap(~pheno_class_name) +
  theme_classic()

dat.tidy.for.cor %>%
  ggplot(aes(x=beta.x, y=beta.y)) +
  geom_point(alpha=0.05) +
  facet_wrap(~pheno_class_name) +
  theme_classic()

Ok wow. That is nice. H3K27Ac beta values are very strongly positively correlated with eQTL betas, as expected. We can look at other pairs of traits too, and if there is an effect, I think this is a very powerful way to assess it. But unsurprisingly, the way we have quantified splicing here with leafcutter, splicing is not really obviously correlated with anything, since a leafcutter intron excision ratio PSI that goes up for one intron necessarily means some other introns went down. Perhaps if we looked at intron retention, the effects would be more interpretable. That will be on the future to do list once we get better chRNA-seq data.

Also, let’s replot these as a clustered heatmap for matrix of correlation coefficients.

library(RColorBrewer)
# my_palette <- colorRampPalette(c("blue", "black", "yellow"))(n = 1000)

dat.tidy.for.cor %>%
  group_by(pheno_class_name) %>%
  summarise(cor = cor.test(beta.x, beta.y, method="pearson")$p.value) %>%
  separate(pheno_class_name, into=c("PhenotypeClass1", "PhenotypeClass2"), sep = ", ")

# A tibble: 31 x 3
   PhenotypeClass1           PhenotypeClass2                      cor
   <chr>                     <chr>                              <dbl>
 1 chRNA.Expression.Splicing Expression.Splicing            1.27e-  5
 2 chRNA.Expression.Splicing Expression.Splicing.Subset_YRI 3.07e-  5
 3 chRNA.Expression.Splicing H3K27AC                        2.18e-  6
 4 chRNA.Expression.Splicing H3K4ME3                        1.05e-  3
 5 chRNA.Expression.Splicing MetabolicLabelled.30min        1.79e-  8
 6 chRNA.Expression.Splicing MetabolicLabelled.60min        3.96e-  7
 7 chRNA.Expression.Splicing polyA.Splicing.Subset_YRI      6.29e-  1
 8 Expression.Splicing       Expression.Splicing.Subset_YRI 0.       
 9 Expression.Splicing       H3K27AC                        0.       
10 Expression.Splicing       H3K4ME3                        2.47e-177
# … with 21 more rows

dat.cor <- dat.tidy.for.cor %>%
  group_by(pheno_class_name) %>%
  summarise(cor = cor(beta.x, beta.y, method="spearman")) %>%
  separate(pheno_class_name, into=c("PhenotypeClass1", "PhenotypeClass2"), sep = ", ")
  
dat.cor %>%
  rename(PhenotypeClass1=PhenotypeClass2, PhenotypeClass2=PhenotypeClass1) %>%
  bind_rows(dat.cor) %>%
  distinct(.keep_all=T) %>%
  # mutate(cor = as.numeric(cor)) %>%
  pivot_wider(names_from = "PhenotypeClass1", values_from = "cor", values_fill=NA, names_sort=T) %>%
  column_to_rownames("PhenotypeClass2") %>%
  as.matrix() %>%
  # heatmap.2(trace="none", col=brewer.pal(51,"Spectral"))
  heatmap.2(trace="none", col=colorpanel(75, "blue", "black", "yellow"), cexRow = 0.5, cexCol = 0.5)

The slight negative correlation between leafcutter splicing and other phenotypes is interesting. I wonder if there is a technical explanation. For example, perhaps many of the colocalizing sQTLs are actually alt TSS, such that the measured decrease in expression if a downstream TSS is used (shorter isoform), results in general upregulation of within-cluster introns which would occur if the cluster had a few introns only present in the long isoform that go down, and the other introns necessarily go up.

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2 gplots_3.0.1.1     forcats_0.4.0     
 [4] stringr_1.4.0      dplyr_0.8.3        purrr_0.3.4       
 [7] readr_1.3.1        tidyr_1.1.0        tibble_2.1.3      
[10] ggplot2_3.3.3      tidyverse_1.3.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         lubridate_1.7.4    lattice_0.20-38   
 [4] gtools_3.8.1       assertthat_0.2.1   rprojroot_2.0.2   
 [7] digest_0.6.20      utf8_1.1.4         R6_2.4.0          
[10] cellranger_1.1.0   backports_1.1.4    reprex_0.3.0      
[13] evaluate_0.14      httr_1.4.1         pillar_1.4.2      
[16] rlang_0.4.10       readxl_1.3.1       rstudioapi_0.10   
[19] gdata_2.18.0       rmarkdown_1.13     labeling_0.3      
[22] munsell_0.5.0      broom_0.5.2        compiler_3.6.1    
[25] httpuv_1.5.1       modelr_0.1.8       xfun_0.8          
[28] pkgconfig_2.0.2    htmltools_0.3.6    tidyselect_1.1.0  
[31] workflowr_1.6.2    fansi_0.4.0        crayon_1.3.4      
[34] dbplyr_1.4.2       withr_2.4.1        later_0.8.0       
[37] bitops_1.0-6       grid_3.6.1         nlme_3.1-140      
[40] jsonlite_1.6       gtable_0.3.0       lifecycle_0.1.0   
[43] DBI_1.1.0          git2r_0.26.1       magrittr_1.5      
[46] scales_1.1.0       KernSmooth_2.23-15 cli_2.2.0         
[49] stringi_1.4.3      farver_2.1.0       fs_1.3.1          
[52] promises_1.0.1     xml2_1.3.2         ellipsis_0.2.0.1  
[55] generics_0.0.2     vctrs_0.3.1        tools_3.6.1       
[58] glue_1.3.1         hms_0.5.3          yaml_2.2.0        
[61] colorspace_1.4-1   caTools_1.17.1.2   rvest_0.3.5       
[64] knitr_1.23         haven_2.3.1