Last updated: 2022-01-18

I have previously colocalized molecular traits gene-wise using hyprcoloc and saved the important summary stats, including beta values for each molecular trait for the top SNP for the hyprcoloc cluster, to file. Here I aim to check how those beta values correlate among different trait types for colocalized traits. For example, among H3K27Ac QTL peaks that colocalize with gene expression QTLs in polyA RNA-seq, are the beta values correlated. Since the paridigm is that H2K27Ac enahncers are activating, such that increase in enhancer signal would result in increase in target gene expression, I expect a generally positive correlation. More interstingly will be to check these kinds of correlations for splicing phenotypes. There are some reports that splicing (via the spliceosome) activates H3K4me3 writers nearby for example (EMATS mechanism), though Li et al did not see enrichment of sQTLs among H3K4me3 QTLs. So it would be interseting to see if we do or do not see these correlations among colocalization traits.

dat <- read_tsv("../output/hyprcoloc_results/ForColoc/hyprcoloc.results.OnlyColocalized.Stats.txt.gz") %>%
  select(-iteration) %>%
  unite(Locus_snp, Locus, snp) %>%
  mutate(pheno_class = str_replace(phenotype_full, "(.+?);.+$", "\\1"))
Parsed with column specification:
  snp = col_character(),
  beta = col_double(),
  beta_se = col_double(),
  p = col_double(),
  Locus = col_character(),
  phenotype_full = col_character(),
  iteration = col_double(),
  ColocPr = col_double(),
  RegionalPr = col_double(),
  TopSNPFinemapPr = col_double()
# A tibble: 6 x 9
  Locus_snp   beta beta_se       p phenotype_full ColocPr RegionalPr
  <chr>      <dbl>   <dbl>   <dbl> <chr>            <dbl>      <dbl>
1 ENSG0000…  0.708  0.110  1.74e-8 MetabolicLabe…   0.838      0.962
2 ENSG0000…  0.580  0.114  3.59e-6 MetabolicLabe…   0.838      0.962
3 ENSG0000… -0.549  0.0880 2.95e-8 H3K27AC;H3K27…   0.377      0.690
4 ENSG0000… -0.538  0.128  7.30e-5 H3K27AC;H3K27…   0.377      0.690
5 ENSG0000… -0.605  0.104  1.75e-7 H3K27AC;H3K27…   0.377      0.690
6 ENSG0000… -0.455  0.121  3.57e-4 H3K4ME3;H3K4M…   0.377      0.690
# … with 2 more variables: TopSNPFinemapPr <dbl>, pheno_class <chr>

Ok, now lets do some data tidy-ing to match up all pairs of within-cluster colocalized traits and their beta values

dat.tidy.for.cor <- dat %>%
  # filter(Locus_snp == "ENSG00000002822.15_7:1998126:G:A") %>%
  left_join(., ., by = c("Locus_snp", "ColocPr", "RegionalPr", "TopSNPFinemapPr")) %>% 
  filter(phenotype_full.x != phenotype_full.y) %>% 
  rowwise() %>%
  mutate(name = toString(sort(c(phenotype_full.x,phenotype_full.y)))) %>% 
  distinct(Locus_snp, name, .keep_all = T) %>%
  mutate(pheno_class_name = toString(sort(c(pheno_class.x,pheno_class.y))))

dat.tidy.for.cor %>%
  pull(pheno_class_name) %>% unique()
 [1] "MetabolicLabelled.30min, MetabolicLabelled.60min"         
 [2] "H3K27AC, H3K27AC"                                         
 [3] "H3K27AC, H3K4ME3"                                         
 [4] "H3K4ME3, H3K4ME3"                                         
 [5] "Expression.Splicing, MetabolicLabelled.30min"             
 [6] "Expression.Splicing, Expression.Splicing.Subset_YRI"      
 [7] "Expression.Splicing.Subset_YRI, MetabolicLabelled.30min"  
 [8] "Expression.Splicing, H3K27AC"                             
 [9] "Expression.Splicing, H3K4ME3"                             
[10] "Expression.Splicing, polyA.Splicing.Subset_YRI"           
[11] "Expression.Splicing.Subset_YRI, H3K27AC"                  
[12] "H3K27AC, polyA.Splicing.Subset_YRI"                       
[13] "Expression.Splicing.Subset_YRI, H3K4ME3"                  
[14] "H3K4ME3, polyA.Splicing.Subset_YRI"                       
[15] "Expression.Splicing.Subset_YRI, polyA.Splicing.Subset_YRI"
[16] "polyA.Splicing.Subset_YRI, polyA.Splicing.Subset_YRI"     
[17] "H3K4ME3, MetabolicLabelled.30min"                         
[18] "MetabolicLabelled.30min, polyA.Splicing.Subset_YRI"       
[19] "Expression.Splicing, MetabolicLabelled.60min"             
[20] "H3K4ME3, MetabolicLabelled.60min"                         
[21] "H3K27AC, MetabolicLabelled.30min"                         
[22] "H3K27AC, MetabolicLabelled.60min"                         
[23] "Expression.Splicing.Subset_YRI, MetabolicLabelled.60min"  
[24] "MetabolicLabelled.60min, polyA.Splicing.Subset_YRI"       
[25] "chRNA.Expression.Splicing, H3K27AC"                       
[26] "chRNA.Expression.Splicing, H3K4ME3"                       
[27] "chRNA.Expression.Splicing, MetabolicLabelled.30min"       
[28] "chRNA.Expression.Splicing, polyA.Splicing.Subset_YRI"     
[29] "chRNA.Expression.Splicing, Expression.Splicing"           
[30] "chRNA.Expression.Splicing, MetabolicLabelled.60min"       
[31] "chRNA.Expression.Splicing, Expression.Splicing.Subset_YRI"

Ok, now let’s plot the correlation of beta values for certain classes of phenotypes (eg Expression QTLs and H3K27Ac QTLs), and more generally for all pairs of phenotype classes

dat.tidy.for.cor %>%
  filter(pheno_class_name == "Expression.Splicing, H3K27AC") %>%
  ggplot(aes(x=beta.x, y=beta.y)) +
  geom_point() +
  facet_wrap(~pheno_class_name) +

dat.tidy.for.cor %>%
  ggplot(aes(x=beta.x, y=beta.y)) +
  geom_point(alpha=0.05) +
  facet_wrap(~pheno_class_name) +

Ok wow. That is nice. H3K27Ac beta values are very strongly positively correlated with eQTL betas, as expected. We can look at other pairs of traits too, and if there is an effect, I think this is a very powerful way to assess it. But unsurprisingly, the way we have quantified splicing here with leafcutter, splicing is not really obviously correlated with anything, since a leafcutter intron excision ratio PSI that goes up for one intron necessarily means some other introns went down. Perhaps if we looked at intron retention, the effects would be more interpretable. That will be on the future to do list once we get better chRNA-seq data.

Also, let’s replot these as a clustered heatmap for matrix of correlation coefficients.

# my_palette <- colorRampPalette(c("blue", "black", "yellow"))(n = 1000)

dat.tidy.for.cor %>%
  group_by(pheno_class_name) %>%
  summarise(cor = cor.test(beta.x, beta.y, method="pearson")$p.value) %>%
  separate(pheno_class_name, into=c("PhenotypeClass1", "PhenotypeClass2"), sep = ", ")
# A tibble: 31 x 3
   PhenotypeClass1           PhenotypeClass2                      cor
   <chr>                     <chr>                              <dbl>
 1 chRNA.Expression.Splicing Expression.Splicing            1.27e-  5
 2 chRNA.Expression.Splicing Expression.Splicing.Subset_YRI 3.07e-  5
 3 chRNA.Expression.Splicing H3K27AC                        2.18e-  6
 4 chRNA.Expression.Splicing H3K4ME3                        1.05e-  3
 5 chRNA.Expression.Splicing MetabolicLabelled.30min        1.79e-  8
 6 chRNA.Expression.Splicing MetabolicLabelled.60min        3.96e-  7
 7 chRNA.Expression.Splicing polyA.Splicing.Subset_YRI      6.29e-  1
 8 Expression.Splicing       Expression.Splicing.Subset_YRI 0.       
 9 Expression.Splicing       H3K27AC                        0.       
10 Expression.Splicing       H3K4ME3                        2.47e-177
# … with 21 more rows
dat.cor <- dat.tidy.for.cor %>%
  group_by(pheno_class_name) %>%
  summarise(cor = cor(beta.x, beta.y, method="spearman")) %>%
  separate(pheno_class_name, into=c("PhenotypeClass1", "PhenotypeClass2"), sep = ", ")
dat.cor %>%
  rename(PhenotypeClass1=PhenotypeClass2, PhenotypeClass2=PhenotypeClass1) %>%
  bind_rows(dat.cor) %>%
  distinct(.keep_all=T) %>%
  # mutate(cor = as.numeric(cor)) %>%
  pivot_wider(names_from = "PhenotypeClass1", values_from = "cor", values_fill=NA, names_sort=T) %>%
  column_to_rownames("PhenotypeClass2") %>%
  as.matrix() %>%
  # heatmap.2(trace="none", col=brewer.pal(51,"Spectral"))
  heatmap.2(trace="none", col=colorpanel(75, "blue", "black", "yellow"), cexRow = 0.5, cexCol = 0.5)

The slight negative correlation between leafcutter splicing and other phenotypes is interesting. I wonder if there is a technical explanation. For example, perhaps many of the colocalizing sQTLs are actually alt TSS, such that the measured decrease in expression if a downstream TSS is used (shorter isoform), results in general upregulation of within-cluster introns which would occur if the cluster had a few introns only present in the long isoform that go down, and the other introns necessarily go up.

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2 gplots_3.0.1.1     forcats_0.4.0     
 [4] stringr_1.4.0      dplyr_0.8.3        purrr_0.3.4       
 [7] readr_1.3.1        tidyr_1.1.0        tibble_2.1.3      
[10] ggplot2_3.3.3      tidyverse_1.3.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         lubridate_1.7.4    lattice_0.20-38   
 [4] gtools_3.8.1       assertthat_0.2.1   rprojroot_2.0.2   
 [7] digest_0.6.20      utf8_1.1.4         R6_2.4.0          
[10] cellranger_1.1.0   backports_1.1.4    reprex_0.3.0      
[13] evaluate_0.14      httr_1.4.1         pillar_1.4.2      
[16] rlang_0.4.10       readxl_1.3.1       rstudioapi_0.10   
[19] gdata_2.18.0       rmarkdown_1.13     labeling_0.3      
[22] munsell_0.5.0      broom_0.5.2        compiler_3.6.1    
[25] httpuv_1.5.1       modelr_0.1.8       xfun_0.8          
[28] pkgconfig_2.0.2    htmltools_0.3.6    tidyselect_1.1.0  
[31] workflowr_1.6.2    fansi_0.4.0        crayon_1.3.4      
[34] dbplyr_1.4.2       withr_2.4.1        later_0.8.0       
[37] bitops_1.0-6       grid_3.6.1         nlme_3.1-140      
[40] jsonlite_1.6       gtable_0.3.0       lifecycle_0.1.0   
[43] DBI_1.1.0          git2r_0.26.1       magrittr_1.5      
[46] scales_1.1.0       KernSmooth_2.23-15 cli_2.2.0         
[49] stringi_1.4.3      farver_2.1.0       fs_1.3.1          
[52] promises_1.0.1     xml2_1.3.2         ellipsis_0.2.0.1  
[55] generics_0.0.2     vctrs_0.3.1        tools_3.6.1       
[58] glue_1.3.1         hms_0.5.3          yaml_2.2.0        
[61] colorspace_1.4-1   caTools_1.17.1.2   rvest_0.3.5       
[64] knitr_1.23         haven_2.3.1