Last updated: 2024-09-03

Checks: 5 2

Knit directory: oxygen_eqtl/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220621) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/gilad/umans/oxygen_eqtl/topicqtl/outputs/topics15/all_genes_merged_fine_fasttopics_15_topics.cellregmap.sighits.tsv topicqtl/outputs/topics15/all_genes_merged_fine_fasttopics_15_topics.cellregmap.sighits.tsv
/project2/gilad/umans/oxygen_eqtl/topicqtl/outputs/topics15/fasttopics_fine_15_topics. topicqtl/outputs/topics15/fasttopics_fine_15_topics.
/project2/gilad/umans/oxygen_eqtl/topicqtl/pseudocell_loadings_k15.tsv topicqtl/pseudocell_loadings_k15.tsv

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  .DS_Store
    Untracked:  18519_calling.Rmd
    Untracked:  TOM/
    Untracked:  _site.yml
    Untracked:  about.Rmd
    Untracked:  batch_variability.Rmd
    Untracked:  batch_variability.knit.md
    Untracked:  batch_variability.nb.html
    Untracked:  cache/
    Untracked:  cellregmap.Rmd
    Untracked:  celltype.Rmd
    Untracked:  celltype2.Rmd
    Untracked:  celltype2.nb.html
    Untracked:  construct_seurat.Rmd
    Untracked:  construct_seurat2.Rmd
    Untracked:  cormotif_eqtl.Rmd
    Untracked:  de.Rmd
    Untracked:  de2.Rmd
    Untracked:  de2.nb.html
    Untracked:  de_finalized.Rmd
    Untracked:  de_finalized_reharmonized.Rmd
    Untracked:  disease_gene_overlap.Rmd
    Untracked:  disease_gene_overlap_EE.Rmd
    Untracked:  disease_gene_overlap_EE_reharmonized.Rmd
    Untracked:  disease_gene_overlap_EE_reharmonized_fine.Rmd
    Untracked:  disease_gene_overlap_EE_reharmonized_fine_filter10.Rmd
    Untracked:  docs/
    Untracked:  figure1.Rmd
    Untracked:  figure2.Rmd
    Untracked:  figure3.Rmd
    Untracked:  figure4.Rmd
    Untracked:  figures_for_poster.R
    Untracked:  for_yunqi_mash.rmd
    Untracked:  gsea.Rmd
    Untracked:  gsea.nb.html
    Untracked:  gsea_reharmonized.Rmd
    Untracked:  hgwgcna.Rmd
    Untracked:  hgwgcna.nb.html
    Untracked:  hippo_eqtl.Rmd
    Untracked:  index.Rmd
    Untracked:  index_old.Rmd
    Untracked:  license.Rmd
    Untracked:  mash_EE.R
    Untracked:  mash_EE_PC.R
    Untracked:  mash_de.Rmd
    Untracked:  mash_for_peter.r
    Untracked:  matrixEQTL.Rmd
    Untracked:  matrixEQTL.nb.html
    Untracked:  matrixEQTL_reharmonized.Rmd
    Untracked:  ncell_permtesting.R
    Untracked:  plot_eqtl.Rmd
    Untracked:  prep_apex.Rmd
    Untracked:  qtltools.Rmd
    Untracked:  seurat.export.library1.h5Seurat
    Untracked:  shared_functions_style_items.R
    Untracked:  test.rmd
    Untracked:  topics.R
    Untracked:  topics.Rmd
    Untracked:  topics_all.R
    Untracked:  topics_pseudocell.R
    Untracked:  topicsde.R
    Untracked:  voxhunt.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


Introduction

This page describes steps used to compare eQTLs to disease gene results.

library(Seurat)
Attaching SeuratObject
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.4.4     ✔ purrr   1.0.2
✔ tibble  3.2.1     ✔ dplyr   1.1.4
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.4     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(pals)
library(RColorBrewer)
library(mashr)
Loading required package: ashr
library(udr)
library(knitr)
library(ggrepel)
library(gridExtra)

Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':

    combine
library(MatrixEQTL)
library(vroom)

Attaching package: 'vroom'
The following objects are masked from 'package:readr':

    as.col_spec, col_character, col_date, col_datetime, col_double,
    col_factor, col_guess, col_integer, col_logical, col_number,
    col_skip, col_time, cols, cols_condense, cols_only, date_names,
    date_names_lang, date_names_langs, default_locale, fwf_cols,
    fwf_empty, fwf_positions, fwf_widths, locale, output_column,
    problems, spec
source("analysis/shared_functions_style_items.R")

Process GWAS results

Import GWAS Catalog data and separate the mapped genes.

gwas <- read.delim(file = "/project/gilad/umans/references/gwas/gwas_catalog_v1.0.2-associations_e110_r2023-09-10.tsv", quote = "", fill = FALSE) %>%
  dplyr::select(!c("DATE.ADDED.TO.CATALOG", "PUBMEDID" ,"FIRST.AUTHOR", "DATE", "JOURNAL","LINK", "STUDY", "INITIAL.SAMPLE.SIZE", "REPLICATION.SAMPLE.SIZE", "PLATFORM..SNPS.PASSING.QC.", "P.VALUE..TEXT.", "X95..CI..TEXT.", "GENOTYPING.TECHNOLOGY", "MAPPED_TRAIT_URI"))

gwas_formatted <- gwas %>% 
  mutate(MAPPED_GENE=if_else(str_detect(MAPPED_GENE, pattern = "No mapped genes"), "", MAPPED_GENE)) %>% 
  rowwise() %>% 
  mutate(gene=strsplit(MAPPED_GENE, split="[ - ]")[[1]][1], upstream_gene=strsplit(MAPPED_GENE, split=" - ")[[1]][1],  downstream_gene=strsplit(MAPPED_GENE, split=" - ")[[1]][2]) %>% 
  mutate(nearest_gene= case_when(UPSTREAM_GENE_DISTANCE<DOWNSTREAM_GENE_DISTANCE ~ upstream_gene, 
                                 UPSTREAM_GENE_DISTANCE>DOWNSTREAM_GENE_DISTANCE ~ downstream_gene,
                                 is.na(UPSTREAM_GENE_DISTANCE)|is.na(DOWNSTREAM_GENE_DISTANCE) ~ gene)) %>% 
  mutate(nearest_gene=gsub("([a-zA-Z]),", "\\1 ", nearest_gene)) %>% 
  mutate(nearest_gene=gsub(",", "", nearest_gene)) %>% 
  mutate(nearest_gene=str_trim(nearest_gene)) %>% 
  mutate(trait=strsplit(DISEASE.TRAIT, split=", ")[[1]][1]) %>% 
  add_count(trait) %>% 
  ungroup()
gwas_formatted <- readRDS(file = "/project/gilad/umans/references/gwas/gwas_catalog_v1.0.2-associations_e110_r2023-09-10.RDS")

In cases where the GWAS reported a gene for a given SNP, use the reported gene. Otherwise, assign the nearest gene as the one that corresponds to the associated SNP.

gwas_formatted <- gwas_formatted %>%  
  mutate(`REPORTED.GENE.S.`=if_else(str_detect(`REPORTED.GENE.S.`, pattern = "NR"), NA, `REPORTED.GENE.S.`)) %>% 
  mutate(`REPORTED.GENE.S.`=if_else(str_detect(`REPORTED.GENE.S.`, pattern = "intergenic"), NA, `REPORTED.GENE.S.`)) %>% 
  mutate(`REPORTED.GENE.S.`= na_if(`REPORTED.GENE.S.`, "")) %>% 
  mutate(nearest_gene= case_when(!is.na(`REPORTED.GENE.S.`) ~ `REPORTED.GENE.S.`,
                                 UPSTREAM_GENE_DISTANCE<DOWNSTREAM_GENE_DISTANCE ~ upstream_gene, 
                                 UPSTREAM_GENE_DISTANCE>DOWNSTREAM_GENE_DISTANCE ~ downstream_gene,
                                 is.na(UPSTREAM_GENE_DISTANCE)|is.na(DOWNSTREAM_GENE_DISTANCE) ~ gene
                                 )) %>% 
     mutate(nearest_gene=str_extract(nearest_gene, "[^, ]+")) %>% 
  mutate(nearest_gene=gsub(",", "", nearest_gene)) %>% 
  mutate(nearest_gene=str_trim(nearest_gene)) 

I next curated traits that were reasonably brain-related. This includes addiction-related traits, but not dietary preference traits.

filtered_traits <- read_csv(file = "/project/gilad/umans/references/gwas/gwas_catalog_v1.0.2-associations_e110_r2023-09-10_organoid-related_n15.csv", col_names = FALSE) %>% pull(X1)
Rows: 512 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): X1

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

I next exclude traits with fewer than 15 associations or more than 500 associations, and further filter consider only genome-wide significant associations.

gwas_gene_lists <- split(gwas_formatted %>% filter(trait %in% filtered_traits) %>% filter(n>15) %>% filter(n<500) %>% filter(PVALUE_MLOG>7.3)  %>% pull(nearest_gene),  gwas_formatted %>% filter(trait %in% filtered_traits) %>% filter(n>15) %>% filter(n<500) %>% filter(PVALUE_MLOG>7.3) %>%  pull(trait))
gwas_snps <- gwas_formatted %>% filter(trait %in% filtered_traits) %>% filter(n>15) %>% filter(n<500) %>%
  filter(PVALUE_MLOG>7.3) %>% unite("snp", CHR_ID:CHR_POS, sep = ":") %>% pull(snp) %>% unique()

Note that for some common traits there are many GWAS studies:

gwas_formatted %>% 
  filter(trait %in% filtered_traits) %>% 
  filter(n>15) %>% filter(n<500) %>% filter(PVALUE_MLOG>7.3)  %>%
  select(STUDY.ACCESSION, trait) %>% distinct() %>% group_by(trait) %>% summarize(studies=n()) %>%
  arrange(desc(studies)) %>% head()
# A tibble: 6 × 2
  trait                            studies
  <chr>                              <int>
1 Alzheimer's disease                   35
2 Parkinson's disease                   24
3 Bipolar disorder                      18
4 Amyotrophic lateral sclerosis         15
5 Major depressive disorder             14
6 Alzheimer's disease (late onset)      12

And, of course, there are a number of very similar traits in this large compendium.

Next, I import significant eGenes from GTEx cerebral cortex tissue samples.

gtex_cortex_signif <- read.table(file = "/project/gilad/umans/references/gtex/GTEx_Analysis_v8_eQTL/Brain_Cortex.v8.egenes.txt.gz", header = TRUE, sep = "\t", stringsAsFactors = FALSE) %>% filter(qval<0.05) %>% pull(gene_name)
gtex_frontalcortex_signif <- read.table(file = "/project/gilad/umans/references/gtex/GTEx_Analysis_v8_eQTL/Brain_Frontal_Cortex_BA9.v8.egenes.txt.gz", header = TRUE, sep = "\t", stringsAsFactors = FALSE) %>% filter(qval<0.05) %>% pull(gene_name)
gtex_cortex_signif <- unique(c(gtex_cortex_signif, gtex_frontalcortex_signif))

And import the eQTL results.

mash_by_condition_output <- readRDS(file = "output/combined_mash-by-condition_EE_fine_reharmonized_032024.rds") %>% ungroup() 

Compare to fine-clustered eQTL results

I classified eGenes from the organoid dataset by whether they were significant under normoxia and whether they are responsive to manipulating oxygen. These two binary classifications result in 4 groups: (1) shared effects in all conditions, detectable under normoxia; (2) dynamic and detectable under normoxia; (3) dynamic and not detectable under normoxia; and (4) shared effects under all conditions but not detectable under normoxia. Implicitly, group 4 effects needed additional treatment conditions to detect them not because they’re responsive to treatment but because of the additional power we get.

Here, I plot the number of eGenes of each these four classes that are the nearest genes to a significant GWAS variant in each cell type.

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  ggplot(aes(x=factor(class, levels = c("class1", "class4", "class2", "class3")), y=egene_gwas)) + geom_boxplot(outlier.shape = NA, aes(color=class)) + 
  xlab("eGene class (see note)") + 
  ylab("Number of eGenes in GWAS") + 
  theme_light() + 
  geom_point(aes(group=source, color=source), position = position_jitter(width = 0.2, height = 0)) +
  scale_color_manual(values=c(manual_palette_fine, class_colors)) +
  xlab("eGene class (see note)") + 
  theme(legend.position="none")
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

Here, I show the numbers plotted above and perform a Wilcoxon test comparing the dynamic/not-detected-in-normoxia (“class3”) GWAS-associated eGenes with the non-dynamic/detected-in-normoxia (“class1”) GWAS-associated eGenes.

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  group_by(class) %>% 
  summarize(median(egene_gwas)) %>% kable()
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
class median(egene_gwas)
class1 215.0
class2 125.0
class3 258.5
class4 131.5
wilcox.test(mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class1),
  mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class3),
  paired = TRUE)
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

    Wilcoxon signed rank exact test

data:  mash_by_condition_output %>% filter(sig_anywhere) %>% mutate(class = case_when(control10_lfsr < 0.05 & allsharing ~ "class1", control10_lfsr < 0.05 & !allsharing ~ "class2", sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3", sig_anywhere & allsharing ~ "class4")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class1) and mash_by_condition_output %>% filter(sig_anywhere) %>% mutate(class = case_when(control10_lfsr < 0.05 & allsharing ~ "class1", control10_lfsr < 0.05 & !allsharing ~ "class2", sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3", sig_anywhere & allsharing ~ "class4")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class3)
V = 31, p-value = 0.1937
alternative hypothesis: true location shift is not equal to 0

Overall (ie, not per-cell type):

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  select(gene, class) %>% 
  distinct() %>% 
   group_by(class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  kable()
class egene_gwas
class1 1411
class2 1242
class3 1745
class4 1091

The same numbers can be calculated excluding genes that have already been identified as eGenes in GTEx cerebral cortex tissue.

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  filter(gene %not_in% gtex_cortex_signif) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  group_by(class) %>% 
  summarize(median(egene_gwas))
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
# A tibble: 4 × 2
  class  `median(egene_gwas)`
  <chr>                 <dbl>
1 class1                130. 
2 class2                 73.5
3 class3                158. 
4 class4                 75  

Overall (ie, not per-cell type):

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  filter(gene %not_in% gtex_cortex_signif) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  select(gene, class) %>% 
  distinct() %>% 
   group_by(class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  kable()
class egene_gwas
class1 825
class2 747
class3 1014
class4 634

To test whether the “dynamic” vs “standard” eGenes (defined solely by comparisons of effect size) are more likely to be present in GWAS gene lists, we group and test them as we did for comparison to GTEx. Note this yields a similar conclusion as above, where we broke eGenes down by whether they were dynamic and detectable at baseline.

wilcox.test(mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
 mutate(class=case_when(allsharing ~ "standard",
                        hypoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hyperoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hypoxia_hyperoxia_shared & sharing_contexts==1 ~ "dynamic",
                        sharing_contexts==2 ~ "standard", 
                        sharing_contexts==0 ~ "dynamic"
                        )) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
   filter(class %in% c("dynamic")) %>% pull(egene_gwas), 
  mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
 mutate(class=case_when(allsharing ~ "standard",
                        hypoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hyperoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hypoxia_hyperoxia_shared & sharing_contexts==1 ~ "dynamic",
                        sharing_contexts==2 ~ "standard", 
                        sharing_contexts==0 ~ "dynamic"
                        )) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
   filter(class %in% c("standard")) %>% pull(egene_gwas), paired = TRUE, alternative = "two.sided")
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

    Wilcoxon signed rank exact test

data:  mash_by_condition_output %>% filter(sig_anywhere) %>% mutate(class = case_when(allsharing ~ "standard", hypoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hyperoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hypoxia_hyperoxia_shared & sharing_contexts == 1 ~ "dynamic", sharing_contexts == 2 ~ "standard", sharing_contexts == 0 ~ "dynamic")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% filter(class %in% c("dynamic")) %>%      pull(egene_gwas) and mash_by_condition_output %>% filter(sig_anywhere) %>% mutate(class = case_when(allsharing ~ "standard", hypoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hyperoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hypoxia_hyperoxia_shared & sharing_contexts == 1 ~ "dynamic", sharing_contexts == 2 ~ "standard", sharing_contexts == 0 ~ "dynamic")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% filter(class %in% c("standard")) %>%      pull(egene_gwas)
V = 42, p-value = 0.5416
alternative hypothesis: true location shift is not equal to 0

Compare to coarse-clustered results

mash_by_condition_output_coarse <- readRDS(file = "output/combined_mash-by-condition_EE_coarse_reharmonized_032024.rds") %>% ungroup() %>% filter(source != "all")

Again, I plot the number of eGenes of each these four classes that are the nearest genes to a significant GWAS variant in each cell type.

mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  ggplot(aes(x=factor(class, levels = c("class1", "class4", "class2", "class3")), y=egene_gwas)) + geom_boxplot(outlier.shape = NA, aes(color=class)) + 
  xlab("eGene class (see note)") + 
  ylab("Number of eGenes in GWAS") + 
  theme_light() + 
  geom_point(aes(group=source, color=source), position = position_jitter(width = 0.2, height = 0)) +
  scale_color_manual(values=c(manual_palette_coarse, class_colors)) +
  xlab("eGene class (see note)") + 
  theme(legend.position="none")
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

Here, I show the numbers plotted above and perform a Wilcoxon test comparing the dynamic/not-detected-in-normoxia (“class3”) GWAS-associated eGenes with the non-dynamic/detected-in-normoxia (“class1”) GWAS-associated eGenes.

mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  group_by(class) %>% 
  summarize(median(egene_gwas)) %>% kable()
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
class median(egene_gwas)
class1 271.0
class2 96.5
class3 218.0
class4 158.5
wilcox.test(mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class1),
  mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class3),
  paired = TRUE)
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

    Wilcoxon signed rank exact test

data:  mash_by_condition_output_coarse %>% filter(sig_anywhere) %>% mutate(class = case_when(control10_lfsr < 0.05 & allsharing ~ "class1", control10_lfsr < 0.05 & !allsharing ~ "class2", sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3", sig_anywhere & allsharing ~ "class4")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class1) and mash_by_condition_output_coarse %>% filter(sig_anywhere) %>% mutate(class = case_when(control10_lfsr < 0.05 & allsharing ~ "class1", control10_lfsr < 0.05 & !allsharing ~ "class2", sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3", sig_anywhere & allsharing ~ "class4")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% pivot_wider(names_from = class, values_from = egene_gwas) %>% pull(class3)
V = 18, p-value = 1
alternative hypothesis: true location shift is not equal to 0

Note that, in the coarsely-clustered data, more GWAS-associated eGenes appear non-dynamic (ie, have equivalent responses across oxygen conditions) and fewer appear context-specific.

The same numbers can be calculated excluding genes that have already been identified as eGenes in GTEx cerebral cortex tissue.

mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
  filter(gene %not_in% gtex_cortex_signif) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  group_by(class) %>% 
  summarize(median(egene_gwas))
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
# A tibble: 4 × 2
  class  `median(egene_gwas)`
  <chr>                 <dbl>
1 class1                156. 
2 class2                 52  
3 class3                122. 
4 class4                 92.5

To test whether the “dynamic” vs “standard” eGenes (defined solely by comparisons of effect size) are more likely to be present in GWAS gene lists, we group and test them as we did for comparison to GTEx. Note this yields a similar conclusion as above, where we broke eGenes down by whether they were dynamic and detectable at baseline.

wilcox.test(mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
 mutate(class=case_when(allsharing ~ "standard",
                        hypoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hyperoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hypoxia_hyperoxia_shared & sharing_contexts==1 ~ "dynamic",
                        sharing_contexts==2 ~ "standard", 
                        sharing_contexts==0 ~ "dynamic"
                        )) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% filter(class %in% c("dynamic")) %>% pull(egene_gwas), 
  mash_by_condition_output_coarse %>% 
  filter(sig_anywhere) %>% 
 mutate(class=case_when(allsharing ~ "standard",
                        hypoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hyperoxia_normoxia_shared & sharing_contexts==1 ~ "dynamic",
                        hypoxia_hyperoxia_shared & sharing_contexts==1 ~ "dynamic",
                        sharing_contexts==2 ~ "standard", 
                        sharing_contexts==0 ~ "dynamic"
                        )) %>% 
   group_by(source, class) %>% 
  summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% filter(class %in% c("standard")) %>% pull(egene_gwas), paired = TRUE, alternative = "two.sided")
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

    Wilcoxon signed rank exact test

data:  mash_by_condition_output_coarse %>% filter(sig_anywhere) %>% mutate(class = case_when(allsharing ~ "standard", hypoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hyperoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hypoxia_hyperoxia_shared & sharing_contexts == 1 ~ "dynamic", sharing_contexts == 2 ~ "standard", sharing_contexts == 0 ~ "dynamic")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% filter(class %in%      c("dynamic")) %>% pull(egene_gwas) and mash_by_condition_output_coarse %>% filter(sig_anywhere) %>% mutate(class = case_when(allsharing ~ "standard", hypoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hyperoxia_normoxia_shared & sharing_contexts == 1 ~ "dynamic", hypoxia_hyperoxia_shared & sharing_contexts == 1 ~ "dynamic", sharing_contexts == 2 ~ "standard", sharing_contexts == 0 ~ "dynamic")) %>% group_by(source, class) %>% summarise(egene_gwas = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% filter(class %in%      c("standard")) %>% pull(egene_gwas)
V = 5, p-value = 0.07813
alternative hypothesis: true location shift is not equal to 0

Topic-interacting eQTL results

Finally, compare the topic-interacting eQTL genes to the GWAS gene list. Obtain the topic 7-correlated eGenes the same as shown previously:

crm_signif <- vroom("/project2/gilad/umans/oxygen_eqtl/topicqtl/outputs/topics15/all_genes_merged_fine_fasttopics_15_topics.cellregmap.sighits.tsv")
Rows: 289 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): GENE_HGNC, VARIANT_ID
dbl (3): P_CELLREGMAP, P_BONF, q

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crm_iegenes <- crm_signif$GENE_HGNC

crm_betas <- vroom(paste0("/project2/gilad/umans/oxygen_eqtl/topicqtl/outputs/topics15/fasttopics_fine_15_topics.", crm_iegenes, ".cellregmap.betas.tsv"))
Rows: 3094323 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): PSEUDOCELL, GENE_HGNC, VARIANT_ID
dbl (2): BETA_GXC, BETA_G

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crm_betas_wide <- dplyr::select(crm_betas, c(PSEUDOCELL, BETA_GXC, GENE_HGNC, VARIANT_ID)) %>%
  unite(TOPIC_QTL, GENE_HGNC, VARIANT_ID, sep="_") %>%
  pivot_wider(id_cols=PSEUDOCELL, names_from=TOPIC_QTL, values_from=BETA_GXC)

topic_loadings <- vroom("/project2/gilad/umans/oxygen_eqtl/topicqtl/pseudocell_loadings_k15.tsv")
Rows: 10707 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (1): pseudocell
dbl (15): k1, k2, k3, k4, k5, k6, k7, k8, k9, k10, k11, k12, k13, k14, k15

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crm_betas_loadings <- left_join(crm_betas_wide, topic_loadings, by=c("PSEUDOCELL"="pseudocell"))

beta_topic_corrs_matrix <- cor(dplyr::select(crm_betas_loadings, -c(PSEUDOCELL, paste0("k", seq(15)))),
                        dplyr::select(crm_betas_loadings, paste0("k", seq(15))))

beta_topic_corrs_p <- apply(dplyr::select(crm_betas_loadings, -c(PSEUDOCELL, paste0("k", seq(15)))) %>% as.matrix(), MARGIN = 2, FUN = function(x) cor.test(x, crm_betas_loadings$k7)$p.value )

k7_egenes <- beta_topic_corrs_matrix %>% as.data.frame() %>% 
  dplyr::select(k7) %>% 
  bind_cols(beta_topic_corrs_p) %>% 
  rownames_to_column(var = "gene_snp") %>%  
   separate(gene_snp, into = c("genename", "snp"), sep = "_") %>% 
    mutate(snp_short = str_sub(snp, end = -5)) %>% 
  filter(`...2` < (0.05/289)) %>% pull(genename)
New names:
• `` -> `...2`

Now, intersect with GWAS genes:

sum(k7_egenes %in% (unlist(gwas_gene_lists) %>% unique()))
[1] 55

If we exclude the genes in GTEx cortex samples:

length(intersect(setdiff(k7_egenes, gtex_cortex_signif), (unlist(gwas_gene_lists) %>% unique())))
[1] 31

Process rare variant results

Here, I import datasets downloaded from SCHEMA, BipEx, Epi25, DDD, and SFARI Genes.

schema <- read.table(file = "data/external/SCHEMA_gene_results.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# convert to gene symbols
bipex <- read.table(file = "data/external/BipEx_gene_results_new.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# convert to gene symbols
epi25 <- read.table(file = "data/external/Epi25_gene_results.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# convert to gene symbols
ddd <- read.csv(file = "data/external/DDD_gene_results.csv", header = TRUE, stringsAsFactors = FALSE)
sfari <- read.csv(file = "data/external/SFARI-Gene_genes.csv", header = TRUE,  stringsAsFactors = FALSE)

Change the ensembl gene names in the SCHEMA, BipEx, and Epi25 sets:

library(GenomicRanges)
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'
The following objects are masked from 'package:MatrixEQTL':

    colnames, rownames
The following object is masked from 'package:gridExtra':

    combine
The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union
The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors

Attaching package: 'S4Vectors'
The following objects are masked from 'package:dplyr':

    first, rename
The following object is masked from 'package:tidyr':

    expand
The following objects are masked from 'package:base':

    I, expand.grid, unname
Loading required package: IRanges

Attaching package: 'IRanges'
The following objects are masked from 'package:dplyr':

    collapse, desc, slice
The following object is masked from 'package:purrr':

    reduce
Loading required package: GenomeInfoDb
gene.gr <- readRDS('/project/gilad/umans/brainchromatin/data_files/rds/AllGenes_GenomicRanges.RDS')

convertGeneIDs <- function(genelist, id.mapping.table) {
  return(mapValues(x = genelist, mapping.table = id.mapping.table))
}
mapValues <- function(x,mapping.table) {
  out <- mapping.table[x]
  if(any(is.na(out))) { 
    na <- sum(is.na(out)) 
    cat(sprintf("Warning: mapping returned %s NA values\n", as.character(na)))
  }
  return(out)
}

name2ensembl <- names(gene.gr)

# Something funny about this one, unsure why yet
name2ensembl['CDR1'] <- 'ENSG00000281508'

names(name2ensembl) <- elementMetadata(gene.gr)[ ,'gene_name']

ensembl2name <- elementMetadata(gene.gr)[ ,'gene_name']
names(ensembl2name) <- names(gene.gr)

schema$gene_name <- unname(convertGeneIDs(schema$gene_id, ensembl2name))
Warning: mapping returned 222 NA values
bipex$gene_name <- unname(convertGeneIDs(bipex$gene_id, ensembl2name))
Warning: mapping returned 684 NA values
epi25$gene_name <- unname(convertGeneIDs(epi25$gene_id, ensembl2name))
Warning: mapping returned 28 NA values

Compile results into a list.

rare_gene_list <- list(bipex= bipex %>% 
                         filter(group=="Bipolar Disorder") %>% 
                         filter(damaging_missense_fisher_gnom_non_psych_pval<0.05|ptv_fisher_gnom_non_psych_pval<0.05) %>% 
                         drop_na(gene_name) %>% 
                         pull(gene_name), 
                       epi25 = epi25 %>% filter(group=="EPI") %>% 
                         filter(ptv_pval<0.05) %>%  
                         drop_na(gene_name) %>% 
                         pull(gene_name), 
                       schema = schema %>% filter(Qmeta<0.05) %>%  
                         drop_na(gene_name) %>% 
                         pull(gene_name), 
                       sfari_asd = sfari %>% filter(gene.score==1) %>% 
                         pull(gene.symbol), 
                       sfari_syndromic = sfari %>% filter(syndromic==1) %>% 
                         pull(gene.symbol),
                       ddd= ddd %>% filter(significant) %>% pull(symbol)
                       ) 

Pleiotropy in psychiatric genetics is well documented. Are the genes pulled from each of these sources largely unique?

length(unique(unlist(rare_gene_list)))/length(unlist(rare_gene_list))
[1] 0.8132296

Yes, about 81% of the genes here are not redundant across studies.

Compile the same results into a data frame.

rare_gene_frame <- rbind(data.frame(source= "bipex", gene= bipex %>% 
                         filter(group=="Bipolar Disorder") %>% 
                         filter(damaging_missense_fisher_gnom_non_psych_pval<0.05|ptv_fisher_gnom_non_psych_pval<0.05) %>% 
                         drop_na(gene_name) %>% 
                         pull(gene_name)),
                         data.frame(source="epi25", gene = epi25 %>% filter(group=="EPI") %>% 
                         filter(ptv_pval<0.05) %>%  
                         drop_na(gene_name) %>% 
                         pull(gene_name)), 
                         data.frame(source= "schema", gene = schema %>% filter(Qmeta<0.05) %>%  
                         drop_na(gene_name) %>% 
                         pull(gene_name)), 
                         data.frame(source= "sfari_asd", gene = sfari %>% filter(gene.score==1) %>% 
                         pull(gene.symbol)), 
                       data.frame(source="sfari_syndromic", gene = sfari %>% filter(syndromic==1) %>% 
                         pull(gene.symbol)),
                       data.frame(source= "ddd", gene= ddd %>% filter(significant) %>% pull(symbol)))

Now, just like above for GWAS genes, compare each eGene class to rare variant genes in each cell type.

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>%
   group_by(source, class) %>% 
  summarise(egene_rare = sum(gene %in% (unlist(rare_gene_frame$gene) %>% unique()))) %>% 
  ggplot(aes(x=factor(class, levels = c("class1", "class4", "class2", "class3")), y=egene_rare)) + 
  geom_boxplot(outlier.shape = NA, aes(color=class)) +  
  ylab("Rare variant genes with eQTLs") + 
  theme_light() + 
  geom_point(aes(group=source, color=source), position = position_jitter(width = 0.2, height = 0)) +
  scale_color_manual(values=c(manual_palette_fine, class_colors)) +
  xlab("eGene class (see note)") + 
  theme(legend.position="none")
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

# additionally, compare to GWAS results as above, but first restricting to rare variant-associated genes.  
mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  filter(gene %in% rare_gene_frame$gene) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>%
  group_by(source, class) %>% 
  summarise(egene_rare = sum(gene %in% (unlist(gwas_gene_lists) %>% unique()))) %>% 
  ggplot(aes(x=factor(class, levels = c("class1", "class4", "class2", "class3")), y=egene_rare)) + 
  geom_boxplot(outlier.shape = NA, aes(color=class)) +  
  xlab("eGene class (see note") + 
  ylab("Rare variant genes with GWAS eQTLs") + 
  theme_light() + 
  scale_color_manual(values=c(manual_palette_fine, class_colors)) + 
  theme(legend.position="none") +
  geom_point(aes(group=source, color=source), position = position_jitter(width = 0.2, height = 0))
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.

The total number of rare variant genes with an eQTL that correspond to one of our GWAS traits is:

mash_by_condition_output %>% 
  filter(sig_anywhere) %>% 
  filter(gene %in% rare_gene_frame$gene) %>% 
  filter(gene %in% (unlist(gwas_gene_lists) %>% unique())) %>% 
  pull(gene) %>% 
  unique() %>% 
  length()
[1] 349

Rare variant genes that are eGenes in our dataset allow us to ask about the phenotypic consequences of modulating expression levels of genes for which loss of function has an identified phenotype, linked together by a SNP. Here, we can join those together, bridging the GWAS results and the rare variant results by our eQTL results. Note that this is of course an underestimate of this analysis, since we’re only using those SNPs chosen as lead variants by mash which are themselves GWAS hits. There may be other equivalent or secondary SNPs masked by mash here that still provide information.

inner_join(x=mash_by_condition_output %>% 
filter(sig_anywhere) %>% 
  filter(gene %in% rare_gene_frame$gene) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  separate(gene_snp, into = c("genename", "snp"), sep = "_") %>% 
    mutate(snp_short = str_sub(snp, end = -5)) %>% 
    select(gene, snp, snp_short, source, control10_lfsr:stim21pct_beta, class), 
  y=gwas_formatted %>% filter(trait %in% filtered_traits) %>% filter(n>15) %>% filter(n<500) %>%
  filter(PVALUE_MLOG>7.3) %>% unite("snp", CHR_ID:CHR_POS, sep = ":") %>% select(nearest_gene, REPORTED.GENE.S., trait, n, snp, STUDY.ACCESSION, PVALUE_MLOG, UPSTREAM_GENE_DISTANCE, DOWNSTREAM_GENE_DISTANCE) , 
  by=join_by(snp_short==snp), relationship = "many-to-many")  %>% 
  mutate(across(ends_with("lfsr"), signif)) %>% 
  mutate(across(ends_with("beta"), signif)) %>% 
  # drop_na() %>% 
  select(gene, source, class, snp, nearest_gene, STUDY.ACCESSION, trait) %>% 
  left_join(x=., y=rare_gene_frame, by=join_by(gene==gene), relationship = "many-to-many") 
       gene    source.x  class              snp nearest_gene STUDY.ACCESSION
1      GNB1     Choroid class1    1:1921745:T:C       CFAP74      GCST010085
2    ATP2A2 CorticalHem class4 12:110285440:C:T       ATP2A2      GCST010645
3    ATP2A2 CorticalHem class4 12:110285440:C:T       ATP2A2      GCST008595
4    ATP2A2 CorticalHem class4 12:110285440:C:T       ATP2A2      GCST009600
5    ATP2A2 CorticalHem class4 12:110285440:C:T       ATP2A2    GCST90016620
6    ATP2A2 CorticalHem class4 12:110285440:C:T       ATP2A2    GCST90016619
7    NUP160 CorticalHem class2  11:47886089:A:G         DDB2      GCST006943
8      APOE    GliaProg class1  19:44917843:G:A        APOC1      GCST005921
9      APOE    GliaProg class1  19:44917843:G:A        APOC1      GCST005922
10   POU3F3    GliaProg class1  2:104849547:G:A       PANTR1    GCST90012894
11   POU3F3    GliaProg class1  2:104849547:G:A       PANTR1    GCST90012894
12   POU3F3    GliaProg class1  2:104849547:G:A       PANTR1    GCST90131905
13   POU3F3    GliaProg class1  2:104849547:G:A       PANTR1    GCST90131905
14   ATP2A2        Glut class1 12:110285440:C:T       ATP2A2      GCST010645
15   ATP2A2        Glut class1 12:110285440:C:T       ATP2A2      GCST008595
16   ATP2A2        Glut class1 12:110285440:C:T       ATP2A2      GCST009600
17   ATP2A2        Glut class1 12:110285440:C:T       ATP2A2    GCST90016620
18   ATP2A2        Glut class1 12:110285440:C:T       ATP2A2    GCST90016619
19    NR2F1        Glut class1   5:93597828:T:G       VANGL2      GCST010700
20    NR2F1        Glut class1   5:93597828:T:G       VANGL2      GCST010700
21    NRXN3        Glut class1  14:78164146:C:G        NRXN3    GCST90267280
22  ZSCAN26        Glut class4   6:28253486:G:A      ZKSCAN4    GCST90085697
23    CELF2     GlutNTS class4  10:10811722:A:G        CELF2    GCST90267280
24    CELF2     GlutNTS class4  10:10811722:A:G        CELF2    GCST90270074
25     NPC1    Immature class1  18:23546117:A:C         NPC1      GCST008595
26    PTCH1    Immature class1   9:95499896:G:A        PTCH1    GCST90027240
27    PTCH1    Immature class1   9:95499896:G:A        PTCH1    GCST90027244
28   ATP2A2         Inh class4 12:110285440:C:T       ATP2A2      GCST010645
29   ATP2A2         Inh class4 12:110285440:C:T       ATP2A2      GCST008595
30   ATP2A2         Inh class4 12:110285440:C:T       ATP2A2      GCST009600
31   ATP2A2         Inh class4 12:110285440:C:T       ATP2A2    GCST90016620
32   ATP2A2         Inh class4 12:110285440:C:T       ATP2A2    GCST90016619
33    HIP1R         Inh class3 12:122811747:G:A       CCDC62      GCST000959
34     CUL3     InhGNRH class3  2:224608546:C:T         CUL3    GCST90239693
35    IGF1R     InhGNRH class4  15:98652883:G:A        IGF1R    GCST90100569
36    IGF1R     InhGNRH class4  15:98652883:G:A        IGF1R      GCST007085
37    NR2F1     InhGNRH class3   5:93597828:T:G       VANGL2      GCST010700
38    NR2F1     InhGNRH class3   5:93597828:T:G       VANGL2      GCST010700
39  CTTNBP2 InhThalamic class3  7:117883655:A:G      CTTNBP2      GCST005324
40  CTTNBP2 InhThalamic class3  7:117883655:A:G      CTTNBP2      GCST008810
41  CTTNBP2 InhThalamic class3  7:117883655:A:G      CTTNBP2      GCST007474
42  CTTNBP2 InhThalamic class3  7:117883655:A:G      CTTNBP2      GCST007462
43  CTTNBP2 InhThalamic class3  7:117883655:A:G      CTTNBP2      GCST007085
44   BCL11B          IP class3  14:99268647:A:G       BCL11B      GCST007472
45   BCL11B          IP class3  14:99268647:A:G       BCL11B    GCST90131904
46    SCN1A NeuronOther class1  2:166142257:C:G        SCN3A      GCST007343
47    SCN1A NeuronOther class1  2:166142257:C:G        SCN3A      GCST007343
48    SCN1A NeuronOther class1  2:166142257:C:G        SCN3A      GCST007343
49    SCN1A NeuronOther class1  2:166142257:C:G        SCN3A      GCST007343
50    EP300          RG class3  22:41062780:C:G       WBP2NL      GCST006952
51    EP300          RG class3  22:41062780:C:G       WBP2NL      GCST006952
52    EP300          RG class3  22:41062780:C:G       WBP2NL      GCST006952
53    GNAO1          RG class1  16:56227517:T:G        GNAO1      GCST007982
54 HIST1H1E          RG class1   6:26122705:G:T        H2BC4    GCST90132260
55 HIST1H1E          RG class1   6:26122705:G:T        H2BC4    GCST90132260
56  NECTIN2          RG class1  19:44860443:A:G      NECTIN2      GCST005922
57  NECTIN2          RG class1  19:44860443:A:G      NECTIN2      GCST005922
58    VAMP2          RG class1   17:8154049:T:C         PER1      GCST007983
59    VAMP2          RG class1   17:8154049:T:C         PER1      GCST007983
60    VAMP2          RG class1   17:8154049:T:C         PER1      GCST007083
61    VAMP2          RG class1   17:8154049:T:C         PER1      GCST007083
62   ATP2A2   RGcycling class4 12:110285440:C:T       ATP2A2      GCST010645
63   ATP2A2   RGcycling class4 12:110285440:C:T       ATP2A2      GCST008595
64   ATP2A2   RGcycling class4 12:110285440:C:T       ATP2A2      GCST009600
65   ATP2A2   RGcycling class4 12:110285440:C:T       ATP2A2    GCST90016620
66   ATP2A2   RGcycling class4 12:110285440:C:T       ATP2A2    GCST90016619
67     GNB1   RGcycling class3    1:1921745:T:C       CFAP74      GCST010085
68    NR2F1   RGcycling class1   5:93597828:T:G       VANGL2      GCST010700
69    NR2F1   RGcycling class1   5:93597828:T:G       VANGL2      GCST010700
70   POU3F3   RGcycling class1  2:104849547:G:A       PANTR1    GCST90012894
71   POU3F3   RGcycling class1  2:104849547:G:A       PANTR1    GCST90012894
72   POU3F3   RGcycling class1  2:104849547:G:A       PANTR1    GCST90131905
73   POU3F3   RGcycling class1  2:104849547:G:A       PANTR1    GCST90131905
                                                                                                 trait
1                                                           Leisure sedentary behaviour (computer use)
2                                                                                 Schizophrenia (MTAG)
3                                                                                    Cognitive ability
4                                                                                     Anorexia nervosa
5                             Schizophrenia vs autism spectrum disorder (ordinary least squares (OLS))
6                                     Schizophrenia vs anorexia nervosa (ordinary least squares (OLS))
7                                                                                    Feeling miserable
8                                                                Family history of Alzheimer's disease
9                                         Alzheimer's disease or family history of Alzheimer's disease
10                                                                            Brain shape (segment 15)
11                                                                            Brain shape (segment 15)
12                                Whole brain restricted directional diffusion (multivariate analysis)
13                                Whole brain restricted directional diffusion (multivariate analysis)
14                                                                                Schizophrenia (MTAG)
15                                                                                   Cognitive ability
16                                                                                    Anorexia nervosa
17                            Schizophrenia vs autism spectrum disorder (ordinary least squares (OLS))
18                                    Schizophrenia vs anorexia nervosa (ordinary least squares (OLS))
19                                                                        Cortical thickness (MOSTest)
20                                                                        Cortical thickness (MOSTest)
21                                                Age when finished full-time education (standard GWA)
22                                                Major depressive disorder or stress-related disorder
23                                                Age when finished full-time education (standard GWA)
24                                      Personality traits or cognitive traits (multivariate analysis)
25                                                                                   Cognitive ability
26 Neuroticism conditioned on Townsend deprivation index (multi-trait conditioning and joint analysis)
27    Neuroticism conditioned on self-rated math ability (multi-trait conditioning and joint analysis)
28                                                                                Schizophrenia (MTAG)
29                                                                                   Cognitive ability
30                                                                                    Anorexia nervosa
31                            Schizophrenia vs autism spectrum disorder (ordinary least squares (OLS))
32                                    Schizophrenia vs anorexia nervosa (ordinary least squares (OLS))
33                                                                                 Parkinson's disease
34                                                        Risk-taking behavior (multivariate analysis)
35                                                                                    Lifetime smoking
36                                                                                      Smoking status
37                                                                        Cortical thickness (MOSTest)
38                                                                        Cortical thickness (MOSTest)
39                                                                                 Depressive symptoms
40                                                  Smoking initiation (ever regular vs never regular)
41                                                  Smoking initiation (ever regular vs never regular)
42                                                                    Age of smoking initiation (MTAG)
43                                                                                      Smoking status
44                                                        Alcohol consumption (drinks per week) (MTAG)
45                                  Whole brain restricted isotropic diffusion (multivariate analysis)
46                                                                                            Epilepsy
47                                                                                            Epilepsy
48                                                                                            Epilepsy
49                                                                                            Epilepsy
50                                                                                       Feeling tense
51                                                                                       Feeling tense
52                                                                                       Feeling tense
53                                                                                      Sleep duration
54                         Alzheimer’s disease polygenic risk score (upper quantile vs lower quantile)
55                         Alzheimer’s disease polygenic risk score (upper quantile vs lower quantile)
56                                        Alzheimer's disease or family history of Alzheimer's disease
57                                        Alzheimer's disease or family history of Alzheimer's disease
58                                                                                         Morningness
59                                                                                         Morningness
60                                                                                      Morning person
61                                                                                      Morning person
62                                                                                Schizophrenia (MTAG)
63                                                                                   Cognitive ability
64                                                                                    Anorexia nervosa
65                            Schizophrenia vs autism spectrum disorder (ordinary least squares (OLS))
66                                    Schizophrenia vs anorexia nervosa (ordinary least squares (OLS))
67                                                          Leisure sedentary behaviour (computer use)
68                                                                        Cortical thickness (MOSTest)
69                                                                        Cortical thickness (MOSTest)
70                                                                            Brain shape (segment 15)
71                                                                            Brain shape (segment 15)
72                                Whole brain restricted directional diffusion (multivariate analysis)
73                                Whole brain restricted directional diffusion (multivariate analysis)
          source.y
1              ddd
2            bipex
3            bipex
4            bipex
5            bipex
6            bipex
7            bipex
8            bipex
9            bipex
10 sfari_syndromic
11             ddd
12 sfari_syndromic
13             ddd
14           bipex
15           bipex
16           bipex
17           bipex
18           bipex
19 sfari_syndromic
20             ddd
21       sfari_asd
22           epi25
23 sfari_syndromic
24 sfari_syndromic
25           bipex
26             ddd
27             ddd
28           bipex
29           bipex
30           bipex
31           bipex
32           bipex
33           epi25
34       sfari_asd
35           bipex
36           bipex
37 sfari_syndromic
38             ddd
39           epi25
40           epi25
41           epi25
42           epi25
43           epi25
44             ddd
45             ddd
46           epi25
47       sfari_asd
48 sfari_syndromic
49             ddd
50       sfari_asd
51 sfari_syndromic
52             ddd
53             ddd
54          schema
55             ddd
56           bipex
57           epi25
58 sfari_syndromic
59             ddd
60 sfari_syndromic
61             ddd
62           bipex
63           bipex
64           bipex
65           bipex
66           bipex
67             ddd
68 sfari_syndromic
69             ddd
70 sfari_syndromic
71             ddd
72 sfari_syndromic
73             ddd

Finally, we can look specifically for cases in which our data link a GWAS SNP to a different gene than was either reported in the GWAS or closest to the associated locus.

full_join(x=mash_by_condition_output %>% 
filter(sig_anywhere) %>% 
  mutate(class=case_when(control10_lfsr < 0.05 & allsharing ~ "class1",
                         control10_lfsr < 0.05 &  !allsharing ~ "class2",
                         sig_anywhere & control10_lfsr > 0.05 & !allsharing ~ "class3",
                         sig_anywhere & allsharing ~ "class4")) %>% 
  separate(gene_snp, into = c("genename", "snp"), sep = "_") %>% 
    mutate(snp = str_sub(snp, end = -5)) %>% 
  filter(snp %in% unique(gwas_snps)) %>% 
    select(gene, snp, source, control10_lfsr:stim21pct_beta, class), 
  y=gwas_formatted %>% filter(trait %in% filtered_traits) %>% filter(n>15) %>% filter(n<500) %>%
  filter(PVALUE_MLOG>7.3) %>% unite("snp", CHR_ID:CHR_POS, sep = ":") %>% select(nearest_gene, REPORTED.GENE.S., trait, n, snp, STUDY.ACCESSION, PVALUE_MLOG) , 
  by=join_by(snp==snp), relationship = "many-to-many")  %>% 
  mutate(across(ends_with("lfsr"), signif)) %>% 
  mutate(across(ends_with("beta"), signif)) %>% 
  filter(gene != nearest_gene) %>%
  select(gene) %>% 
  distinct() %>% # remove this to get the number of associations (some pleiotoropic genes will be duplicated)
  dim()
[1] 76  1
# this gets the number of genes with a changed interpretation

Plot example eGenes

Here, I plot a couple of example eGenes identified above as cases of changed or updated interpretation of a GWAS SNP’s regulatory target based on our data.

genotypes <- read_table("data/MatrixEQTL/snps/YRI_genotypes_maf10hwee-6_full/chr1_for_matrixeqtl.snps") %>% pivot_longer(cols=starts_with("NA"), names_to = "individual", values_to = "snp") 

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
for (i in 2:22){
  genotypes <- rbind(genotypes, read_table(paste0("data/MatrixEQTL/snps/YRI_genotypes_maf10hwee-6_full/chr",i,"_for_matrixeqtl.snps", sep="")) %>% pivot_longer(cols=starts_with("NA"), names_to = "individual", values_to = "snp"))
}

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
genotypes <- genotypes %>% mutate(ID = str_sub(ID, end = -5)) 

pseudo_data <- readRDS(file = "output/pseudo_fine_quality_filtered_nomap_qtl_20240305.RDS")
pseudo_data <- filter.pseudobulk(pseudo_data, threshold = 20)

As a first example, the SNP located at 17:75826364 (rs2008012), which is associated with a white matter structural phenotype (uncinate fasciculus thickness). The nearest gene to this SNP is UNK.

p1 <- make_boxplot(celltype = "Immature", condition = "control10", testsnp = "17:75826364", testgene = "H3F3B") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
p2 <- make_boxplot(celltype = "Immature", condition = "stim1pct", testsnp = "17:75826364", testgene = "H3F3B") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
p3 <- make_boxplot(celltype = "Immature", condition = "stim21pct", testsnp = "17:75826364", testgene = "H3F3B") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  ID = col_character(),
  NA19153 = col_double(),
  NA19207 = col_double(),
  NA18913 = col_double(),
  NA19098 = col_double(),
  NA18853 = col_double(),
  NA18856 = col_double(),
  NA19093 = col_double(),
  NA19210 = col_double(),
  NA19190 = col_double(),
  NA18508 = col_double(),
  NA18517 = col_double(),
  NA18507 = col_double(),
  NA18519 = col_double(),
  NA18501 = col_double(),
  NA18489 = col_double(),
  NA19102 = col_double(),
  NA18511 = col_double(),
  NA19128 = col_double()
)
grid.arrange(p1, p2, p3, nrow=1)

p1 <- make_boxplot(celltype = "IPcycling", condition = "control10", testsnp = "17:75826364", testgene = "H3F3B") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  ID = col_character(),
  NA19153 = col_double(),
  NA19207 = col_double(),
  NA18913 = col_double(),
  NA18853 = col_double(),
  NA18856 = col_double(),
  NA19093 = col_double(),
  NA19210 = col_double(),
  NA19190 = col_double(),
  NA18502 = col_double(),
  NA18517 = col_double(),
  NA18507 = col_double(),
  NA18501 = col_double(),
  NA18489 = col_double(),
  NA19102 = col_double(),
  NA19128 = col_double()
)
p2 <- make_boxplot(celltype = "IPcycling", condition = "stim1pct", testsnp = "17:75826364", testgene = "H3F3B") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  ID = col_character(),
  NA19153 = col_double(),
  NA19207 = col_double(),
  NA18913 = col_double(),
  NA18853 = col_double(),
  NA18856 = col_double(),
  NA19093 = col_double(),
  NA19144 = col_double(),
  NA19210 = col_double(),
  NA19190 = col_double(),
  NA18502 = col_double(),
  NA18517 = col_double(),
  NA18507 = col_double(),
  NA18501 = col_double(),
  NA18489 = col_double(),
  NA19102 = col_double(),
  NA19128 = col_double()
)
p3 <- make_boxplot(celltype = "IPcycling", condition = "stim21pct", testsnp = "17:75826364", testgene = "H3F3B") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  ID = col_character(),
  NA19153 = col_double(),
  NA19207 = col_double(),
  NA18913 = col_double(),
  NA18853 = col_double(),
  NA18856 = col_double(),
  NA19093 = col_double(),
  NA19210 = col_double(),
  NA19190 = col_double(),
  NA18502 = col_double(),
  NA18517 = col_double(),
  NA18507 = col_double(),
  NA18519 = col_double(),
  NA18501 = col_double(),
  NA18489 = col_double(),
  NA19102 = col_double(),
  NA19128 = col_double()
)
grid.arrange(p1, p2, p3, nrow=1)

p1 <- make_boxplot(celltype = "RG", condition = "control10", testsnp = "22:41062780", testgene = "EP300") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
p2 <- make_boxplot(celltype = "RG", condition = "stim1pct", testsnp = "22:41062780", testgene = "EP300") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
p3 <- make_boxplot(celltype = "RG", condition = "stim21pct", testsnp = "22:41062780", testgene = "EP300") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
grid.arrange(p1, p2, p3, nrow=1)

library(Gviz)
Loading required package: grid
Warning: replacing previous import 'utils::download.file' by
'restfulr::download.file' when loading 'rtracklayer'
Warning: no function found corresponding to methods exports from 'BSgenome'
for: 'releaseName'
library(TxDb.Hsapiens.UCSC.hg38.refGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: 'AnnotationDbi'
The following object is masked from 'package:dplyr':

    select
snptrack <- AnnotationTrack(start = 75826364, width = 1, chromosome = 17, genome = "hg38", name = "rs2008012")
gtrack <- GenomeAxisTrack()

getOption("Gviz.scheme")
[1] "default"
scheme <- getScheme()
scheme$GeneRegionTrack$fill <- "blue"
scheme$GeneRegionTrack$col <- NULL
scheme$GeneRegionTrack$transcriptAnnotation <- NULL  #"symbol"
addScheme(scheme, "myScheme")
options(Gviz.scheme = "myScheme")


grtrack <- GeneRegionTrack(TxDb.Hsapiens.UCSC.hg38.refGene, genome = "hg38", chromosome = 17, 
                           name = "Gene Model",
                           start = 75770000, end = 75880000)

plotTracks(list(gtrack, grtrack, snptrack), collapseTranscripts = "longest", from = 75775000, to = 75880000)

As a second example, consider a SNP located at 22:39579302 (rs5757736), which is associated with cognitive/educational traits (with all those caveats…) as well as schizophrenia risk. The nearest gene to this SNP is CACNA1I. In our dataset, it is also associated with expression of RPS19BP1 under hypoxia in immature neurons.

p1 <- make_boxplot(celltype = "Immature", condition = "stim1pct", testsnp = "22:39579302", testgene = "RPS19BP1") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none") + ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
p2 <- make_boxplot(celltype = "Immature", condition = "stim21pct", testsnp = "22:39579302", testgene = "RPS19BP1") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none")+ ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  ID = col_character(),
  NA19153 = col_double(),
  NA19207 = col_double(),
  NA18913 = col_double(),
  NA19098 = col_double(),
  NA18853 = col_double(),
  NA18856 = col_double(),
  NA19093 = col_double(),
  NA19210 = col_double(),
  NA19190 = col_double(),
  NA18508 = col_double(),
  NA18517 = col_double(),
  NA18507 = col_double(),
  NA18519 = col_double(),
  NA18501 = col_double(),
  NA18489 = col_double(),
  NA19102 = col_double(),
  NA18511 = col_double(),
  NA19128 = col_double()
)
p3 <- make_boxplot(celltype = "Immature", condition = "control10", testsnp = "22:39579302", testgene = "RPS19BP1") + geom_boxplot(alpha=0) + theme_classic() + theme(legend.position="none")+ ylab("")

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  ID = col_character()
)
ℹ Use `spec()` for the full column specifications.
grid.arrange(p3, p1, p2, nrow=1)

library(Gviz)

snptrack <- AnnotationTrack(start = 39579302, width = 1, chromosome = 22, genome = "hg38", name = "rs5757736")
gtrack <- GenomeAxisTrack()


getOption("Gviz.scheme")
[1] "myScheme"
scheme <- getScheme()
scheme$GeneRegionTrack$fill <- "blue"
scheme$GeneRegionTrack$col <- NULL
scheme$GeneRegionTrack$transcriptAnnotation <- NULL  #"symbol"
addScheme(scheme, "myScheme")
options(Gviz.scheme = "myScheme")


grtrack <- GeneRegionTrack(TxDb.Hsapiens.UCSC.hg38.refGene, genome = "hg38", chromosome = 22, 
                           name = "Gene Model",
                           start = 39433403, end = 39729699)

plotTracks(list(gtrack, grtrack, snptrack), collapseTranscripts = "longest", from = 39433403, to = 39729699)


sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg38.refGene_3.15.0
 [2] GenomicFeatures_1.48.4                
 [3] AnnotationDbi_1.60.2                  
 [4] Biobase_2.58.0                        
 [5] Gviz_1.40.1                           
 [6] GenomicRanges_1.50.2                  
 [7] GenomeInfoDb_1.34.9                   
 [8] IRanges_2.32.0                        
 [9] S4Vectors_0.36.2                      
[10] BiocGenerics_0.44.0                   
[11] vroom_1.6.4                           
[12] MatrixEQTL_2.3                        
[13] gridExtra_2.3                         
[14] ggrepel_0.9.4                         
[15] knitr_1.45                            
[16] udr_0.3-153                           
[17] mashr_0.2.79                          
[18] ashr_2.2-63                           
[19] RColorBrewer_1.1-3                    
[20] pals_1.8                              
[21] forcats_0.5.1                         
[22] stringr_1.5.0                         
[23] dplyr_1.1.4                           
[24] purrr_1.0.2                           
[25] readr_2.1.4                           
[26] tidyr_1.3.0                           
[27] tibble_3.2.1                          
[28] ggplot2_3.4.4                         
[29] tidyverse_1.3.1                       
[30] SeuratObject_4.1.4                    
[31] Seurat_4.4.0                          
[32] workflowr_1.7.0                       

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3              rtracklayer_1.56.0         
  [3] scattermore_1.2             bit64_4.0.5                
  [5] irlba_2.3.5.1               DelayedArray_0.24.0        
  [7] rpart_4.1.16                data.table_1.14.8          
  [9] AnnotationFilter_1.20.0     KEGGREST_1.38.0            
 [11] RCurl_1.98-1.13             generics_0.1.3             
 [13] callr_3.7.3                 cowplot_1.1.1              
 [15] RSQLite_2.3.3               RANN_2.6.1                 
 [17] future_1.33.1               bit_4.0.5                  
 [19] tzdb_0.4.0                  spatstat.data_3.0-3        
 [21] xml2_1.3.3                  lubridate_1.9.3            
 [23] httpuv_1.6.5                SummarizedExperiment_1.28.0
 [25] assertthat_0.2.1            xfun_0.41                  
 [27] hms_1.1.3                   jquerylib_0.1.4            
 [29] evaluate_0.23               promises_1.2.0.1           
 [31] fansi_1.0.5                 restfulr_0.0.14            
 [33] progress_1.2.2              dbplyr_2.4.0               
 [35] readxl_1.4.0                igraph_1.5.1               
 [37] DBI_1.1.3                   htmlwidgets_1.6.2          
 [39] spatstat.geom_3.2-7         ellipsis_0.3.2             
 [41] backports_1.4.1             biomaRt_2.52.0             
 [43] deldir_1.0-6                MatrixGenerics_1.10.0      
 [45] vctrs_0.6.4                 ensembldb_2.20.2           
 [47] ROCR_1.0-11                 abind_1.4-5                
 [49] cachem_1.0.8                withr_2.5.2                
 [51] BSgenome_1.64.0             progressr_0.14.0           
 [53] checkmate_2.1.0             sctransform_0.4.1          
 [55] GenomicAlignments_1.32.0    prettyunits_1.1.1          
 [57] goftest_1.2-3               cluster_2.1.3              
 [59] lazyeval_0.2.2              crayon_1.5.2               
 [61] spatstat.explore_3.0-6      pkgconfig_2.0.3            
 [63] labeling_0.4.3              ProtGenerics_1.28.0        
 [65] nlme_3.1-157                nnet_7.3-17                
 [67] rlang_1.1.3                 globals_0.16.2             
 [69] lifecycle_1.0.4             miniUI_0.1.1.1             
 [71] filelock_1.0.2              BiocFileCache_2.6.1        
 [73] modelr_0.1.8                dichromat_2.0-0.1          
 [75] invgamma_1.1                cellranger_1.1.0           
 [77] rprojroot_2.0.3             polyclip_1.10-0            
 [79] matrixStats_1.1.0           lmtest_0.9-40              
 [81] Matrix_1.6-3                zoo_1.8-12                 
 [83] base64enc_0.1-3             reprex_2.0.1               
 [85] whisker_0.4.1               ggridges_0.5.3             
 [87] processx_3.8.2              png_0.1-8                  
 [89] viridisLite_0.4.2           rjson_0.2.21               
 [91] bitops_1.0-7                getPass_0.2-2              
 [93] KernSmooth_2.23-20          Biostrings_2.66.0          
 [95] blob_1.2.4                  mixsqp_0.3-48              
 [97] SQUAREM_2021.1              parallelly_1.37.0          
 [99] spatstat.random_3.1-3       jpeg_0.1-10                
[101] rmeta_3.0                   scales_1.2.1               
[103] memoise_2.0.1               magrittr_2.0.3             
[105] plyr_1.8.9                  ica_1.0-3                  
[107] zlibbioc_1.44.0             compiler_4.2.0             
[109] BiocIO_1.6.0                fitdistrplus_1.1-8         
[111] Rsamtools_2.12.0            cli_3.6.1                  
[113] XVector_0.38.0              listenv_0.9.1              
[115] patchwork_1.1.3             pbapply_1.5-0              
[117] ps_1.7.5                    htmlTable_2.4.0            
[119] Formula_1.2-4               MASS_7.3-56                
[121] tidyselect_1.2.0            stringi_1.8.1              
[123] highr_0.9                   yaml_2.3.5                 
[125] latticeExtra_0.6-30         VariantAnnotation_1.42.1   
[127] sass_0.4.7                  tools_4.2.0                
[129] timechange_0.2.0            future.apply_1.11.1        
[131] parallel_4.2.0              rstudioapi_0.13            
[133] foreign_0.8-82              git2r_0.30.1               
[135] farver_2.1.1                Rtsne_0.16                 
[137] digest_0.6.34               shiny_1.7.5.1              
[139] Rcpp_1.0.12                 broom_1.0.5                
[141] later_1.3.0                 RcppAnnoy_0.0.19           
[143] httr_1.4.7                  biovizBase_1.44.0          
[145] colorspace_2.1-0            rvest_1.0.2                
[147] XML_3.99-0.15               fs_1.6.3                   
[149] tensor_1.5                  reticulate_1.34.0          
[151] truncnorm_1.0-9             splines_4.2.0              
[153] uwot_0.1.16                 spatstat.utils_3.0-4       
[155] sp_2.1-3                    mapproj_1.2.11             
[157] plotly_4.10.0               xtable_1.8-4               
[159] jsonlite_1.8.7              R6_2.5.1                   
[161] Hmisc_5.1-1                 pillar_1.9.0               
[163] htmltools_0.5.7             mime_0.12                  
[165] glue_1.6.2                  fastmap_1.1.1              
[167] BiocParallel_1.32.6         codetools_0.2-18           
[169] maps_3.4.0                  mvtnorm_1.2-3              
[171] utf8_1.2.4                  lattice_0.20-45            
[173] bslib_0.5.1                 spatstat.sparse_3.0-3      
[175] curl_5.1.0                  leiden_0.4.2               
[177] interp_1.1-4                survival_3.3-1             
[179] rmarkdown_2.26              munsell_0.5.0              
[181] GenomeInfoDbData_1.2.9      haven_2.5.0                
[183] reshape2_1.4.4              gtable_0.3.4