Log2022

Last updated: 2022-08-24

Checks: 7 0

Knit directory: GradLog/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20201014)

The command set.seed(20201014) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: ff45d00

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version ff45d00. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory

Untracked files:
    Untracked:  analysis/week_log.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Log2022.Rmd) and HTML (docs/Log2022.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	ff45d00	liliw-w	2022-08-24	add new week
html	56c966a	liliw-w	2022-08-18	Build site.
Rmd	31d1e52	liliw-w	2022-08-18	test side by side figures
html	0ca4da8	liliw-w	2022-08-17	Build site.
Rmd	985ba11	liliw-w	2022-08-17	add a new week
html	a85e562	liliw-w	2022-08-11	Build site.
Rmd	4ea6502	liliw-w	2022-08-11	adjust figures
html	d3ffddd	liliw-w	2022-08-11	Build site.
Rmd	06cbd57	liliw-w	2022-08-11	New week
html	4edeb7e	liliw-w	2022-07-20	Build site.
Rmd	4870a2e	liliw-w	2022-07-20	add new week
html	dbaa236	liliw-w	2022-06-29	Build site.
Rmd	ba5c42b	liliw-w	2022-06-29	Add new weeks
html	07720fc	liliw-w	2022-06-11	Build site.
Rmd	41a531b	liliw-w	2022-06-11	update theme
html	dacdae7	liliw-w	2022-06-11	Build site.
html	e6a2199	liliw-w	2022-06-10	Build site.
Rmd	ed7392a	liliw-w	2022-06-10	update rmd
html	f60b0b9	liliw-w	2022-06-09	Build site.
Rmd	06afdff	liliw-w	2022-06-09	update error bar from se to 0.95 CI
html	0155a0f	liliw-w	2022-06-08	Build site.
Rmd	518a1c0	liliw-w	2022-06-08	add more details
html	607ff2b	liliw-w	2022-06-08	Build site.
Rmd	397fb52	liliw-w	2022-06-08	add a new week
html	21ae027	liliw-w	2022-06-08	Build site.
Rmd	0db9d74	liliw-w	2022-06-08	add a new week
html	0835a9c	liliw-w	2022-04-27	Build site.
Rmd	6d481ca	liliw-w	2022-04-27	fix a typo
html	1b8143d	liliw-w	2022-04-27	Build site.
Rmd	0626baa	liliw-w	2022-04-27	add a new week
html	59c3d71	liliw-w	2022-04-25	Build site.
Rmd	9d92b99	liliw-w	2022-04-25	Add a new week
html	2e6c039	liliw-w	2022-04-14	Build site.
Rmd	fae1f4e	liliw-w	2022-04-14	adjust fig position
html	3395a26	liliw-w	2022-04-14	Build site.
Rmd	d0a08e6	liliw-w	2022-04-14	adjust fig position
html	8b9071e	liliw-w	2022-04-14	Build site.
Rmd	424015b	liliw-w	2022-04-14	add new week
html	72025bf	liliw-w	2022-02-04	Build site.
Rmd	089c609	liliw-w	2022-02-04	new week
html	1274e24	llw	2022-01-19	Build site.
Rmd	9976af6	llw	2022-01-19	new week
html	bd69a21	llw	2022-01-13	Build site.
Rmd	28d0a2b	llw	2022-01-13	new week
html	ec0a2af	llw	2022-01-04	Build site.
Rmd	9e5c2e0	llw	2022-01-04	new week
html	97b4ca1	llw	2022-01-03	Build site.
html	4c90bd8	llw	2022-01-03	Build site.
Rmd	6dc3f9c	llw	2022-01-03	add a new year

If any figures don’t show, try opening in Safari.

Aug 17 & Aug 24

1. Merged coloc regions

How?

If two trans-eQTL regions under previous definition have close lead SNPs within 200kb, they are merged to one region. As a result, 255 regions were merged to 179 regions.
I counted a merged region as a coloc region if any of its sub-regions has coloc signals.
I calculated the coloc proportion as #coloc/(255 or 179), instead of using those “candidate regions” as denominator as calculated previously.
I looked at 29 blood related traits, autoimmune diseases, and some other traits (including height).

Figures & Observations

Figure: Coloc proportion of blood traits, autoimmune diseases, and other traits, based on 255 originally defined regions and 179 merged regions. Blue: original non-merged regions. Green: merged regions. Dark: coloc.

Version	Author	Date
985ba11	liliw-w	2022-08-17

Other files:

Numerical results of the figure, see
- For blood, /project2/xuanyao/llw/coloc/ukbb_coloc_blood_traits/data/coloc_region_prop_merged.txt.
- For autoimmune, /project2/xuanyao/llw/coloc/immune_traits/pmid_all/coloc_region_prop_merged.txt.
- For other ukbb traits, /project2/xuanyao/llw/coloc/ukbb_coloc_more_traits/all_trait/data/coloc_region_prop_merged.txt.
Detailed coloc merged region for each trait, e.g. the original coloc region was merged to which region, see
- For blood, /project2/xuanyao/llw/coloc/ukbb_coloc_blood_traits/data/pheno*.coloc_reg_w_merged.txt.
- For autoimmune, /project2/xuanyao/llw/coloc/immune_traits/pmid*/data/coloc_reg_w_merged.txt.
- For other ukbb traits, /project2/xuanyao/llw/coloc/ukbb_coloc_more_traits/ukbb*/data/coloc_reg_w_merged.txt.

For height, modules : M153, M156, M25, M51. Loci: 3:101044144. Similar for autoimmune traits.

Other updated figures

Figure: Number of coloc regions for pairs of (module, trait) for all traits. Left: non-merged coloc regions. Right: merged coloc regions. Colors: Traits of various types.

Version	Author	Date
985ba11	liliw-w	2022-08-17

Figure: Number of coloc regions for pairs of (module, trait) for all traits. Left: non-merged coloc regions. Right: merged coloc regions. Colors: Traits of various types.

Version	Author	Date
985ba11	liliw-w	2022-08-17

2. S-LDSC for “other traits”

All modules for all blood traits

/project2/xuanyao/llw/ldsc/h2_enrich_comb/M*_blood_traits.results
All modules for all autoimmune diseases and all other traits

/project2/xuanyao/llw/ldsc/h2_enrich_comb/T_*_all_modules.results
Figures for all
- Module based (blood traits)
/project2/xuanyao/llw/ldsc/h2_enrich_comb/plots/M*_all_traits.results.pdf
- Trait based (autoimmune diseases & other traits)
/project2/xuanyao/llw/ldsc/plots/T_*_all_modules.results.pdf
- Enrich p v.s. module v.s. trait
/project2/xuanyao/llw/ldsc/plots/h2_enrich_all.pdf

Figure: Heatmap of coloc regions and h2 enrichment for pairs of module and trait.

Version	Author	Date
b3ba297	liliw-w	2022-08-24

3. Covariates

117 modules

Use $10^{-7}$ as signal cutoff,

2058 (trans-eQTL, module) pairs
46 modules
1871 unique trans-eQTLs

Figure: Heatmap of coloc regions and h2 enrichment for pairs of module and trait.

Version	Author	Date
b3ba297	liliw-w	2022-08-24

Figure: Heatmap of coloc regions and h2 enrichment for pairs of module and trait.

Version	Author	Date
b3ba297	liliw-w	2022-08-24

Aug 10

1. How many trans-eQTL Loci are cis to the trans target gene module?

How? and result files

I used two ways:

If a signal’s nearest gene (cis- genes within 1Mb) is (are) in its trans- target module.
What is the distance between a signal and genes in the module? Is distance to any gene within 1Mb of the signal?

Result file:

/project2/xuanyao/llw/DGN_no_filter_on_mappability/postanalysis/cis_genes_in_module.txt
/project2/xuanyao/llw/DGN_no_filter_on_mappability/postanalysis/dis_to_genes_in_module.txt

Numerical results

Among 3899 (SNP, module) signal pairs in DGN,

The nearest gene (or at least one gene within 1Mb) of trans-eQTLs in 79 (372) (SNP, module) pairs are in the trans- target module.
trans-eQTLs in 186 (372) (SNP, module) pairs are within 100kb (1Mb) of at least one gene in the trans- target module.

2. Simulation when beta has same correlation as $\Sigma$

Goal

To see if PC1 has any power in this case.

Signals of previous simulations have large p-values (max ~0.05), under null distribution.
The simulated null distribution has large p-values (0.001% < 1e-5, 0.01% < 1e-4, 0.1% < 1e-3).
Signals in DGN have largest p-value ~1e-7.
Why simulated null distribution has large p-values?

Procedure

To look into how $cor(\beta)$ affects the power, I considered the following parameter settings,

$cor(\beta)=\Sigma$, $caus=0.3$, $N=800$
$cor(\beta)=\Sigma$, $caus=1$, $N=800$
$cor(\beta)=I$, $caus=1$, $N=800$
$cor(\beta)=I$, $caus=0.3$, $N=800$ (previous setting)

Observation

Figure: coloc between trans and cis loci.

Version	Author	Date
1b724ae	liliw-w	2022-08-11

PC1 still has low power.
Using $\Sigma$ as $cor(\beta)$ doesn’t matter much to power under $caus=0.3$.
Using $\Sigma$ as $cor(\beta)$ increase power under $caus=1$.
Increased $caus$ doesn’t matter much to power under independent $cor(\beta)$.
Increased $caus$ increase power under $cor(\beta)=\Sigma$.

3. Cell type analysis

Aggregate p-values, Bonferroni, signals. Except M1 & M2.

Run z.sh for M1 & M2.

4. Re-define coloc regions

Previously, a region for coloc is defined to be a region of length 100kb centered at a lead SNP. As a result, for two coloc regions, their lead SNPs can be as closest as 50kb. In most cases (only one causal variant), it doesn’t make much sense to see these two regions as two different regions.

Therefore, I need to merge these kind of regions. How?

I focus on the regions with significant coloc. If two regions’ lead SNPs are within 200kb, then these regions are counted as one.

See result file: /project2/xuanyao/llw/coloc/ukbb_coloc_blood_traits/res_coloc_reg_merged.txt

Aug 03

1. Check enriched $h^2$ of all modules in all three autoimmune diseases

To look at if there is h2 enrichment of any module in three autoimmune diseases (i.e. cd, ibd, allergy), I ran S-LDSC for all modules.

See h2 enrich for cd, h2 enrich for ibd, h2 enrich for allergy.

2. Run Trans-PCO in DGN with keeping cell type proportion and expression pc’s

This is a suggestion from the committee meeting. We observed a high proportion of trans-eQTLs and blood traits. We interpreted it as the enrichment of trans-eQTLs in cell type proportion.

However, since we attempted to remove the cell type proportion as covariates, the above intepretion doesn’t send a good message.

So, the suggestion was to not remove cell type proportion at all, and see what happens. Therefore, I need to run trans-PCO without regression out cell type proportions (and expression PC’s, since they are also relevant to cell type proportions), and see how the gene modules, signals, coloc with blood traits change.

Progress: running trans-PCO on DGN.

3. Update eQTLGen using ratio 100 (instead of 50)

Numbers

108 (previously, 130) modules are used for signal detection.
There are 5610 significant (trans-eQTL, module) pairs.
There are 101 modules that have at least one trans-eQTL.
There are 1697 eQTLGen SNPs that are significant trans-eQTL for at least one module.

Signal figures

Updates see /project2/xuanyao/llw/eQTLGen_est_Sigma/plot.

Update: DGN replication in eQTLGen

Out of 3899 DGN (SNP, module) signal pairs, 33 of them are also analyzed in eQTLGen.
Out of 33 DGN signals, 33 are replicated in eQTLGen.

Update: eQTLGen replication in DGN

Out of 5610 eQTLGen (SNP, module) signal pairs, 5230 of them are also analyzed in DGN.
Out of 5230 eQTLGen signals, 69 are replicated in DGN.

Add buffer region to DGN signals to see replication

eQTLGen specific signal (SNP, module) pairs not in DGN

Out of 5230 eQTLGen (SNP, module) signal pairs that were also analyzed by DGN,

145 190 212 236 are near at least one DGN signal within same loci of distance:

1e+05 2e+05 5e+05 1e+06 bp, respectively.

4672 are not close to any DGN signals.
eQTLGen specific signal SNPs not in DGN

Out of 1558 eQTLGen signal SNPs that were also analyzed by DGN,

153 216 263 378 are near at least one DGN signal within same loci of distance:

1e+05 2e+05 5e+05 1e+06 respectively.

0 are not close to any DGN signals.
Interesting examples detected in eQTLGen but missed in DGN

In earlier section where I used ratio 50 to see the replications, I gave a few interesting examples of signals detected in eQTLGen but missed in DGN and have more significant z-scores.

After updating the ratio to 100, those examples still hold.

4. h2 enrichment heatmap

Figure: Heatmap of coloc regions and h2 enrichment for pairs of module and trait.

Version	Author	Date
1b724ae	liliw-w	2022-08-11

Figure: Heatmap of coloc regions and h2 enrichment for pairs of module and trait.

Version	Author	Date
1b724ae	liliw-w	2022-08-11

Add other non-blood traits?

June 29 & July 06

1. coloc between trans- and cis-QTLs

With cis-sQTLs

255 trans- regions in total.
163 candidate coloc regions, i.e. regions which have at least one intron with min p < 1e-5. (Light green regions).
66 regions are coloced with at least one intron. (Dark green regions).

With cis-eQTLs (updates)

When I was looking into coloc with cis-s, I found some results were weird about the coloc numbers with cis-e. So I re-analyzed cis-e coloc with trans. Numbers are updated as below.

255
120 –> 222
54 –> 93

With either cis-e or cis-s

Upset plot

Figure: coloc between trans and cis loci.

Version	Author	Date
521ea43	liliw-w	2022-06-29

2. Module g:profiler enrichment

Files for enrichment of each 166 modules, /project2/xuanyao/llw/DGN_no_filter_on_mappability/module_enrich.

3. Simulations

Simulation with large N, caus, var

Aim

To look at PC1 and PCO power. If PC1 has any power in this case.
Parameters

var.b: 0.2 (0.001 by default), caus: 1 (0.3 by default), N: 800 (500 by default). 10^3 simulations, 10^4 samples.

I looked at only one scenario.
Figures & Observations
- Trans-PCO and MinP both have power of 1 across all simulations.
- PC1 tends to be either extremely powerful or powerless.

Figure: Simulation power when parameters, N, caus, var, are large. Green point is mean +- 95% CI.

Version	Author	Date
26e2dad	liliw-w	2022-07-20

Simulation where PC1 has the highest power

Aim

When true beta’s align with the first eigenvector, what does power look like for PC1? and the other two methods?
Parameters, and how?

To make beta’s have same direction with u1, I first make beta equal u1 (which has length 1), then scale beta by a constant to set its genetic variance.

I set beta’s to have the same length as beta generated by default setting, i.e. var.b=0.001, caus=0.3, N=500. However, in this case, powers of all three methods are low, even for the oracle test, which indicates power in this case is intrinsically low.

This may be due to the smaller effects of each individual gene given a fixed var and more genes with nonzero effects.

Therefore, I increased beta’s length by $u_1 \times ||beta^{default}||/0.3 \times 1.2$, i.e. 4 times longer.

I looked at power at various N’s.
Figures & Observations
- PC1 has the highest power.

Figure: Simulation power when PC1 has the highest power. Error bar is mean +- 95% CI.

Version	Author	Date
26e2dad	liliw-w	2022-07-20

Simulation 3

4. trans- target module of gene KLF14

Goal

KLF14 relates to T1D, an autoimmune disease. Check if its trans- target module is enriched in T1D related pathways.

Observations

Signal is (7:130536491, module30).
Genomic location of the trans-eQTL and KLF14, see genome browser. The distance is 119,090 bp.
GO enrichment of the module 30

See enrichment here.

Module 30 is enriched in two GO molecular functions:
- pantetheine hydrolase activity
The Vanin genes are a family that encode pantetheinases (or pantetheine hydrolase).

VNN1, VNN2, VNN3 are in module 30.
- CXCR chemokine receptor binding