Last updated: 2022-07-25
Checks: 7 0
Knit directory: SISG2022_Association_Mapping/
Before you begin:
The R template to do the exercises is here.
Note: if on the online server, set your working directory to your home directory using in R
The data files are in the folder /data/SISG2022M15/data/
We will be analyzing a simulated data set which contains sample structure to better understand the impact it can have in GWAS analyses if not accounted for. We will perform GWAS on a quantitative phenotype which was simulated to have high heritability and be highly polygenic.
The file “sim_rels_pheno.txt”” contains the phenotype measurements for a set of individuals and the file “sim_rels_geno.bed” is a binary file in PLINK BED format with accompanying BIM and FAM files which contains the genotype data at null variants (i.e. simulated as not associated with the phenotype).
How should we expect the QQ/Manhatthan plots to look like under this scenario?
Here are some things to try:
and the sim_rels_geno.{bed,bim,fam}
genotype files. Only perform association test on SNPs that pass the following quality control threshold filters:The basic command would look like
system("plink2 --bfile /data/SISG2022M15/data/sim_rels_geno --pheno /data/SISG2022M15/data/sim_rels_pheno.txt --pheno-name <pheno_name> --maf <min_MAF> --geno <max_miss> --hwe <hwe_p_thresh> --glm allow-no-covars --out <output_prefix>")
R function. The basic command would look likemanhattanPlot(
p = <pvalues>,
chromosome = <chromosomes>,
thinThreshold = 1e-4,
main= <title>
R function. The basic command would look likeqqPlot(
pval = <pvalues>,
thinThreshold = 1e-4,
main= <title>
Compute the genomic control inflation factor \(\lambda_{GC}\) based on the p-values. (Hint: convert p-values to \(\chi^2_1\) test statistics using the R function qchisq()
). Is there evidence of possible inflation due to confounding?
Now use REGENIE to perform a GWAS of the phenotype using a whole genome regression model.
We want to use high quality variants in the Step 1 null model fitting. Using PLINK, apply QC filters to remove variants with MAF below 5%, missingness above 1%, HWE p-value below 0.001, minor allele count (MAC) below 20. (hint: use --write-snplist
to store list of variants passing QC without making a new BED file)
Run REGENIE Step 1 to fit the null model and obtain polygenic predictions using a leave-one-chromosome-out (LOCO) scheme. The basic command would look like
system("regenie --bed /data/SISG2022M15/data/sim_rels_geno --phenoFile /data/SISG2022M15/data/sim_rels_pheno.txt --step 1 --loocv --bsize 1000 --qt --extract <plink_QC_pass_snplist> --out <output_prefix_step1>")
system("regenie --bed /data/SISG2022M15/data/sim_rels_geno --phenoFile /data/SISG2022M15/data/sim_rels_pheno.txt --step 2 --bsize 400 --qt --pred <output_prefix_step1>_pred.list --extract <plink_GWAS_snplist> --out <output_prefix_step2>")
