Estimating genetic (co)variance and heritability

Proportion of variance in fitness explained by ‘DGRP line’:

Correlations in the line means

Use gcta64 to estimate the genetic variances and heritabilities for each fitness trait

Make a genomic relatedness matrix for all the lines
Run 4 univariate GREML models to find genetic variances and heritabilities
Run 6 bivariate GREML models to find the genetic covariances/correlations

Last updated: 2021-10-01

Checks: 7 0

Knit directory: fitnessGWAS/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20180914)

The command set.seed(20180914) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 01226ab

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 01226ab. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rapp.history
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .httr-oauth
    Ignored:    .pversion
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/correlations_SNP_effects_cache/
    Ignored:    code/.DS_Store
    Ignored:    code/Drosophila_GWAS.Rmd
    Ignored:    data/.DS_Store
    Ignored:    data/derived/
    Ignored:    data/input/.DS_Store
    Ignored:    data/input/.pversion
    Ignored:    data/input/dgrp.fb557.annot.txt
    Ignored:    data/input/dgrp2.bed
    Ignored:    data/input/dgrp2.bim
    Ignored:    data/input/dgrp2.fam
    Ignored:    data/input/huang_transcriptome/
    Ignored:    figures/.DS_Store
    Ignored:    figures/fig1_inkscape.svg
    Ignored:    figures/figure1a.pdf
    Ignored:    figures/figure1b.pdf

Untracked files:
    Untracked:  big_model.rds
    Untracked:  code/quant_gen_1.R
    Untracked:  data/input/genomic_relatedness_matrix.rds
    Untracked:  old_analyses/

Unstaged changes:
    Modified:   .gitignore
    Modified:   figures/GWAS_stats_figure.pdf
    Modified:   figures/SNP_effect_ED.pdf
    Modified:   figures/TWAS_stats_figure.pdf
    Modified:   figures/antagonism_ratios.pdf
    Modified:   figures/boyle_plot.pdf
    Modified:   figures/composite_mixture_figure.pdf
    Modified:   figures/eff_size_histos.pdf

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/gcta_quant_genetics.Rmd) and HTML (docs/gcta_quant_genetics.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	8d14298	lukeholman	2021-09-26	Build site.
Rmd	af15dd6	lukeholman	2021-09-26	Commit Sept 2021

library(tidyverse)
library(glue)
library(kableExtra)
library(brms)
gcta <- file.path(getwd(), "code/gcta64")
options(readr.show_col_types = FALSE)


# helper function to pass commands to the terminal
# Note that we set `intern = TRUE`, and pass the result of `system()` to `cat()`,
# ensuring that the Terminal output will be printed in this knitr report.
run_command <- function(shell_command, wd = getwd(), path = ""){
  cat(system(glue("cd ", wd, path, "\n", shell_command), intern = TRUE), sep = '\n')
}

# Load the line mean fitness values calculated by brms
line_means <- read_csv("data/derived/predicted_line_means.csv") %>% 
  mutate(line = factor(str_remove_all(line, "_"))) %>%
  filter(!is.na(male.fitness.early)) %>%
  mutate(`Female fitness early` = female.fitness.early,
         `Female fitness late` = female.fitness.late,
         `Male fitness early` = male.fitness.early,
         `Male fitness late` = male.fitness.late) %>%
  rename(female_fitness_early = female.fitness.early,
         female_fitness_late = female.fitness.late,
         male_fitness_early = male.fitness.early,
         male_fitness_late = male.fitness.late)

Proportion of variance in fitness explained by ‘DGRP line’:

The table shows the posterior estimate of the intra-class correlation coefficient (ICC, which measures the proportion of variance within and between lines), using the models in the script get_predicted_line_means.Rmd to adjust for block effects (and one covariate in the male late-life fitness analysis). ICC was calculated using ICC::ICCbare. This value approximates the broad-sense heritability; fitness appears highly heritable, except for male late-life fitness.

read.csv('data/derived/ICC.csv') %>% 
  select(Fitness_trait, Estimate, Est.Error, Q2.5, Q97.5) %>%
  rename(`Fitness trait` = Fitness_trait, `Est. Error` = Est.Error, `Lower 95% CI` = Q2.5, `Upper 95% CI` = Q97.5) %>% 
  kable(digits = 2) %>% kable_styling(full_width = FALSE)

Fitness trait	Estimate	Est. Error	Lower 95% CI	Upper 95% CI
Female early-life fitness	0.63	0.01	0.61	0.64
Female late-life fitness	0.56	0.01	0.54	0.58
Male early-life fitness	0.45	0.01	0.43	0.46
Male late-life fitness	0.14	0.01	0.12	0.15

Correlations in the line means

line_means %>% select(contains(" ")) %>% cor() %>% kable(digits =3) %>% kable_styling()

	Female fitness early	Female fitness late	Male fitness early	Male fitness late
Female fitness early	1.000	0.766	0.228	0.172
Female fitness late	0.766	1.000	0.317	0.317
Male fitness early	0.228	0.317	1.000	0.864
Male fitness late	0.172	0.317	0.864	1.000

Use `gcta64` to estimate the genetic variances and heritabilities for each fitness trait

Make a genomic relatedness matrix for all the lines

This uses the gcta64 command --make-grm to make a genomic relatedness matrix (GRM) for the DGRP SNP data. This matrix is needed to run the GREML models below.

run_command(glue("{gcta} --bfile dgrp2_QC_all_lines",
                 " --make-grm",
                 " --out gcta_GRM"), path = "/data/derived/")

*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.2 beta Mac
* (C) 2010-present, Jian Yang, The University of Queensland
* Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
*******************************************************************
Analysis started at 16:32:06 BST on Fri Oct 01 2021.
Hostname: 6300D-132190-M.local

Options: 
 
--bfile dgrp2_QC_all_lines 
--make-grm 
--out gcta_GRM 

Note: GRM is computed using the SNPs on the autosome.
Reading PLINK FAM file from [dgrp2_QC_all_lines.fam]...
205 individuals to be included from FAM file.
205 individuals to be included. 0 males, 205 females, 0 unknown.
Reading PLINK BIM file from [dgrp2_QC_all_lines.bim]...
1648723 SNPs to be included from BIM file(s).
Computing the genetic relationship matrix (GRM) v2 ...
Subset 1/1, no. subject 1-205
  205 samples, 1648723 markers, 21115 GRM elements
IDs for the GRM file has been saved in the file [gcta_GRM.grm.id]
Computing GRM...
  100% finished in 3.2 sec
1648723 SNPs have been processed.
  Used 1648723 valid SNPs.
The GRM computation is completed.
Saving GRM...
GRM has been saved in the file [gcta_GRM.grm.bin]
Number of SNPs in each pair of individuals has been saved in the file [gcta_GRM.grm.N.bin]

Analysis finished at 16:32:12 BST on Fri Oct 01 2021
Overall computational time: 5.78 sec.

Run 4 univariate GREML models to find genetic variances and heritabilities

The below uses the gcta64 command --reml to estimate the genetic variance (V(G)), i.e. the variance explained by the GRM), environmental variance (V(e)), their sum (Vp), and the so-called “SNP heritability” (V(G)/Vp). Note that the genetic variance is very low for male late-life fitness. This is a bit puzzling because the same is not true for male early-life fitness, which is strongly correlated across lines with male late-life fitness.

do_univariate_greml <- function(trait1){

  # First make 'focal_data', a 2-column data frame with the line and the focal phenotype value
  focal_data <- line_means %>%
    select(line, !! trait1)
  names(focal_data)[2] <- c("focal_pheno")
  
  # Write a file with the line and phenotype data called phenotype.txt, for gcta64
  pheno_data <- focal_data %>%
    # filter(!(line %in% lines_to_prune)) %>% 
    mutate(line_copy = line) %>%
    select(line, line_copy, focal_pheno) %>%
    as.matrix()
  
  pheno_data %>%
    write.table(row.names = FALSE, col.names = FALSE,
                file = "data/derived/phenotype.txt", quote = FALSE)
  
  console_output <- suppressWarnings(capture.output(
    run_command(glue("{gcta}  --reml --reml-maxit 10000 --grm gcta_GRM  --pheno phenotype.txt  --out delete_me"), 
                path = "/data/derived/")))
  
  errors <- console_output[str_detect(console_output, "[Ee]rror")]
  if(length(errors) == 0) errors <- NA
  if(length(errors) > 1) errors <- paste(errors, sep = "; ")

  rbind(read.delim("data/derived/delete_me.hsq", sep = "\t") , 
        data.frame(Source = "Errors", Variance = errors, SE = NA)) %>% 
    mutate(Trait = trait1) %>% select(Trait, everything())
}

bind_rows(do_univariate_greml("Female fitness early"),
          do_univariate_greml("Female fitness late"),
          do_univariate_greml("Male fitness early"),
          do_univariate_greml("Male fitness late")) %>% 
  filter(Source %in% c("V(G)", "V(e)", "Vp", "V(G)/Vp")) %>% 
  mutate(Variance = as.numeric(Variance)) %>% 
  rename(Parameter = Source) %>% 
  kable(digits = 3) %>% 
  kable_styling(full_width = FALSE)

Trait	Parameter	Variance	SE
Female fitness early	V(G)	0.121	0.221
Female fitness early	V(e)	0.765	0.443
Female fitness early	Vp	0.886	0.240
Female fitness early	V(G)/Vp	0.137	0.281
Female fitness late	V(G)	0.388	0.202
Female fitness late	V(e)	0.226	0.362
Female fitness late	Vp	0.614	0.183
Female fitness late	V(G)/Vp	0.632	0.489
Male fitness early	V(G)	0.192	0.194
Male fitness early	V(e)	0.613	0.381
Male fitness early	Vp	0.805	0.207
Male fitness early	V(G)/Vp	0.238	0.293
Male fitness late	V(G)	0.000	0.233
Male fitness late	V(e)	1.019	0.485
Male fitness late	Vp	1.019	0.268
Male fitness late	V(G)/Vp	0.000	0.229

Run 6 bivariate GREML models to find the genetic covariances/correlations

The below uses the gcta64 command --reml-bivar to estimate the genetic covariances (C(G)_tr12), i.e. the co-variance explained by the GRM), and the genetic correlation (rG), in addition to the univariate parameters for trait 1 and trait 2 mentioned above. Note that because the genetic variance is low for male late-life fitness, the model cannot reliably estimate genetic covariance or correlations for pairs of traits that involve male late-life fitness.

do_bivar_greml <- function(trait1, trait2){

  # First make 'focal_data', a 2-column data frame with the line and the focal phenotype value
  focal_data <- line_means %>%
    select(line, !! trait1, !! trait2)
  names(focal_data)[2:3] <- c("focal_pheno1", "focal_pheno2")
  
  # Write a file with the line and phenotype data called phenotype.txt, for gcta64
  pheno_data <- focal_data %>%
    # filter(!(line %in% lines_to_prune)) %>% 
    mutate(line_copy = line) %>%
    select(line, line_copy, focal_pheno1, focal_pheno2) %>%
    as.matrix()
  
  pheno_data %>%
    write.table(row.names = FALSE, col.names = FALSE,
                file = "data/derived/phenotype_pair.txt", quote = FALSE)
  
  # Note that the warnings come when gcta could not fit the model properly. I save these errors especially, so no need to print them.
  console_output <- suppressWarnings(capture.output(
    run_command(glue("{gcta}  --reml-bivar --reml-bivar-lrt-rg 0 --reml-maxit 10000 --grm gcta_GRM  --pheno phenotype_pair.txt  --out delete_me"), 
                path = "/data/derived/")))
  
  errors <- console_output[str_detect(console_output, "[Ee]rror")]
  if(length(errors) == 0) errors <- NA
  if(length(errors) > 1) errors <- paste(errors, sep = "; ")
  traits <- sort(c(trait1, trait2))
  
  rbind(read.delim("data/derived/delete_me.hsq", sep = "\t") , 
        data.frame(Source = "Errors", Variance = errors, SE = NA)) %>% 
    mutate(`Trait 1` = traits[1], `Trait 2` = traits[2]) %>% select(`Trait 1`, `Trait 2`, everything())
}

results <- bind_rows(do_bivar_greml("Female fitness early", "Female fitness late"),
          do_bivar_greml("Male fitness early", "Male fitness late"),
          do_bivar_greml("Female fitness early", "Male fitness early"),
          do_bivar_greml("Female fitness late", "Male fitness late"),
          do_bivar_greml("Female fitness early", "Male fitness late"),
          do_bivar_greml("Female fitness late", "Male fitness early"))

unlink(c("data/derived/delete_me.log", "data/derived/phenotype_pair.txt", "data/derived/delete_me.hsq"))
unlink(list.files("data/derived/", pattern = "gcta_GRM", full.names = TRUE))

results %>% 
  filter(Source %in% c("V(G)_tr1", "V(e)_tr1", "Vp_tr1", "V(G)/Vp_tr1",
                       "V(G)_tr2", "V(e)_tr2", "Vp_tr2", "V(G)/Vp_tr2",
                       "C(G)_tr12", "rG")) %>% 
  mutate(Variance = as.numeric(Variance)) %>% 
  rename(Parameter = Source) %>% 
  kable(digit = 3) %>% 
  kable_styling(full_width = FALSE)

Trait 1	Trait 2	Parameter	Variance	SE
Female fitness early	Female fitness late	V(G)_tr1	0.167	0.208
Female fitness early	Female fitness late	V(G)_tr2	0.433	0.170
Female fitness early	Female fitness late	C(G)_tr12	0.223	0.173
Female fitness early	Female fitness late	V(e)_tr1	0.668	0.410
Female fitness early	Female fitness late	V(e)_tr2	0.153	0.292
Female fitness early	Female fitness late	Vp_tr1	0.836	0.221
Female fitness early	Female fitness late	Vp_tr2	0.586	0.151
Female fitness early	Female fitness late	V(G)/Vp_tr1	0.200	0.294
Female fitness early	Female fitness late	V(G)/Vp_tr2	0.739	0.439
Female fitness early	Female fitness late	rG	0.827	0.230
Male fitness early	Male fitness late	V(G)_tr1	0.110	0.186
Male fitness early	Male fitness late	V(G)_tr2	0.001	0.234
Male fitness early	Male fitness late	C(G)_tr12	-0.010	0.197
Male fitness early	Male fitness late	V(e)_tr1	0.781	0.383
Male fitness early	Male fitness late	V(e)_tr2	1.001	0.486
Male fitness early	Male fitness late	Vp_tr1	0.891	0.216
Male fitness early	Male fitness late	Vp_tr2	1.002	0.268
Male fitness early	Male fitness late	V(G)/Vp_tr1	0.124	0.234
Male fitness early	Male fitness late	V(G)/Vp_tr2	0.001	0.234
Male fitness early	Male fitness late	rG	-1.000	147.954
Female fitness early	Male fitness early	V(G)_tr1	0.123	0.222
Female fitness early	Male fitness early	V(G)_tr2	0.192	0.195
Female fitness early	Male fitness early	C(G)_tr12	0.047	0.153
Female fitness early	Male fitness early	V(e)_tr1	0.760	0.445
Female fitness early	Male fitness early	V(e)_tr2	0.612	0.382
Female fitness early	Male fitness early	Vp_tr1	0.883	0.241
Female fitness early	Male fitness early	Vp_tr2	0.805	0.208
Female fitness early	Male fitness early	V(G)/Vp_tr1	0.139	0.284
Female fitness early	Male fitness early	V(G)/Vp_tr2	0.239	0.294
Female fitness early	Male fitness early	rG	0.306	0.888
Female fitness late	Male fitness late	V(G)_tr1	0.351	0.140
Female fitness late	Male fitness late	V(G)_tr2	0.071	0.282
Female fitness late	Male fitness late	C(G)_tr12	0.158	0.139
Female fitness late	Male fitness late	V(e)_tr1	0.119	0.244
Female fitness late	Male fitness late	V(e)_tr2	0.963	0.570
Female fitness late	Male fitness late	Vp_tr1	0.470	0.122
Female fitness late	Male fitness late	Vp_tr2	1.034	0.306
Female fitness late	Male fitness late	V(G)/Vp_tr1	0.747	0.459
Female fitness late	Male fitness late	V(G)/Vp_tr2	0.069	0.291
Female fitness late	Male fitness late	rG	1.000	2.158
Female fitness early	Male fitness late	V(G)_tr1	0.124	0.218
Female fitness early	Male fitness late	V(G)_tr2	0.001	0.233
Female fitness early	Male fitness late	C(G)_tr12	-0.012	0.159
Female fitness early	Male fitness late	V(e)_tr1	0.756	0.437
Female fitness early	Male fitness late	V(e)_tr2	1.016	0.484
Female fitness early	Male fitness late	Vp_tr1	0.880	0.237
Female fitness early	Male fitness late	Vp_tr2	1.017	0.267
Female fitness early	Male fitness late	V(G)/Vp_tr1	0.141	0.281
Female fitness early	Male fitness late	V(G)/Vp_tr2	0.001	0.229
Female fitness early	Male fitness late	rG	-1.000	95.966
Female fitness late	Male fitness early	V(G)_tr1	0.495	0.186
Female fitness late	Male fitness early	V(G)_tr2	0.241	0.164
Female fitness late	Male fitness early	C(G)_tr12	0.266	0.122
Female fitness late	Male fitness early	V(e)_tr1	0.062	0.311
Female fitness late	Male fitness early	V(e)_tr2	0.490	0.311
Female fitness late	Male fitness early	Vp_tr1	0.557	0.154
Female fitness late	Male fitness early	Vp_tr2	0.731	0.170
Female fitness late	Male fitness early	V(G)/Vp_tr1	0.889	0.531
Female fitness late	Male fitness early	V(G)/Vp_tr2	0.329	0.285
Female fitness late	Male fitness early	rG	0.770	0.353

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] brms_2.14.4      Rcpp_1.0.4.6     kableExtra_1.3.4 glue_1.4.2      
 [5] forcats_0.5.0    stringr_1.4.0    dplyr_1.0.0      purrr_0.3.4     
 [9] readr_2.0.0      tidyr_1.1.0      tibble_3.0.1     ggplot2_3.3.2   
[13] tidyverse_1.3.0  workflowr_1.6.2 

loaded via a namespace (and not attached):
  [1] readxl_1.3.1         backports_1.1.7      systemfonts_0.2.2   
  [4] plyr_1.8.6           igraph_1.2.5         splines_4.0.3       
  [7] crosstalk_1.1.0.1    TH.data_1.0-10       rstantools_2.1.1    
 [10] inline_0.3.15        digest_0.6.25        htmltools_0.5.0     
 [13] rsconnect_0.8.16     fansi_0.4.1          magrittr_2.0.1      
 [16] tzdb_0.1.2           modelr_0.1.8         RcppParallel_5.0.1  
 [19] matrixStats_0.56.0   vroom_1.5.3          xts_0.12-0          
 [22] sandwich_2.5-1       svglite_1.2.3        prettyunits_1.1.1   
 [25] colorspace_1.4-1     blob_1.2.1           rvest_0.3.5         
 [28] haven_2.3.1          xfun_0.22            callr_3.4.3         
 [31] crayon_1.3.4         jsonlite_1.7.0       lme4_1.1-23         
 [34] survival_3.2-7       zoo_1.8-8            gtable_0.3.0        
 [37] emmeans_1.4.7        webshot_0.5.2        V8_3.4.0            
 [40] pkgbuild_1.0.8       rstan_2.21.2         abind_1.4-5         
 [43] scales_1.1.1         mvtnorm_1.1-0        DBI_1.1.0           
 [46] miniUI_0.1.1.1       viridisLite_0.3.0    xtable_1.8-4        
 [49] bit_1.1-15.2         stats4_4.0.3         StanHeaders_2.21.0-3
 [52] DT_0.13              htmlwidgets_1.5.1    httr_1.4.1          
 [55] threejs_0.3.3        ellipsis_0.3.1       pkgconfig_2.0.3     
 [58] loo_2.3.1            dbplyr_1.4.4         tidyselect_1.1.0    
 [61] rlang_0.4.6          reshape2_1.4.4       later_1.0.0         
 [64] munsell_0.5.0        cellranger_1.1.0     tools_4.0.3         
 [67] cli_2.0.2            generics_0.0.2       broom_0.5.6         
 [70] ggridges_0.5.2       evaluate_0.14        fastmap_1.0.1       
 [73] yaml_2.2.1           bit64_0.9-7          processx_3.4.2      
 [76] knitr_1.32           fs_1.4.1             nlme_3.1-149        
 [79] whisker_0.4          mime_0.9             projpred_2.0.2      
 [82] xml2_1.3.2           compiler_4.0.3       bayesplot_1.7.2     
 [85] shinythemes_1.1.2    rstudioapi_0.11      gamm4_0.2-6         
 [88] curl_4.3             reprex_0.3.0         statmod_1.4.34      
 [91] stringi_1.5.3        highr_0.8            ps_1.3.3            
 [94] Brobdingnag_1.2-6    gdtools_0.2.2        lattice_0.20-41     
 [97] Matrix_1.2-18        nloptr_1.2.2.1       markdown_1.1        
[100] shinyjs_1.1          vctrs_0.3.0          pillar_1.4.4        
[103] lifecycle_0.2.0      bridgesampling_1.0-0 estimability_1.3    
[106] httpuv_1.5.3.1       R6_2.4.1             promises_1.1.0      
[109] gridExtra_2.3        codetools_0.2-16     boot_1.3-25         
[112] colourpicker_1.0     MASS_7.3-53          gtools_3.8.2        
[115] assertthat_0.2.1     rprojroot_1.3-2      withr_2.2.0         
[118] shinystan_2.5.0      multcomp_1.4-13      mgcv_1.8-33         
[121] parallel_4.0.3       hms_0.5.3            grid_4.0.3          
[124] coda_0.19-3          minqa_1.2.4          rmarkdown_2.5       
[127] git2r_0.27.1         shiny_1.4.0.2        lubridate_1.7.10    
[130] base64enc_0.1-3      dygraphs_1.1.1.6