Estimating the mean phenotype for each DGRP line

Estimate the line means for relative fitness

Define functions for prediction
Predict line means and save the results
Summary of model used to predict female line means:
Summary of model used to predict male line means:
Percent of variance in fitness explained by ‘DGRP line’:

Estimate genetic (co)variance from the genomic relatedness matrix

Last updated: 2021-03-04

Checks: 6 1

Knit directory: fitnessGWAS/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20180914)

The command set.seed(20180914) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: absolute

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute	relative
~/Rprojects/fitnessGWAS/	.

Repository version: b1836d7

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b1836d7. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rapp.history
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .httr-oauth
    Ignored:    .pversion
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/correlations_SNP_effects_cache/
    Ignored:    analysis/plot_models_variant_effects_cache/
    Ignored:    code/.DS_Store
    Ignored:    code/Drosophila_GWAS.Rmd
    Ignored:    data/.DS_Store
    Ignored:    data/derived/
    Ignored:    data/input/.DS_Store
    Ignored:    data/input/.pversion
    Ignored:    data/input/dgrp.fb557.annot.txt
    Ignored:    data/input/dgrp2.bed
    Ignored:    data/input/dgrp2.bim
    Ignored:    data/input/dgrp2.fam
    Ignored:    data/input/huang_transcriptome/
    Ignored:    figures/.DS_Store

Untracked files:
    Untracked:  Rplots.pdf
    Untracked:  analysis/GO_KEGG_enrichment.Rmd
    Untracked:  analysis/TWAS_tables.Rmd
    Untracked:  analysis/plot_models_variant_effects.Rmd
    Untracked:  code/GO_and_KEGG_gsea.R
    Untracked:  code/eQTL_analysis.Rmd
    Untracked:  code/make_sites_files_for_ARGweaver.R
    Untracked:  code/plotting_results.Rmd
    Untracked:  code/run_argweaver.R
    Untracked:  code/run_mashr.R
    Untracked:  data/argweaver/
    Untracked:  data/input/TF_binding_sites.csv
    Untracked:  figures/GWAS_mixture_proportions.rds
    Untracked:  figures/GWAS_stats_figure.pdf
    Untracked:  figures/TWAS_mixture_proportions.rds
    Untracked:  figures/TWAS_stats_figure.pdf
    Untracked:  figures/antagonism_ratios.pdf
    Untracked:  figures/composite_mixture_figure.pdf
    Untracked:  figures/eff_size_histos.pdf
    Untracked:  figures/stats_figure.pdf
    Untracked:  manuscript/

Unstaged changes:
    Modified:   .gitignore
    Modified:   README.md
    Modified:   analysis/GWAS_tables.Rmd
    Modified:   analysis/TWAS.Rmd
    Modified:   analysis/_site.yml
    Deleted:    analysis/about.Rmd
    Modified:   analysis/checking_mashr_results.Rmd
    Deleted:    analysis/eQTL_analysis.Rmd
    Modified:   analysis/gwas_adaptive_shrinkage.Rmd
    Modified:   analysis/license.Rmd
    Modified:   analysis/make_annotation_database.Rmd
    Modified:   analysis/plot_line_means.Rmd
    Modified:   analysis/plotting_results.Rmd
    Deleted:    code/gwas_adaptive_shrinkage.R
    Modified:   data/input/female_fitness.csv
    Modified:   data/input/female_fitness_CLEANED.csv
    Modified:   data/input/male_fitness.csv
    Modified:   data/input/male_fitness_CLEANED.csv
    Modified:   figures/figure1.eps
    Deleted:    figures/figure1.png
    Modified:   figures/figure2.eps

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/get_predicted_line_means.Rmd) and HTML (docs/get_predicted_line_means.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	b1836d7	lukeholman	2021-03-04	big first commit 2021
Rmd	8d54ea5	Luke Holman	2018-12-23	Initial commit
html	8d54ea5	Luke Holman	2018-12-23	Initial commit

library(tidyverse)
library(brms)
library(mice)
library(ICC)
library(future)
library(future.apply)
library(kableExtra)

Estimate the line means for relative fitness

Most studies of the DGRP calculate “line means” simply by averaging the phenotypes of several individuals per line. Instead, we used Bayesian generalised linear mixed model (GLMMs) to estimate the true line means for each of the 4 fitness traits, implemented in the brms package.

One notable advantage of using a model instead of a simple average is that a model allows one to statistically remove confounding factors such as block effects, which would otherwise bias the line means. Note that we intentionally measured line 352 in every block in our experiments, in order to improve our precision to estimate block effects.

Additionally, we must properly account for our experimental design in order to avoid pseudoreplication. Our design involved repeatedly measuring the same individuals to determine early- and late-life fitness.

For the progeny count data used to measure early and late female fitness, we used multivariate model with two components, one for early- and one for late-life fitness, which had the following formulae (in brms-like pseudocode):

Progeny count (early) = Intercept + (1 | p | line) + (1 | q | block) + (1 | r | vial)

Progeny count (late) = Intercept + (1 | p | line) + (1 | q | block) + (1 | r | vial)

To explain, the terms with the form (1 | x | y) represent random intercepts for line, block, and vial ID, where the letter between the pipe operator indicates which random intercepts should be assumed to be (potentially) correlated with which other random intercepts. We therefore assume that the effect of line, block, and vial ID on each observation are potentially correlated for both of the response variables, and estiamte that correlation while fitting the multivariate model.

The progeny count data were assumed to follow a Poisson distribution (with the standard log link function).

Similarly, for our measure of male fitness (proportion of offspring sired), we used a multivariate model consisting of a pair of binomial GLMMs, with the following formula:

Proportion of progeny sired (early) = Intercept + (1 | p | line) + (1 | q | block) + (1 | r | vial)

Proportion of progeny sired (late) = Intercept + num.GFP.males + (1 | p | line) + (1 | q | block) + (1 | r | vial)

Where the response variable was a Bernoulli outcome describing the paternity of each offspring observed. Note that in the late life fitness model, we included the number of surviving rival (GFP) males as an additional predictor. We reasoned that the rival males might die at random with respect to the genotype of the focal males, which lowers competition on the focal males, and so we can get an improved estimate of the focal males’ siring success by statistically correcting for their deaths via the model.

Define functions for prediction

load_and_tidy_fitness_data <- function(){
  female_fitness <- read_csv("data/input/female_fitness.csv") %>% 
    arrange(line) %>%
    split(paste(.$block, .$line)) %>%
    map(~ .x %>% mutate(replicate = 1:n())) %>%
    bind_rows() %>%
    mutate(vial = 1:n(),
           vial = paste("F", line, block, replicate, sep = "_"),
           line = paste("line_", line, sep = "")) %>% 
    dplyr::select(-num.F.late1, -num.F.late2, -num.laying.F.early, -replicate) %>%
    mutate(female.fitness.early = replace(female.fitness.early, is.na(female.fitness.early), 0),
           female.fitness.late = replace(female.fitness.late, is.na(female.fitness.late), 0)) 
  
  male_fitness <- read_csv("data/input/male_fitness.csv") %>% 
    arrange(line) %>%
    split(paste(.$block, .$line)) %>%
    map(~ .x %>% mutate(replicate = 1:n())) %>%
    bind_rows() %>%
    mutate(vial = paste("M", line, block, replicate, sep = "_"),
           line = paste("line_", line, sep = "")) %>%
    dplyr::select(-num.DGRP.males, -replicate)
  
  # Remove the small number of observations where the females had no offspring, so we cannot measure the
  # rival vs focal male competitive abilities
  # NOTE: Not needed any more, because now I impute this as missing data
  # exclude_early <- male_fitness$early.male.rival + male_fitness$early.male.focal == 0
  # exclude_late <- male_fitness$late.male.rival + male_fitness$late.male.focal == 0
  # male_fitness$early.male.rival[exclude_early] <- NA
  # male_fitness$early.male.focal[exclude_early] <- NA
  # male_fitness$late.male.rival[exclude_late] <- NA
  # male_fitness$late.male.focal[exclude_late] <- NA
  
  # Save these tidy files, for data archiving
  write.csv(female_fitness, file = "data/input/female_fitness_CLEANED.csv", row.names = FALSE)
  write.csv(male_fitness, file = "data/input/male_fitness_CLEANED.csv", row.names = FALSE)
  
  list(female_fitness = female_fitness, male_fitness = male_fitness)
}


# This function calculates the traditional line means (i.e. simple averages) for each line
get_line_means <- function(){
  left_join(
    load_and_tidy_fitness_data()[[1]] %>% 
      select(line, female.fitness.early, female.fitness.late) %>% 
      group_by(line) %>%
      summarise(female.fitness.early = mean(female.fitness.early), 
                female.fitness.late = mean(female.fitness.late)),
    load_and_tidy_fitness_data()[[2]] %>% 
      group_by(line) %>%
      summarise(early.male.rival = sum(early.male.rival),
                early.male.focal = sum(early.male.focal),
                late.male.rival = sum(late.male.rival),
                late.male.focal = sum(late.male.focal),
                male.fitness.early = early.male.focal / (early.male.focal + early.male.rival),
                male.fitness.late = late.male.focal / (late.male.focal + late.male.rival)) %>%
      select(line, male.fitness.early, male.fitness.late),
    by = "line") %>% as.data.frame()
}

# A function to calculate each line's expected fitness, using brms models to adjust for block + vial random effects
# For the male late life assay, we also adjust for the number of competitor males that died 
# (this assumes that their deaths happened randomly with respect to which line we were measuring)
get_predicted_line_means <- function(overwrite = FALSE){
  
  # If it's already calculated, just load up the file. Otherwise, make the file
  out.file <- "data/derived/predicted_line_means.csv"
  if(file.exists(out.file) & !overwrite) return(read_csv(out.file))
  
  ####### Load the individual-level data for females
  female_fitness <- load_and_tidy_fitness_data()[[1]] 
  
  ####### Load the individual-level data for males, and impute missing values
  
  # For one vial, there is no early-life measurement because the females did not produce offspring. 
  # Therefore we impute that datapoint and make 5 imputed datasets using the package 'mice'
  # See this link for explanation: https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html
  male_fitness <- load_and_tidy_fitness_data()[[2]]
  male_fitness <- male_fitness %>% 
    mutate(early.male.rival = replace(early.male.rival, early.male.rival + early.male.focal == 0, NA),
           early.male.focal = replace(early.male.focal, is.na(early.male.rival), NA)) 
  male_fitness_imputed <- mice(male_fitness, m = 5, print = FALSE, seed = 1)
  male_fitness_imputed <- lapply(1:5, function(i) { # for each imputed dataset, 
    complete(male_fitness_imputed, i) %>% 
      mutate(total1 = early.male.rival + early.male.focal, # tally up the focal+rival offspring for binomial model
             total2 = late.male.rival + late.male.focal,
             num.GFP.males = as.numeric(scale(num.GFP.males))) %>% # scale this covariate
      # For late-life assays where the 5 focal males all died before reaching age 14, 
      # we need to assign them zero fitness. We did this by assuming that the rivals sired 100/100 offspring.
      # This scenario happened in 47 vials (out of 810)
      mutate(late.male.focal = replace(late.male.focal, total2 == 0, 100),
             total2 = replace(total2, total2 == 0, 100))
  })

  ####### Run model on the female data 
  
  prior_females <- 
    c(set_prior("cauchy(0, 0.1)", class = "sd", resp = 'femalefitnessearly', group = "line"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'femalefitnessearly', group = "block"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'femalefitnessearly', group = "vial"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'femalefitnesslate', group = "line"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'femalefitnesslate', group = "block"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'femalefitnesslate', group = "vial"))
  
  prior_males <- 
    c(set_prior("cauchy(0, 0.1)", class = "sd", resp = 'earlymalefocal', group = "line"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'earlymalefocal', group = "block"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'earlymalefocal', group = "vial"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'latemalefocal', group = "line"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'latemalefocal', group = "block"),
      set_prior("cauchy(0, 0.1)", class = "sd", resp = 'latemalefocal', group = "vial"))
      
   
  
  
  female1 <- bf(female.fitness.early ~ (1 | p | line) + (1 | q | block) + (1 | r | vial))
  female2 <- bf(female.fitness.late ~  (1 | p | line) + (1 | q | block) + (1 | r | vial))
  female_model <- brm(female1 + female2, 
                      family = "poisson", data = female_fitness,
                      prior = prior_females,
                      iter = 5000, chains = 4, cores = 4,
                      seed = 1, control = list(adapt_delta = 0.999, max_treedepth = 15)) 
  writeLines(capture.output(summary(female_model)), con = "data/derived/model_summary_females.txt")
  
  saveRDS(female_model, "data/derived/female_fitness_prediction_model.rds") # maybe delete later - probably not needed?
  
  
  ### Run model on male data 
  # Fit num.GFP.males, but only for the late life data
  male1 <- bf(early.male.focal | trials(total1) ~ (1 | p | line) + (1 | q | block) + (1 | r | vial))
  male2 <- bf(late.male.focal  | trials(total2) ~ num.GFP.males + (1 | p | line) + (1 | q | block) + (1 | r | vial))
  
  # Run 5 models, on each of the 5 imputed datasets, and combine the results
  male_model <- brm_multiple(male1 + male2,
                             family = "binomial", 
                             prior = prior_males,
                             data = male_fitness_imputed,
                             iter = 5000, chains = 4, cores = 4,
                             seed = 1, control = list(adapt_delta = 0.999, max_treedepth = 15)) 
  writeLines(capture.output(summary(male_model)), con = "data/derived/model_summary_males.txt")
  saveRDS(male_model, "data/derived/male_fitness_prediction_model.rds") # maybe delete later - probably not needed?
  male_model <- readRDS("data/derived/male_fitness_prediction_model.rds") 
  
  ####### Define new data for prediction
  new.data.female <- female_fitness %>% 
    select(line) %>% distinct() %>% arrange(line)
  
  new.data.male <- male_fitness %>% 
    select(line) %>% distinct() %>% arrange(line) %>% 
    mutate(total1 = 100, # I checked, and changing this number does not affect the line means, as expected
           total2 = 100, 
           num.GFP.males = male_fitness %>% .$num.GFP.males %>% mean(na.rm=TRUE))
  
  # Get predicted lines means for early and late life *female* fitness on linear predictor scale (i.e. fairly normal)
  female_line_preds <- data.frame(new.data.female, 
                                  fitted(female_model, 
                                         newdata = new.data.female, 
                                         re_formula = "~ ( 1 | line)", 
                                         scale = "linear")) %>%
    select(line, Estimate.femalefitnessearly, Estimate.femalefitnesslate) %>% 
    rename(female.fitness.early = Estimate.femalefitnessearly,
           female.fitness.late = Estimate.femalefitnesslate) %>%
    mutate(female.fitness.early = as.numeric(scale(female.fitness.early)), # scale the line means
           female.fitness.late = as.numeric(scale(female.fitness.late)))
  
  # Get predicted lines means for early and late life *male* fitness on linear predictor scale (i.e. fairly normal)
  male_line_preds <- data.frame(new.data.male, 
                                fitted(male_model, 
                                       newdata = new.data.male, 
                                       re_formula = "~ (1 | line)", 
                                       scale = "linear")) %>%
    select(line, Estimate.earlymalefocal, Estimate.latemalefocal) %>% 
    rename(male.fitness.early = Estimate.earlymalefocal,
           male.fitness.late = Estimate.latemalefocal) %>%
    mutate(male.fitness.early = as.numeric(scale(male.fitness.early)), # scale the line means
           male.fitness.late = as.numeric(scale(male.fitness.late))) 

  ## Define a function to get the ICC for line, for each posterior sample of predicted line means
  get_ICC_line <- function(predictions){
    cols <- str_detect(names(predictions), "X")
    first_posterior_column <- which(cols)[1] 
    n_draws <- sum(cols)
    plan("multiprocess")
    future_lapply(0:(n_draws - 1), function(i){
       ICCbare(predictions$line, predictions[, i + first_posterior_column])
    }) %>% unlist()
  }
  
  # Calculate ICC for each sample, separately for old and young females
  # For females, this is done on the estimated line mean progeny produced (i.e. response scale)
  predicted <- fitted(female_model, summary = FALSE)
  Line_ICC_female_early <- get_ICC_line(data.frame(female_fitness, t(predicted[, , 1])))
  Line_ICC_female_late <- get_ICC_line(data.frame(female_fitness, t(predicted[, , 2])))
  
  # Do the same for males
  # For males, this is done on the estimated proportion progeny sired (i.e. response scale)
  new_data <- male_fitness %>% 
    select(block, line, vial.ID) %>% 
    mutate(total1 = 100, 
           total2 = 100, 
           num.GFP.males = male_fitness %>% .$num.GFP.males %>% mean(na.rm=TRUE))
  predicted <- fitted(male_model, summary = FALSE, newdata = new_data)
  Line_ICC_male_early <- get_ICC_line(data.frame(new_data, t(predicted[, , 1])))
  Line_ICC_male_late <- get_ICC_line(data.frame(new_data, t(predicted[, , 2])))
  
  ICC <- list(Line_ICC_female_early, 
              Line_ICC_female_late,
              Line_ICC_male_early,
              Line_ICC_male_late) %>% lapply(posterior_summary) %>% 
    do.call("rbind", .) %>% as_tibble() %>%
    mutate(Fitness_trait = c("Female early-life fitness", "Female late-life fitness",
                             "Male early-life fitness", "Male late-life fitness"))
  write.csv(ICC, file = "data/derived/ICC.csv", row.names = TRUE)
  

  # Record which block(s) each line was measured in. Generally one block, except for ref line (all blocks)
  line_blocks <- rbind(female_fitness %>% select(block, line), 
                       male_fitness   %>% select(block, line)) %>% 
    mutate(block = replace(block, line == "352", "reference_line")) %>% 
    distinct() %>% 
    group_by(line) %>% summarise(block = paste0(block, collapse = ", ")) 
  
  predicted_line_means <- female_line_preds %>%
    left_join(male_line_preds, by = "line") %>% 
    left_join(line_blocks, by = "line") %>%
    arrange(line) 
  
  write.csv(predicted_line_means, out.file, row.names = FALSE)
}

Predict line means and save the results

The object line_means is the raw line means: unscaled, arithmetic means and in the original units.

The object predicted_line_means is the estimated line means from the Bayesian models. These means are on the scale of the linear predictor (i.e. Poisson for females, binomial for males) rather than on the scale of the original response. They have been scaled to have a mean of 0 and a variance of 1.

line_means <- get_line_means()
predicted_line_means <- get_predicted_line_means(overwrite = FALSE)

Summary of model used to predict female line means:

cat(readLines('data/derived/model_summary_females.txt'), sep = '\n')

 Family: MV(poisson, poisson) 
  Links: mu = log
         mu = log 
Formula: female.fitness.early ~ (1 | p | line) + (1 | q | block) + (1 | r | vial) 
         female.fitness.late ~ (1 | p | line) + (1 | q | block) + (1 | r | vial) 
   Data: female_fitness (Number of observations: 816) 
Samples: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup samples = 10000

Group-Level Effects: 
~block (Number of levels: 10) 
                                                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(femalefitnessearly_Intercept)                                  0.29      0.09     0.16     0.50 1.00     6955     7704
sd(femalefitnesslate_Intercept)                                   1.21      0.35     0.70     2.05 1.00     7505     8048
cor(femalefitnessearly_Intercept,femalefitnesslate_Intercept)    -0.42      0.29    -0.88     0.22 1.00     8010     7082

~line (Number of levels: 125) 
                                                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(femalefitnessearly_Intercept)                                  0.58      0.05     0.50     0.68 1.00     5014     7236
sd(femalefitnesslate_Intercept)                                   1.82      0.17     1.51     2.18 1.00     5993     7883
cor(femalefitnessearly_Intercept,femalefitnesslate_Intercept)     0.68      0.07     0.53     0.79 1.00     7663     7679

~vial (Number of levels: 816) 
                                                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(femalefitnessearly_Intercept)                                  0.52      0.02     0.49     0.55 1.00     4219     6935
sd(femalefitnesslate_Intercept)                                   1.54      0.07     1.42     1.68 1.00     4543     6625
cor(femalefitnessearly_Intercept,femalefitnesslate_Intercept)     0.21      0.05     0.12     0.30 1.00     7886     7637

Population-Level Effects: 
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
femalefitnessearly_Intercept     4.40      0.11     4.19     4.62 1.00    11285     6886
femalefitnesslate_Intercept      0.70      0.44    -0.16     1.57 1.00    13404     7848

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Summary of model used to predict male line means:

cat(readLines('data/derived/model_summary_males.txt'), sep = '\n')

 Family: MV(binomial, binomial) 
  Links: mu = logit
         mu = logit 
Formula: early.male.focal | trials(total1) ~ (1 | p | line) + (1 | q | block) + (1 | r | vial) 
         late.male.focal | trials(total2) ~ num.GFP.males + (1 | p | line) + (1 | q | block) + (1 | r | vial) 
   Data: male_fitness_imputed (Number of observations: 810) 
Samples: 20 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup samples = 50000

Group-Level Effects: 
~block (Number of levels: 10) 
                                                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(earlymalefocal_Intercept)                              0.74      0.20     0.46     1.21 1.00    25462    31760
sd(latemalefocal_Intercept)                               0.34      0.22     0.01     0.83 1.00     6387    19201
cor(earlymalefocal_Intercept,latemalefocal_Intercept)     0.36      0.40    -0.62     0.94 1.00    30792    28410

~line (Number of levels: 124) 
                                                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(earlymalefocal_Intercept)                              0.54      0.05     0.45     0.65 1.00    17106    27248
sd(latemalefocal_Intercept)                               0.85      0.20     0.45     1.22 1.01     2221     2337
cor(earlymalefocal_Intercept,latemalefocal_Intercept)     0.68      0.15     0.36     0.95 1.01     2784     3159

~vial (Number of levels: 810) 
                                                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(earlymalefocal_Intercept)                              0.81      0.03     0.76     0.86 1.01     1459    30181
sd(latemalefocal_Intercept)                               2.79      0.11     2.58     3.01 1.00     9096    21491
cor(earlymalefocal_Intercept,latemalefocal_Intercept)     0.18      0.04     0.10     0.26 1.01     2437    14178

Population-Level Effects: 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
earlymalefocal_Intercept        0.15      0.25    -0.34     0.66 1.00    23930    27025
latemalefocal_Intercept        -0.81      0.19    -1.17    -0.42 1.00    25469    27873
latemalefocal_num.GFP.males    -0.13      0.12    -0.37     0.10 1.00    13718    23568

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Percent of variance in fitness explained by ‘DGRP line’:

The table shows the posterior estimate of the intra-class correlation coefficient (ICC, which measures the proportion of variance within and between lines), using the above models to adjust for block effects (and one covariate in the male late-life fitness analysis). ICC was calculated using ICC::ICCbare.

read.csv('data/derived/ICC.csv') %>% 
  select(Fitness_trait, Estimate, Est.Error, Q2.5, Q97.5) %>%
  rename(`Fitness trait` = Fitness_trait, `Est. Error` = Est.Error, `Lower 95% CI` = Q2.5, `Upper 95% CI` = Q97.5) %>% 
  mutate(across(where(is.numeric), ~ .x * 100)) %>%
  kable(digits = 2) %>% kable_styling(full_width = FALSE)

Fitness trait	Estimate	Est. Error	Lower 95% CI	Upper 95% CI
Female early-life fitness	62.53	0.67	61.23	63.85
Female late-life fitness	55.99	0.99	54.09	57.96
Male early-life fitness	44.38	0.97	42.47	46.26
Male late-life fitness	9.26	6.74	-1.53	19.89

Estimate genetic (co)variance from the genomic relatedness matrix

The following fits a multivariate mixed model using brms, to estimate the amount of variance explained by the DGRP line, accounting for the genomic relatedness matrix. The response variables are the predicted line means for the four phenotypes, as estimated in the previous section. We fit line as a random effect, specifying a covariance structure among the line effects that is equal to the genomic relatedness matrix (GRM), which we estimate using the SNP genotypes of the full sample of 205 DGRP lines. We use a somewhat informative prior to help the model converge; the prior is that the line and residual variances are not large (which must be true, given that the response variables have overall variance of 1), and that only 10% of the phenotype variance is explained by line (the posterior estimates suggest much higher line variance, indicating strong signal in the data).

The table below indicates that much of the variance in fitness is due to additive genetic effects, and that there is a positive geneti

# Run quant gen model using brms
if(!file.exists("data/derived/brms_quant_gen.rds")){
  
  library(bigsnpr)
  library(MCMCglmm)
  library(sommer)
  
  # Get the data
  mcmc_dat <- predicted_line_means %>% 
    mutate(line = factor(str_remove_all(line, "_"))) %>%
    filter(!is.na(male.fitness.early)) %>%
    rename(femalefitnessearly = female.fitness.early,
           femalefitnesslate = female.fitness.late,
           malefitnessearly = male.fitness.early,
           malefitnesslate = male.fitness.late)
  
  # Make inverse genomic relatedness matrix
  bigsnp <- snp_readBed("data/derived/dgrp2_QC_all_lines.bed", backingfile = tempfile())
  bigsnp_attached <- snp_attach(bigsnp)
  G <- A.mat(bigsnp_attached$genotypes[1:nrow(bigsnp_attached$genotypes), 
                                       1:ncol(bigsnp_attached$genotypes)] - 1) 
  colnames(G) <- bigsnp_attached$fam[,1]
  rownames(G) <- bigsnp_attached$fam[,1]
  focal_lines <- mcmc_dat %>% pull(line)
  G <- G[rownames(G) %in% focal_lines, 
         colnames(G) %in% focal_lines]
  G <- cov2cor(G)
  
  brms_quant_gen <- brm(
    bf(femalefitnessearly ~ 0 + (1 | p | gr(line, cov = G))) +
   #   bf(femalefitnesslate ~ 0 + (1 | p | gr(line, cov = G))) +
      bf(malefitnessearly ~ 0 + (1 | p | gr(line, cov = G))) +
   #   bf(malefitnesslate ~ 0 + (1 | p | gr(line, cov = G))) +
      set_rescor(FALSE),
    data = mcmc_dat, data2 = list(G = G), 
    iter = 30000, chains = 4, cores = 1, warmup = 20000,
    control = list(adapt_delta = 0.99999, max_treedepth = 20),
    
    prior = c(
      prior(normal(0.5, 0.3), "sd", resp = 'femalefitnessearly', group = "line"),
      # prior(normal(0.5, 0.12), "sd", resp = 'femalefitnesslate', group = "line"),
      prior(normal(0.5, 0.3), "sd", resp = 'malefitnessearly', group = "line"),
      # prior(normal(0.5, 0.12), "sd", resp = 'malefitnesslate', group = "line"),
      prior(normal(0.5, 0.3), "sigma", resp = 'femalefitnessearly'),
      # prior(normal(0.5, 0.12), "sigma", resp = 'femalefitnesslate'),
      prior(normal(0.5, 0.3), "sigma", resp = 'malefitnessearly'),
      prior(lkj(0.5), class = "cor", group = "line")#,
      # prior(normal(0.5, 0.12), "sigma", resp = 'malefitnesslate')
    )
    
    # prior = c( # worth a try too...
    #   prior(Gamma(4,7), "sd", resp = 'femalefitnessearly', group = "line"),
    #   prior(Gamma(4,7), "sd", resp = 'femalefitnesslate', group = "line"),
    #   prior(Gamma(4,7), "sd", resp = 'malefitnessearly', group = "line"),
    #   prior(Gamma(4,7), "sd", resp = 'malefitnesslate', group = "line"),
    #   prior(Gamma(4,7), "sigma", resp = 'femalefitnessearly'),
    #   prior(Gamma(4,7), "sigma", resp = 'femalefitnesslate'),
    #   prior(Gamma(4,7), "sigma", resp = 'malefitnessearly'),
    #   prior(Gamma(4,7), "sigma", resp = 'malefitnesslate')
    # )
    
    
    # prior = c(
    #   # prior(cauchy(0.5, 1), class = "sigma", resp = 'femalefitnessearly'),
    #   # prior(cauchy(0.5, 1), class = "sigma", resp = 'femalefitnesslate'),
    #   # prior(cauchy(0.5, 1), class = "sigma", resp = 'malefitnessearly'),
    #   # prior(cauchy(0.5, 1), class = "sigma", resp = 'malefitnesslate'),
    #   prior(cauchy(0.5, 1), class = "sd", resp = 'femalefitnessearly', group = "line"),
    #   prior(cauchy(0.5, 1), class = "sd", resp = 'femalefitnesslate', group = "line"),
    #   prior(cauchy(0.5, 1), class = "sd", resp = 'malefitnessearly', group = "line"),
    #   prior(cauchy(0.5, 1), class = "sd", resp = 'malefitnesslate', group = "line"),
    #   prior(lkj(0.5), class = "cor", group = "line")
    )
  )
  saveRDS(brms_quant_gen, "data/derived/brms_quant_gen.rds")
} else {
  brms_quant_gen <- readRDS("data/derived/brms_quant_gen.rds")
}

p <- posterior_samples(brms_quant_gen)
p <- tibble(
  VA_FE = sqrt(p[, "sd_line__femalefitnessearly_Intercept"]),
  VA_FL = sqrt(p[, "sd_line__femalefitnesslate_Intercept"]),
  VA_ME = sqrt(p[, "sd_line__malefitnessearly_Intercept"]),
  VA_ML = sqrt(p[, "sd_line__malefitnesslate_Intercept"]),
  h2_FE = VA_FE / (VA_FE + sqrt(p[, "sigma_femalefitnessearly"])),
  h2_FL = VA_FL / (VA_FL + sqrt(p[, "sigma_femalefitnesslate"])),
  h2_ME = VA_ME / (VA_ME + sqrt(p[, "sigma_malefitnessearly"])),
  h2_ML = VA_ML / (VA_ML + sqrt(p[, "sigma_malefitnesslate"])),
  corr_FE_FL = p[,"cor_line__femalefitnessearly_Intercept__femalefitnesslate_Intercept"],
  corr_FE_ME = p[,"cor_line__femalefitnessearly_Intercept__malefitnessearly_Intercept"],
  corr_FL_ME = p[,"cor_line__femalefitnesslate_Intercept__malefitnessearly_Intercept"],
  corr_FE_ML = p[,"cor_line__femalefitnessearly_Intercept__malefitnesslate_Intercept"],
  corr_FL_ML = p[,"cor_line__femalefitnesslate_Intercept__malefitnesslate_Intercept"],
  corr_ME_ML = p[,"cor_line__malefitnessearly_Intercept__malefitnesslate_Intercept"],
  cov_FE_FL = corr_FE_FL * VA_FE^2 * VA_FL^2,
  cov_FE_ME = corr_FE_ME * VA_FE^2 * VA_ME^2,
  cov_FL_ME = corr_FL_ME * VA_FL^2 * VA_ME^2,
  cov_FE_ML = corr_FE_ML * VA_FE^2 * VA_ML^2,
  cov_FL_ML = corr_FL_ML * VA_FL^2 * VA_ML^2,
  cov_ME_ML = corr_ME_ML * VA_ME^2 * VA_ML^2,
  corr_FE_ME_minus_corr_FL_ML = corr_FE_ME - corr_FL_ML,
  corr_FE_FL_minus_corr_ME_ML = corr_FE_FL - corr_ME_ML
) %>% gather(Parameter, value)

p %>%
  group_by(Parameter) %>%
  summarise(summ = list(posterior_summary(value)),
            Estimate = map_dbl(summ, ~ .x[[1]]),
            `Est. Error` = map_dbl(summ, ~ .x[[2]]),
            `Lower 95% CI` = map_dbl(summ, ~ .x[[3]]),
            `Upper 95% CI` = map_dbl(summ, ~ .x[[4]]),
            .groups = "drop") %>%
  dplyr::select(-summ) %>%
  mutate(sp = str_split(Parameter, "_"),
         Type = factor(map_chr(sp, ~ .x[[1]]), c("h2", "corr", "VA", "cov")),
         Parameter = map_chr(sp, ~ paste0(.x[-1], collapse = " - "))) %>%
  arrange(Type) %>%
  dplyr::select(-sp, -Type) %>%
  kable(digits = 3) %>%
  kable_styling(full_width = FALSE) %>%
  pack_rows("Heritabilities", 1, 4) %>%
  pack_rows("Genetic correlations", 5, 10) %>%
  pack_rows("Additive genetic variances", 11, 14) %>%
  pack_rows("Additive genetic covariances", 15, 20)

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kableExtra_1.1.0   future.apply_1.5.0 future_1.17.0      ICC_2.3.0         
 [5] mice_3.9.0         brms_2.14.4        Rcpp_1.0.4.6       forcats_0.5.0     
 [9] stringr_1.4.0      dplyr_1.0.0        purrr_0.3.4        readr_1.3.1       
[13] tidyr_1.1.0        tibble_3.0.1       ggplot2_3.3.2      tidyverse_1.3.0   
[17] knitrhooks_0.0.4   knitr_1.30         workflowr_1.6.2   

loaded via a namespace (and not attached):
  [1] readxl_1.3.1         backports_1.1.7      plyr_1.8.6          
  [4] igraph_1.2.5         splines_4.0.3        listenv_0.8.0       
  [7] crosstalk_1.1.0.1    TH.data_1.0-10       rstantools_2.1.1    
 [10] inline_0.3.15        digest_0.6.25        htmltools_0.5.0     
 [13] rsconnect_0.8.16     fansi_0.4.1          magrittr_2.0.1      
 [16] globals_0.12.5       modelr_0.1.8         RcppParallel_5.0.1  
 [19] matrixStats_0.56.0   xts_0.12-0           sandwich_2.5-1      
 [22] prettyunits_1.1.1    colorspace_1.4-1     blob_1.2.1          
 [25] rvest_0.3.5          haven_2.3.1          xfun_0.19           
 [28] callr_3.4.3          crayon_1.3.4         jsonlite_1.7.0      
 [31] lme4_1.1-23          survival_3.2-7       zoo_1.8-8           
 [34] glue_1.4.2           gtable_0.3.0         emmeans_1.4.7       
 [37] webshot_0.5.2        V8_3.4.0             pkgbuild_1.0.8      
 [40] rstan_2.21.2         abind_1.4-5          scales_1.1.1        
 [43] mvtnorm_1.1-0        DBI_1.1.0            miniUI_0.1.1.1      
 [46] viridisLite_0.3.0    xtable_1.8-4         stats4_4.0.3        
 [49] StanHeaders_2.21.0-3 DT_0.13              htmlwidgets_1.5.1   
 [52] httr_1.4.1           threejs_0.3.3        ellipsis_0.3.1      
 [55] pkgconfig_2.0.3      loo_2.3.1            dbplyr_1.4.4        
 [58] tidyselect_1.1.0     rlang_0.4.6          reshape2_1.4.4      
 [61] later_1.0.0          munsell_0.5.0        cellranger_1.1.0    
 [64] tools_4.0.3          cli_2.0.2            generics_0.0.2      
 [67] broom_0.5.6          ggridges_0.5.2       evaluate_0.14       
 [70] fastmap_1.0.1        yaml_2.2.1           processx_3.4.2      
 [73] fs_1.4.1             nlme_3.1-149         whisker_0.4         
 [76] mime_0.9             projpred_2.0.2       xml2_1.3.2          
 [79] compiler_4.0.3       bayesplot_1.7.2      shinythemes_1.1.2   
 [82] rstudioapi_0.11      gamm4_0.2-6          curl_4.3            
 [85] reprex_0.3.0         statmod_1.4.34       stringi_1.5.3       
 [88] highr_0.8            ps_1.3.3             Brobdingnag_1.2-6   
 [91] lattice_0.20-41      Matrix_1.2-18        nloptr_1.2.2.1      
 [94] markdown_1.1         shinyjs_1.1          vctrs_0.3.0         
 [97] pillar_1.4.4         lifecycle_0.2.0      bridgesampling_1.0-0
[100] estimability_1.3     httpuv_1.5.3.1       R6_2.4.1            
[103] promises_1.1.0       gridExtra_2.3        codetools_0.2-16    
[106] boot_1.3-25          colourpicker_1.0     MASS_7.3-53         
[109] gtools_3.8.2         assertthat_0.2.1     rprojroot_1.3-2     
[112] withr_2.2.0          shinystan_2.5.0      multcomp_1.4-13     
[115] mgcv_1.8-33          parallel_4.0.3       hms_0.5.3           
[118] grid_4.0.3           coda_0.19-3          minqa_1.2.4         
[121] rmarkdown_2.5        git2r_0.27.1         shiny_1.4.0.2       
[124] lubridate_1.7.8      base64enc_0.1-3      dygraphs_1.1.1.6