Last updated: 2019-05-06

Checks: 6 0

Knit directory: W_shredder/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20180716) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    Proc_B_manuscript/.DS_Store
    Ignored:    data/sim_results/
    Ignored:    docs/figure/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 97abdcf lukeholman 2019-04-24 writing and new data finally done
Rmd 5aea3e0 Luke Holman 2019-03-06 tweak future library
Rmd 3b6cc7a Luke Holman 2019-02-05 Making graphs, fixed combine file
Rmd 30a3a54 Luke Holman 2019-01-03 Fixed bug with combining results script
Rmd d07c759 Luke Holman 2018-12-30 fix mclapply typo
Rmd 6cbc988 Luke Holman 2018-12-29 Merge branch ‘master’ of https://github.com/lukeholman/W_shredder
Rmd 09b03be Luke Holman 2018-12-29 tweaks on desktop
Rmd 4109a65 Luke Holman 2018-12-28 better file combiner?
Rmd 9eb5c95 Luke Holman 2018-12-28 minor tweaks
Rmd 53e047e Luke Holman 2018-12-21 added a check for a rare error
Rmd 84db6b2 Luke Holman 2018-12-03 No procedure for checking completed runs
Rmd 40f47e1 Luke Holman 2018-11-30 Tweak combine script
Rmd 2efe487 Luke Holman 2018-11-30 New script to combine files and some writing
Rmd 0a7c263 Luke Holman 2018-11-23 bug fixed??
Rmd dc4d673 Luke Holman 2018-11-23 typo
Rmd 8b21bb0 Luke Holman 2018-11-23 check for corrupted file
Rmd bb8119e Luke Holman 2018-11-23 while loop added
Rmd 1971a25 Luke Holman 2018-11-23 new try
Rmd 1561ebd Luke Holman 2018-11-23 tweak
Rmd 52b42b1 Luke Holman 2018-11-23 continuing
Rmd f45dd96 Luke Holman 2018-11-23 baffling bug - server is weird?
Rmd d1c12ac Luke Holman 2018-11-23 found bug, testing
Rmd 52be32d Luke Holman 2018-11-23 bug hunt
Rmd e60d0d1 Luke Holman 2018-11-23 bug hunt
Rmd 95857d7 Luke Holman 2018-11-23 more bug hunt
Rmd fd8f8be Luke Holman 2018-11-23 bug hunt
Rmd 50c5c82 Luke Holman 2018-11-22 Added readr library
Rmd 79a4634 Luke Holman 2018-11-21 Change counting of old parameters
Rmd 4fd5699 Luke Holman 2018-11-21 silly bug
Rmd 7bf9997 Luke Holman 2018-11-20 added print to find bug
Rmd 6c9cb6f Luke Holman 2018-11-15 Fix bugs
Rmd 4e630f4 Luke Holman 2018-11-15 Make combine_results_files.R
Rmd 2c0e001 Luke Holman 2018-11-15 Bug fix
Rmd e4c137b Luke Holman 2018-11-15 tweak
Rmd 4be43ee Luke Holman 2018-11-14 fixing bugs
Rmd 1d21bf4 Luke Holman 2018-11-14 fixing wd issues
Rmd 7c0f4de Luke Holman 2018-11-14 Fix gene conversion NHEJ bug
Rmd 96fd5b1 Luke Holman 2018-11-14 Fix bug
Rmd 6a905d0 Luke Holman 2018-11-14 return output as well as saving
Rmd 1cafc43 Luke Holman 2018-11-14 Added fitness to density calculation
Rmd a166609 Luke Holman 2018-11-13 rcpp woes
Rmd 774f266 Luke Holman 2018-11-13 rcpp on cluster issue
Rmd 80dcce8 Luke Holman 2018-11-13 Fix stringr issue
Rmd e45a2c8 Luke Holman 2018-11-13 link to stringr
Rmd 1a4ca77 Luke Holman 2018-11-13 test random_picker
Rmd 14d82d2 Luke Holman 2018-11-13 Push to new repo
Rmd de9e0ff Luke Holman 2018-11-13 Added slurm capacity
Rmd 99e93c7 Luke Holman 2018-11-03 Many bug fixes with density dependence
html 99e93c7 Luke Holman 2018-11-03 Many bug fixes with density dependence

Load R libraries

packages <- c("dplyr", "purrr", "tidyr", "stringr", "reshape2", 
              "parallel", "Rcpp", "rslurm", "readr", "tibble")
shh <- suppressMessages(lapply(packages, library, character.only = TRUE, quietly = TRUE))
Warning: package 'dplyr' was built under R version 3.5.2
Warning: package 'purrr' was built under R version 3.5.2

Make a table of mating types and the zygotes they produce

For every combination of male and female genotypes (i.e. mating types), we can calculate the types of zygotes that can be produced, as well as the relative frequency of each zygote type. It’s computationally efficient to do this before running the simulation and store all the information in a big table, and then look it up later as needed. The types of zygotes produced by each mating type depend on three parameters which are passed as arguments, namely:

  1. W_shredding_rate, the rate at which Z* chromosomes shred susceptible W+ chromosomes in females
  2. Z_conversion_rate, the rate at which Z* chromosomes convert susceptible Z+ chromosomes in males
  3. Zr_creation_rate, the rate at which Z* chromosomes convert susceptible Z+ chromosomes to resistant Zr chromosomes in males, for example by non-homologous end joining (NHEJ)
make_mating_type_table <- function(W_shredding_rate, 
                                   Z_conversion_rate,
                                   Zr_creation_rate){
  
  # Step 1: write out the possible alleles at all 3 loci 
  # (the sex chrs, and the two autosomal resistance loci)
  males_sex   <- c("Z+Z+", "Z*Z*", "ZrZr", "Z*Z+", "Z*Zr", "Z+Zr")
  females_sex <- c("Z+W+", "Z*W+", "ZrW+", "Z+Wr", "Z*Wr", "ZrWr")
  autosomal_W_rescue <- c("aa", "Aa", "AA") # Big A is rescue allele for W-shredding
  autosomal_Z_rescue <- c("bb", "Bb", "BB") # Big B is rescue allele for Z-conversion
  
  # Step 2: Combine the 3 loci's alleles to find all possible male and female genotypes
  get_possible_genotypes <- function(sex_chr){
    expand.grid(sex_chr, 
                autosomal_W_rescue, 
                autosomal_Z_rescue, 
                stringsAsFactors = FALSE) %>% 
      apply(1, paste0, collapse = "")
  }
  male_genotypes   <- get_possible_genotypes(sex_chr = males_sex)
  female_genotypes <- get_possible_genotypes(sex_chr = females_sex)
  
  # Step 3: Find all possible mating types
  mating_types <- expand.grid(male_genotypes, 
                              female_genotypes, 
                              stringsAsFactors = FALSE) %>% 
    as.data.frame() %>%
    mutate(male = Var1, female = Var2) %>% 
    select(female, male) 
  
  # Step 4: Find the types and frequencies of gametes produced by each mating type
  
  ## 4a: Identify mating types that experience W-shredding and/or Z-conversion
  ## Note that the A and B alleles are dominant - one copy confers protection
  W.shredding <- grepl("Z[*]W[+]", mating_types$female) &
    !(grepl("A", mating_types$female))
  Z.conversion <- grepl("Z[*]Z[+]", mating_types$male) & 
    !(grepl("B", mating_types$male))
  
  ## 4b: Find the proportions of each type of sex chromosome in the gametes, 
  ## allowing for W shredding and Z conversion
  female_gametes_sex <- data.frame(
    type1 = substr(mating_types$female, 1, 2),
    type2 = substr(mating_types$female, 3, 4),
    prop1 = 0.5,
    prop2 = 0.5,
    stringsAsFactors = FALSE) %>%
    mutate(prop1 = replace(prop1, W.shredding, 0.5 + (0.5 * W_shredding_rate)),
           prop2 = 1 - prop1)
 
  ### NB: Z*Z+ males can create Zr gametes by NHEJ, hence 'type3' column
  male_gametes_sex <- data.frame(type1 = substr(mating_types$male, 1, 2),
                                 type2 = substr(mating_types$male, 3, 4),
                                 type3 = "Zr", 
                                 prop1 = 0.5, prop2 = 0, prop3 = 0,
                                 stringsAsFactors = FALSE) %>%
    mutate(prop1 = replace(prop1, Z.conversion, 
                           0.5 + 0.5 * Z_conversion_rate), # fair meiosis + gene conversion
           prop3 = replace(prop3, Z.conversion, 
                           0.5 * Z_conversion_rate * Zr_creation_rate), # gene conversion * Zr_creation_rate
           prop1 = prop1 - prop3, # The new Zr chromosomes are taken out of those that undergo gene conversion
           prop2 = 1 - prop1 - prop3) # The Z+ chromosomes escaping conversion to Z* or NHEJ to Zr
  
  male_gametes_sex[male_gametes_sex$prop3 == 0, 
                   names(male_gametes_sex) %in% c("type3", "prop3")] <- NA
  
  ## 4c: Find the proportions of each autosomal allele in the gametes
  get_gametes_autosomes <- function(genotype_column){
    lapply(strsplit(substr(genotype_column, 5, 8), split = ""), function(x){
      c(paste(x[1], x[3], sep = ""), # NB this bit assumes A and B loci are unlinked
        paste(x[1], x[4], sep = ""),
        paste(x[2], x[3], sep = ""),
        paste(x[2], x[4], sep = "")
      )
    }) %>% do.call("rbind", .)
  }
  female_gametes_autosomes <- get_gametes_autosomes(mating_types$female)
  male_gametes_autosomes   <- get_gametes_autosomes(mating_types$male)
  
  ## 4d: Combine sex and autosomal alleles to find complete 
  ## gametic genotypes, and their corresponding frequencies
  get_complete_gametes <- function(gametes_sex, 
                                   gametes_autosomes,
                                   male){ # logical: are we doing males?
    
    # Make vector showing if there are 1, 2 or 4 possible 
    # autosome genotypes in the gametes, for each of the mating types
    possible.autosomes <- lapply(1:nrow(gametes_autosomes), 
                                 function(i) unique(gametes_autosomes[i, ]))
    n_unique_autosomes <- sapply(possible.autosomes, length)
    
    # Count if there are 2 or 3 possible sex chromosomes in the sperm (from NHEJ)
    # Note that for simpler coding, I treat sex chr homozygotes as having 2 alleles
    # (not 1) for this section (it is quickly fixed later by 'simplying')
    sex <- rep(2, nrow(gametes_sex)) # always 2 for females, and for most males
    if(male) sex[!is.na(gametes_sex$type3)] <- 3 # 3 sex chromosomes in Z*Z+ male sperm
    
    # Paste the sex and autosomes together
    paster <- function(i){ # i is the mating type index
      sex <- sex[i] # 2 or 3 sex chrs in the focal sex for this mating type
      possible.sex.chr <- as.character(gametes_sex[i, 1:sex]) # columns 1-2 or 1-3
      sex.chr.freqs <- as.numeric(gametes_sex[i, 2 + as.numeric(male) + (1:sex)]) # cols 3-4 or 4-6
      possible.autosomes <- possible.autosomes[[i]] # 1, 2 or 4 autosome genotypes
      
      expand.grid(sex = possible.sex.chr, # combine and paste genotypes
                  auto = possible.autosomes, 
                  stringsAsFactors = FALSE) %>%
        mutate(genotype = map2_chr(sex, auto, paste0, collapse = "")) %>%
        mutate(prop = rep(sex.chr.freqs * 1 / n_unique_autosomes[i], # find corresponding freqs
                          n_unique_autosomes[i])) %>%
        group_by(genotype) %>%
        summarise(prop = sum(prop))
    }
    lapply(1:nrow(gametes_sex), paster) # returns a list
  } # end of get_complete_gametes()
  complete_female_gametes <- get_complete_gametes(female_gametes_sex,
                                                  female_gametes_autosomes,
                                                  male = FALSE)
  complete_male_gametes   <- get_complete_gametes(male_gametes_sex, 
                                                  male_gametes_autosomes,
                                                  male = TRUE)
  
  # Step 5: find the zygote frequencies for each mating type
  get_zygotes <- function(mating_types, 
                          complete_female_gametes, 
                          complete_male_gametes){
    
    n_mating_types <- length(complete_female_gametes)
    
    # Combine the gametes to make zygotes
    zygotes <- lapply(1:n_mating_types, function(i){
      
      combos <- expand.grid(complete_female_gametes[[i]]$genotype, 
                            complete_male_gametes[[i]]$genotype, 
                            stringsAsFactors = FALSE)
      
      props <- expand.grid(complete_female_gametes[[i]]$prop,
                           complete_male_gametes[[i]]$prop)
      combos$prop <- props[,1] * props[,2]
      
      # Discard info on parental origin of alleles
      for(j in 1:nrow(combos)) combos[j, 1:2] <- sort(combos[j, 1:2]) 
      
      # Combine probabilities of identical genotypes
      combos %>% 
        group_by(Var1, Var2) %>%
        summarise(prop = sum(prop)) %>%
        mutate(mating_type = i)
    }) %>% 
      bind_rows() %>% as.data.frame() %>% mutate(genotype = NA)
    
    for(i in 1:nrow(zygotes)){
      A_geno <- c(substr(zygotes$Var1[i], 3, 3),
                  substr(zygotes$Var2[i], 3, 3)) %>%
        sort(decreasing = TRUE)
      
      B_geno <- c(substr(zygotes$Var1[i], 4, 4),
                  substr(zygotes$Var2[i], 4, 4)) %>%
        sort(decreasing = TRUE)
      sex_geno <- c(substr(zygotes$Var1[i], 1, 2),
                    substr(zygotes$Var2[i], 1, 2)) 
      forwards <- paste0(sex_geno, collapse = "")
      if(forwards %in% c(males_sex, females_sex)){ # I define Z*Z+ as a 'real' genotype but not Z+Z*
        zygotes$genotype[i] <- paste0(c(forwards, 
                                        A_geno,
                                        B_geno),
                                      collapse = "")
      } else {                           # if genotype is not real, paste it the other way around
        zygotes$genotype[i] <- paste0(c(paste0(rev(sex_geno), collapse = ""), 
                                        A_geno,
                                        B_geno),
                                      collapse = "")
      }
    } # end of for loop
    data.frame(mating_type = zygotes$mating_type,
               mother = mating_types$female[zygotes$mating_type],
               father = mating_types$male[zygotes$mating_type],
               zygote = zygotes$genotype,
               prop = zygotes$prop, stringsAsFactors = FALSE) 
  } # end of get_zygotes()
  get_zygotes(mating_types, complete_female_gametes, complete_male_gametes)
}

Function to add de novo resistance mutations

This function supplements a mating_type_table with all the extra progeny classes that can appear by a mutation of one of the normal progeny classes. We assume that Z+ chromosomes can mutate to Zr and vice versa, and W+ to Wr and vice versa. We assume that mutations are equally likely in both directions, meaning that mutation will not consistently affect the allele frequencies. Also, in simulations where the Zr_mutation_rate and Wr_mutation_rate are non-zero, we set the initial frequency of Zr and/or Wr to the mutation rate (so that resistance mutations will generally already be present at the start of the burn in phase)

add_mutants <- function(mating_type_table, Zr_mutation_rate, Wr_mutation_rate){
  
  zygote_has_two_Z <- grepl("Z[+]Z[+]", mating_type_table$zygote)
  zygote_has_one_Z <- grepl("Z[+]", mating_type_table$zygote) & !zygote_has_two_Z
  zygote_has_two_Zr <- grepl("ZrZr", mating_type_table$zygote)
  zygote_has_one_Zr <- grepl("Zr", mating_type_table$zygote) & !zygote_has_two_Zr
  zygote_has_W <- grepl("W[+]", mating_type_table$zygote)
  zygote_has_Wr <- grepl("Wr", mating_type_table$zygote)
  
  # Create the mutant zygotes...
  Z_mutants1 <- mating_type_table %>% # single Z+ -> Zr mutants in Z+/- individuals
    filter(zygote_has_one_Z) %>%
    mutate(zygote = str_replace_all(zygote, "Z[+]", "Zr"),
           prop = prop * Zr_mutation_rate)
  
  Z_mutants2 <- mating_type_table %>% # single Z+ -> Zr mutants in Z+Z+ individuals
    filter(zygote_has_two_Z) %>%
    mutate(zygote = str_replace_all(zygote, "Z[+]Z[+]", "Z+Zr"),
           prop = prop * Zr_mutation_rate * 2)
  
  Z_mutants3 <- mating_type_table %>% # double Z+ -> Zr mutants in Z+Z+ individuals
    filter(zygote_has_two_Z) %>%
    mutate(zygote = str_replace_all(zygote, "Z[+]Z[+]", "ZrZr"),
           prop = prop * Zr_mutation_rate * Zr_mutation_rate)
  
  Zr_mutants1 <- mating_type_table %>% # single Zr -> Z+ mutants in Zr/- individuals
    filter(zygote_has_one_Zr) %>%
    mutate(zygote = str_replace_all(zygote, "Zr", "Z+"),
           prop = prop * Zr_mutation_rate)
  
  Zr_mutants2 <- mating_type_table %>% # single Zr -> Z+ mutants in ZrZr individuals
    filter(zygote_has_two_Zr) %>%
    mutate(zygote = str_replace_all(zygote, "ZrZr", "Z+Zr"),
           prop = prop * Zr_mutation_rate * 2)
  
  Zr_mutants3 <- mating_type_table %>% # double Zr -> Z+ mutants in ZrZr individuals
    filter(zygote_has_two_Zr) %>%
    mutate(zygote = str_replace_all(zygote, "ZrZr", "Z+Z+"),
           prop = prop * Zr_mutation_rate * Zr_mutation_rate)
  
  W_mutants <- mating_type_table %>% 
    filter(zygote_has_W) %>%
    mutate(zygote = str_replace_all(zygote, "W[+]", "Wr"),
           prop = prop * Wr_mutation_rate)
  
  Wr_mutants <- mating_type_table %>% 
    filter(zygote_has_Wr) %>%
    mutate(zygote = str_replace_all(zygote, "Wr", "W+"),
           prop = prop * Wr_mutation_rate)
  
  # Subtract the mutants from the non-mutants' frequencies
  mating_type_table$prop[zygote_has_one_Z] <- 
    mating_type_table$prop[zygote_has_one_Z] - Z_mutants1$prop
  mating_type_table$prop[zygote_has_two_Z] <- 
    mating_type_table$prop[zygote_has_two_Z] - Z_mutants2$prop - Z_mutants3$prop
  mating_type_table$prop[zygote_has_one_Zr] <- 
    mating_type_table$prop[zygote_has_one_Zr] - Zr_mutants1$prop
  mating_type_table$prop[zygote_has_two_Zr] <- 
    mating_type_table$prop[zygote_has_two_Zr] - Zr_mutants2$prop - Zr_mutants3$prop
  mating_type_table$prop[zygote_has_W] <- 
    mating_type_table$prop[zygote_has_W] - W_mutants$prop
  mating_type_table$prop[zygote_has_Wr] <- 
    mating_type_table$prop[zygote_has_Wr] - Wr_mutants$prop
  
  rbind(mating_type_table, # bind mutants onto the mating type table, combine and simplify
        Z_mutants1, Z_mutants2, Z_mutants3,
        Zr_mutants1, Zr_mutants2, Zr_mutants3, 
        W_mutants, Wr_mutants) %>%
    group_by(mating_type, mother, father, zygote) %>%
    summarise(prop = sum(prop)) %>%
    as.data.frame() %>%
    arrange(mating_type, mother, father, -prop) %>%
    mutate(mating_type = as.character(mating_type),
           mother = as.character(mother),
           father = as.character(father))
}

# Helper function to convert the zygote frequencies to a cumulative sum.
# To create each offspring, we later roll random numbers between 0 and 1 with runif(),
# then see which is the largest index of 'prop' that is less than the random number.
# This will efficiently determine which zygote is produced, allowing us to roll all the 
# random numbers needed in one step only (making use of vectorisation for speed)
convert_probabilities <- function(mating_type_table){
  
  # Remove zygote classes with zero probability (e.g. because drive is 100% and Z+ or W+ are missing)
  mating_type_table <- mating_type_table %>% filter(prop != 0) 
  
  # Arrange probs from biggest to smallest within each mating type
  mating_type_table <- mating_type_table %>% 
    split(mating_type_table$mating_type) %>%
    map(~ .x %>% arrange(-prop)) %>%
    bind_rows()
  
  # convert to cumsum probabilities; for example,
  # The classic 1:2:1 Mendelian ratio has genotype frequencies of .25, .5, and .25
  # This is represented as c(0, 0.25, 0.75). 
  # Rolling runif() 0 to 0.25 yields geno 1, >.25 gives genotype 2, and >.75 gives genotype 3
  mating_type_table$prop <- mating_type_table$prop %>% 
    split(mating_type_table$mating_type) %>% # Get the vector of zygote probs for each mating type
    map(function(focal_props){
      focal_props <- cumsum(focal_props) 
      c(0, focal_props[-length(focal_props)]) 
    }) %>% combine()
  
  # Remove numbers so close 1 they are converted to 1 by floating point arithmetic
  # Effectively, we assume that sufficiently rare progeny classes never appear (<10^-16 I think)
  mating_type_table %>% filter(prop != 1)
}

Function to make a table of every genotype’s fitness

We assume that individuals with wild type sex chromosomes (Z+ and W+), as well as non-resistant autosomal alleles (genotype aabb), have a fitness of 1. The resistant alleles (Zr, Wr, A, and B), as well as the driving Z* allele, all potentially incur costs. These costs are multiplicative, such that the fitness cost \(c\) of e.g. having both Zr and A is \(c = 1 {\times} (1 - c_{Zr}) {\times} (1 - c_{A})\). The costs are also all dominant: having one copy of Z*, Zr, Wr, A, and B is equally costly as having two.

make_fitness_table <- function(cost_Zdrive_female,
                               cost_Zdrive_male,
                               cost_Wr,
                               cost_Zr,
                               cost_A,
                               cost_B,
                               mating_type_table){
  
  female_fitnesses <- mating_type_table %>% 
    select(mother) %>% distinct() %>%
    rename(genotype = mother) %>%
    mutate(w = 1,
           w = replace(w, grepl("Wr", genotype), (1 - cost_Wr)))
  
  female_fitnesses$w[grep("Z[*]", female_fitnesses$genotype)] <- 
    female_fitnesses$w[grep("Z[*]", female_fitnesses$genotype)] * (1 - cost_Zdrive_female)
  
  male_fitnesses <- mating_type_table %>% 
    select(father) %>% distinct() %>%
    rename(genotype = father) %>%
    mutate(w = 1,
           w = replace(w, grepl("Z[*]", genotype), (1 - cost_Zdrive_male)))
  
  fitness <- rbind(female_fitnesses, male_fitnesses)
  
  fitness$w[grep("Zr", fitness$genotype)] <- 
    fitness$w[grep("Zr", fitness$genotype)] * (1 - cost_Zr) 
  fitness$w[grep("A", fitness$genotype)] <- 
    fitness$w[grep("A", fitness$genotype)] * (1 - cost_A)
  fitness$w[grep("B", fitness$genotype)] <- 
    fitness$w[grep("B", fitness$genotype)] * (1 - cost_B)
  
  fitness %>% rename(fitness = w) 
}

Function to create the starting population

This function creates a population with a specified size, spread evenly across a specified number of patches, with specified allele frequencies in Hardy-Weinberg genotype frequencies.

make_initial_population <- function(initial_Zdrive, 
                                    initial_Zr,
                                    initial_Wr,
                                    initial_A,
                                    initial_B,
                                    initial_pop_size,
                                    n_patches,
                                    fitness_table){
  
  initial_freqs <- fitness_table %>% 
    rename(freq = fitness) %>% 
    mutate(sex = substr(genotype, 1, 4),
           auto1 = substr(genotype, 5, 6),
           auto2 = substr(genotype, 7, 8),
           freq = 0) 
  Z <- (1 - initial_Zdrive - initial_Zr)
  Zdrive <- initial_Zdrive
  Zr <- initial_Zr
  W <-  (1 - initial_Wr)
  Wr <- initial_Wr
  initial_freqs$sex[initial_freqs$sex == "Z+Z+"] <- Z ^ 2
  initial_freqs$sex[initial_freqs$sex == "Z*Z*"] <- Zdrive ^ 2
  initial_freqs$sex[initial_freqs$sex == "ZrZr"] <- Zr ^ 2
  initial_freqs$sex[initial_freqs$sex == "Z*Z+"] <- 2 * Zdrive * Z
  initial_freqs$sex[initial_freqs$sex == "Z+Zr"] <- 2 * Z * Zr
  initial_freqs$sex[initial_freqs$sex == "Z*Zr"] <- 2 * Zdrive * Zr
  initial_freqs$sex[initial_freqs$sex == "Z+W+"] <- Z * W
  initial_freqs$sex[initial_freqs$sex == "Z*W+"] <- Zdrive * W
  initial_freqs$sex[initial_freqs$sex == "ZrW+"] <- Zr * W
  initial_freqs$sex[initial_freqs$sex == "Z+Wr"] <- Z * Wr
  initial_freqs$sex[initial_freqs$sex == "Z*Wr"] <- Zdrive * Wr
  initial_freqs$sex[initial_freqs$sex == "ZrWr"] <- Zr * Wr
  
  initial_freqs[initial_freqs == "aa"] <- (1 - initial_A) ^ 2
  initial_freqs[initial_freqs == "Aa"] <- 2 * initial_A * (1 - initial_A)
  initial_freqs[initial_freqs == "AA"] <- initial_A ^ 2
  
  initial_freqs[initial_freqs == "bb"] <- (1 - initial_B) ^ 2
  initial_freqs[initial_freqs == "Bb"] <- 2 * initial_B * (1 - initial_B)
  initial_freqs[initial_freqs == "BB"] <- initial_B ^ 2
  for(i in 2:ncol(initial_freqs)) initial_freqs[, i] <- as.numeric(initial_freqs[, i])
  initial_freqs$freq <- with(initial_freqs, sex * auto1 * auto2)
  
  data.frame(genotype = sample(initial_freqs$genotype,
                               initial_pop_size,
                               prob = initial_freqs$freq,
                               replace = TRUE),
             patch = sample(n_patches, initial_pop_size, replace = TRUE),
             stringsAsFactors = FALSE) %>%
    group_by(patch, genotype) %>%
    summarise(n = n()) %>% arrange(patch, -n) %>%
    ungroup()
}
# test_pop <- make_initial_population(0.1, 0.1, 0.1, 0.1, 0.1, initial_pop_size = 1000, n_patches = 10, fitness_table = make_fitness_table())

Function to release the Z* individuals

This functions adds release_size Z*Z* males to the population, simulating a release of the engineered Z chromosome. We assume the release is localised, such that patch 1 gets most of it, then patch 2, and so on exponentially. For \(k\) patches, the proportion of the release to patch \(i\) is given by \(p_i = exp(i)^{-1} / (\sum_{i=1}^{k}exp(i)^{-1}/k)\).

release_Zd_individuals <- function(parent_list, n_patches, fitness_table, release_size, release_strategy){
  
  if(release_strategy == "one_patch"){
    
    # Release drive males in most productive patch
    patch_productivities <- parent_list[[1]] %>%
      group_by(patch) %>%
      summarise(productivity = sum(fecundity))

    released_males <- data.frame(
      patch = with(patch_productivities, patch[which.max(productivity)]), 
      genotype = "Z*Z*aabb", 
      n = release_size, 
      fitness = fitness_table$fitness[fitness_table$genotype == "Z*Z*aabb"],
      stringsAsFactors = FALSE)
  }
  
  # Scatter the release randomly and evenly across all patches
  if(release_strategy == "all_patches"){
    
    released_males <- data.frame(
      patch = 1:n_patches, 
      genotype = "Z*Z*aabb", 
      n = c(rmultinom(1, release_size, rep(1 / n_patches, n_patches))), 
      fitness = fitness_table$fitness[fitness_table$genotype == "Z*Z*aabb"],
      stringsAsFactors = FALSE) 
  }
  
  list(
    parent_list[[1]], # females
    rbind(parent_list[[2]] %>% select(-is_female) %>% as.data.frame(), 
          released_males) %>% # original+released males
      group_by(patch, genotype, fitness) %>%
      summarise(n = sum(n)) %>% 
      ungroup() %>% as.data.frame()
  )
}

Functions to calculate density-dependent fecundity

Calculate density at the global scale

# females are weighted by their fitness, 
# males are considered all the same, but are weighted by male_weighting 
calc_global_density <- function(pop, male_weighting){
  sum(pop$n[pop$is_female] * pop$fitness[pop$is_female]) + 
    sum(pop$n[!pop$is_female]) * male_weighting
}

Calculate density at the local scale

calc_patch_densities <- function(patch_densities, 
                                 n_patches, male_weighting){
  data.frame( 
    patch = patch_densities$patch,
    density = n_patches * 
      (patch_densities$weighted_number_of_females + 
                   male_weighting * patch_densities$number_of_males)
  )
}

Function to pick the parents of the next generation

This function is run once per generation in the simulation. It randomly picks the male and female parents of each new offspring, where parents with high fitness genotypes (e.g. Z+Z+aabb) are more likely to be picked than parents with low fitness genotypes (e.g. Z*Z*AABB). We assume a promiscuous mating system, such that individuals of both sexes potentially produce offspring with multiple partners. The fitness of each individual is stochastic, so each generation there will be some males and females that do not breed, and others that reproduce a lot.

We assume that all mating occurs within patches. Males thus always compete locally, meaning that alleles in males always experience “soft” selection (resulting from differences in lifetime mating success). For females, we model global and local competition separately (corresponding to hard and soft selection respectively). Selection on females results from differences in the lifetime number of offspring produced (this could be fecundity and/or viability selection).

We assume that under ideal conditions (e.g. in a near-empty patch or meta-population), a female with genotypic fitness \(w_j\) has an expected fecundity of \(w_j * max_fecundity\), while under crowded conditions, each females’ expected fecundity tends towards \(0\).

Under soft selection, a female in a PATCH containing \(n_f\) females has an expected fecundity of \(w_j * max_fecundity / (n_patches * ∑_i n_f w_i)\) offspring.

Under hard selection, a female in a METAPOPULATION containing \(n_f\) females has an expected fecundity of \(w_j * max_fecundity / (∑_i n_f w_i)\) offspring.

The actual number of progeny per female is generated stochastically by drawing from a Poisson distribution.

# Helper function
# given a vector of pre-rolled uniform random numbers (rand), a vector of cumulative sum probabilities,
# and a vector of the genotypes to which probability corresponds, pick the genotypes
random_picker <- function(rand, probabilities, genotypes){
  sapply(1:length(rand), function(i) genotypes[sum(rand[i] > probabilities)])
}
# cppFunction('CharacterVector random_picker(
#     NumericVector rand,
#     NumericVector probabilities,
#     CharacterVector genotypes) {
#   int nRand = rand.size();
#   int nGenotypes = genotypes.size();
#   CharacterVector out(nRand);
# 
#   /* First check if there is only one genotype: no random picking needed */
#   if(nGenotypes == 1){
#     for(int i = 0; i < nRand; ++i) {
#       out[i] = genotypes[0];
#     }
#     return out;
#   }
# 
#   int stoppingPoint = nGenotypes - 1;
#   LogicalVector booleans(nGenotypes);
#   booleans[0] = true;
#   int picked = 0;
#   bool keepGoing = true;
# 
#   /* Loop over the random numbers in rand */
#   for(int i = 0; i < nRand; ++i) {
# 
#     /* Find which elements of probabilities are < rand[i] */
#     for(int j = 1; j < nGenotypes; ++j) {
#       booleans[j] = rand[i] > probabilities[j];
#     }
# 
#     picked = 0;
#     keepGoing = true;
# 
#     /* Loop over the booleans, and find the last element that is true */
#     while(keepGoing == true) {
#        if(booleans[picked + 1] == true) {
#           picked += 1;
#           if(picked == stoppingPoint) keepGoing = false;
#        } else keepGoing = false;
#     }
# 
#     out[i] = genotypes[picked];
#   }
#   return out;
# }')


# # 
# x <- runif(100)
# random_picker(x, c(0, 0.5, 0.9, 0.95), c("A", "B", "C", "D"))
# random_picker_R(x, c(0, 0.5, 0.9, 0.95), c("A", "B", "C", "D"))
# microbenchmark::microbenchmark(C = random_picker(x, c(0, 0.5, 0.9, 0.95), c("A", "B", "C", "D")),
#                R = random_picker_R(x, c(0, 0.5, 0.9, 0.95), c("A", "B", "C", "D")), times = 10000)



  # Calculate (global) density-dependent fecundity accrording to Richards model (Fowler 1981):
  # Females of genotype i have expected fecundity of w_i * max_fecundity * (Richards density regulation)
calc_fecundities <- function(females, 
                             max_fecundity, 
                             carrying_capacity, 
                             density_dependence_shape){
  
  # Under softness > 0, it is possible to get patch densities > carrying capacity
  # Deal with this by setting density = carrying capacity
  females$density[females$density > carrying_capacity] <- carrying_capacity
  
  # Calculate expected fecundity for each female patch/genotype combination
  fecundity <- females$n * (1 + females$fitness * max_fecundity * 
     (1 - (females$density / carrying_capacity) ^ density_dependence_shape))
  
  # Roll Poisson random numbers using expected values
  fecundity <- rpois(nrow(females), fecundity)
  
  # Check if the number of progeny exceeds the carrying capacity
  if(sum(fecundity) > carrying_capacity){

    # If it does, randomly select carrying_capacity - 1 surviving offspring
    # This makes a df with col1 = rows in `females`, col2 is their fecundity
    to_keep <- melt(table(sample(rep(1:nrow(females), 
                          times = fecundity),
                      carrying_capacity)), 
                    value.name = "fecundity")
    females$fecundity <- 0
    females$fecundity[to_keep$Var1] <- to_keep$fecundity
    
  } else {
    females$fecundity <- fecundity
  }
  females
}

pick_mothers <- function(pop, fitness_table, n_patches,
                         softness, 
                         male_weighting, 
                         max_fecundity,
                         carrying_capacity,
                         density_dependence_shape){
  
  pop <- pop %>% 
    filter(n != 0) %>% # double check there are no empty genotypes....
    left_join(fitness_table, by = "genotype") %>% # add fitness column
    mutate(is_female = grepl("W", genotype)) # Add column of TRUE/FALSE for sex

  # Make df of the patch-specific male and female counts
  # (needed later as well as immediately)
  patch_densities <- pop %>% 
    group_by(patch) %>% 
    summarise(weighted_number_of_females = sum(n[is_female] * fitness[is_female]),
              number_of_males = sum(n[!is_female])) 
  
  
  # Calculate global density, unless selection is 100% soft
  # Note that all-male and all-female patches still contribute to the global density
  if(softness != 1) {
    global_density <- calc_global_density(pop, male_weighting)
  }
  
  # here are the parents eligible to breed (trim off all single-sex patches)
  patch_densities <- patch_densities %>%
    filter(weighted_number_of_females != 0 & number_of_males != 0)
  
  # Return NULL if pop has gone extinct
  if(nrow(patch_densities) == 0) return(NULL) 
  
  pop     <- pop %>% filter(patch %in% patch_densities$patch)
  females <- pop %>% filter(is_female)
  males   <- pop %>% filter(!is_female) 

  # under totally hard selection, we can skip calculation of the patch-specific densities
  if(softness == 0){ 
    females$density <- rep(global_density, nrow(females))
  } else { # under (partially or fully) soft selection, 
           # calculate the patch-specific densities 
    patch_densities <- calc_patch_densities(
      patch_densities, n_patches, male_weighting)
    
    if(softness != 1){ # softness == 1, only local density matters
      # for 0 < softness < 1, we need both local and global density
      patch_densities$density <-
        softness * patch_densities$density + 
        (1 - softness) * global_density
    } 
    females <- left_join(females, patch_densities, by = "patch") 
  }
  
  # Randomly generate fecundity from the combined expected reproductive 
  # output of each genotype/patch combination. e.g. 10 ZWaabb females,
  # with expected fecundity of 2.5 each, have a total of rpois(1, 25) offspring
  females <- calc_fecundities(females, 
                              max_fecundity, 
                              carrying_capacity, 
                              density_dependence_shape)
  
  # Number of offspring born, by mother genotype and patch: 
  females <- females %>% 
    select(patch, genotype, fecundity)
  
  # This is `parent_list` for the next function
  return(list(females, males))
}

pick_fathers <- function(parent_list){
  
  if(is.null(parent_list)) return(NULL) # Return NULL if pop has gone extinct
  
  offspring_per_patch <- parent_list[[1]] %>%
    group_by(patch) %>% 
    summarise(offspring = sum(fecundity)) %>%
    filter(offspring != 0)

  # Generate total_offspring random numbers to determine each offspring's father,
  # And split these numbers by patch
  rand <- runif(sum(offspring_per_patch$offspring)) %>% 
    split(rep(offspring_per_patch$patch, 
              times = offspring_per_patch$offspring))
  
  if(length(rand) < 1) print("This one is bad")
  
  # Now loop over all the patches with >0 offspring, and find fathers 
  # for individual offspring (therefore, we assume promiscuity)
  lapply(1:nrow(offspring_per_patch), function(i){
    
    # get the numbers of each male and female genotype in the focal patch
    females_in_patch <- parent_list[[1]] %>% 
      filter(patch == offspring_per_patch$patch[i])
    males_in_patch   <- parent_list[[2]] %>% 
      filter(patch == offspring_per_patch$patch[i])

    # Find the siring probability of each male genotype, convert to cumsum
    father_probabilities <- males_in_patch$n * males_in_patch$fitness
    
    father_probabilities <- cumsum(father_probabilities / sum(father_probabilities))
    
    data.frame(patch = offspring_per_patch$patch[i],
               parents = paste(rep(females_in_patch$genotype,
                                   times = females_in_patch$fecundity), 
                               random_picker(rand[[i]], 
                                             c(0, father_probabilities[-length(father_probabilities)]), 
                                             males_in_patch$genotype)), 
               stringsAsFactors = FALSE)
  }) %>% do.call("rbind", .) %>%
    group_by(patch, parents) %>% 
    summarise(n = n()) %>% as.data.frame()
}

Function to create offspring by random meiosis and mutation

Most of the hard work has already been done, when we created the mating_type_table above. This function takes a data frame of parents (giving the number of offspring produced by each different mating type in each patch), looks up the expected zygote frequencies in mating_type_table, and then randomly generates the zygotes.

make_offspring <- function(parents, mating_type_table){
  
  if(is.null(parents) || sum(parents$n) == 0) return(NULL) # Return NULL if pop has gone extinct
  
  # Number of offspring from each mating type, summed across patches
  offspring_per_mating_type <- parents %>%
    group_by(parents) %>%
    summarise(n = sum(n))
  
  # Restrict the big mating_type_table, for faster searching
  mating_type_table <- mating_type_table %>% 
    filter(parents %in% offspring_per_mating_type$parents)
  
  # Generate a random number for every offspring, and split between each of the i mating types
  rand <- runif(sum(parents$n)) %>% 
    split(rep(offspring_per_mating_type$parents,
                    times = offspring_per_mating_type$n))
  
  if(length(rand) < 1) print("THIS one is bad")

  # Loop over all the mating types 
  lapply(1:nrow(offspring_per_mating_type), function(i) {
    focal.parents <- parents %>% # Get the patch ids for each offspring (same as parents')
      filter(parents == offspring_per_mating_type$parents[i])
    
    focal.zygotes <- mating_type_table %>% # Get the zygote proportions for the focal mating type
      filter(parents == offspring_per_mating_type$parents[i])
    
    data.frame(genotype = random_picker(rand[[i]], 
                                        focal.zygotes$prop,
                                        focal.zygotes$zygote),
               patch = rep(focal.parents$patch,
                           times = focal.parents$n),
               stringsAsFactors = FALSE) %>%
      group_by(patch, genotype) %>%
      summarise(n = n())
  }) %>% bind_rows() %>%
    group_by(patch, genotype) %>%
    summarise(n = sum(n)) %>%
    ungroup() 
}

Function implementing migration

This function moves newly-born indiviuals to a different randomly-selected patch. Males and females potentially migrate at different rates.

migrate_population <- function(pop, 
                               n_patches,
                               patch_landing_probabilities, # passed as arg to save time
                               male_migration_prob,
                               female_migration_prob,
                               migration_type){
  
  if(is.null(pop)) return(NULL) # Return NULL if pop has gone extinct
  if(n_patches == 1) return(pop) # don't do anything if there is only one patch
  
  nrow_pop <- nrow(pop)
  migration_probs <- rep(male_migration_prob, nrow_pop)
  migration_probs[grep("W", pop$genotype)] <- female_migration_prob
  
  # For efficiency, the call to rmultinom() below allows migrants 
  # to return to their home patch, which would mean they are not really migrants!
  # To make sure that male_migration_prob and female_migration_prob correctly
  # give the % individuals that move to another patch, the function
  # internally boosts the migration rates by a constant, c, which varies
  # based on how many patches there are. 
  # To understand how I derived c, here's the algebra.
  
  # The problem is that the following inequality is always true if there is a finite number of patches
  # real_migration_rate < stated_migration_rate * (1 - proportion.returning.to.same.patch)
  
  # To make real_migration_rate = stated_migration_rate, we can add a constant c,
  # And solve for c when real_migration_rate = stated_migration_rate:
  # real_migration_rate = (stated_migration_rate + c) * 
  #                                           (1 - proportion.returning.to.same.patch)
  # Rearranging, we find that c is:
  # c = (stated_migration_rate / (1 - fraction.returning)) - stated_migration_rate
  
  if(migration_type == "global") {
    migration_probs <- migration_probs + # Add on c. Not needed for local dispersal.
      (migration_probs / (1 - patch_landing_probabilities[1])) - migration_probs
  }
  
  n_migrants <- rbinom(nrow_pop, pop$n, prob = migration_probs) 
  if(sum(n_migrants) == 0) return(pop) # return pop if there are no migrants
  
  pop$n <- pop$n - n_migrants # remove migrants from their original patches
  
  migrants <- pop %>% # Find the number of migrants for each genotype
    mutate(n = n_migrants) %>%
    filter(n != 0) 
  
  if(migration_type == "global"){

    migrants <- migrants %>% # tally the migrants' genotypes
    group_by(genotype) %>%
      summarise(n = sum(n))
    
    pop <- rbind(pop, # Find new patches for the migrants, and return them to the pop
                 lapply(1:nrow(migrants), function(i){ # for each genotype in the migrant pool...
                   data.frame(patch = 1:n_patches,
                              genotype = migrants$genotype[i],
                              n = rmultinom(1, 
                                            size = migrants$n[i], 
                                            prob = patch_landing_probabilities)[,1],
                              stringsAsFactors = FALSE)
                 }) %>% bind_rows() %>% filter(n != 0)) 
  } else { # else for LOCAL migration...
    pop <- rbind(
      pop,
      data.frame(patch = rep(migrants$patch, migrants$n),
                 genotype = rep(migrants$genotype, migrants$n),
                 n = 1,
                 stringsAsFactors = FALSE) %>%
        mutate(patch = patch + sample(c(-1, 1), n(), replace = TRUE), # add -1 or +1 to patch address
               patch = replace(patch, patch == 0, n_patches),  # going to patch 0 brings you to patch k
               patch = replace(patch, patch > n_patches, 1)))  # going to patch k+1 brings you to patch 1 
  }
  pop %>%
    group_by(patch, genotype) %>%
    summarise(n = sum(n)) %>%
    ungroup()
}

Function to calculate allele frequencies in the population

get_allele_freqs <- function(pop, generation){
  pop_size <- sum(pop$n)
  n_alleles <- 2 * pop_size
  
  freqs <- data.frame(
      Z = pop$n * str_count(pop$genotype, "Z[+]") / n_alleles, 
      Zd = pop$n * str_count(pop$genotype, "Z[*]") / n_alleles,
      Zr = pop$n * str_count(pop$genotype, "Zr") / n_alleles,
      W = pop$n * str_count(pop$genotype, "W[+]") / n_alleles,
      Wr = pop$n * str_count(pop$genotype, "Wr") / n_alleles,
      A = pop$n * str_count(pop$genotype, "A") / n_alleles,
      B = pop$n * str_count(pop$genotype, "B") / n_alleles,
      females = 0, # placeholder
      N = 0
    ) %>% summarise_all(sum) %>% gather(allele, frequency)
  freqs$frequency[8] <- (freqs$frequency[4] + freqs$frequency[5]) * 2 # females
  freqs$frequency[9] <- pop_size # pop size
  freqs$frequency[1:3] <- freqs$frequency[1:3] / sum(freqs$frequency[1:3]) # Z alleles
  freqs$frequency[4:5] <- freqs$frequency[4:5] / sum(freqs$frequency[4:5]) # W alleles
  freqs %>% mutate(frequency  = round(frequency, 3), # round to save disk space
                   generation = generation)
}

Function to set up several mating type tables

If running the simulation on many parameter spaces, we will first need to set up several mating_type_tables, each of which takes several seconds of compute time. To speed this up, this function identifies all the unique mating_type_tables that are implied by the table parameters, and generates them all in parallel using mclapply().

set_up_mating_type_tables <- function(parameters, 
                                      cores, 
                                      wd,
                                      overwrite = FALSE){
  
  path_to_tables <- file.path(wd, "data/mating_type_tables/")
  if(overwrite) unlink(list.files(path_to_tables, full.names = TRUE))
  
  parameters <- parameters %>%
    mutate(mating_table = paste(W_shredding_rate,
                                Z_conversion_rate, 
                                Zr_creation_rate,
                                Zr_mutation_rate,
                                Wr_mutation_rate, sep = "_")) 
  
  uniques    <- unique(parameters$mating_table)
  file.names <- paste(uniques, ".rds", sep = "")
  uniques    <- uniques[!(gsub(path_to_tables, "", file.names) %in% list.files(path_to_tables))]
  
  if(length(uniques) > 0){
    print("Setting up the mating type tables...")
    file.names <- paste(
      path_to_tables, 
      file.names[!(file.names %in% list.files(path_to_tables))], sep = "")
    
    uniques <- lapply(strsplit(uniques, split = "_"), as.numeric)
    mclapply(1:length(uniques), function(i){
      make_mating_type_table(W_shredding_rate  = uniques[[i]][1], 
                             Z_conversion_rate = uniques[[i]][2],
                             Zr_creation_rate  = uniques[[i]][3]) %>%
        add_mutants(Zr_mutation_rate = uniques[[i]][4], 
                    Wr_mutation_rate = uniques[[i]][5]) %>%
        convert_probabilities() %>%
        mutate(parents = paste(mother, father)) %>%
        select(mother, father, parents, zygote, prop) %>%
        saveRDS(file = file.names[i])
      return(NULL)
    }, mc.cores = cores)
  }
  uniques <- unique(parameters$mating_table)
  
  # I added this section to fix a "bug", but I think it was just an issue with the Spartan server because the bug mysteriously fixed itself!! I think it is never actually triggered now

  files <- paste(path_to_tables, uniques, ".rds", sep = "")
  mating_tables <- vector(length(files), mode = "list")
  for(i in 1:length(mating_tables)){
    mating_tables[[i]] <- NA
    while(is.na(mating_tables[[i]])){
      try(mating_tables[[i]] <- readRDS(files[[i]]))
      # if a mating table was not loadable, delete it and try again...
      if(is.na(mating_tables[[i]])){
        unlink(files[[i]])
        corrupted <- as.numeric(strsplit(gsub("[.]rds", "", tail(strsplit(files[[i]], split = "/")[[1]], 1)), split = "_")[[1]])
        
        print("A file was corrupted, deleted it and trying again...")
        print(corrupted)
        
        make_mating_type_table(W_shredding_rate  = corrupted[1], 
                               Z_conversion_rate = corrupted[2],
                               Zr_creation_rate  = corrupted[3]) %>%
          add_mutants(Zr_mutation_rate = corrupted[4], 
                      Wr_mutation_rate = corrupted[5]) %>%
          convert_probabilities() %>%
          mutate(parents = paste(mother, father)) %>%
          select(mother, father, parents, zygote, prop) %>%
          saveRDS(file = files[[i]])
        print("The corrupted file has been re-made")
      }
    }
  }
  print("done them all")
  
  parameters$mating_table <- (1:length(uniques))[match(parameters$mating_table, uniques)]
  print("Mating tables are all complete; ready to start")
  return(list(parameters, mating_tables))
}

Function to run the simulation

This function runs the simulation, for a single parameter space specified by parameters[row, ], using a pre-made mating_type_table.

run_simulation <- function(row,
                           parameters,
                           mating_type_table,
                           wd){
  
  output.file <- sample(99999999999999, 1)
  set.seed(as.integer(substr(output.file, 1, 5))) # Note that the file name is also the seed
  output.file <- paste(file.path(wd, "data/sim_results/"),
                       output.file, ".rds", sep = "")
  
  # Keep only the correct mating table from the list of tables made by set_up_mating_type_tables()
  mating_type_table <- mating_type_table[[parameters$mating_table[row]]]

  # Make a table holding the quality/fitness of each genotype
  fitness_table <- make_fitness_table(
    cost_Zdrive_female = parameters$cost_Zdrive_female[row],
    cost_Zdrive_male = parameters$cost_Zdrive_male[row],
    cost_Wr = parameters$cost_Wr[row],
    cost_Zr = parameters$cost_Zr[row],
    cost_A = parameters$cost_A[row],
    cost_B = parameters$cost_B[row],
    mating_type_table = mating_type_table)
  
  # Make the initial population
  pop <- make_initial_population(
    initial_Zdrive = parameters$initial_Zdrive[row], 
    initial_Zr = parameters$initial_Zr[row],
    initial_Wr = parameters$initial_Wr[row],
    initial_A = parameters$initial_A[row],
    initial_B = parameters$initial_B[row],
    initial_pop_size = parameters$initial_pop_size[row],
    n_patches = parameters$n_patches[row],
    fitness_table = fitness_table)
  
  # Assign remaining parameters to variables for faster access in the while() loops
  release_size <- parameters$release_size[row]
  burn_in <- parameters$burn_in[row]
  softness <- parameters$softness[row]
  male_weighting <- parameters$male_weighting[row]
  carrying_capacity <- parameters$carrying_capacity[row]
  density_dependence_shape <- parameters$density_dependence_shape[row]
  male_migration_prob <- parameters$male_migration_prob[row]
  female_migration_prob <- parameters$female_migration_prob[row]
  migration_type <- parameters$migration_type[row]
  n_patches <- parameters$n_patches[row]
  max_fecundity <- parameters$max_fecundity[row]
  patch_landing_probabilities <- rep(1/parameters$n_patches[row], 
                                     parameters$n_patches[row])
  allele_freqs <- vector(burn_in + parameters$generations[row], mode = "list")
  i <- 0L
  generation_extinct <- NA # Declare outputs
  generation_Zd_extinct <- NA
  generation_W_extinct <- NA # wild-type W, aka W+
  generation_Zd_fixed <- NA #If shredding rate is less than 100%, pop can sometimes persist with the Z*
  outcome <- "Timer expired" 
  final_pop <- "placeholder"
  
  # Do a few generations of burn-in, to allow population size and distribution to stabilise
  while(i < burn_in){
    i <- i + 1
    # Pick the parents, make the offspring, migrate them, then repeat
    pop <- pick_mothers(pop = pop,
                 fitness_table = fitness_table,
                 n_patches = n_patches,
                 softness = softness,
                 male_weighting = male_weighting,
                 max_fecundity = max_fecundity,
                 carrying_capacity = carrying_capacity,
                 density_dependence_shape = density_dependence_shape) %>%
      pick_fathers() %>% 
      make_offspring(mating_type_table = mating_type_table) %>%
      migrate_population(n_patches = n_patches,
                         patch_landing_probabilities = patch_landing_probabilities,
                         male_migration_prob = male_migration_prob,
                         female_migration_prob = female_migration_prob, 
                         migration_type = migration_type)
    
    # 1. Check if the pop has gone extinct,
    if(is.null(pop)){ 
      generation_extinct <- NA
      outcome <- "Didn't survive burn-in"
      final_pop <- NA
      i <- Inf
    } else allele_freqs[[i]] <- get_allele_freqs(pop, i) # record allele freqs
         
    # print(allele_freqs[[i]])
  }
  
  # If pop survived burn-in, add Zd individuals and begin iterating over generations
  if(!is.null(pop)){
    i <- 0L

    # Add the Zd individuals, AFTER calculating density-dependent fecundity (or pop may crash due to the released individuals breaking the carrying capacity)
    pop <- pick_mothers(pop = pop,
                 fitness_table = fitness_table,
                 n_patches = n_patches,
                 softness = softness,
                 male_weighting = male_weighting,
                 max_fecundity = max_fecundity,
                 carrying_capacity = carrying_capacity,
                 density_dependence_shape = density_dependence_shape) %>%
      release_Zd_individuals(n_patches = n_patches, 
                             fitness_table = fitness_table,
                             release_size = release_size,
                             release_strategy = parameters$release_strategy[row]) %>%
      pick_fathers() %>%
      make_offspring(mating_type_table = mating_type_table) %>%
      migrate_population(n_patches = n_patches,
                         patch_landing_probabilities = patch_landing_probabilities,
                         male_migration_prob = male_migration_prob,
                         female_migration_prob = female_migration_prob, 
                         migration_type = migration_type)
  } else i <- Inf # otherwise, skip the iterations and go to the end
  
  zero_one <- c(0,1) # Declare outside the loop
  Zd_fixed_counter <- 0 

  while(i < parameters$generations[row]){
    i <- i + 1
    # Pick the parents, make the offspring, migrate them, then repeat
    pop <- pick_mothers(pop = pop,
                 fitness_table = fitness_table,
                 n_patches = n_patches,
                 softness = softness,
                 male_weighting = male_weighting,
                 max_fecundity = max_fecundity,
                 carrying_capacity = carrying_capacity,
                 density_dependence_shape = density_dependence_shape) %>%
      pick_fathers() %>% 
      make_offspring(mating_type_table = mating_type_table) %>%
      migrate_population(n_patches = n_patches,
                         patch_landing_probabilities = patch_landing_probabilities,
                         male_migration_prob = male_migration_prob,
                         female_migration_prob = female_migration_prob, 
                         migration_type = migration_type)
    
    # 1. Check if the pop has gone extinct,
    if(is.null(pop)){ 
      generation_extinct <- i
      outcome <- "Population extinct"
      final_pop <- NA
      i <- Inf
    } else { 
      # 2. Check if females have gone extinct (also mark as extinct)
      if (!any(str_detect(pop$genotype, "W"))) { 
        generation_extinct <- i
        outcome <- "Population extinct"
        final_pop <- NA
        i <- Inf
      } else {
        # 3. record the allele frequencies
        i_burnin <- i + burn_in
        allele_freqs[[i_burnin]] <- get_allele_freqs(pop, i_burnin)
        # print(allele_freqs[[i_burnin]])
        
        # 4. check if driving Z has gone extinct or reached fixation,
        # or if wild-type W has been completely replaced by Wr 
        if (allele_freqs[[i_burnin]]$frequency[2] %in% zero_one | 
            allele_freqs[[i_burnin]]$frequency[4] == 0) { 
          if (allele_freqs[[i_burnin]]$frequency[2] == 0){
            generation_Zd_extinct <- i_burnin
            outcome <- "Zd extinct"
            final_pop <- NA
            i <- Inf
          } else if (allele_freqs[[i_burnin]]$frequency[2] == 1) { 
            if(Zd_fixed_counter == 0) generation_Zd_fixed <- i
            Zd_fixed_counter <- Zd_fixed_counter + 1
            if(Zd_fixed_counter == 20) { # If Zd has been fixed for 20 generations without causing a crash...
              outcome <- "Zd fixed without extinction"
              final_pop <- NA
              i <- Inf
            }
          } else if (allele_freqs[[i_burnin]]$frequency[4] == 0) {   
            generation_W_extinct <- i_burnin
            outcome <- "Wr fixed"
            final_pop <- NA
            i <- Inf
          }
        }
      }
    } # End of "If not extinct..."
  } # End of while()
  if(!is.na(final_pop)) final_pop <- pop
  print(outcome)
  
  list(  # return a list:               
    parameters[row, ] %>% # [[1]]: A 1-row dataframe holding the parameter space and outcome
      cbind(data.frame(generation_extinct, 
                       generation_Zd_extinct, 
                       generation_W_extinct, 
                       generation_Zd_fixed,
                       outcome, 
                       stringsAsFactors = FALSE)), 
    
    allele_freqs %>%      # [[2]]: The allele frequencies, for each generation
      bind_rows() %>% 
      select(generation, allele, frequency),
    
    final_pop) -> output         # [[3]] final population (if the generation timer ran out; otherwise NA)
    saveRDS(output, file = output.file)
    output
}

Function to run the simulation on all parameter spaces

This is a wrapper function that calls run_simulation on all rows in parameters, using a specified number of CPU cores to run them in parallel using mclapply().

do_all_parameters <- function(parameters, over_write, cores, wd = NULL){
  
  if(!is.null(wd)) setwd(wd)
  else wd <- getwd()
  
  mating_type_tables <- set_up_mating_type_tables(
    parameters, cores = cores, 
    overwrite = FALSE, # Change this manually if needed
    wd = wd) 
  print("check this too")
  parameters <- mating_type_tables[[1]] # annotate the parameters with the mating tables numbers
  mating_type_tables <- mating_type_tables[[2]] # Get the mating tables themselves
  print("Launching multicore job...")
  mclapply(1:nrow(parameters), 
           function(i){
             run_simulation(
               row = i, 
               parameters = parameters,
               mating_type_table = mating_type_tables,
               wd = wd)
           }, mc.cores = cores)
}

Combine all the results files

combine_results_files <- function(chunk, all_files, 
                                  wd = getwd()){
  
  setwd(wd)
  print(paste("starting on", chunk, "of", length(all_files)))
  vector_of_file_names <- all_files[[chunk]]
  seeds <- lapply(strsplit(vector_of_file_names, split = "/"), function(x) x[length(x)])
  
  # Get the filesnames/seeds for each of the 10,000 files in the chunk
  filenames <- gsub("[.]rds", "", seeds)
  
  # Open a file, or delete it if the server saved a corrupted file
  attempt_to_open <- function(file){
    tryCatch(
      # 1. Try to open the RDS file
      expr = readRDS(file),
      
      # 2. If it doesn't open, delete the file and print its name, and return NA
      error = function(cond) {
        print(file)
        unlink(file)
        return(NA)})
  }
  
  # Open all 10,000 files in the chunk, and store as a list. Each element contains either a 3-element list, or NA
  list_of_lists <- lapply(vector_of_file_names, attempt_to_open)

  # Only keep the elements and file names for files that opened properly
  to_keep <- sapply(list_of_lists, function(x) length(x) == 3)
  filenames <- filenames[to_keep]
  list_of_lists <- list_of_lists[to_keep]
  rm(to_keep)

  data.frame(id = filenames,
             lapply(list_of_lists, function(x) x[[1]]) %>%
               do.call("rbind", .)) %>% 
  saveRDS(paste("data/results_", chunk, ".rds", sep = ""))
  
  allele_freqs <- lapply(list_of_lists, function(x) x[[2]])
  names(allele_freqs) <- filenames
  saveRDS(allele_freqs, paste("data/allele_freqs_", chunk, ".rds", sep = ""))
  
  final_pop <- lapply(list_of_lists, function(x) x[[3]])
  names(final_pop) <- filenames
  saveRDS(final_pop, paste("data/final_pop_", chunk, ".rds", sep = ""))
}

Parse the results files for plotting

add_parameters()

This function takes a dataframe extracted from the big results file (df), and uses left_join() to add the parameter spaces used to generate those data. For example, one can grab a subset of the time-to-extinction results using pluck(results, "generation_extinct") %>% filter(), and then add the parameter values corresponding to each result using this function.

add_parameters <- function(df){
  df %>% # Join a results table with all the variable para values
    left_join(results$parameters[, names(results$parameter) %in% 
                                   names(results$parameters %>% 
                                           map(function(x) length(unique(x))))], 
              by = "id")
}

remove_nonvariable_parameters()

This function removes all columns that do not vary at all (e.g. parameters like the maximum number of generations, which was set to a single value in all simulations).

remove_nonvariable_parameters <- function(df){
  df[, apply(df, 2, function(x) length(unique(x))) > 1]
}
# Find parameter spaces that DID vary in the parameter sheet but were in 100% of the crashes
find_failures <- function(){
  if(!exists("results")) results <- readRDS("data/all_results.rds")

  varies_in_results <- remove_nonvariable_parameters(results[[1]] %>% select(-id)) %>% names()
  
  remove_VARIABLE_parameters <- function(df){
  df[, apply(df, 2, function(x) length(unique(x))) == 1]
}
  
  list_of_lists <- mclapply(list.files("data/sim_results", full.names = TRUE), 
                            readRDS, mc.cores = 4)
  
  failed_runs <- which(map_lgl(list_of_lists, function(i) is.null(i[[3]])))
  failed_paras <- lapply(list_of_lists[failed_runs], function(x) x[[1]]) %>% bind_rows() %>% remove_VARIABLE_parameters()
  failed_paras[, names(failed_paras) %in% varies_in_results]
}
# find_failures() # softness = 0 and cost_Zdrive_female = 0.01?
pluck_allele_freqs <- function(model_id, allele_freqs_list){
  allele_freqs_list[as.character(model_id)][[1]]
}

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] tibble_2.0.99.9000 readr_1.1.1        rslurm_0.4.0      
[4] Rcpp_1.0.0         reshape2_1.4.3     stringr_1.3.1     
[7] tidyr_0.8.2        purrr_0.3.1        dplyr_0.8.0.1     

loaded via a namespace (and not attached):
 [1] knitr_1.20        whisker_0.3-2     magrittr_1.5     
 [4] workflowr_1.3.0   hms_0.4.2         tidyselect_0.2.5 
 [7] R6_2.4.0          rlang_0.3.1       plyr_1.8.4       
[10] tools_3.5.1       git2r_0.23.0      htmltools_0.3.6  
[13] yaml_2.2.0        rprojroot_1.3-2   digest_0.6.18    
[16] assertthat_0.2.0  crayon_1.3.4      fs_1.2.6         
[19] glue_1.3.0.9000   evaluate_0.11     rmarkdown_1.10   
[22] stringi_1.3.1     compiler_3.5.1    pillar_1.3.1.9000
[25] backports_1.1.2   pkgconfig_2.0.2