Last updated: 2021-08-31
Checks: 7 0
Knit directory: BreedingSchemeOpt/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20210422)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 211ab47. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/images/.DS_Store
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Untracked files:
Untracked: code/scrap.R
Untracked: data/baselineScheme.gsheet
Untracked: data/siErrorVarEst_byTrialType_directApproach_2021Aug25.rds
Untracked: output/benchmark_sim.rds
Untracked: output/benchmark_sims5.rds
Untracked: output/burnInSims_bsp1_iita_2021Aug27.rds
Untracked: output/burnInSims_bsp2_iita_2021Aug27.rds
Untracked: output/burnInSims_bsp3_iita_2021Aug27.rds
Untracked: output/burnInSims_iita_2021Aug27.rds
Untracked: output/burnIn_test.rds
Untracked: output/postBurnInGS_test.rds
Untracked: output/postBurnIn_test.rds
Untracked: output/test_burnIn_sim.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/baselineSim.Rmd
) and HTML (docs/baselineSim.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 211ab47 | wolfemd | 2021-08-31 | Small test simulations with post burn-in GS vs. PS |
Rmd | 8d65f27 | wolfemd | 2021-08-30 | Post burn-in test of new GS functions underway |
html | e0d20bd | wolfemd | 2021-08-27 | Build site. |
Rmd | 9d369ee | wolfemd | 2021-08-27 | Publish burnInSims with the toy example completed and the full analysis almost ready to run. |
html | e210a1f | wolfemd | 2021-08-19 | Build site. |
Rmd | 5914d8d | wolfemd | 2021-08-19 | Publish initial sims towards a baseline set of sims using runBreedingScheme_wBurnIn |
Rmd | c2e379e | wolfemd | 2021-08-13 | Rebuild repo removing the “Group” part. Project is to contain BOTH the “Group” and a separate section just for the actual simulation analyses. |
Empirical estimate of TrialType-specific error variances in terms of the IITA selection index (SELIND). See that analysis here.
This is optional and can be skipped.
If you want the advantage of multi-threaded BLAS to speed up predictions within the simulations, you need an R instance that is linked to OpenBLAS (another example is Microsoft R Open). For CBSU, the recommended approach is currently to use singularity shells provided by the “rocker” project. They even come pre-installed with tidyverse :). Linked to OpenBLAS, using a simple function RhpcBLASctl::blas_set_num_threads()
I can add arguments to functions to control this feature. For optimal performance, it is import to balance the number of threads each R session uses for BLAS against any other form of parallel processing being used and considering total available system resources.
# 0) Pull a singularity image with OpenBLAS enabled R + tidyverse from rocker/
# singularity pull ~/rocker2.sif docker://rocker/tidyverse:latest;
# 1) start a screen shell
screen;
# 2) reserve interactive slurm
salloc -n 25 --mem 60G;
# 3) start the singularity Linux shell inside that
singularity shell ~/rocker2.sif;
# Project directory, so R will use as working dir.
cd /home/mw489/BreedingSchemeOpt/;
# 3) Start R
export OMP_NUM_THREADS=1;
R
# Install genomicMateSelectR to user-accessible libPath
### In a singularity shell, sintall as follows:
<-"/home/mw489/R/x86_64-pc-linux-gnu-library/4.1" # should be YOUR libPath
libPath::with_libpaths(new=libPath, devtools::install_github("wolfemd/genomicMateSelectR", ref = 'master'))
withr### Else, simply
::install_github("wolfemd/genomicMateSelectR", ref = 'master')
devtools# Install my own forked repo of AlphaSimHlpR
::with_libpaths(new=libPath, install.packages("Rcpp"))
withr::with_libpaths(new=libPath, install.packages("AlphaSimR"))
withr::with_libpaths(new=libPath, install.packages("optiSel"))
withr::with_libpaths(new=libPath, install.packages("rgl"))
withr::with_libpaths(new=libPath, devtools::install_github("wolfemd/AlphaSimHlpR", ref = 'master', force=T))
withr# devtools::install_github("wolfemd/AlphaSimHlpR", ref = 'master', force=T)
Objective: develop a hopefully faster GS simulator which uses a fixed number of clones for predictions… hoping to constrain compute requirements better than using the trainingPopCycles
parameter, without sacrificing much prediction accuracy.
For specifying a newBSP
, there are some considerations. The bsp
from previous run contains an entry bsp$checks
specifying a pop-object with randomly chosen checks from the burn-in phase. For now, I will ensure the burn-in checks are used post burn-in. Keep it simple. For that reason, newBSP
should not alter the nChks
value. If it does, might cause code to break…
Set-up my own population improvement function
Added some options to bsp
objects to control which phenotype records are used, which clones are includedin the TP, and which clones are considered selection candidates
JL’s original popImprov1Cyc
drew candidates
from the full records$F1@id
(excluding potentially indivs only scored during the current year, if useCurrentPhenoTrain=FALSE
).
My changes:
nTrainPopCycles
: draw training pop clones only from this number of recent cycles.
nYrsAsCandidates
: candidates for selection only from this number of recent years
maxTrainingPopSize
: From the lines in the most recent cycles (indicated by nTrainPopCycles
), subsample this number of lines for training data.
bsp$checks@id
) and the lines indicates as selection candidates
according to the setting of nYrsAsCandidates
.candidates
will also be included in any predictions.bsp$trainingPopCycles
, which will be unused in this pipeline, but not deleted from the package.suppressMessages(library(AlphaSimHlpR))
suppressMessages(library(tidyverse))
suppressMessages(library(genomicMateSelectR))
<- dplyr::select
select
<-read.csv(here::here("data","baselineScheme - Test.csv"),
schemeDFheader = T, stringsAsFactors = F)
source(here::here("code","runSchemesPostBurnIn.R"))
<-readRDS(here::here("output","burnIn_test.rds")) simulations
<-specifyBSP(schemeDF = schemeDF,
newBSPnChr = 3,effPopSize = 100,quickHaplo = F,
segSites = 400, nQTL = 40, nSNP = 100, genVar = 40,
gxeVar = NULL, gxyVar = 15, gxlVar = 10,gxyxlVar = 5,
meanDD = 0.5,varDD = 0.01,relAA = 0.5,
stageToGenotype = "PYT",
nParents = 10, nCrosses = 4, nProgeny = 50,nClonesToNCRP = 3,
phenoF1toStage1 = T,errVarPreStage1 = 500,
useCurrentPhenoTrain = F,
nCyclesToKeepRecords = 30,
selCritPipeAdv = selCritIID,
selCritPopImprov = selCritIID,
nTrainPopCycles=6,
nYrsAsCandidates=2,
maxTrainingPopSize=500)
<-proc.time()[3]
start<-runSchemesPostBurnIn(simulations = simulations,
postBurnInGS_testnewBSP=newBSP,
nPostBurnInCycles=10,
selCritPop="parentSelCritGEBV",
selCritPipe="selCritIID",
productFunc="productPipeline",
popImprovFunc="popImprovByParentSel",
ncores=4,
nBLASthreads=4)
saveRDS(postBurnInGS_test,file = here::here("output","postBurnInGS_test.rds"))
<-proc.time()[3]; print(paste0((end-start)/60," mins elapsed"))
end# [1] "3.8908 mins elapsed"
<-readRDS(here::here("output","postBurnInGS_test.rds"))
postBurnInGS<-readRDS(here::here("output","postBurnIn_test.rds"))
postBurnInPS
<-postBurnInGS %>%
forSimPlotmutate(PostBurnIn="GS") %>%
bind_rows(postBurnInPS %>%
mutate(PostBurnIn="PS")) %>%
unnest_wider(SimOutput) %>%
select(SimRep,PostBurnIn,records) %>%
unnest_wider(records) %>%
select(SimRep,PostBurnIn,stageOutputs) %>%
unnest() %>%
filter(stage=="F1") %>%
mutate(YearPostBurnIn=year-10)
library(patchwork)
<-forSimPlot %>%
meanGplotgroup_by(PostBurnIn,YearPostBurnIn,year,stage) %>%
summarize(meanGenMean=mean(genValMean),
seGenMean=sd(genValMean)/n()) %>%
ggplot(.,aes(x=YearPostBurnIn)) +
geom_ribbon(aes(ymin = meanGenMean - seGenMean,
ymax = meanGenMean + seGenMean,
fill = PostBurnIn),
alpha=0.75) +
geom_line(aes(y = meanGenMean, color=PostBurnIn))
<-forSimPlot %>%
sdGplotgroup_by(PostBurnIn,YearPostBurnIn,year,stage) %>%
summarize(meanGenSD=mean(genValSD),
seGenSD=sd(genValSD)/n()) %>%
ggplot(.,aes(x=YearPostBurnIn)) +
geom_ribbon(aes(ymin = meanGenSD - seGenSD,
ymax = meanGenSD + seGenSD,
fill = PostBurnIn),
alpha=0.75) +
geom_line(aes(y = meanGenSD, group=PostBurnIn))
| sdGplot) & theme_bw() & geom_vline(xintercept = 0, color='darkred') (meanGplot
# burnInSim<-simulations$burnInSim[[1]]
# selCritPop="parentSelCritGEBV";
# selCritPipe="selCritIID";
# productFunc="productPipeline";
# popImprovFunc="popImprov1Cyc";
# newBSP=burnInSim$bsp
# newBSP[["maxTrainingPopSize"]]<-500
# newBSP[["nYrsAsCandidates"]]<-2
# newBSP[["nTrainPopCycles"]]<-10
# newBSP$checks<-NULL
# runSchemesPostBurnIn<-function(simulations,
# newBSP=NULL, # so you can change the scheme entirely after burn-in
# nPostBurnInCycles,
# selCritPop="selCritIID",
# selCritPipe="selCritIID",
# productFunc="productPipeline",
# popImprovFunc="popImprovByParentSel",
# ncores=1,
# nBLASthreads=NULL,nThreadsMacs2=NULL){
#
# require(furrr); plan(multisession, workers = ncores)
# options(future.globals.maxSize=+Inf); options(future.rng.onMisuse="ignore")
#
# simulations<-simulations %>%
# mutate(SimOutput=future_map2(SimRep,burnInSim,function(SimRep,burnInSim,...){
# # debug
# # burnInSim<-simulations$burnInSim[[1]]
# if(!is.null(nBLASthreads)) { RhpcBLASctl::blas_set_num_threads(nBLASthreads) }
# cat("******", SimRep, "\n")
#
# # This CONTINUES where previous sims left off
# ## no initialize step
# ## Keep burn-in stage sim params "SP"
# SP<-burnInSim$SP
# ## specify a potentially new bsp object
# ## (keep checks stored in burn-in stage's bsp)
# if(!is.null(newBSP)){
# bsp<-newBSP; bsp$checks<-burnInSim$bsp$checks
# } else { bsp<-burnInSim$bsp }
# ## 'historical' records from burn-in
# records<-burnInSim$records
# ## override burn-in specified product and population improvement funcs
# bsp[["productPipeline"]] <- get(productFunc)
# bsp[["populationImprovement"]] <- get(popImprovFunc)
# bsp[["selCritPipeAdv"]] <- get(selCritPipe)
# bsp[["selCritPopImprov"]] <- get(selCritPop)
#
# # Post burn-in cycles
# cat("\n"); cat("Post burn-in cycles"); cat("\n")
# for (cycle in 1:nPostBurnInCycles){
# cat(cycle, " ")
# records <- bsp$productPipeline(records, bsp, SP)
# records <- bsp$populationImprovement(records, bsp, SP)
# }
#
# # Finalize the stageOutputs
# records <- AlphaSimHlpR:::lastCycStgOut(records, bsp, SP)
#
# return(list(records=records,
# bsp=bsp,
# SP=SP))
# },
# nPostBurnInCycles=nPostBurnInCycles,
# selCritPop=selCritPop,
# selCritPipe=selCritPipe,
# productFunc=productFunc,
# popImprovFunc=popImprovFunc,
# nBLASthreads=nBLASthreads,
# nThreadsMacs2=nThreadsMacs2))
# plan(sequential)
# return(simulations)
# }
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] patchwork_1.1.1 genomicMateSelectR_0.2.0 forcats_0.5.1
[4] stringr_1.4.0 purrr_0.3.4 readr_2.0.1
[7] tidyr_1.1.3 tibble_3.1.4 ggplot2_3.3.5
[10] tidyverse_1.3.1 AlphaSimHlpR_0.2.1 dplyr_1.0.7
[13] AlphaSimR_1.0.3 R6_2.5.1 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 here_1.0.1 lubridate_1.7.10 assertthat_0.2.1
[5] rprojroot_2.0.2 digest_0.6.27 utf8_1.2.2 cellranger_1.1.0
[9] backports_1.2.1 reprex_2.0.1 evaluate_0.14 highr_0.9
[13] httr_1.4.2 pillar_1.6.2 rlang_0.4.11 readxl_1.3.1
[17] rstudioapi_0.13 whisker_0.4 jquerylib_0.1.4 rmarkdown_2.10
[21] labeling_0.4.2 munsell_0.5.0 broom_0.7.9 compiler_4.1.0
[25] httpuv_1.6.2 modelr_0.1.8 xfun_0.25 pkgconfig_2.0.3
[29] htmltools_0.5.2 tidyselect_1.1.1 fansi_0.5.0 crayon_1.4.1
[33] tzdb_0.1.2 dbplyr_2.1.1 withr_2.4.2 later_1.3.0
[37] grid_4.1.0 jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.0
[41] DBI_1.1.1 git2r_0.28.0 magrittr_2.0.1 scales_1.1.1
[45] cli_3.0.1 stringi_1.7.4 farver_2.1.0 fs_1.5.0
[49] promises_1.2.0.1 xml2_1.3.2 bslib_0.2.5.1 ellipsis_0.3.2
[53] generics_0.1.0 vctrs_0.3.8 tools_4.1.0 glue_1.4.2
[57] hms_1.1.0 fastmap_1.1.0 yaml_2.2.1 colorspace_2.0-2
[61] rvest_1.0.1 knitr_1.33 haven_2.4.3 sass_0.4.0