Last updated: 2023-10-27

Checks: 7 0

Knit directory: Genomic-Selection-for-Drought-Tolerance-Using-Genome-Wide-SNPs-in-Casava/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20221020) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 02143bc. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/allchrAR08.txt

Untracked files:
    Untracked:  GETGV.RDS
    Untracked:  data/Article.docx
    Untracked:  data/Article2.docx
    Untracked:  data/Articles/
    Untracked:  output/BLUPS2.csv
    Untracked:  output/BLUPS3.csv
    Untracked:  output/BLUPS_drgBLUPs__boxplot_med.png
    Untracked:  output/BLUPS_drgBLUPs_density_med.png
    Untracked:  output/BLUPS_drgBLUPs_med.png
    Untracked:  output/BLUPS_par_mmer.Rdata
    Untracked:  output/Figuras_article.rar
    Untracked:  output/GD_matrix.csv
    Untracked:  output/GEBVS_G_BLUP_BGLR.RDS
    Untracked:  output/GEBVS_RR_BLUP.RDS
    Untracked:  output/GEBVxBLUPS.tiff
    Untracked:  output/GEBVxGETV.tiff
    Untracked:  output/GEBVxGETV2.tiff
    Untracked:  output/G_matrix.csv
    Untracked:  output/G_matrix.rds
    Untracked:  output/H2.csv
    Untracked:  output/Tabelas.xlsx
    Untracked:  output/accuracy_all_methods.tiff
    Untracked:  output/correlation_blups.tiff
    Untracked:  output/density_blups.tiff
    Untracked:  output/drgBLUP.csv
    Untracked:  output/indice_selection3.tiff
    Untracked:  output/kappa.tiff
    Untracked:  output/mean_pheno.csv
    Untracked:  output/result_mixed_model.RData
    Untracked:  output/result_mixed_model2.RData
    Untracked:  output/result_mixed_model3.RData
    Untracked:  output/result_sommer.RData
    Untracked:  output/results_cv_BayesA.csv
    Untracked:  output/results_cv_BayesB.csv
    Untracked:  output/results_cv_DOM.csv
    Untracked:  output/results_cv_GETGV.csv
    Untracked:  output/results_cv_G_BLUP.csv
    Untracked:  output/results_cv_G_BLUP2.csv
    Untracked:  output/results_cv_G_BLUP_BGRL.csv
    Untracked:  output/results_cv_G_BLUP_BLUPS2.csv
    Untracked:  output/results_cv_G_BLUP_BLUPS3.csv
    Untracked:  output/results_cv_G_BLUP_sommer.csv
    Untracked:  output/results_cv_RF.csv
    Untracked:  output/results_cv_RKHS.csv
    Untracked:  output/results_cv_RR_BLUP.csv
    Untracked:  output/results_kappa.csv
    Untracked:  output/results_kappa2.csv
    Untracked:  output/varcomp.tiff
    Untracked:  output/varcomp2.csv

Unstaged changes:
    Deleted:    data/Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize.pdf
    Deleted:    data/convet_haplo_diplo.txt
    Deleted:    output/BLUPS.RDS
    Modified:   output/BLUPS.csv
    Deleted:    output/BLUPS_Multi.csv
    Modified:   output/BLUPS_par.Rdata
    Deleted:    output/BLUPS_par_ind.Rdata
    Deleted:    output/H2_Multi.csv
    Deleted:    output/accuracy.png
    Deleted:    output/accuracy2.png
    Deleted:    output/accuracy_multi.png
    Deleted:    output/blups_med.png
    Deleted:    output/herdabilidade.csv
    Modified:   output/media_pheno.csv
    Deleted:    output/media_pheno_Multi.csv
    Deleted:    output/resultMM.Rdata
    Deleted:    output/result_G_BLUP.csv
    Modified:   output/results_cv.csv
    Modified:   output/varcomp.csv
    Deleted:    output/varcomp_multi.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/phenotype.Rmd) and HTML (docs/phenotype.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 286b492 Weverton Gomes 2023-10-27 Update Scripts and README
Rmd 90dc112 WevertonGomesCosta 2022-11-17 Update
Rmd 6cc4d23 WevertonGomesCosta 2022-11-17 Update
html 6cc4d23 WevertonGomesCosta 2022-11-17 Update
html d930880 WevertonGomesCosta 2022-11-11 Update
Rmd 5988c27 WevertonGomesCosta 2022-11-11 Update
html 5988c27 WevertonGomesCosta 2022-11-11 Update
Rmd bf7b1d3 WevertonGomesCosta 2022-11-11 Update
html bf7b1d3 WevertonGomesCosta 2022-11-11 Update

Data and libraries

Load Libraries

library(kableExtra)
library(tidyverse)
require(ComplexHeatmap)
library(data.table)
library(readxl)
library(metan)
library(DataExplorer)
library(ggthemes)
library(GGally)
library(furrr)
library(future)
plan(multisession)
theme_set(theme_bw())

Data import and manipulation

Let’s import the phenotypic dataset, excluding the traits without information and the traits Local (redundant with Year) and Treatment (only one observation).

pheno <- read_excel("data/Phenotyping.xlsx", na = "NA") %>%
  select_if( ~ !all(is.na(.))) %>%  # Deleting traits without information 
  select(-c("Local", "Tratamento"))

We will perform some manipulations to adjust our database and to facilitate the visualization of the exploratory analysis.

First, let’s convert the traits that are character into factors. Then we will convert the traits that refer to the grades to integers and then into factors. After that, let’s create the trait ANo.Bloco for nesting in the model to obtain the BLUPs.

pheno <- pheno %>%
  mutate(Clone = as.factor(Clone),
         Ano = as.factor(Ano),
         Bloco = as.factor(Bloco))   # Convert Clone, Ano,Bloco in factors

Exploratory Data Analysis

Introductory analysis of the entire dataset

introduce(pheno) %>% 
  t() %>% 
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
rows 2336
columns 27
discrete_columns 3
continuous_columns 24
all_missing_columns 0
total_missing_values 16875
complete_rows 440
total_observations 63072
memory_usage 516592

We don’t have any columns that have all of the missing observations, but we do have a lot of missing values in every dataset. Some manipulations should be performed to improve the quality of the data.

Year Analysis

Let’s produce a heatmap to check the clone amount each year. I’m going to create another dataset with the Year and Clone count. Then I will create the objects corresponding to the clones and years array. Finally, I created the matrix that represents the presence and absence of the clone in the year.

pheno2 <- pheno %>%
  count(Ano, Clone)

genmat <- model.matrix(~ -1 + Clone, data = pheno2)
envmat <- model.matrix(~ -1 + Ano, data = pheno2)
genenvmat <- t(envmat) %*% genmat
genenvmat_ch <- ifelse(genenvmat == 1, "Present", "Abscent")

Heatmap(
  genenvmat_ch,
  col = c("white", "tomato"),
  show_column_names = F,
  heatmap_legend_param = list(title = ""),
  column_title = "Genotypes",
  row_title = "Environments"
)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11
rm(pheno2, genmat, envmat, genenvmat, genenvmat_ch)

From the heatmap, it is clear that the year 2016 has very few observations. So, we must eliminate it.

pheno <- pheno %>% 
  filter(Ano != 2016) %>% 
  droplevels()

Just for reference, let’s re-view the clone heatmap by year.

pheno2<- pheno %>% 
  count(Ano, Clone)
  
genmat = model.matrix( ~ -1 + Clone, data = pheno2)
envmat = model.matrix( ~ -1 + Ano, data = pheno2)
genenvmat = t(envmat) %*% genmat
genenvmat_ch = ifelse(genenvmat == 1, "Present", "Abscent")

Heatmap(
  genenvmat_ch,
  col = c("white", "tomato"),
  show_column_names = F,
  heatmap_legend_param = list(title = ""),
  column_title = "Genotypes",
  row_title = "Environments"
)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11
rm(pheno2, genmat, envmat, genenvmat, genenvmat_ch)

We can check how many clones we have in common between the years and also note that the years differ in the number of clones evaluated:

pheno2<- pheno %>% 
  count(Ano, Clone)
  
genmat = model.matrix( ~ -1 + Clone, data = pheno2)
envmat = model.matrix( ~ -1 + Ano, data = pheno2)
genenvmat = t(envmat) %*% genmat

genenvmat %*% t(genenvmat) %>%
  kbl(escape = F,
      align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Ano2017 Ano2018 Ano2019 Ano2020
Ano2017 165 42 22 14
Ano2018 42 138 39 16
Ano2019 22 39 133 29
Ano2020 14 16 29 138
rm(pheno2, genmat, envmat, genenvmat)

The year 2020 has a lower number of clones in common, however, we will keep it for the analysis.

Here, it is possible to observe that our dataset has clones that were evaluated in just one year. Let’s visualize this, to see how many clones were evaluated according to the number of years.

pheno %>%
  count(Ano, Clone) %>% 
  count(Clone) %>%
  count(n) %>%
  kbl(
    escape = F,
    align = 'c',
    col.names = c("N of Environment", "N of genotypes")
  ) %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Storing counts in `nn`, as `n` already present in input
ℹ Use `name = "new_name"` to pick a new name.
N of Environment N of genotypes
1 350
2 72
3 20
4 5

Only 5 clones were evaluated in all years, this will possibly decrease our model accuracy.

Another factor that reduces the accuracy, and therefore adopting mixed models via REML in the analysis is the most suitable for obtaining BLUPs.

Analysis of traits

Now, we will analyze the frequency for each discrete feature.

plot_bar(pheno, ncol = 4)

Mite Incidence and Flowering have little information for some levels and many NA’s, we will also exclude these traits.

pheno <- pheno  %>% 
  select(-c(Incidence_Mites, Flowering))

Let’s just look at the missing values now, to check the proportions.

plot_missing(pheno)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11

We have a high missing value ratio for Vigor, Leaf_Lenght, Canopy_Width and Canopy_Lenght, I’ll exclude those too.

pheno <- pheno %>% 
  select(-c(Vigor, Leaf_Lenght, Canopy_Width, Canopy_Lenght))

Let’s check the distribution of traits by year now and let’s look at the histograms of the quantitative traits:

plot_histogram(pheno, ncol = 5)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11

For Branching, Leaf.Ret, Mite, PltArc, Stand6MAP and Staygreen don’t have normal distribution. To get the BLUPs we will have to remove that traits from the database.

pheno <- pheno %>% 
  select(-c(Branching, Leaf.Ret, Mite, PltArc, Stand6MAP, Staygreen))

Analisys of Clone

First, let’s check the amount of missing values for each clone by year. We are filtering the Clones with the average bigger than 2 missing values by year.

pheno2 <- pheno %>% 
  select(-Bloco) %>% 
  group_by(Clone, Ano) %>% 
  summarise_all(.funs = list(~sum(is.na(.)))) %>% 
  ungroup() %>% 
  select_numeric_cols() %>%
  mutate(mean = rowMeans(.),
         Clone.Ano = unique(interaction(pheno$Clone, pheno$Ano))) %>%
  filter(mean > 2) %>%
  droplevels()

nlevels(pheno2$Clone.Ano) %>% 
  kbl(
    escape = F,
    align = 'c',
    col.names = c("N of genotypes")
  ) %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
N of genotypes
54
rm(pheno2)

54 clones presented many missing values, i.e., they were evaluated in less than two blocks by year. Therefore, they should be excluded from future analyses, according to the year.

Let’s evaluate the descriptive statistics of the combination between clone and year for the traits.

ge_details(pheno, Ano, Clone, resp = everything()) %>% 
  t() %>% 
kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Parameters Mean SE SD CV Min Max MinENV MaxENV MinGEN MaxGEN
N_Roots 4.29 0.06 2.51 58.56 0.12 (BGM-1031 in 2017) 15.67 (2012-107-002 in 2019) 2018 (1.62) 2019 (5.76) BGM-0411 (0.33) 2012-107-002 (11.33)
FRY 4.95 0.09 4.04 81.79 0.12 (BGM-0886 in 2017) 22.2 (BGM-1267 in 2018) 2017 (2.75) 2020 (6.52) BGM-1488 (0.34) IAC-14 (14.07)
ShY 14.23 0.22 10.16 71.45 0.69 (BGM-0996 in 2017) 61.17 (BGM-2124 in 2020) 2017 (8.47) 2020 (25.87) BGM-0048 (1.52) BGM-2124 (54.33)
DMC 29.06 0.17 6.1 21 11.98 (BGM-0626 in 2020) 48.34 (BGM-1015 in 2020) 2020 (26.21) 2018 (35.38) BGM-0626 (14.98) BGM-1015 (45.02)
StY 1.52 0.04 1.27 84.01 0.02 (BGM-0340 in 2019) 8.87 (BGM-0396 in 2018) 2019 (1.32) 2018 (1.84) BGM-0089 (0.06) BGM-1023 (4.76)
Plant.Height 1.19 0.01 0.33 27.43 0.36 (BGM-0426 in 2020) 3.03 (BR-11-24-156 in 2020) 2017 (1) 2019 (1.48) Jatobá (0.58) BGM-1200 (1.91)
HI 24.56 0.27 11.89 48.42 1.57 (BGM-1159 in 2017) 71.97 (BGM-1315 in 2018) 2020 (18.61) 2018 (32.36) BGM-0961 (2.5) Mata_Fome_Branca (52.78)
StC 24.42 0.17 6.11 25.05 7.33 (BGM-0626 in 2020) 43.69 (BGM-1015 in 2020) 2020 (21.56) 2018 (30.78) BGM-0626 (10.33) BGM-1015 (40.37)
Root.Le 23.21 0.13 5.8 24.99 7 (BGM-1574 in 2017) 47.33 (BGM-0396 in 2018) 2017 (19.72) 2019 (27.14) Jatobá (8.5) BGM-1956 (35.5)
Root.Di 28.88 0.18 7.9 27.35 6.12 (BGM-2142 in 2019) 63.3 (BRS Mulatinha in 2018) 2017 (24.49) 2018 (34.74) BGM-0089 (12.14) BGM-1956 (48.91)
Stem.D 2.11 0.01 0.38 17.9 1.01 (BGM-0592 in 2018) 4.37 (BRS Tapioqueira in 2020) 2018 (2.02) 2017 (2.16) BGM-0048 (1.25) BGM-1523 (2.93)
Nstem.Plant 2.13 0.03 0.95 44.53 1 (BGM-0036 in 2018) 6.67 (BGM-0714 in 2019) 2018 (1.44) 2019 (2.71) BGM-0066 (1) BGM-0451 (4.44)

Apparently we no longer have a genotype that could harm our analysis. Now we must evaluate the clone-only descriptive statistics for the traits. Some traits presented hight cv, StY and FRY, but we will go continue.

desc_stat(pheno, by = Ano) %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Ano variable cv max mean median min sd.amo se ci.t n.valid
2017 DMC NA -Inf NaN NA Inf 0.0000 NA NaN 0
2017 FRY 66.8379 9.6700 2.7517 2.4310 0.1160 1.8392 0.0743 0.1459 613
2017 HI 48.2942 66.6827 23.5691 22.7255 1.5744 11.3825 0.4612 0.9058 609
2017 N_Roots 48.8708 12.0000 4.3530 4.3330 0.1250 2.1273 0.0860 0.1689 612
2017 Nstem.Plant NA -Inf NaN NA Inf 0.0000 NA NaN 0
2017 Plant.Height 21.0398 1.6767 1.0023 1.0000 0.4500 0.2109 0.0084 0.0165 632
2017 Root.Di 21.0039 39.6900 24.4889 24.4733 8.9867 5.1436 0.2077 0.4080 613
2017 Root.Le 21.5823 31.6667 19.7236 19.6667 7.0000 4.2568 0.1719 0.3376 613
2017 ShY 45.4973 22.2220 8.4677 8.0750 0.6940 3.8526 0.1542 0.3029 624
2017 StC NA -Inf NaN NA Inf 0.0000 NA NaN 0
2017 StY NA -Inf NaN NA Inf 0.0000 NA NaN 0
2017 Stem.D 16.7774 3.3333 2.1581 2.1667 1.2000 0.3621 0.0144 0.0284 629
2018 DMC 13.6086 46.0784 35.3819 36.0000 18.1818 4.8150 0.2785 0.5480 299
2018 FRY 86.0073 22.2000 5.6640 4.2500 0.3330 4.8714 0.2776 0.5462 308
2018 HI 41.5442 71.9673 32.3554 32.6094 5.8772 13.4418 0.7672 1.5096 307
2018 N_Roots 63.2913 6.2500 1.6249 1.4085 0.1900 1.0284 0.0588 0.1157 306
2018 Nstem.Plant 33.3849 3.3333 1.4397 1.3333 1.0000 0.4806 0.0248 0.0487 376
2018 Plant.Height 28.0409 1.9967 1.0953 1.0600 0.4867 0.3071 0.0159 0.0313 373
2018 Root.Di 23.3943 63.3033 34.7356 34.4400 14.1300 8.1262 0.4707 0.9264 298
2018 Root.Le 27.1625 47.3333 23.6102 23.0000 9.5000 6.4131 0.3715 0.7311 298
2018 ShY 69.0625 31.6000 8.7610 7.2000 1.0000 6.0506 0.3022 0.5940 401
2018 StC 15.6937 41.4284 30.7820 31.3500 13.5318 4.8309 0.2794 0.5498 299
2018 StY 91.8742 8.8669 1.8356 1.3539 0.0839 1.6864 0.0984 0.1936 294
2018 Stem.D 22.3941 3.7317 2.0158 1.9785 1.0133 0.4514 0.0234 0.0460 372
2019 DMC 13.8212 40.5941 28.2826 28.2828 15.3846 3.9090 0.1745 0.3428 502
2019 FRY 70.0326 22.2000 5.5039 4.6670 0.1430 3.8545 0.1712 0.3363 507
2019 HI 41.8572 60.1626 27.2545 27.3973 1.7258 11.4080 0.5066 0.9954 507
2019 N_Roots 47.2357 15.6670 5.7620 5.6670 0.3330 2.7217 0.1187 0.2331 526
2019 Nstem.Plant 37.3740 6.6667 2.7113 2.6667 1.0000 1.0133 0.0441 0.0867 527
2019 Plant.Height 19.1125 2.4233 1.4843 1.4633 0.7400 0.2837 0.0123 0.0243 528
2019 Root.Di 24.3155 58.9267 33.4940 33.6500 6.1200 8.1442 0.3596 0.7064 513
2019 Root.Le 21.1089 45.6667 27.1361 27.0000 11.0000 5.7281 0.2529 0.4969 513
2019 ShY 47.4901 47.5710 13.4607 12.7735 1.2860 6.3925 0.2782 0.5465 528
2019 StC 16.5407 35.9441 23.6326 23.6328 10.7346 3.9090 0.1745 0.3428 502
2019 StY 72.3243 5.4021 1.3238 1.1095 0.0154 0.9574 0.0428 0.0841 500
2019 Stem.D 15.7037 3.3203 2.1061 2.0963 1.1197 0.3307 0.0144 0.0283 528
2020 DMC 22.5059 48.3360 26.2094 26.0400 11.9760 5.8986 0.2569 0.5048 527
2020 FRY 68.4455 20.5670 6.5246 5.9000 0.4000 4.4658 0.1934 0.3800 533
2020 HI 43.3204 45.9340 18.6129 18.0486 1.9608 8.0632 0.3496 0.6867 532
2020 N_Roots 47.0839 10.3330 4.3060 4.3330 0.3330 2.0274 0.0881 0.1730 530
2020 Nstem.Plant 37.1507 4.6667 2.0463 2.0000 1.0000 0.7602 0.0326 0.0641 543
2020 Plant.Height 24.0927 3.0333 1.1945 1.1600 0.3600 0.2878 0.0124 0.0243 543
2020 Root.Di 20.5858 43.2733 26.2101 26.1800 11.8533 5.3956 0.2337 0.4591 533
2020 Root.Le 18.9730 36.6667 23.2345 23.3333 9.0000 4.4083 0.1909 0.3751 533
2020 ShY 42.7475 61.1670 25.8653 25.9165 3.3330 11.0568 0.4794 0.9417 532
2020 StC 27.3600 43.6860 21.5594 21.3900 7.3260 5.8986 0.2569 0.5048 527
2020 StY 81.2220 6.0724 1.5211 1.2326 0.0483 1.2354 0.0540 0.1060 524
2020 Stem.D 17.5042 4.3733 2.1321 2.1207 1.0547 0.3732 0.0160 0.0315 543

Again, some traits were not computed for the year 2017, so we have to eliminate that year when performing the analysis for these traits.

What draws attention in this table are the high cv for some traits, especially: HI, Nstem.Plant, N_Roots, ShY, StY and FRY.

This may be due to the presence of outliers, let’s inspect the entire dataset to assess whether there are outliers:

inspect(pheno %>%
          select_if( ~ !is.factor(.)), verbose = FALSE, plot =T) %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Variable Class Missing Levels Valid_n Min Median Max Outlier Text
N_Roots numeric Yes
1974 0.12 4.00 15.67 16 NA
FRY numeric Yes
1961 0.12 3.75 22.20 71 NA
ShY numeric Yes
2085 0.69 11.17 61.17 89 NA
DMC numeric Yes
1328 11.98 28.80 48.34 6 NA
StY numeric Yes
1318 0.02 1.21 8.87 49 NA
Plant.Height numeric Yes
2076 0.36 1.16 3.03 23 NA
HI numeric Yes
1955 1.57 23.35 71.97 15 NA
StC numeric Yes
1328 7.33 24.15 43.69 6 NA
Root.Le numeric Yes
1957 7.00 23.00 47.33 19 NA
Root.Di numeric Yes
1957 6.12 28.17 63.30 32 NA
Stem.D numeric Yes
2072 1.01 2.10 4.37 22 NA
Nstem.Plant numeric Yes
1446 1.00 2.00 6.67 30 NA

Confirming what was described before, most traits with high cv have many outliers and therefore we will exclude them in the loop to obtain the blups.

General Inspection

Now let’s just perform a general inspection of the data to finish the manipulations.

corr_plot(pheno, col.by = Ano)

StC with DMC and StY with FRY show high correlation.

Furthermore most of the traits apparently show normal distribution of phenotypic data. So let’s save the clean data and move on to getting the blups.

write.csv(pheno, "data/pheno_clean.csv", row.names = F, quote = F)

pheno<-read.csv("data/pheno_clean.csv") %>% 
  mutate(Clone = as.factor(Clone),
         Ano = as.factor(Ano),
         Bloco = as.factor(Bloco))

Genotype-environment analysis by mixed-effect models

First, I’m going to create a function to get the blups and some parameters from our model.

BLUPS_par_mmer <- function(model, trait) {
  # log likelihood of the model, AIC, convergence T/F
  modelfit <- summary(model)$logo
  # number of groups for factors, could be used to compute DF
  groups <- summary(model)$groups
  # variance components
  varcomp <- summary(model)$varcomp
  
  # variance components
  Vg <- model$sigma$Clone %>% as.vector()
  Ve <- model$sigma$units %>% as.vector()
  
  # Narrow-sense heritability
  H2_narrow <- Vg / (Vg + Ve)
  
  # Broad-sense heritability
  Va <- model$sigma$`Ano:Clone` %>% as.vector()
  Ve <- model$sigma$units %>% as.vector()
  n.ano <- length(model$U$Ano[[trait]])
  H2_Broad<- (Vg) / (Vg + (Va/n.ano) + (Ve/(2*n.ano)))  
  
  # Extract the BLUPs and PEVs, compute Reliability (REL),
  # de-regressed BLUPs and weights for downstream analysis
  blups <- model$U$Clone %>%
    unlist() %>%
    tibble(Clone = names(.), BLUP = .) %>%
    mutate(
      Clone = gsub("Clone", "", Clone),
      PEV = diag(as.matrix(model$PevU$Clone[[1]])),
      # prediction error variance
      REL = 1 - PEV / Vg,
      # Reliability
      drgBLUP = BLUP / REL,
      # De-regressed BLUP
      WT = (1 - H2_narrow) / ((0.1 + (1 - REL) / REL) * H2_narrow) # weight for downstream
    )
  
  # Combine all outputs into one object the function can return()
  out <- list(
    Trait = trait,
    H2_Broad = H2_Broad,
    H2_narrow=H2_narrow,
    modelfit = list(modelfit),
    groups = list(groups),
    blups = list(blups),
    varcomp = list(varcomp)
  )
  
  return(out)
}

save(BLUPS_par_mmer, file = "output/BLUPS_par_mmer.Rdata")

The BLUP model

Here we have to remember that we have outliers for some traits and also that we must exclude the year 2017 for some.

traits <- colnames(pheno)[4:ncol(pheno)]
pheno <- pheno %>%
  mutate_at(traits, as.numeric)

Now let’s perform the mixed model analysis to get the BLUPs.

load("output/BLUPS_par_mmer.Rdata")

resultMM <- future_map(traits, function(i) {
      data <- pheno %>%
        select(1:3,all_of(i)) %>% 
        na.omit() %>% 
        droplevels()
    
    model <- sommer::mmer(fixed = as.formula(paste(i, "~", "Ano:Bloco")),
      random = ~ Clone + Ano + Ano:Clone,
      rcov= ~ units,
      data = data,
      verbose = FALSE) # Clone:Ano and Ano.Bloco are random and Clone and Ano is fixed
    
    result <- BLUPS_par_mmer(model, i)
  }
)

names(resultMM) <- traits
save(resultMM, file = "output/result_sommer.RData")

BLUPS for Clone

First, I will add the average of the traits with the BLUPs for better interpretation.

mean_pheno <- as.data.frame(pheno %>%
                               summarise_if(is.numeric, mean, na.rm = TRUE))

write.csv(mean_pheno, "output/mean_pheno.csv", row.names = F)

As I used “future_map” to run each stage 1 analysis in parallel, each trait is in a separate element of a list. We need to process the resultMM object into a data.frame or matrix for further analysis.

BLUPS <- bind_rows(lapply(traits, function(i) {
  data.frame(
    trait = i,
    Clone = str_split_i(resultMM[[i]][["blups"]][[1]][["Clone"]], pattern = "[.]", -1),
    resultMM[[i]][["blups"]][[1]][-1],
    BLUPS_mean = resultMM[[i]][["blups"]][[1]][["BLUP"]] + mean_pheno[, i],
    drgBLUP_mean = resultMM[[i]][["blups"]][[1]][["drgBLUP"]] + mean_pheno[, i]
  )
}))

H2 <- bind_rows(lapply(traits, function(i) {
  data.frame(trait = i,
             H2_Broad =  resultMM[[i]][["H2_Broad"]],
             H2_narrow =  resultMM[[i]][["H2_narrow"]])
}))

varcomp <- bind_rows(lapply(traits, function(i) {
  data.frame(
    trait = i,
    grp = c("Clone", "Ano", "Ano:Clone", "units"),
    resultMM[[i]][["varcomp"]][[1]]
  )
}))

Saving the results of BLUPs and parameters

write.csv(BLUPS,
          "output/BLUPS.csv",
          row.names = F,
          quote = F)

write.csv(H2,
          "output/H2.csv",
          row.names = F,
          quote = F)

write.csv(varcomp,
          "output/varcomp.csv",
          row.names = F,
          quote = F)

Create a figure about de variance components

varcomp |> 
  ggplot(aes(x = trait, y =  VarComp, fill = fct_reorder(grp, VarComp), by = trait)) +
  geom_col(position = "fill")+ 
  labs(y="Variance Component (%)",
       x="Traits",
       fill = "Var. Comp.")+
  scale_fill_gdocs()+
  theme_minimal()+
  theme(text = element_text(size = 25, face = "bold"),
    axis.text.x = element_text(
      size = 20,
      angle = 45,
      hjust = 1,
      vjust = 1
    )) 

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11
ggsave("output/varcomp.tiff", width = 16, height = 8)

Ploting BLUPS for all traits

Let’s plot the density and boxplots figures of the traits.

BLUPS %>%
  pivot_longer(8:9, names_to = "Method", values_to = "Values") %>%
  ggplot(aes(x = Values, fill = Method)) +
  geom_boxplot(alpha = 0.8) +
  facet_wrap(trait~., ncol = 4, scales = "free") +
  theme_bw() + 
  theme(text = element_text(size = 15)) +
  scale_fill_gdocs() +
  scale_color_gdocs() +
  labs(x = "", y = "")

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11
ggsave("output/BLUPS_drgBLUPs__boxplot_med.png", width = 16, height = 10, dpi =300)

Here we will only evaluate the distribution of BLUPs without the mean.

BLUPS %>%
  pivot_longer(8:9, names_to = "Method", values_to = "Values") %>%
  ggplot(aes(x = Values, fill = Method, color = Method)) +
  geom_density(alpha = 0.8) +
  facet_wrap(trait~., ncol = 4, scales = "free") +
  theme_bw() + 
  theme(text = element_text(size = 15)) +
  scale_fill_gdocs() +
  scale_color_gdocs() +
  labs(x = "", y = "")

Version Author Date
6cc4d23 WevertonGomesCosta 2022-11-17
bf7b1d3 WevertonGomesCosta 2022-11-11
ggsave("output/BLUPS_drgBLUPs_density_med.png", width = 16, height = 10, dpi =300)

Apparently most BLUPs for the traits follow normal distribution and can be applied to GWS by conventional methods.

Then, now we go to execute the Genomic Wide Selection script: GWS.Rmd


sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8   
[3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.utf8    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] furrr_0.3.1           future_1.33.0         GGally_2.1.2         
 [4] ggthemes_4.2.4        DataExplorer_0.8.2    metan_1.18.0         
 [7] readxl_1.4.3          data.table_1.14.8     ComplexHeatmap_2.14.0
[10] lubridate_1.9.2       forcats_1.0.0         stringr_1.5.0        
[13] dplyr_1.1.2           purrr_1.0.2           readr_2.1.4          
[16] tidyr_1.3.0           tibble_3.2.1          ggplot2_3.4.3        
[19] tidyverse_2.0.0       kableExtra_1.3.4     

loaded via a namespace (and not attached):
  [1] minqa_1.2.6         colorspace_2.1-0    rjson_0.2.21       
  [4] rprojroot_2.0.3     circlize_0.4.15     GlobalOptions_0.1.2
  [7] fs_1.6.3            clue_0.3-65         rstudioapi_0.15.0  
 [10] listenv_0.9.0       farver_2.1.1        ggrepel_0.9.3      
 [13] fansi_1.0.4         mathjaxr_1.6-0      xml2_1.3.5         
 [16] codetools_0.2-19    splines_4.2.3       doParallel_1.0.17  
 [19] cachem_1.0.8        knitr_1.44          polyclip_1.10-4    
 [22] jsonlite_1.8.7      workflowr_1.7.1     nloptr_2.0.3       
 [25] cluster_2.1.4       png_0.1-8           ggforce_0.4.1      
 [28] compiler_4.2.3      httr_1.4.7          Matrix_1.6-1       
 [31] fastmap_1.1.1       cli_3.6.1           later_1.3.1        
 [34] tweenr_2.0.2        htmltools_0.5.6     tools_4.2.3        
 [37] lmerTest_3.1-3      igraph_1.5.1        gtable_0.3.4       
 [40] glue_1.6.2          Rcpp_1.0.11         cellranger_1.1.0   
 [43] jquerylib_0.1.4     vctrs_0.6.3         svglite_2.1.1      
 [46] nlme_3.1-163        iterators_1.0.14    xfun_0.40          
 [49] globals_0.16.2      networkD3_0.4       lme4_1.1-34        
 [52] rvest_1.0.3         timechange_0.2.0    lifecycle_1.0.3    
 [55] MASS_7.3-58.2       scales_1.2.1        ragg_1.2.6         
 [58] hms_1.1.3           promises_1.2.1      parallel_4.2.3     
 [61] RColorBrewer_1.1-3  yaml_2.3.7          gridExtra_2.3      
 [64] sass_0.4.7          reshape_0.8.9       stringi_1.7.12     
 [67] S4Vectors_0.36.2    foreach_1.5.2       BiocGenerics_0.44.0
 [70] boot_1.3-28.1       shape_1.4.6         rlang_1.1.1        
 [73] pkgconfig_2.0.3     systemfonts_1.0.4   matrixStats_1.0.0  
 [76] evaluate_0.22       lattice_0.21-8      labeling_0.4.3     
 [79] patchwork_1.1.3     htmlwidgets_1.6.2   tidyselect_1.2.0   
 [82] parallelly_1.36.0   plyr_1.8.8          magrittr_2.0.3     
 [85] R6_2.5.1            IRanges_2.32.0      generics_0.1.3     
 [88] mgcv_1.9-0          pillar_1.9.0        whisker_0.4.1      
 [91] withr_2.5.1         crayon_1.5.2        utf8_1.2.3         
 [94] tzdb_0.4.0          rmarkdown_2.25      GetoptLong_1.0.5   
 [97] git2r_0.32.0        digest_0.6.33       webshot_0.5.5      
[100] httpuv_1.6.11       numDeriv_2016.8-1.1 textshaping_0.3.7  
[103] stats4_4.2.3        munsell_0.5.0       viridisLite_0.4.2  
[106] bslib_0.5.1        

  1. Weverton Gomes da Costa, Pós-Doutorando, Embrapa Mandioca e Fruticultura, ↩︎