gws-multi-enviroment

Last updated: 2022-11-17

Checks: 6 1

Knit directory: Genomic-Selection-for-Drought-Tolerance-Using-Genome-Wide-SNPs-in-Casava/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20221020)

The command set.seed(20221020) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 0f3154e

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 0f3154e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    data/allchrAR08.txt

Untracked files:
    Untracked:  analysis/figure/
    Untracked:  analysis/gws-multi-enviroment.Rmd
    Untracked:  analysis/phenotype_multi.Rmd
    Untracked:  data/genomic selection.pptx
    Untracked:  data/pheno_clean.csv
    Untracked:  output/BLUPS_Multi.csv
    Untracked:  output/BLUPS_par_ind.Rdata
    Untracked:  output/accuracy_multi.png
    Untracked:  output/media_pheno_Multi.csv
    Untracked:  output/result_G_BLUP.csv
    Untracked:  output/varcomp.csv

Unstaged changes:
    Modified:   analysis/GWS.Rmd
    Modified:   analysis/_site.yml
    Modified:   analysis/phenotype.Rmd
    Modified:   output/BLUPS_par.Rdata
    Modified:   output/resultMM.Rdata
    Modified:   output/results_cv.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

To perform the analyses, we will need the following packages:

library(readxl)
require(tidyverse)
library(kableExtra)
library(janitor)
library(genomicMateSelectR)
library(AGHmatrix)
require(ComplexHeatmap)
require(rrBLUP)
library(ggpmisc)
library(cvTools)
theme_set(theme_bw())

Data

The data set is based in genotypes evalueted in five years (2016 to 2020), each year was considered as environment.

Names marker data

Primeiro vamos buscar os ID dos marcadores para cada clone.

names <-
  read_excel("data/Phenotyping.xlsx", sheet = "GBS") %>%
  rename(Clone = `Names trials Petrolina`,
         ID_Clone = `Nome GBS`) %>%
  mutate(ID_Clone = str_replace_all(ID_Clone, ":", ".")) %>%
  select(Clone, ID_Clone)

names %>%
  head() %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )

Clone	ID_Clone
4271	4271.250494233
9624-09	962409.250370255
Aipim Abacate	ERETA.250437577
Alagoana	Alagoana363.250437472
BGM-0004	CNPMF4.250370278
BGM-0019	CNPMF19.250370327

Phenotypic data

Agora vamos agrupar os IDs dos marcadores com os nomes dos clones.

pheno <- read.csv("output/BLUPS_Multi.csv") %>%
  inner_join(names) %>% # Join Phenotypic data with Genotypics names (ID_Clones) of the Clones
  mutate(Clone = as.factor(Clone),
         ID_Clone = as.factor(ID_Clone),
         Ano = as.factor(Ano))

Joining, by = "Clone"

pheno %>%
  head() %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )

Clone	Ano	NR.P	PTR	PPA	AP	HI	RF	CR	DR	DCaule	Ácaro	MS	PROD.AMD	AMD	Porte	Nº.Hastes	Stand_Final	Branching_Level	Staygreen	ID_Clone
Alagoana	2017	-0.6795277	0.1656796	0.3485062	-0.0284596	0.5013653	-0.0667096	0.4854822	2.9140433	0.0044243	-0.1333563	NA	NA	NA	NA	NA	NA	NA	NA	Alagoana363.250437472
BGM-0032	2017	1.4149888	0.3046166	0.8963395	0.1214955	0.4290333	0.4001097	4.5063002	-0.6829637	0.4800358	0.2080576	NA	NA	NA	NA	NA	NA	NA	NA	CNPMF32.250370305
BGM-0036	2017	-1.3540323	-0.8137316	8.0983401	0.2122890	-11.3135067	0.0967386	2.1977370	-0.5302431	0.2737945	-0.4231997	NA	NA	NA	NA	NA	NA	NA	NA	CNPMF36.250370328
BGM-0046	2017	-0.3322988	2.4625915	4.9096154	0.1085703	4.3032975	0.0967386	3.3122158	3.0876838	0.1252696	0.0502433	NA	NA	NA	NA	NA	NA	NA	NA	CNPMF46.250370283
BGM-0056	2017	1.9146996	2.3963149	0.4202654	0.1178382	12.1574116	-0.1420580	3.7627115	3.3581355	0.3203126	0.0733478	NA	NA	NA	NA	NA	NA	NA	NA	CNPMF56.250370248
BGM-0065	2017	0.7642085	0.1967549	3.6785055	0.0919147	-3.9200178	-0.2301578	1.5136325	3.1620003	0.0105003	-0.2696599	NA	NA	NA	NA	NA	NA	NA	NA	CNPMF65.250370285

Genotypic data

Com o filtro de MAF a 1% restaram 22779 marcadores. Vou carregar a matriz genotipica agora.

geno <- readRDS("data/SNPs.rds")

geno[1:5,1:5]

                      S1_84637 S1_84843 S1_126260 S1_126261 S1_126264
ERETA.250437577              2        2         2         2         2
Alagoana363.250437472        2        2         2         2         2
CNPMF4.250370278             2        2         2         2         2
CNPMF19.250370327            1        2         2         2         2
CNPMF30.250370293            2        1         2         2         2

Building the G matrix

Again, we will use the AGHmatrix package [@amadeu_aghmatrix_2016] to build the G matrix:

G_matrix <- Gmatrix(geno,
                    method = "VanRaden",
                    ploidy = 2,
                    missingValue = NA)

Initial data: 
    Number of Individuals: 414 
    Number of Markers: 22779 

Missing data check: 
    Total SNPs: 22779 
     0 SNPs dropped due to missing data threshold of 1 
    Total of: 22779  SNPs 
MAF check: 
    No SNPs with MAF below 0 
Monomorphic check: 
    No monomorphic SNPs 
Summary check: 
    Initial:  22779 SNPs 
    Final:  22779  SNPs ( 0  SNPs removed) 
 
Completed! Time = 28.46  seconds

Now we have the whole G matrix (414 x 414), which we can represent using a heatmap:

Heatmap(
  G_matrix,
  show_row_names = F,
  show_column_names = F,
  heatmap_legend_param = list(title = "Res")
)

“Res” in the heatmap legend title is for “Resemblance”.

Genomic selection

1. Restraining the genotype means - only adj. means and genotyped:

For this purpose, we will use only individuals with BLUps and SNPs available.

pheno <- pheno %>%
  filter(ID_Clone %in% rownames(geno)) %>%
  droplevels()

2. Building the genotype and environment design matrices:

envmat = model.matrix( ~ -1 + Ano, data = pheno)

envmat[1:5, 1:4] %>% kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  ) %>% footnote("Dimension: 534 $\\times$ 4", general_title = "")

Ano2017	Ano2018	Ano2019	Ano2020
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
Dimension: 534 \(\times\) 4

genmat = model.matrix( ~ -1 + Clone, data = pheno)

genmat[1:5, 1:5] %>% kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  ) %>% footnote("Dimension: 534 $\\times$ 414", general_title = "")

CloneAipim Abacate	CloneAlagoana	CloneBGM-0004	CloneBGM-0019	CloneBGM-0030
0	1	0	0	0
0	0	0	0	0
0	0	0	0	0
0	0	0	0	0
0	0	0	0	0
Dimension: 534 \(\times\) 414

3. Building the environmental and genetic covariance matrices:

G = tcrossprod(tcrossprod(genmat, G_matrix), genmat)

G[1:5, 1:5] %>% kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  ) %>% footnote("Dimension: 534 $\\times$ 534", general_title = "")

1	2	3	4	5
1.0404654	-0.0759830	0.0667313	0.0020280	0.0188232
-0.0759830	1.2906265	-0.0615798	-0.0413384	-0.0153453
0.0667313	-0.0615798	0.9818020	0.0750449	0.1472075
0.0020280	-0.0413384	0.0750449	0.9779189	0.0018877
0.0188232	-0.0153453	0.1472075	0.0018877	1.0308067
Dimension: 534 \(\times\) 534

E = tcrossprod(envmat)

E[1:5, 1:5] %>% kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  ) %>% footnote("Dimension: 534 $\\times$ 534", general_title = "")

1	2	3	4	5
1	1	1	1	1
1	1	1	1	1
1	1	1	1	1
1	1	1	1	1
1	1	1	1	1
Dimension: 534 \(\times\) 534

4. Building the interaction matrix:

GE = G * E

GE[1:5, 1:5] %>% kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  ) %>% footnote("Dimension: 534 $\\times$ 534", general_title = "")

1	2	3	4	5
1.0404654	-0.0759830	0.0667313	0.0020280	0.0188232
-0.0759830	1.2906265	-0.0615798	-0.0413384	-0.0153453
0.0667313	-0.0615798	0.9818020	0.0750449	0.1472075
0.0020280	-0.0413384	0.0750449	0.9779189	0.0018877
0.0188232	-0.0153453	0.1472075	0.0018877	1.0308067
Dimension: 534 \(\times\) 534

GBLUP

In the GBLUP, we will use the G matrix instead of the SNPs matrix. Thus, we will obtain the GEBV directly. Note that we will need to build the G matrix again, since some genotypes were dropped after our filtering. The rrBLUP package has a function called “A.mat” that build the Additive Genomic Matrix from a SNP matrix with “-1,0,1” codification:

year <- levels(pheno$Ano)

traits <- colnames(pheno[3:20])

result <- data.frame()
result_G_BLUP <- data.frame()

for (i in traits) {
  
  for (j in year) {

    data <- pheno %>%
      filter(Ano == j) %>%
      droplevels()
    
    if (is.na(mean(data[[i]], na.rm = TRUE))) {
      next
    }
    
    index = rownames(geno) %in% data$ID_Clone
    data_gen = geno[index,]
    
    
    GBLUP <- mixed.solve(data[[i]], K = A.mat(data_gen - 1))
    if (dim(result)[1] == 0) {
      result <- data.frame(
        ID_Clone = rownames(GBLUP$u),
        Ano = j,
        stringsAsFactors = F
      )
      result[, i] <- data.frame(GBLUP$u)
    } else {
      
      result1 <- data.frame(
        ID_Clone = rownames(GBLUP$u),
        Ano = j,
        stringsAsFactors = F
      )
      
      result1[, i] <- data.frame(GBLUP$u)
      
      result <- rbind(result1, result)
    }
  }
  
  if (i == "NR.P") {
    result_G_BLUP <- result
  } else{
    
    if (dim(result)[1] == 0) {
      next
    }
    
    result_G_BLUP1 <- result
    
    result_G_BLUP <- result_G_BLUP %>% 
    full_join(result_G_BLUP1)
  }
    result<-data.frame()
}

Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")
Joining, by = c("ID_Clone", "Ano")

write.csv(result_G_BLUP, "output/result_G_BLUP.csv" , row.names = F,
  quote = F)

Aqui, iremos adicionar novamente os valores médios dos valores fenotípicos aos GEBVs obtidos pelo G-BLUP.

media_pheno_Multi <- read.csv("output/media_pheno_Multi.csv") %>% 
  mutate(Ano = as.factor(Ano))

phen<-
  data.frame(pheno %>% 
               group_by(ID_Clone, Ano) %>% 
               group_keys() %>% 
               arrange(ID_Clone, .by_group = TRUE),
             stringsAsFactors = F)

media_pheno_Multi <- full_join(phen, media_pheno_Multi)

Joining, by = "Ano"

traits <- colnames(result_G_BLUP[3:ncol(result_G_BLUP)])

result_G_BLUP <- result_G_BLUP %>% 
  arrange(ID_Clone, .by_group = TRUE)

for (i in traits) {
  phen[,i] <- result_G_BLUP[,i] + media_pheno_Multi[,i]
}

GEBVS_G_BLUP <- result_G_BLUP %>%
  pivot_longer(NR.P:Staygreen,
               names_to = "Variable",
               values_to = "GEBVS")

BLUPS_MED <- result_G_BLUP %>% 
  arrange(ID_Clone, .by_group = TRUE)

for (i in traits) {
  BLUPS_MED[,i] <- BLUPS_MED[,i] + media_pheno_Multi[, i]
}

BLUPS_MED <- BLUPS_MED %>%
  pivot_longer(NR.P:Staygreen,
               names_to = "Variable",
               values_to = "BLUPS")

data_gws_G_BLUP <- GEBVS_G_BLUP %>%
  full_join(BLUPS_MED)

Joining, by = c("ID_Clone", "Ano", "Variable")

Agora vamos produzir os gráficos para cada variável comparando a correlação entre os GEBVs e BLUPs.

data_gws_G_BLUP %>%
  ggplot(aes(x = BLUPS, y = GEBVS)) +
  geom_point(aes(color = GEBVS), show.legend = F) +
  geom_smooth(method = lm, se = F) +
  stat_poly_eq(formula = y ~ x, label.y = "bottom") +
  scale_color_gradient(low = '#c80000', high = '#FFFF00') +
  facet_wrap(. ~ Variable, ncol = 6, scales = "free") +
  expand_limits(y = 0, x = 0) +
  ggtitle("G-BLUP")

`geom_smooth()` using formula 'y ~ x'

Agora vamos comparar os modelos por meio de um boxplot.

data_gws_G_BLUP %>%
  pivot_longer(4:5, names_to = "Method", values_to = "Values") %>%
  ggplot(aes(x = , y = Values, fill = Method)) +
  geom_boxplot() +
  facet_wrap(. ~ Variable, ncol = 6, scales = "free")  +
  expand_limits(y = 0, x = 0) +
  ggtitle("G-BLUP")

Cross-validation

To prove that the prediction is accurate, we should perform a cross-validation (CV) scheme. For this purpose, we divide the data into a training set and a validation set.

First we separate the data into k folds. Then, we attribute NA for one fold and try to predict the data from this fold based on the others. When selecting the number of folds, one must prioritize the balance between the number of observations in each fold.

In addition, this process should be repeated for further validation. The step-by-step below will guide the CV in the data we are analysing.

1. Determine the number of folds and repetitions

nfolds = 5
nrept = 5

Since we defined 5 folds, our data will be divided into 5 parts with 83 observations each.

2. Match the data and the rows of the SNP matrix

all(rownames(geno) %in% pheno$ID_Clone)

[1] TRUE

all(pheno$ID_Clone %in% rownames(geno))

[1] TRUE

3. Croos-Validation

set.seed(100)

results <- list()
results_cv <- data.frame()

for (j in traits) {
  for (a in year) {
    data <- pheno %>%
      filter(Ano == a) %>%
      droplevels()
    
    if (is.na(mean(data[[j]], na.rm = TRUE))) {
      next
    }
    
    index = rownames(geno) %in% data$ID_Clone
    data_gen = geno[index, ]
    
    data$ID <- factor(1:nrow(data))
    
    sort <- list()
    
    for (r in 1:nrept) {
      for (f in 1:nfolds) {
        folds <- cvFolds(nlevels(data$ID),
                         type = "random",
                         K = 5,
                         R = 1)
        Sample <- cbind(folds$which, folds$subsets)
        cv <- split(Sample[, 2], f = Sample[, 1])
      }
      sort[[r]] <- cv
    }
    
    rm(r, folds, f, cv, Sample)
    
    fold.list <- sort
    
    for (z in 1:length(fold.list)) {
      for (i in 1:nfolds) {
        # Training set
        train_data <- data
        
        # Validation set
        train_data[train_data$ID %in% fold.list[[z]][[i]], j] <- NA
        
        # Fitting model
        GBLUP <-
          mixed.solve(train_data[[j]], K = A.mat(data_gen - 1))
        
        # GEBV
        Pred <- data.frame(Yhat = GBLUP$u, G = data$ID)
        
        rownames(Pred) <- rownames(data_gen)
        
        # Predicted GEBV
        
        results[[i]] <- Pred[Pred[, "G"] %in% fold.list[[z]][[i]],]
        
        # Remove
        rm(GBLUP, Pred, train_data)
      }
      
      GEBV <- do.call(rbind, results)
      GEBV <- GEBV[order(GEBV$G),]
      
      # Log
      log <- all(GEBV$G == data$ID)
      
      # Results
      result_cv <- data.frame(
        Ano = a,
        Trait = j,
        Rep = z,
        Log = log,
        Ac = round(cor(GEBV$Yhat, data[[j]], use = "na.or.complete"), 3),
        MSPE = round(mean(((GEBV$Yhat - data[[j]]) ^ 2
        ), na.rm = T), 3)
      )
      
      results_cv <-
        rbind(results_cv, result_cv)
    }
  }
}

results_cv %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
    
  )

Ano	Trait	Rep	Log	Ac	MSPE
2017	NR.P	1	TRUE	0.069	2.715
2017	NR.P	2	TRUE	0.101	2.687
2017	NR.P	3	TRUE	0.162	2.642
2017	NR.P	4	TRUE	-0.024	2.784
2017	NR.P	5	TRUE	-0.002	2.785
2018	NR.P	1	TRUE	-0.075	0.487
2018	NR.P	2	TRUE	0.072	0.457
2018	NR.P	3	TRUE	0.057	0.462
2018	NR.P	4	TRUE	0.022	0.463
2018	NR.P	5	TRUE	0.054	0.459
2019	NR.P	1	TRUE	0.131	2.596
2019	NR.P	2	TRUE	0.048	2.699
2019	NR.P	3	TRUE	0.121	2.610
2019	NR.P	4	TRUE	0.229	2.520
2019	NR.P	5	TRUE	0.175	2.563
2020	NR.P	1	TRUE	0.566	1.339
2020	NR.P	2	TRUE	0.502	1.488
2020	NR.P	3	TRUE	0.598	1.262
2020	NR.P	4	TRUE	0.551	1.372
2020	NR.P	5	TRUE	0.492	1.511
2017	PTR	1	TRUE	0.147	1.810
2017	PTR	2	TRUE	0.224	1.754
2017	PTR	3	TRUE	0.220	1.756
2017	PTR	4	TRUE	0.108	1.856
2017	PTR	5	TRUE	0.097	1.862
2018	PTR	1	TRUE	0.098	2.186
2018	PTR	2	TRUE	0.055	2.243
2018	PTR	3	TRUE	0.091	2.209
2018	PTR	4	TRUE	0.053	2.231
2018	PTR	5	TRUE	0.108	2.179
2019	PTR	1	TRUE	0.196	5.646
2019	PTR	2	TRUE	0.078	6.027
2019	PTR	3	TRUE	0.069	6.034
2019	PTR	4	TRUE	0.134	5.806
2019	PTR	5	TRUE	0.071	6.055
2020	PTR	1	TRUE	0.439	10.569
2020	PTR	2	TRUE	0.465	10.164
2020	PTR	3	TRUE	0.466	10.105
2020	PTR	4	TRUE	0.483	9.895
2020	PTR	5	TRUE	0.456	10.285
2017	PPA	1	TRUE	0.240	9.100
2017	PPA	2	TRUE	0.308	8.717
2017	PPA	3	TRUE	0.343	8.463
2017	PPA	4	TRUE	0.270	8.992
2017	PPA	5	TRUE	0.423	8.015
2018	PPA	1	TRUE	0.228	2.781
2018	PPA	2	TRUE	0.191	2.848
2018	PPA	3	TRUE	0.227	2.786
2018	PPA	4	TRUE	0.249	2.746
2018	PPA	5	TRUE	0.223	2.794
2019	PPA	1	TRUE	0.073	8.448
2019	PPA	2	TRUE	0.178	8.042
2019	PPA	3	TRUE	0.128	8.215
2019	PPA	4	TRUE	0.064	8.461
2019	PPA	5	TRUE	0.179	8.032
2020	PPA	1	TRUE	0.368	64.976
2020	PPA	2	TRUE	0.325	68.539
2020	PPA	3	TRUE	0.286	71.632
2020	PPA	4	TRUE	0.408	62.612
2020	PPA	5	TRUE	0.384	64.212
2017	AP	1	TRUE	0.388	0.022
2017	AP	2	TRUE	0.381	0.023
2017	AP	3	TRUE	0.381	0.022
2017	AP	4	TRUE	0.389	0.022
2017	AP	5	TRUE	0.396	0.022
2018	AP	1	TRUE	0.043	0.009
2018	AP	2	TRUE	0.037	0.009
2018	AP	3	TRUE	-0.083	0.010
2018	AP	4	TRUE	-0.069	0.010
2018	AP	5	TRUE	-0.131	0.010
2019	AP	1	TRUE	0.153	0.014
2019	AP	2	TRUE	0.076	0.014
2019	AP	3	TRUE	0.047	0.015
2019	AP	4	TRUE	0.093	0.014
2019	AP	5	TRUE	0.152	0.014
2020	AP	1	TRUE	0.315	0.025
2020	AP	2	TRUE	0.260	0.026
2020	AP	3	TRUE	0.254	0.026
2020	AP	4	TRUE	0.210	0.027
2020	AP	5	TRUE	0.365	0.024
2017	HI	1	TRUE	0.133	60.564
2017	HI	2	TRUE	0.101	61.948
2017	HI	3	TRUE	0.153	60.296
2017	HI	4	TRUE	0.153	60.382
2017	HI	5	TRUE	0.171	59.205
2018	HI	1	TRUE	0.150	38.174
2018	HI	2	TRUE	0.029	39.828
2018	HI	3	TRUE	0.204	37.183
2018	HI	4	TRUE	0.097	38.903
2018	HI	5	TRUE	0.058	40.630
2019	HI	1	TRUE	-0.014	73.880
2019	HI	2	TRUE	-0.134	76.365
2019	HI	3	TRUE	-0.115	77.022
2019	HI	4	TRUE	-0.131	77.614
2019	HI	5	TRUE	-0.127	77.601
2020	HI	1	TRUE	0.408	39.800
2020	HI	2	TRUE	0.427	38.740
2020	HI	3	TRUE	0.454	37.223
2020	HI	4	TRUE	0.369	41.692
2020	HI	5	TRUE	0.472	36.383
2017	RF	1	TRUE	0.217	0.079
2017	RF	2	TRUE	0.206	0.080
2017	RF	3	TRUE	0.174	0.081
2017	RF	4	TRUE	0.259	0.078
2017	RF	5	TRUE	0.177	0.081
2018	RF	1	TRUE	0.401	0.143
2018	RF	2	TRUE	0.367	0.149
2018	RF	3	TRUE	0.396	0.144
2018	RF	4	TRUE	0.335	0.155
2018	RF	5	TRUE	0.409	0.143
2020	RF	1	TRUE	0.342	0.012
2020	RF	2	TRUE	0.389	0.012
2020	RF	3	TRUE	0.424	0.012
2020	RF	4	TRUE	0.310	0.013
2020	RF	5	TRUE	0.330	0.013
2017	CR	1	TRUE	0.200	5.347
2017	CR	2	TRUE	0.219	5.281
2017	CR	3	TRUE	0.289	5.097
2017	CR	4	TRUE	0.143	5.505
2017	CR	5	TRUE	0.177	5.428
2018	CR	1	TRUE	0.119	1.033
2018	CR	2	TRUE	0.109	1.037
2018	CR	3	TRUE	0.045	1.052
2018	CR	4	TRUE	0.016	1.072
2018	CR	5	TRUE	0.073	1.044
2019	CR	1	TRUE	0.276	3.833
2019	CR	2	TRUE	0.236	3.974
2019	CR	3	TRUE	0.326	3.691
2019	CR	4	TRUE	0.188	4.077
2019	CR	5	TRUE	0.254	3.883
2020	CR	1	TRUE	0.356	7.249
2020	CR	2	TRUE	0.357	7.245
2020	CR	3	TRUE	0.363	7.186
2020	CR	4	TRUE	0.364	7.205
2020	CR	5	TRUE	0.348	7.353
2017	DR	1	TRUE	0.203	8.381
2017	DR	2	TRUE	0.137	8.616
2017	DR	3	TRUE	0.168	8.513
2017	DR	4	TRUE	0.200	8.389
2017	DR	5	TRUE	0.173	8.499
2018	DR	1	TRUE	0.366	6.054
2018	DR	2	TRUE	0.203	6.834
2018	DR	3	TRUE	0.341	6.152
2018	DR	4	TRUE	0.304	6.307
2018	DR	5	TRUE	0.276	6.443
2019	DR	1	TRUE	0.129	14.179
2019	DR	2	TRUE	0.114	14.038
2019	DR	3	TRUE	0.229	13.356
2019	DR	4	TRUE	0.039	14.774
2019	DR	5	TRUE	0.213	13.399
2020	DR	1	TRUE	0.480	11.290
2020	DR	2	TRUE	0.436	11.921
2020	DR	3	TRUE	0.435	11.961
2020	DR	4	TRUE	0.469	11.446
2020	DR	5	TRUE	0.385	12.679
2017	DCaule	1	TRUE	0.259	0.048
2017	DCaule	2	TRUE	0.246	0.048
2017	DCaule	3	TRUE	0.235	0.048
2017	DCaule	4	TRUE	0.237	0.048
2017	DCaule	5	TRUE	0.212	0.049
2018	DCaule	1	TRUE	0.168	0.024
2018	DCaule	2	TRUE	0.023	0.025
2018	DCaule	3	TRUE	0.134	0.024
2018	DCaule	4	TRUE	0.094	0.024
2018	DCaule	5	TRUE	0.140	0.024
2019	DCaule	1	TRUE	0.282	0.019
2019	DCaule	2	TRUE	0.176	0.020
2019	DCaule	3	TRUE	0.128	0.020
2019	DCaule	4	TRUE	0.238	0.019
2019	DCaule	5	TRUE	0.292	0.019
2020	DCaule	1	TRUE	0.341	0.031
2020	DCaule	2	TRUE	0.178	0.035
2020	DCaule	3	TRUE	0.302	0.033
2020	DCaule	4	TRUE	0.417	0.029
2020	DCaule	5	TRUE	0.388	0.030
2017	Ácaro	1	TRUE	0.233	0.074
2017	Ácaro	2	TRUE	0.325	0.069
2017	Ácaro	3	TRUE	0.188	0.075
2017	Ácaro	4	TRUE	0.319	0.070
2017	Ácaro	5	TRUE	0.236	0.073
2018	Ácaro	1	TRUE	0.279	0.106
2018	Ácaro	2	TRUE	0.330	0.101
2018	Ácaro	3	TRUE	0.236	0.108
2018	Ácaro	4	TRUE	0.325	0.102
2018	Ácaro	5	TRUE	0.295	0.104
2020	Ácaro	1	TRUE	0.608	0.070
2020	Ácaro	2	TRUE	0.555	0.076
2020	Ácaro	3	TRUE	0.570	0.074
2020	Ácaro	4	TRUE	0.532	0.079
2020	Ácaro	5	TRUE	0.582	0.073
2018	MS	1	TRUE	0.256	3.738
2018	MS	2	TRUE	0.069	3.951
2018	MS	3	TRUE	0.064	3.962
2018	MS	4	TRUE	0.080	3.957
2018	MS	5	TRUE	0.120	3.889
2019	MS	1	TRUE	0.317	3.378
2019	MS	2	TRUE	0.298	3.430
2019	MS	3	TRUE	0.230	3.618
2019	MS	4	TRUE	0.305	3.395
2019	MS	5	TRUE	0.340	3.308
2020	MS	1	TRUE	0.156	19.956
2020	MS	2	TRUE	0.188	19.558
2020	MS	3	TRUE	0.144	20.024
2020	MS	4	TRUE	0.193	19.513
2020	MS	5	TRUE	0.180	19.591
2018	PROD.AMD	1	TRUE	0.045	0.175
2018	PROD.AMD	2	TRUE	0.121	0.167
2018	PROD.AMD	3	TRUE	0.183	0.162
2018	PROD.AMD	4	TRUE	0.134	0.166
2018	PROD.AMD	5	TRUE	0.077	0.176
2019	PROD.AMD	1	TRUE	0.105	0.246
2019	PROD.AMD	2	TRUE	0.129	0.244
2019	PROD.AMD	3	TRUE	0.063	0.251
2019	PROD.AMD	4	TRUE	0.126	0.245
2019	PROD.AMD	5	TRUE	0.174	0.239
2020	PROD.AMD	1	TRUE	0.376	0.760
2020	PROD.AMD	2	TRUE	0.380	0.753
2020	PROD.AMD	3	TRUE	0.351	0.785
2020	PROD.AMD	4	TRUE	0.352	0.780
2020	PROD.AMD	5	TRUE	0.411	0.727
2018	AMD	1	TRUE	0.106	3.859
2018	AMD	2	TRUE	0.126	3.821
2018	AMD	3	TRUE	0.179	3.751
2018	AMD	4	TRUE	0.039	4.016
2018	AMD	5	TRUE	0.138	3.828
2019	AMD	1	TRUE	0.256	3.556
2019	AMD	2	TRUE	0.393	3.188
2019	AMD	3	TRUE	0.293	3.432
2019	AMD	4	TRUE	0.315	3.382
2019	AMD	5	TRUE	0.253	3.533
2020	AMD	1	TRUE	0.166	19.855
2020	AMD	2	TRUE	0.279	18.558
2020	AMD	3	TRUE	0.182	19.655
2020	AMD	4	TRUE	0.203	19.435
2020	AMD	5	TRUE	0.213	19.264
2018	Porte	1	TRUE	0.264	0.057
2018	Porte	2	TRUE	0.344	0.053
2018	Porte	3	TRUE	0.283	0.055
2018	Porte	4	TRUE	0.357	0.052
2018	Porte	5	TRUE	0.383	0.050
2019	Porte	1	TRUE	0.396	0.432
2019	Porte	2	TRUE	0.346	0.449
2019	Porte	3	TRUE	0.383	0.437
2019	Porte	4	TRUE	0.301	0.463
2019	Porte	5	TRUE	0.367	0.442
2020	Porte	1	TRUE	0.263	0.182
2020	Porte	2	TRUE	0.284	0.179
2020	Porte	3	TRUE	0.258	0.184
2020	Porte	4	TRUE	0.267	0.182
2020	Porte	5	TRUE	0.265	0.183
2018	Nº.Hastes	1	TRUE	0.179	0.039
2018	Nº.Hastes	2	TRUE	0.182	0.039
2018	Nº.Hastes	3	TRUE	0.134	0.040
2018	Nº.Hastes	4	TRUE	0.014	0.042
2018	Nº.Hastes	5	TRUE	0.104	0.040
2019	Nº.Hastes	1	TRUE	-0.028	0.349
2019	Nº.Hastes	2	TRUE	0.024	0.346
2019	Nº.Hastes	3	TRUE	0.007	0.345
2019	Nº.Hastes	4	TRUE	0.096	0.336
2019	Nº.Hastes	5	TRUE	0.127	0.334
2020	Nº.Hastes	1	TRUE	0.314	0.148
2020	Nº.Hastes	2	TRUE	0.335	0.145
2020	Nº.Hastes	3	TRUE	0.318	0.147
2020	Nº.Hastes	4	TRUE	0.382	0.140
2020	Nº.Hastes	5	TRUE	0.326	0.146
2018	Stand_Final	1	TRUE	0.073	0.004
2018	Stand_Final	2	TRUE	0.203	0.003
2018	Stand_Final	3	TRUE	0.153	0.003
2018	Stand_Final	4	TRUE	0.091	0.004
2018	Stand_Final	5	TRUE	0.199	0.003
2019	Stand_Final	1	TRUE	0.014	0.126
2019	Stand_Final	2	TRUE	-0.010	0.125
2019	Stand_Final	3	TRUE	-0.082	0.128
2019	Stand_Final	4	TRUE	-0.093	0.128
2019	Stand_Final	5	TRUE	-0.063	0.125
2020	Stand_Final	1	TRUE	0.249	0.534
2020	Stand_Final	2	TRUE	0.193	0.549
2020	Stand_Final	3	TRUE	0.295	0.527
2020	Stand_Final	4	TRUE	0.206	0.545
2020	Stand_Final	5	TRUE	0.136	0.563
2018	Branching_Level	1	TRUE	0.033	0.034
2018	Branching_Level	2	TRUE	0.092	0.033
2018	Branching_Level	3	TRUE	0.096	0.033
2018	Branching_Level	4	TRUE	0.118	0.033
2018	Branching_Level	5	TRUE	0.159	0.033
2019	Branching_Level	1	TRUE	0.182	0.281
2019	Branching_Level	2	TRUE	0.143	0.286
2019	Branching_Level	3	TRUE	0.183	0.281
2019	Branching_Level	4	TRUE	0.193	0.280
2019	Branching_Level	5	TRUE	0.202	0.277
2020	Branching_Level	1	TRUE	0.436	0.150
2020	Branching_Level	2	TRUE	0.465	0.144
2020	Branching_Level	3	TRUE	0.468	0.143
2020	Branching_Level	4	TRUE	0.452	0.146
2020	Branching_Level	5	TRUE	0.529	0.132
2018	Staygreen	1	TRUE	-0.029	0.001
2018	Staygreen	2	TRUE	-0.130	0.001
2018	Staygreen	3	TRUE	-0.145	0.001
2018	Staygreen	4	TRUE	-0.003	0.001
2018	Staygreen	5	TRUE	0.032	0.001

write.csv(results_cv, "output/results_cv.csv",    row.names = F,
  quote = F)

The object “result_cv” is divided by repetition. In the “result_cv” objects for each repetition, “Rep” is the number of the repetition, “Log” is a diagnostic indicating if the order of the predicted breeding values matches the order of the adjusted means, “Ac” is the prediction accuracy (correlation between the GEBV and adjusted means), and “MSPE” is the mean square prediction error (the lower, the better).

Agora vamos plotar os resultados de acurácias para cada característica

results_cv %>%
  group_by(Ano, Trait) %>%
  summarise(
    mean.Ac = mean(Ac),
    sd.Ac = sd(Ac),
    mean.MSPE = mean(MSPE),
    sd.MSPE = sd(MSPE)
  ) %>%
  ggplot(aes(x = Ano, y = mean.Ac , fill = Trait)) +
  geom_col(alpha = 0.8,
           width = 0.5,
           show.legend = F) +
  geom_errorbar(
    aes(ymin = mean.Ac - sd.Ac, ymax = mean.Ac + sd.Ac),
    width = .2,
    position = position_dodge(.9)
  ) +
  facet_wrap(. ~ Trait, ncol = 6) +
  expand_limits(y = c(0,1)) +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1),
         text = element_text(size = 15)) +
  labs(y = "Accuracy")

`summarise()` has grouped output by 'Ano'. You can override using the `.groups`
argument.

ggsave("output/accuracy_multi.png", width = 16, height = 8, dpi =300)

sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
[3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] cvTools_0.3.2            robustbase_0.95-0        lattice_0.20-45         
 [4] ggpmisc_0.5.1            ggpp_0.4.5               rrBLUP_4.6.1            
 [7] ComplexHeatmap_2.10.0    AGHmatrix_2.0.4          genomicMateSelectR_0.2.0
[10] janitor_2.1.0            kableExtra_1.3.4         forcats_0.5.2           
[13] stringr_1.4.1            dplyr_1.0.10             purrr_0.3.4             
[16] readr_2.1.2              tidyr_1.2.1              tibble_3.1.8            
[19] ggplot2_3.3.6            tidyverse_1.3.2          readxl_1.4.1            

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0   colorspace_2.0-3    rjson_0.2.21       
  [4] ellipsis_0.3.2      rprojroot_2.0.3     circlize_0.4.15    
  [7] snakecase_0.11.0    GlobalOptions_0.1.2 fs_1.5.2           
 [10] clue_0.3-61         rstudioapi_0.14     farver_2.1.1       
 [13] MatrixModels_0.5-1  fansi_1.0.3         lubridate_1.8.0    
 [16] xml2_1.3.3          splines_4.1.3       codetools_0.2-18   
 [19] doParallel_1.0.17   cachem_1.0.6        confintr_0.2.0     
 [22] knitr_1.40          polynom_1.4-1       jsonlite_1.8.0     
 [25] workflowr_1.7.0     broom_1.0.1         cluster_2.1.2      
 [28] dbplyr_2.2.1        png_0.1-7           compiler_4.1.3     
 [31] httr_1.4.4          backports_1.4.1     assertthat_0.2.1   
 [34] Matrix_1.5-1        fastmap_1.1.0       gargle_1.2.1       
 [37] cli_3.3.0           later_1.3.0         htmltools_0.5.3    
 [40] quantreg_5.94       tools_4.1.3         gtable_0.3.1       
 [43] glue_1.6.2          Rcpp_1.0.9          cellranger_1.1.0   
 [46] jquerylib_0.1.4     vctrs_0.4.1         nlme_3.1-159       
 [49] svglite_2.1.0       iterators_1.0.14    xfun_0.32          
 [52] rvest_1.0.3         lifecycle_1.0.3     googlesheets4_1.0.1
 [55] DEoptimR_1.0-11     MASS_7.3-58.1       zoo_1.8-11         
 [58] scales_1.2.1        ragg_1.2.3          hms_1.1.2          
 [61] promises_1.2.0.1    parallel_4.1.3      SparseM_1.81       
 [64] RColorBrewer_1.1-3  yaml_2.3.5          sass_0.4.2         
 [67] stringi_1.7.6       highr_0.9           S4Vectors_0.32.4   
 [70] foreach_1.5.2       BiocGenerics_0.40.0 boot_1.3-28        
 [73] shape_1.4.6         rlang_1.0.6         pkgconfig_2.0.3    
 [76] systemfonts_1.0.4   matrixStats_0.62.0  evaluate_0.17      
 [79] labeling_0.4.2      tidyselect_1.2.0    magrittr_2.0.3     
 [82] R6_2.5.1            magick_2.7.3        IRanges_2.28.0     
 [85] generics_0.1.3      DBI_1.1.3           mgcv_1.8-40        
 [88] pillar_1.8.1        haven_2.5.1         withr_2.5.0        
 [91] survival_3.4-0      modelr_0.1.9        crayon_1.5.2       
 [94] utf8_1.2.2          tzdb_0.3.0          rmarkdown_2.17     
 [97] GetoptLong_1.0.5    git2r_0.30.1        reprex_2.0.2       
[100] digest_0.6.29       webshot_0.5.4       httpuv_1.6.5       
[103] textshaping_0.3.6   stats4_4.1.3        munsell_0.5.0      
[106] viridisLite_0.4.1   bslib_0.4.0

Weverton Gomes da Costa, Pós-Doutorando, Embrapa Mandioca e Fruticultura, wevertonufv@gmail.com ↩︎