Last updated: 2019-08-31

Checks: 7 0

Knit directory: 2019-feature-selection/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190522) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .Ruserdata/
    Ignored:    .drake/
    Ignored:    analysis/rosm.cache/
    Ignored:    data/
    Ignored:    inst/Benchmark for Filter Methods for Feature Selection in High-Dimensional  Classification Data.pdf
    Ignored:    inst/study-area-map/study-area.qgs~
    Ignored:    log/
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/
    Ignored:    reviews/
    Ignored:    rosm.cache/
    Ignored:    tests/

Untracked files:
    Untracked:  .drake_history/

Unstaged changes:
    Modified:   _drake.R
    Modified:   analysis/eval-performance.Rmd
    Modified:   code/05-modeling/paper/tune-wrapper.R
    Modified:   code/06-benchmark-matrix.R
    Modified:   code/061-aggregate.R
    Modified:   code/98-paper/ieee/pdf/correlation-filter-nri-1.pdf
    Modified:   code/98-paper/ieee/pdf/correlation-nbins-1.pdf
    Modified:   code/98-paper/journal/defoliation-distribution-plot-1.pdf
    Modified:   code/move-figures.R
    Modified:   docs/figure/eda.Rmd/defoliation-distribution-plot-1.pdf
    Modified:   docs/figure/eval-performance.Rmd/performance-results-1.pdf
    Modified:   docs/figure/filter-correlation.Rmd/correlation-filter-nri-1.pdf
    Modified:   docs/figure/filter-correlation.Rmd/correlation-nbins-1.pdf

Staged changes:
    Modified:   analysis/response-normality.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html df85aba pat-s 2019-07-12 Build site.
html 3a44a95 pat-s 2019-07-10 Build site.
html db3955e pat-s 2019-06-27 Build site.
html f389b09 pat-s 2019-06-27 Build site.
Rmd ca7205f pat-s 2019-06-27 add new report
html 07cbfb6 pat-s 2019-05-29 Build site.
Rmd 8f54a19 pat-s 2019-05-29 wflow_publish(knitr_in(“analysis/report-defoliation.Rmd”), view =
html 49da4b7 pat-s 2019-05-27 Build site.
html 47a5b39 pat-s 2019-05-25 Build site.
html 5653514 pat-s 2019-05-24 Build site.
html 473adec pat-s 2019-05-24 Build site.
Rmd 9cc304d pat-s 2019-05-24 wflow_publish(knitr_in(“analysis/report-defoliation.Rmd”), view =
html ffa39d4 pat-s 2019-05-24 Build site.
html f3b12f5 pat-s 2019-05-24 Build site.
html b5a9ba4 pat-s 2019-05-24 Build site.
html 9a62e32 pat-s 2019-05-24 Build site.
Rmd 64e0c53 pat-s 2019-05-24 wflow_publish(“analysis/report-defoliation.Rmd”)
html 33132af pat-s 2019-05-23 Build site.
Rmd 826acd8 pat-s 2019-05-23 wflow_publish(“analysis/report-defoliation.Rmd”)
html 6517dc7 pat-s 2019-05-23 Build site.
html cd4346d pat-s 2019-05-23 Build site.
Rmd 6f93d87 pat-s 2019-05-23 wflow_publish(all = T)
html 22eb53d pat-s 2019-05-23 Build site.
html 66ad991 pat-s 2019-05-23 Build site.
Rmd a7ae8f1 pat-s 2019-05-23 wflow_publish(all = T)
html a7ae8f1 pat-s 2019-05-23 wflow_publish(all = T)
html 39ea046 pat-s 2019-05-22 Build site.
Rmd b02b9da pat-s 2019-05-22 restructure for workflowr

Predicted defoliation (absolute)

Predicted defoliation by XGBOOST on 20 m spatial resolution in the Basque Country.

With an RMSE of 36 % (for a range of 0 - 100), the model focuses its prediction to the mean of the response range (0 - 100) to achieve acceptable performance results on the test set.

The following vegetation indices have been used, selected by the internal variable importance rating of XGBOOST:

  • EVI
  • GDVI4
  • GDVI3
  • GDVI2
  • D1
  • mNDVI
  • mSR

Optimization path of hyperparameter tuning

The following shows the optimization path (RMSE) of the hyperparameter tuning for the XGBOOST model using 20 MBO iterations (with a starting design of 30).

 [1] 43.17171 44.46635 43.42093 43.82827 44.53159 43.86555 41.70989
 [8] 37.23756 37.77271 37.66704 39.38304 40.50224 36.66701 36.22678
[15] 38.25126 36.93972 39.12485 42.45889 38.05913 39.47527

Some more details on the optimization path are presented below:

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

Predicted Defoliation (relative)

Since these absolute values do not reflect the truth, we decided to create a relative Index from the predicted absolute values.

This was done by calling

scale(data$defoliation, center = FALSE,
      scale = max(data$defoliation, na.rm = TRUE)/100)

which re-scales the data from 0 - 100. The code was adapted from this Stackoverflow answer.

Defoliation evaluation

Absolute defoliation

Maps

2017

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

2018

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

Histograms

2017

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

2018

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

Relative defoliation

Maps

2017

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

2018

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

Histograms

2017

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

2018

Version Author Date
3a44a95 pat-s 2019-07-10
39ea046 pat-s 2019-05-22

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/3.5.2/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/libopenblaso-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] gridExtra_2.3        tidyselect_0.2.5     ggcorrplot_0.1.3    
 [4] kernlab_0.9-27       mRMRe_2.0.9          igraph_1.2.4        
 [7] survival_2.43-3      praznik_5.0.0        workflowr_1.4.0     
[10] here_0.1             ggpubr_0.1.6         ggspatial_1.0.3     
[13] gdalUtils_2.0.1.7    getSpatialData_0.0.4 knitr_1.23          
[16] rgenoud_5.8-3.0      parallelMap_1.4      mlrMBO_1.1.2        
[19] smoof_1.5.1          checkmate_1.9.1      BBmisc_1.11         
[22] mapview_2.3.0        stringr_1.4.0        fs_1.2.6            
[25] curl_3.3             mlrCPO_0.3.4-2       mlr_2.15.0          
[28] ParamHelpers_1.12    data.table_1.12.0    magrittr_1.5        
[31] future.apply_1.1.0   future.callr_0.4.0   furrr_0.1.0.9002    
[34] future_1.11.1.1      R.utils_2.8.0        R.oo_1.22.0         
[37] R.methodsS3_1.7.1    glue_1.3.0           purrr_0.3.0         
[40] sf_0.7-7             dplyr_0.8.0.1        hsdar_0.5.2         
[43] caret_6.0-81         ggplot2_3.1.0        lattice_0.20-38     
[46] signal_0.7-6         rootSolve_1.7        raster_2.8-4        
[49] rgdal_1.4-4          sp_1.3-1             drake_7.5.2         

loaded via a namespace (and not attached):
  [1] backports_1.1.3    fastmatch_1.1-0    mapedit_0.4.3     
  [4] plyr_1.8.4         lazyeval_0.2.1     splines_3.5.2     
  [7] storr_1.2.1        crosstalk_1.0.0    listenv_0.7.0     
 [10] leaflet_2.0.2      usethis_1.4.0      digest_0.6.18     
 [13] foreach_1.4.4      htmltools_0.3.6    memoise_1.1.0     
 [16] base64url_1.4      remotes_2.0.2      recipes_0.1.4     
 [19] globals_0.12.4     gower_0.1.2        prettyunits_1.0.2 
 [22] colorspace_1.4-0   xfun_0.5           DiceKriging_1.5.6 
 [25] callr_3.1.1        crayon_1.3.4       jsonlite_1.6      
 [28] iterators_1.0.10   gtable_0.2.0       ipred_0.9-8       
 [31] webshot_0.5.1      pkgbuild_1.0.2     abind_1.4-5       
 [34] scales_1.0.0       GGally_1.4.0       DBI_1.0.0         
 [37] Rcpp_1.0.0         viridisLite_0.3.0  xtable_1.8-3      
 [40] units_0.6-2        txtq_0.1.4         stats4_3.5.2      
 [43] lava_1.6.5         prodlim_2018.04.18 htmlwidgets_1.3   
 [46] httr_1.4.0         RColorBrewer_1.1-2 reshape_0.8.8     
 [49] pkgconfig_2.0.2    nnet_7.3-12        RJSONIO_1.3-1.1   
 [52] labeling_0.3       rlang_0.3.4        reshape2_1.4.3    
 [55] later_0.8.0        munsell_0.5.0      tools_3.5.2       
 [58] cli_1.1.0          generics_0.0.2     devtools_2.0.1    
 [61] evaluate_0.13      yaml_2.2.0         ModelMetrics_1.2.2
 [64] processx_3.2.1     satellite_1.0.1    nlme_3.1-137      
 [67] whisker_0.3-2      mime_0.6           xml2_1.2.0        
 [70] compiler_3.5.2     prettymapr_0.2.2   plotly_4.8.0      
 [73] filelock_1.0.2     png_0.1-7          testthat_2.0.1    
 [76] e1071_1.7-0.1      tibble_2.0.1       lhs_1.0.1         
 [79] stringi_1.3.1      ps_1.3.0           desc_1.2.0        
 [82] plot3D_1.1.1       Matrix_1.2-15      classInt_0.3-1    
 [85] rosm_0.2.4         pillar_1.3.1       httpuv_1.4.5.1    
 [88] R6_2.4.0           promises_1.0.1     sessioninfo_1.1.1 
 [91] codetools_0.2-16   MASS_7.3-51.1      assertthat_0.2.0  
 [94] pkgload_1.0.2      rprojroot_1.3-2    withr_2.1.2       
 [97] parallel_3.5.2     rpart_4.1-13       timeDate_3043.102 
[100] tidyr_0.8.2        class_7.3-15       rmarkdown_1.13    
[103] misc3d_0.8-4       mco_1.0-15.1       git2r_0.24.0      
[106] getPass_0.2-2      shiny_1.2.0        lubridate_1.7.4   
[109] base64enc_0.1-3