Last updated: 2019-12-10

Checks: 7 0

Knit directory: 2019-feature-selection/

This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190522) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .Ruserdata/
    Ignored:    .drake/
    Ignored:    analysis/rosm.cache/
    Ignored:    data/
    Ignored:    inst/Benchmark for Filter Methods for Feature Selection in High-Dimensional  Classification Data.pdf
    Ignored:    inst/study-area-map/study-area.qgs~
    Ignored:    log/
    Ignored:    renv/library/
    Ignored:    renv/staging/
    Ignored:    reviews/
    Ignored:    rosm.cache/

Unstaged changes:
    Modified:   _drake.R
    Modified:   analysis/report-defoliation.Rmd
    Modified:   code/07-reports.R
    Modified:   code/98-paper/ieee/pdf/correlation-filter-nri-1.pdf
    Modified:   code/98-paper/ieee/pdf/correlation-nbins-1.pdf
    Modified:   code/98-paper/ieee/pdf/defoliation-distribution-plot-1.pdf
    Modified:   code/98-paper/ieee/pdf/spectral-signatures-1.pdf
    Modified:   code/98-paper/ieee/performance-best-per-learner.tex
    Modified:   code/98-paper/ieee/performance-top-20.tex
    Modified:   code/98-paper/journal/defoliation-distribution-plot-1.pdf
    Modified:   code/98-paper/presentation/table-best-learner-per-task.rda
    Modified:   code/98-paper/presentation/table-perf.rda
    Modified:   code/99-packages.R
    Modified:   slurm_clustermq.tmpl

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 951e98c pat-s 2019-12-10 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE, verbose =
Rmd b27c623 pat-s 2019-12-09 Standard Error -> SE
Rmd 559a59d pat-s 2019-12-09 add scatterplots to vis BM perf
html 482a158 pat-s 2019-11-01 Build site.
html becf5ea pat-s 2019-11-01 Build site.
html bd7c7f5 pat-s 2019-10-31 Build site.
html 62ff96f pat-s 2019-10-07 Build site.
html a947654 pat-s 2019-10-02 Build site.
html 49da171 pat-s 2019-09-22 Build site.
html c6317a8 pat-s 2019-09-19 Build site.
Rmd d7c72a8 pat-s 2019-09-19 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
html 7fd40ca pat-s 2019-09-18 Build site.
Rmd 44ff84b pat-s 2019-09-18 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
html 41aae14 pat-s 2019-09-12 Build site.
html ff340b8 pat-s 2019-09-03 Build site.
Rmd a524819 pat-s 2019-09-03 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
html b181c52 pat-s 2019-09-02 Build site.
Rmd cf6e820 pat-s 2019-09-02 wflow_publish(“analysis/eval-performance.Rmd”)
Rmd 1bec10d pat-s 2019-09-01 no timestamps in latex tables
html 4e363ac pat-s 2019-09-01 Build site.
Rmd 518d0cb pat-s 2019-09-01 style files using tidyverse style
html 8e7e4fe pat-s 2019-09-01 Build site.
Rmd 8941bca pat-s 2019-09-01 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
Rmd 297ed93 pat-s 2019-08-31 add filter vs no filter comparison plot
html 7582c67 pat-s 2019-08-31 Build site.
html abd531f pat-s 2019-08-31 Build site.
Rmd 9117eee pat-s 2019-08-31 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
Rmd 5b629cb pat-s 2019-08-19 add new tasks to performance eval report
html 1ec8768 pat-s 2019-08-17 Build site.
html df85aba pat-s 2019-07-12 Build site.
html 3a44a95 pat-s 2019-07-10 Build site.
html c238ce4 pat-s 2019-07-10 Build site.
Rmd e98cb01 pat-s 2019-07-10 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
Rmd 24e318f pat-s 2019-07-01 update reports
Rmd ca5c5bc pat-s 2019-06-28 add eval-performance report

Warning: Expected 2 pieces. Additional pieces discarded in 72 rows [21, 22,
23, 24, 57, 58, 59, 60, 93, 94, 95, 96, 145, 146, 147, 148, 181, 182, 183,
184, ...].
Warning: Expected 2 pieces. Missing pieces filled with `NA` in 168 rows [1,
2, 3, 4, 37, 38, 39, 40, 73, 74, 75, 76, 109, 110, 111, 112, 113, 114, 115,
116, ...].

Aggregate performances and add standard error column.

(Table) All leaner/filter/task combinations ordered by performance.

Overall leaderboard across all settings, sorted descending by performance.

Task

Model

Filter

RMSE

SE

HR

XGBoost

Borda

30.557

13.373

HR

XGBoost

CMIM

30.829

13.898

HR-NRI

XGBoost

Info

31.033

15.955

HR-NRI-VI

XGBoost

CMIM

31.171

14.778

HR

RF

MRMR

31.438

15.076

HR

XGBoost

Relief

31.438

15.076

HR

XGBoost

MRMR

31.438

15.076

HR-NRI-VI

SVM

Relief

31.584

15.602

NRI

SVM

Car

31.613

15.581

HR-NRI-VI

RF

Car

31.640

14.892

HR-NRI-VI

SVM

Info

31.666

15.491

HR-NRI

SVM

Borda

31.703

15.621

NRI

SVM

Borda

31.712

15.309

VI

SVM

Pearson

31.745

15.077

HR-NRI

SVM

Relief

31.778

15.390

NRI

SVM

Info

31.803

15.370

VI

SVM

CMIM

31.812

15.509

HR-VI

SVM

Relief

31.821

15.358

HR-VI

SVM

Borda

31.828

15.238

HR-NRI-VI

SVM

Borda

31.833

15.269

HR-NRI-VI

SVM

Pearson

31.837

15.281

HR

SVM

Borda

31.866

15.312

HR

SVM

Pearson

31.866

15.312

HR

SVM

Relief

31.876

15.310

HR

SVM

MRMR

31.876

15.310

HR-VI

SVM

MRMR

31.876

15.310

HR-VI

SVM

CMIM

31.876

15.310

HR-VI

SVM

Car

31.896

15.372

VI

SVM

Info

31.901

15.380

HR-VI

SVM

Info

31.904

15.384

HR

SVM

Info

31.911

15.379

HR-VI

SVM

Pearson

31.921

15.420

HR

SVM

PCA

32.008

15.200

HR-NRI

SVM

Info

32.014

15.224

HR-NRI

SVM

Pearson

32.016

15.230

HR-NRI

SVM

MRMR

32.022

15.191

HR-NRI-VI

SVM

MRMR

32.025

15.183

HR-NRI-VI

SVM

CMIM

32.025

15.175

VI

SVM

PCA

32.027

15.178

HR-VI

SVM

PCA

32.027

15.181

HR-NRI-VI

SVM

PCA

32.027

15.179

HR-NRI

SVM

PCA

32.028

15.178

HR-NRI

SVM

CMIM

32.028

15.179

HR

SVM

No Filter

32.030

15.178

VI

SVM

No Filter

32.030

15.178

HR-NRI

SVM

No Filter

32.030

15.178

HR-VI

SVM

No Filter

32.030

15.178

HR-NRI-VI

SVM

No Filter

32.030

15.178

VI

SVM

Relief

32.030

15.178

NRI

SVM

PCA

32.033

15.177

HR

SVM

CMIM

32.040

15.175

VI

SVM

Car

32.044

15.162

VI

SVM

Borda

32.054

15.274

HR

SVM

Car

32.062

15.239

HR-NRI-VI

XGBoost

No Filter

32.081

15.497

VI

SVM

MRMR

32.086

15.252

NRI

RF

CMIM

32.265

13.908

HR-NRI

XGBoost

CMIM

32.284

14.053

NRI

XGBoost

Pearson

32.457

15.837

HR-NRI

XGBoost

MRMR

32.527

14.933

NRI

SVM

Relief

32.602

14.700

NRI

XGBoost

CMIM

32.613

14.048

NRI

XGBoost

MRMR

32.690

14.171

NRI

XGBoost

No Filter

32.762

13.870

HR-NRI

XGBoost

Car

32.939

13.227

HR-NRI

XGBoost

Borda

33.141

13.419

HR-NRI-VI

XGBoost

Pearson

33.166

14.958

NRI

XGBoost

Car

33.183

13.859

NRI

Ridge-CV

No Filter

33.225

8.223

NRI

XGBoost

Borda

33.281

13.889

HR-NRI-VI

XGBoost

Borda

33.370

12.937

HR-NRI

XGBoost

Relief

33.509

14.108

HR-NRI-VI

XGBoost

Car

33.552

12.619

HR-NRI

XGBoost

No Filter

33.837

13.018

HR-NRI-VI

XGBoost

Info

33.876

12.562

HR-NRI

XGBoost

Pearson

33.978

12.171

NRI

RF

Car

34.581

10.616

HR-NRI-VI

SVM

Car

34.607

14.299

HR-NRI

SVM

Car

34.744

13.942

NRI

XGBoost

Info

34.806

14.270

HR-NRI-VI

XGBoost

MRMR

34.860

12.701

NRI

XGBoost

Relief

34.991

10.419

HR

XGBoost

Car

35.029

12.492

HR-NRI

RF

PCA

35.143

13.403

HR-NRI-VI

RF

PCA

35.331

13.246

HR-NRI

RF

CMIM

35.379

12.442

HR-NRI

RF

Relief

35.841

14.634

HR-NRI-VI

RF

CMIM

35.895

12.120

HR-NRI

RF

Borda

35.923

12.315

HR-VI

RF

Relief

35.969

12.595

HR-NRI

RF

No Filter

35.980

12.392

HR-NRI-VI

RF

No Filter

36.063

11.997

VI

XGBoost

Relief

36.130

12.493

HR

RF

Info

36.164

13.129

VI

RF

Pearson

36.327

12.762

HR-NRI-VI

RF

Borda

36.416

12.820

NRI

RF

Info

36.420

12.771

HR-VI

RF

Borda

36.436

12.888

NRI

RF

MRMR

36.468

13.339

HR-NRI

RF

MRMR

36.560

12.945

NRI

RF

No Filter

36.584

12.407

HR-NRI-VI

XGBoost

Relief

36.584

8.425

HR-NRI

RF

Info

36.594

12.358

VI

RF

Car

36.602

13.027

HR

XGBoost

Info

36.676

13.275

NRI

RF

Pearson

36.685

12.843

VI

XGBoost

Borda

36.741

13.035

HR-NRI-VI

RF

Pearson

36.751

12.783

HR-NRI-VI

RF

Info

36.783

12.859

HR

RF

Pearson

36.786

13.239

VI

XGBoost

MRMR

36.902

13.234

NRI

RF

Borda

36.969

12.949

HR-NRI

RF

Pearson

37.042

12.783

VI

XGBoost

Pearson

37.054

13.434

VI

XGBoost

CMIM

37.108

13.449

HR-NRI

RF

Car

37.128

7.416

VI

XGBoost

Info

37.142

13.466

VI

XGBoost

Car

37.167

13.679

HR

XGBoost

Pearson

37.201

13.726

HR

RF

CMIM

37.572

14.306

HR

RF

Car

37.693

14.396

HR

RF

Borda

37.727

14.480

VI

XGBoost

No Filter

37.839

13.149

HR-NRI-VI

RF

MRMR

38.201

14.641

NRI

RF

PCA

38.435

12.195

HR

RF

Relief

38.510

11.990

VI

RF

No Filter

38.515

11.361

VI

RF

Info

38.755

11.033

NRI

RF

Relief

38.874

7.507

NRI

SVM

No Filter

38.894

15.169

NRI

XGBoost

PCA

38.967

13.883

NRI

SVM

MRMR

39.030

15.294

NRI

SVM

CMIM

39.178

15.828

HR-NRI-VI

RF

Relief

39.250

7.618

HR-VI

XGBoost

PCA

39.314

9.186

NRI

SVM

Pearson

39.531

15.623

HR-VI

RF

MRMR

39.715

9.516

HR-NRI

Ridge-CV

No Filter

39.728

11.222

HR-VI

RF

Car

39.802

9.864

HR-VI

RF

No Filter

39.829

9.562

VI

RF

MRMR

39.831

8.528

VI

RF

CMIM

39.844

8.584

HR-VI

RF

Info

39.977

9.552

HR

Ridge-CV

No Filter

39.996

11.509

VI

RF

Borda

40.120

8.687

HR-VI

XGBoost

MRMR

40.227

9.158

VI

RF

Relief

40.248

9.037

HR-VI

RF

CMIM

40.362

8.464

HR-VI

RF

Pearson

40.677

8.112

VI

RF

PCA

41.052

8.887

HR

RF

No Filter

41.752

7.561

HR

XGBoost

PCA

41.834

6.703

HR-VI

XGBoost

Info

41.919

9.705

HR-VI

XGBoost

Relief

41.972

10.466

HR-NRI

XGBoost

PCA

42.276

11.034

HR-VI

RF

PCA

42.561

6.304

HR

RF

PCA

43.510

4.685

VI

XGBoost

PCA

44.486

6.181

HR-VI

XGBoost

CMIM

45.349

5.260

HR-NRI-VI

XGBoost

PCA

45.560

5.618

HR-VI

XGBoost

No Filter

45.748

6.062

HR-VI

XGBoost

Car

45.994

5.286

HR-VI

XGBoost

Pearson

46.956

3.088

HR

Lasso-CV

No Filter

47.220

20.520

HR-VI

XGBoost

Borda

47.386

3.021

HR-NRI

Lasso-CV

No Filter

47.533

20.878

NRI

Lasso-CV

No Filter

47.946

20.686

HR

XGBoost

No Filter

49.261

1.282

HR-NRI-VI

Lasso-CV

No Filter

54.259

24.509

VI

Lasso-CV

No Filter

54.465

24.107

HR-VI

Lasso-CV

No Filter

54.465

24.107

HR

Lasso-MBO

No Filter

55.113

22.839

VI

Lasso-MBO

No Filter

55.113

22.839

NRI

Lasso-MBO

No Filter

55.113

22.839

HR-NRI

Lasso-MBO

No Filter

55.113

22.839

HR-VI

Lasso-MBO

No Filter

55.113

22.839

HR-NRI-VI

Lasso-MBO

No Filter

55.113

22.839

HR

Ridge-MBO

No Filter

55.113

22.839

VI

Ridge-MBO

No Filter

55.113

22.839

NRI

Ridge-MBO

No Filter

55.113

22.839

HR-NRI

Ridge-MBO

No Filter

55.113

22.839

HR-VI

Ridge-MBO

No Filter

55.113

22.839

HR-NRI-VI

Ridge-MBO

No Filter

55.113

22.839

HR-NRI-VI

Ridge-CV

No Filter

25251329.518

50502526.510

HR-VI

Ridge-CV

No Filter

27517498.064

55034863.601

VI

Ridge-CV

No Filter

27834900.992

55669669.458

(Table) Best learner/filter/task combination

Learners: On which task and using which filter did every learner score their best result on?

*CV: L2 penalized regression using the internal 10-fold CV tuning of the glmnet package

*MBO: L2 penalized regression using using MBO for hyperparameter optimization.

Task

Model

Filter

RMSE

SE

HR

XGBoost

Borda

30.557

13.373

HR

RF

MRMR

31.438

15.076

HR-NRI-VI

SVM

Relief

31.584

15.602

NRI

Ridge-CV

No Filter

33.225

8.223

HR

Lasso-CV

No Filter

47.220

20.520

HR

Lasso-MBO

No Filter

55.113

22.839

HR

Ridge-MBO

No Filter

55.113

22.839

(Plot) Best learner/filter combs for all tasks

Version Author Date
482a158 pat-s 2019-11-01
becf5ea pat-s 2019-11-01
bd7c7f5 pat-s 2019-10-31
62ff96f pat-s 2019-10-07
a947654 pat-s 2019-10-02
49da171 pat-s 2019-09-22
41aae14 pat-s 2019-09-12
b181c52 pat-s 2019-09-02
8e7e4fe pat-s 2019-09-01
7582c67 pat-s 2019-08-31
abd531f pat-s 2019-08-31

(Plot) Best filter combination of each learner vs. no filter per task vs. Borda

Showing the final effect of applying feature selection to a learner for each task. The more left a certain filter appears for a given task compared to the purple dot (No Filter), the higher the effectivity of applying feature selection for that given learner on the given task.

(Plot) Best filter combination of each learner vs. Borda filter

Showing the final effect of the ensemble Borda filter vs the best scoring simple filter.

Version Author Date
482a158 pat-s 2019-11-01
becf5ea pat-s 2019-11-01
bd7c7f5 pat-s 2019-10-31
62ff96f pat-s 2019-10-07
a947654 pat-s 2019-10-02
49da171 pat-s 2019-09-22
7fd40ca pat-s 2019-09-18
41aae14 pat-s 2019-09-12
ff340b8 pat-s 2019-09-03
b181c52 pat-s 2019-09-02

(Plot) Performances of all filter methods

Version Author Date
482a158 pat-s 2019-11-01
becf5ea pat-s 2019-11-01
bd7c7f5 pat-s 2019-10-31
62ff96f pat-s 2019-10-07
a947654 pat-s 2019-10-02
49da171 pat-s 2019-09-22
7fd40ca pat-s 2019-09-18
41aae14 pat-s 2019-09-12
ff340b8 pat-s 2019-09-03
b181c52 pat-s 2019-09-02

(Plot) Scatterplots of filter methods vs. no filter for each learner and task

Showing the final effect of applying feature selection to a learner for each task. All filters are summarized into a boxplot whereas using “no filter” appears as a standalone observation.

(Plot) Scatterplots of filter methods vs. Borda for each learner and task

Showing the final effect of applying feature selection to a learner for each task. All filters are summarized into a boxplot whereas using “no filter” appears as a standalone observation.


R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/r-3.6.1-j25wr6zcofibs2zfjwg37357rjj26lqb/rlib/R/lib/libRblas.so
LAPACK: /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/r-3.6.1-j25wr6zcofibs2zfjwg37357rjj26lqb/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xtable_1.8-3      flextable_0.5.6   ggbeeswarm_0.7.0 
 [4] ggpubr_0.1.6      ggsci_2.9         ggrepel_0.8.0    
 [7] ggplot2_3.2.1     dplyr_0.8.0.1     magrittr_1.5     
[10] mlr_2.16.0.9000   ParamHelpers_1.12 tidyselect_0.2.5 

loaded via a namespace (and not attached):
 [1] httr_1.4.0         tidyr_0.8.2        jsonlite_1.6      
 [4] viridisLite_0.3.0  splines_3.6.1      R.utils_2.8.0     
 [7] assertthat_0.2.0   base64url_1.4      vipor_0.4.5       
[10] yaml_2.2.0         gdtools_0.2.1      mlrMBO_1.1.2      
[13] pillar_1.3.1       backports_1.1.5    lattice_0.20-38   
[16] glue_1.3.0         uuid_0.1-2         digest_0.6.22     
[19] RColorBrewer_1.1-2 promises_1.0.1     checkmate_1.9.1   
[22] colorspace_1.4-0   htmltools_0.3.6    httpuv_1.4.5.1    
[25] Matrix_1.2-15      R.oo_1.22.0        pkgconfig_2.0.3   
[28] lhs_1.0.1          misc3d_0.8-4       drake_7.7.0       
[31] purrr_0.3.0        scales_1.0.0       parallelMap_1.4   
[34] whisker_0.3-2      later_0.8.0        officer_0.3.5     
[37] mco_1.0-15.1       git2r_0.26.1       tibble_2.0.1      
[40] txtq_0.1.4         withr_2.1.2        lazyeval_0.2.1    
[43] survival_2.43-3    RJSONIO_1.3-1.1    crayon_1.3.4      
[46] evaluate_0.13      storr_1.2.1        R.methodsS3_1.7.1 
[49] fs_1.3.1           xml2_1.2.0         beeswarm_0.2.3    
[52] tools_3.6.1        data.table_1.12.6  BBmisc_1.11       
[55] stringr_1.4.0      plotly_4.8.0       munsell_0.5.0     
[58] zip_2.0.0          compiler_3.6.1     systemfonts_0.1.1 
[61] rlang_0.4.1        plot3D_1.1.1       grid_3.6.1        
[64] htmlwidgets_1.3    igraph_1.2.4.1     labeling_0.3      
[67] base64enc_0.1-3    rmarkdown_1.13     gtable_0.2.0      
[70] smoof_1.5.1        R6_2.4.1           knitr_1.23        
[73] fastmatch_1.1-0    filelock_1.0.2     workflowr_1.5.0   
[76] rprojroot_1.3-2    stringi_1.3.1      parallel_3.6.1    
[79] Rcpp_1.0.0         xfun_0.5