Last updated: 2019-09-01

Checks: 7 0

Knit directory: 2019-feature-selection/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190522) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .Ruserdata/
    Ignored:    .drake/
    Ignored:    analysis/rosm.cache/
    Ignored:    data/
    Ignored:    inst/Benchmark for Filter Methods for Feature Selection in High-Dimensional  Classification Data.pdf
    Ignored:    inst/study-area-map/study-area.qgs~
    Ignored:    log/
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/
    Ignored:    reviews/
    Ignored:    rosm.cache/
    Ignored:    tests/

Untracked files:
    Untracked:  .drake_history/

Unstaged changes:
    Modified:   _drake.R
    Modified:   code/98-paper/ieee/pdf/correlation-filter-nri-1.pdf
    Modified:   code/98-paper/ieee/pdf/correlation-nbins-1.pdf
    Modified:   code/98-paper/ieee/pdf/defoliation-distribution-plot-1.pdf
    Modified:   code/98-paper/ieee/pdf/spectral-signatures-1.pdf
    Modified:   code/98-paper/journal/defoliation-distribution-plot-1.pdf
    Deleted:    docs/figure/eda.Rmd/defoliation-distribution-plot-1.pdf
    Deleted:    docs/figure/eval-performance.Rmd/filter-effect-1.pdf
    Deleted:    docs/figure/eval-performance.Rmd/performance-results-1.pdf
    Deleted:    docs/figure/filter-correlation.Rmd/correlation-filter-nri-1.pdf
    Deleted:    docs/figure/filter-correlation.Rmd/correlation-nbins-1.pdf
    Deleted:    docs/figure/spectral-signatures.Rmd/spectral-signatures-1.pdf

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 518d0cb pat-s 2019-09-01 style files using tidyverse style
html 8e7e4fe pat-s 2019-09-01 Build site.
Rmd 8941bca pat-s 2019-09-01 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
Rmd 297ed93 pat-s 2019-08-31 add filter vs no filter comparison plot
html 7582c67 pat-s 2019-08-31 Build site.
html abd531f pat-s 2019-08-31 Build site.
Rmd 9117eee pat-s 2019-08-31 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
html 1ec8768 pat-s 2019-08-17 Build site.
html df85aba pat-s 2019-07-12 Build site.
html 3a44a95 pat-s 2019-07-10 Build site.
html c238ce4 pat-s 2019-07-10 Build site.
Rmd e98cb01 pat-s 2019-07-10 wflow_publish(knitr_in(“analysis/eval-performance.Rmd”), view = FALSE,
Rmd 24e318f pat-s 2019-07-01 update reports
Rmd ca5c5bc pat-s 2019-06-28 add eval-performance report

(Table) All leaner/filter/task combinations ordered by performance.

Overall leaderboard across all settings, sorted descending by performance.

Learner Group

Task

Filter

RMSE

XGBoost

HR-NRI-VI

No Filter

33.950

RF

NRI

Car

33.958

Ridge-CV

NRI

No Filter

33.980

SVM

HR-VI

Borda

34.254

SVM

VI

Car

34.328

SVM

VI

Relief

34.473

SVM

VI

Borda

34.478

SVM

NRI

Info

34.480

XGBoost

HR-NRI-VI

CMIM

34.541

SVM

HR-NRI

Info

34.562

SVM

HR-VI

Info

34.596

SVM

HR-VI

Pearson

34.601

SVM

HR-NRI-VI

Car

34.606

SVM

VI

Info

34.608

SVM

HR-NRI

Car

34.609

SVM

HR-NRI-VI

Pearson

34.614

SVM

HR-VI

MRMR

34.615

SVM

VI

PCA

34.620

SVM

HR-NRI-VI

PCA

34.620

SVM

HR-NRI-VI

MRMR

34.621

SVM

HR-NRI

PCA

34.621

SVM

VI

No Filter

34.622

SVM

HR-VI

No Filter

34.622

SVM

HR-NRI-VI

No Filter

34.622

SVM

HR-NRI

No Filter

34.622

SVM

HR

No Filter

34.622

SVM

HR-VI

CMIM

34.622

SVM

HR

Pearson

34.622

SVM

HR-NRI-VI

CMIM

34.622

SVM

HR

MRMR

34.622

SVM

HR-NRI

Relief

34.622

SVM

HR-NRI

CMIM

34.622

SVM

HR-VI

PCA

34.623

SVM

HR

Info

34.623

SVM

HR

Borda

34.623

SVM

NRI

PCA

34.624

SVM

HR

CMIM

34.632

SVM

HR

PCA

34.635

SVM

VI

MRMR

34.639

SVM

HR-VI

Car

34.640

SVM

VI

Pearson

34.667

SVM

HR-NRI

MRMR

34.672

XGBoost

HR-NRI

No Filter

34.681

XGBoost

NRI

No Filter

34.707

SVM

HR

Relief

34.738

XGBoost

NRI

CMIM

34.769

RF

HR-NRI-VI

Car

34.879

XGBoost

HR-NRI-VI

Relief

34.882

XGBoost

HR-NRI-VI

Car

35.056

XGBoost

HR-NRI

Relief

35.263

XGBoost

HR-NRI-VI

MRMR

35.310

XGBoost

NRI

MRMR

35.364

XGBoost

HR-NRI

Car

35.414

XGBoost

NRI

Car

35.418

XGBoost

HR-NRI

CMIM

35.507

XGBoost

NRI

Info

35.743

XGBoost

HR-NRI

MRMR

35.876

XGBoost

HR-NRI-VI

Info

36.163

RF

HR-NRI

Car

36.243

XGBoost

HR

CMIM

36.368

SVM

HR-NRI-VI

Info

36.460

SVM

HR-NRI

Pearson

36.615

RF

HR-NRI

PCA

36.756

RF

HR-NRI-VI

PCA

36.772

SVM

VI

CMIM

36.891

RF

NRI

Relief

36.944

XGBoost

HR-NRI

Info

36.965

RF

NRI

PCA

36.977

RF

HR-NRI-VI

Relief

36.999

XGBoost

NRI

Pearson

37.008

RF

HR-NRI-VI

CMIM

37.222

RF

HR

Info

37.338

RF

NRI

CMIM

37.383

RF

HR-NRI

CMIM

37.455

SVM

HR

Car

37.485

RF

HR-NRI-VI

Pearson

37.512

XGBoost

HR

Pearson

37.568

RF

NRI

Info

37.589

RF

NRI

No Filter

37.609

XGBoost

HR-NRI

PCA

37.616

RF

HR-NRI-VI

No Filter

37.661

RF

HR-NRI

Relief

37.669

RF

HR-NRI

No Filter

37.796

RF

HR-NRI

MRMR

37.854

RF

HR-NRI

Info

37.890

RF

HR-NRI-VI

Info

37.904

XGBoost

HR

Car

37.998

RF

NRI

MRMR

37.999

RF

HR-NRI-VI

MRMR

38.047

RF

HR-NRI-VI

Borda

38.080

RF

HR-NRI

Borda

38.259

XGBoost

NRI

PCA

38.293

RF

HR

Pearson

38.321

RF

NRI

Borda

38.366

XGBoost

HR-NRI

Pearson

38.382

RF

HR-NRI

Pearson

38.473

SVM

HR-NRI-VI

Borda

38.487

XGBoost

NRI

Relief

38.504

SVM

HR-NRI

Borda

38.521

RF

HR

CMIM

38.528

RF

NRI

Pearson

38.544

RF

HR

MRMR

38.661

RF

VI

Borda

38.716

XGBoost

HR-NRI-VI

PCA

38.831

RF

HR

Borda

39.001

RF

VI

Car

39.049

RF

HR-VI

Info

39.479

RF

VI

No Filter

39.541

RF

VI

Info

39.685

XGBoost

HR

Info

39.730

XGBoost

HR-NRI-VI

Pearson

39.890

XGBoost

HR-VI

Relief

39.992

RF

HR-VI

No Filter

40.330

SVM

NRI

MRMR

40.331

SVM

NRI

Car

40.516

RF

HR-VI

Relief

40.527

RF

VI

CMIM

40.564

XGBoost

VI

Car

40.596

RF

HR

Car

40.600

RF

VI

MRMR

40.626

SVM

NRI

No Filter

40.632

RF

VI

Pearson

40.658

SVM

NRI

Pearson

40.692

RF

HR-VI

Borda

40.713

XGBoost

VI

No Filter

40.718

RF

VI

Relief

40.744

Ridge-CV

HR-NRI

No Filter

40.883

RF

HR-VI

MRMR

40.996

RF

HR-VI

CMIM

41.019

RF

HR-VI

Pearson

41.090

XGBoost

HR-VI

PCA

41.125

Ridge-CV

HR

No Filter

41.136

XGBoost

HR-VI

Pearson

41.196

XGBoost

VI

Info

41.237

XGBoost

VI

Pearson

41.321

XGBoost

HR

MRMR

41.350

RF

HR-VI

Car

41.449

RF

VI

PCA

41.458

SVM

NRI

CMIM

41.476

RF

HR

Relief

41.492

XGBoost

VI

CMIM

41.528

XGBoost

HR-VI

No Filter

41.563

SVM

NRI

Borda

41.651

XGBoost

VI

PCA

41.790

XGBoost

VI

Relief

41.860

XGBoost

VI

MRMR

41.914

XGBoost

HR-VI

Info

41.953

XGBoost

HR-VI

MRMR

41.962

RF

HR

No Filter

42.114

XGBoost

HR

Relief

42.127

SVM

NRI

Relief

42.300

RF

HR-VI

PCA

42.791

SVM

HR-VI

Relief

42.805

XGBoost

HR-VI

CMIM

42.912

RF

HR

PCA

43.776

XGBoost

HR

No Filter

43.898

XGBoost

HR-VI

Car

44.434

XGBoost

HR

PCA

45.335

SVM

HR-NRI-VI

Relief

46.288

Lasso-CV

HR

No Filter

50.453

Lasso-CV

HR-NRI

No Filter

50.855

Lasso-CV

NRI

No Filter

51.184

Lasso-CV

HR-NRI-VI

No Filter

58.263

Lasso-CV

VI

No Filter

58.329

Lasso-CV

HR-VI

No Filter

58.329

Ridge-MBO

HR-VI

No Filter

58.555

Ridge-MBO

VI

No Filter

58.555

Ridge-MBO

HR-NRI-VI

No Filter

58.555

Lasso-MBO

HR

No Filter

58.555

Lasso-MBO

NRI

No Filter

58.555

Lasso-MBO

HR-NRI

No Filter

58.555

Ridge-MBO

HR

No Filter

58.555

Ridge-MBO

NRI

No Filter

58.555

Ridge-MBO

HR-NRI

No Filter

58.555

Lasso-MBO

VI

No Filter

58.555

Lasso-MBO

HR-VI

No Filter

58.555

Lasso-MBO

HR-NRI-VI

No Filter

58.555

Ridge-CV

HR-NRI-VI

No Filter

50502559.642

Ridge-CV

HR-VI

No Filter

55034896.733

Ridge-CV

VI

No Filter

55669702.590

(Table) Best learner/filter/task combination

Learners: On which task and using which filter did every learner score their best result on?

-CV: L2 penalized regression using the internal 10-fold CV tuning of the glmnet package -MBO: L2 penalized regression using using MBO for hyperparameter optimization.

Task

Learner Group

Filter

RMSE

defoliation-all-plots-HR-NRI-VI

XGBoost

No Filter

33.950

defoliation-all-plots-NRI

RF

Car

33.958

defoliation-all-plots-NRI

Ridge-CV

No Filter

33.980

defoliation-all-plots-HR-VI

SVM

Borda

34.254

defoliation-all-plots-HR

Lasso-CV

No Filter

50.453

defoliation-all-plots-HR-VI

Ridge-MBO

No Filter

58.555

defoliation-all-plots-HR

Lasso-MBO

No Filter

58.555

(Plot) Best learner/filter combs for all tasks

Version Author Date
8e7e4fe pat-s 2019-09-01
7582c67 pat-s 2019-08-31
abd531f pat-s 2019-08-31

(Plot) Best filter combination of each learner vs. no filter per task

Showing the final effect of applying feature selection to a learner for each task. The more left a certain filter appears for a given task compared to the purple dot (No Filter), the higher the effectivity of applying feature selection for that given learner on the given task.

Version Author Date
8e7e4fe pat-s 2019-09-01

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/3.5.2/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/libopenblaso-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xtable_1.8-3      flextable_0.5.5   ggbeeswarm_0.7.0 
 [4] ggpubr_0.1.6      ggsci_2.9         ggrepel_0.8.0    
 [7] ggplot2_3.1.0     dplyr_0.8.0.1     magrittr_1.5     
[10] mlr_2.15.0        ParamHelpers_1.12 tidyselect_0.2.5 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        txtq_0.1.4        lattice_0.20-38  
 [4] tidyr_0.8.2       assertthat_0.2.0  rprojroot_1.3-2  
 [7] digest_0.6.18     R6_2.4.0          plyr_1.8.4       
[10] backports_1.1.3   drake_7.5.2       evaluate_0.13    
[13] pillar_1.3.1      gdtools_0.1.7     rlang_0.3.4      
[16] lazyeval_0.2.1    uuid_0.1-2        data.table_1.12.0
[19] whisker_0.3-2     R.utils_2.8.0     R.oo_1.22.0      
[22] Matrix_1.2-15     checkmate_1.9.1   rmarkdown_1.13   
[25] labeling_0.3      splines_3.5.2     stringr_1.4.0    
[28] igraph_1.2.4      munsell_0.5.0     compiler_3.5.2   
[31] vipor_0.4.5       xfun_0.5          pkgconfig_2.0.2  
[34] base64enc_0.1-3   BBmisc_1.11       htmltools_0.3.6  
[37] tibble_2.0.1      workflowr_1.4.0   crayon_1.3.4     
[40] withr_2.1.2       R.methodsS3_1.7.1 grid_3.5.2       
[43] gtable_0.2.0      git2r_0.24.0      storr_1.2.1      
[46] scales_1.0.0      zip_2.0.0         cli_1.1.0        
[49] stringi_1.3.1     fs_1.2.6          parallelMap_1.4  
[52] xml2_1.2.0        filelock_1.0.2    fastmatch_1.1-0  
[55] tools_3.5.2       glue_1.3.0        beeswarm_0.2.3   
[58] officer_0.3.5     purrr_0.3.0       parallel_3.5.2   
[61] survival_2.43-3   yaml_2.2.0        colorspace_1.4-0 
[64] base64url_1.4     knitr_1.23