Last updated: 2021-02-19

Checks: 7 0

Knit directory: 2019-feature-selection/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190522) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4621357. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .Ruserdata/
    Ignored:    .drake/
    Ignored:    .vscode/
    Ignored:    analysis/rosm.cache/
    Ignored:    data/
    Ignored:    inst/Benchmark for Filter Methods for Feature Selection in High-Dimensional  Classification Data.pdf
    Ignored:    inst/study-area-map/._study-area.qgs
    Ignored:    inst/study-area-map/study-area.qgs~
    Ignored:    log/
    Ignored:    renv/library/
    Ignored:    renv/staging/
    Ignored:    reviews/
    Ignored:    rosm.cache/

Unstaged changes:
    Modified:   .Rprofile
    Modified:   2019-feature-selection.Rproj
    Modified:   README.md
    Modified:   _drake.R
    Modified:   analysis/style.css
    Modified:   code/090-reports.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/about.Rmd) and HTML (docs/about.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 4621357 pat-s 2021-02-19 workflowr::wflow_publish(“analysis/about.Rmd”)
html 27be6dc pat-s 2021-02-19 Build site.
html a953299 pat-s 2020-08-12 Build site.
html 8b5e422 pat-s 2020-08-05 Build site.
html 9fe93ea pat-s 2020-08-05 Build site.
Rmd 9fef275 pat-s 2020-08-05 workflowr::wflow_publish(“analysis/about.Rmd”)
html 90d099f pat-s 2020-08-05 Build site.
Rmd 944c037 pat-s 2020-08-05 update about.Rmd
html d513e78 pat-s 2020-03-22 Build site.
Rmd fc147e4 pat-s 2020-03-22 workflowr::wflow_publish(“analysis/about.Rmd”)
html 649886e pat-s 2020-03-08 Build site.
Rmd db07056 pat-s 2020-02-24 add jose to authors
html 7582c67 pat-s 2019-08-31 Build site.
html df85aba pat-s 2019-07-12 Build site.
html 3a44a95 pat-s 2019-07-10 Build site.
html db3955e pat-s 2019-06-27 Build site.
Rmd 4cf247a pat-s 2019-06-18 fix git
html 4cf247a pat-s 2019-06-18 fix git
html 4c3422d pat-s 2019-06-15 Build site.
html 45cbebd pat-s 2019-05-27 Build site.
Rmd 9a83d5a pat-s 2019-05-27 wflow_publish(“analysis/about.Rmd”)
html 47a5b39 pat-s 2019-05-25 Build site.
html 37da161 pat-s 2019-05-24 Build site.
html ad4c6c1 pat-s 2019-05-23 Build site.
html 6517dc7 pat-s 2019-05-23 Build site.
html 22eb53d pat-s 2019-05-23 Build site.
html a7ae8f1 pat-s 2019-05-23 wflow_publish(all = T)
html ad7d595 pat-s 2019-05-22 Build site.
Rmd 7ad5791 pat-s 2019-05-22 first version
html 7ad5791 pat-s 2019-05-22 first version
Rmd 1cbc8de pat-s 2019-05-22 Start workflowr project.

Modeling defoliation as a proxy for tree health: Comparison of feature-selection methods across multiple feature sets derived from hyperspectral data

Authors

Patrick Schratz ()

Jannes Muenchow

Eugenia Iturritxa

José Cortés

Bernd Bischl

Alexander Brenning

Contents

Paper

This repository contains the research compendium of our work on comparing algorithms across multiple feature sets and filtering methods (including ensemble filter methods).

  • keywords

    • hyperspectral imagery
    • forest health monitoring
    • machine learning
    • feature selection
    • feature effects
    • model comparison
    • filter
    • imaging spectroscopy
  • Using machine-learning algorithms to model defoliation of Pinus Radiata trees.

  • Compare filtering methods (ensemble filter methods) across various algorithms and datasets

  • Predict defoliation to all available plots (24) and the whole Basque Country (at 200 m resolution)

The following directories belong to this project

  • code/01-download.R
  • code/02-hyperspectral-processing.R
  • code/04-data-processing.R
  • code/05-modeling/
  • code/06-benchmark-matrix/
  • code/07-reports/

Other Content

In addition, this repo contains the workflow for an analysis related to the LIFE 14 ENV/ES/000179 LIFE HEALTHY FOREST project: Predicting defoliation at trees for the Basque Country (for the years 2017 and 2018) using Sentinel-2 data.

Target defoliation_maps_wfr builds subsequent argets which are necessary for the final results report.

How to use

Reading the code, accessing the data

See the code directory on GitHub for the source code that generated the figures and statistical results contained in the manuscript. See the data directory for instructions how to access the raw data used in the manuscript.

Installing the R package

This repository is organized as an R package, providing functions and raw data to reproduce and extend the analysis reported in the publication. Note that this package has been written explicitly for this project and is not suited a for more general use.

This project is setup with a drake workflow, ensuring reproducibility. Intermediate targets/objects will be stored in a hidden .drake directory.

The R library of this project is managed by renv. This makes sure that the exact same package versions are used when recreating the project. When calling renv::restore(), all required packages will be installed with their specific version.

Please note that this project was built with R version 4.0.3 on a CentOS 7 operating system. Some packages from this project are not compatible with R versions prior version 3.6.0.

To clone the project, a working installation of git is required. Open a terminal in the directory of your choice and execute:

git clone https://github.com/pat-s/2019-feature-selection.git

Then start R in this directory and run

renv::restore()
r_make()

Creating targets with {drake}

Calling r_make() will create targets specified in drake_config(targets = <target>) in _drake.R with the additional drake settings specified.

Out of the 400+ targets in this project, the following targets are important:

Note that most reports require some/all fitted models. Creating these (e.g. target benchmark_no_models_new_buffer2) is a costly process and takes several days on a HPC and way longer on a single machine.

Notes and resources


sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/r-4.0.3-hkh5nywkwodhye5qvukisarvhbj264ob/rlib/R/lib/libRblas.so
LAPACK: /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/r-4.0.3-hkh5nywkwodhye5qvukisarvhbj264ob/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] usethis_2.0.0  magrittr_2.0.1 drake_7.13.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        git2r_0.28.0      later_1.1.0.1     compiler_4.0.3   
 [5] pillar_1.4.7      workflowr_1.6.2   prettyunits_1.1.1 tools_4.0.3      
 [9] progress_1.2.2    digest_0.6.27     evaluate_0.14     lifecycle_0.2.0  
[13] tibble_3.0.6      pkgconfig_2.0.3   rlang_0.4.10      igraph_1.2.6     
[17] cli_2.3.0         rstudioapi_0.13   filelock_1.0.2    yaml_2.2.1       
[21] parallel_4.0.3    xfun_0.20         stringr_1.4.0     storr_1.2.5      
[25] knitr_1.31        fs_1.5.0          vctrs_0.3.6       hms_1.0.0        
[29] rprojroot_2.0.2   tidyselect_1.1.0  glue_1.4.2        R6_2.5.0         
[33] base64url_1.4     rmarkdown_2.6     txtq_0.2.3        whisker_0.4      
[37] purrr_0.3.4       promises_1.1.1    backports_1.2.1   ellipsis_0.3.1   
[41] htmltools_0.5.1.1 assertthat_0.2.1  renv_0.12.5-39    httpuv_1.5.5     
[45] stringi_1.5.3     crayon_1.4.0