Preparation

Darius Goergen

Last updated: 2019-08-19

Checks: 6 1

Knit directory: polymeRID/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0.9001). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190729) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rprofile
    Ignored:    .Rproj.user/
    Ignored:    analysis/library.bib
    Ignored:    fun/
    Ignored:    output/20190810_1538/
    Ignored:    output/20190810_1546/
    Ignored:    output/20190810_1609/
    Ignored:    output/20190813_1044/
    Ignored:    output/logs/
    Ignored:    output/natural/
    Ignored:    output/nnet/
    Ignored:    output/svm/
    Ignored:    output/testRunII/
    Ignored:    output/testRunIII/
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/00LOCK-curl/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/BH/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/FactoMineR/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/IDPmisc/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/KernSmooth/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/MASS/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/Matrix/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/MatrixModels/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ModelMetrics/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/R6/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/RColorBrewer/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/RCurl/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/Rcpp/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/RcppArmadillo/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/RcppEigen/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/RcppGSL/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/RcppZiggurat/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/Rfast/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/Rgtsvm/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/Rmisc/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/SQUAREM/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/SparseM/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/abind/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/askpass/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/assertthat/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/backports/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/base64enc/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/baseline/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/bit/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/bit64/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/bitops/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/boot/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/callr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/car/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/carData/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/caret/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/cellranger/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/class/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/cli/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/clipr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/cluster/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/codetools/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/colorspace/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/config/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/cowplot/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/crayon/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/crosstalk/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/curl/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/data.table/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/dendextend/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/digest/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/doParallel/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/dplyr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/e1071/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ellipse/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ellipsis/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/evaluate/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/factoextra/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/fansi/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/flashClust/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/forcats/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/foreach/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/foreign/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/fs/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/generics/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/getPass/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ggplot2/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ggpubr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ggrepel/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ggsci/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ggsignif/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/git2r/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/glue/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/gower/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/gridExtra/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/gtable/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/haven/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/hexbin/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/highr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/hms/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/htmltools/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/htmlwidgets/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/httpuv/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/httr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ipred/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/iterators/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/jsonlite/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/keras/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/kerasR/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/knitr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/labeling/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/later/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/lattice/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/lava/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/lazyeval/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/leaps/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/lme4/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/lubridate/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/magrittr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/maptools/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/markdown/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/mgcv/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/mime/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/minqa/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/munsell/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/nlme/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/nloptr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/nnet/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/numDeriv/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/openssl/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/openxlsx/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/packrat/tests/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/pbkrtest/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/pillar/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/pkgconfig/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/plogr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/plotly/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/plyr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/polynom/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/prettyunits/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/processx/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/prodlim/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/progress/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/promises/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/prospectr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/ps/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/purrr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/quantreg/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/randomForest/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/readr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/readxl/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/recipes/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rematch/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/reshape2/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/reticulate/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rio/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rlang/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rmarkdown/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rpart/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rprojroot/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rsconnect/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/rstudioapi/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/scales/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/scatterplot3d/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/shiny/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/sourcetools/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/sp/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/stringi/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/stringr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/survival/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/sys/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/tensorflow/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/tfruns/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/tibble/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/tidyr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/tidyselect/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/timeDate/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/tinytex/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/utf8/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/vctrs/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/viridis/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/viridisLite/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/whisker/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/withr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/workflowr/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/xfun/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/xtable/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/yaml/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/zeallot/
    Ignored:    packrat/lib/x86_64-pc-linux-gnu/3.6.1/zip/
    Ignored:    packrat/src/
    Ignored:    polymeRID.Rproj
    Ignored:    smp/20190812_1723_NNET/files/
    Ignored:    smp/20190812_1723_NNET/plots/
    Ignored:    smp/20190812_1729_NNET/files/
    Ignored:    smp/20190812_1729_NNET/plots/
    Ignored:    smp/20190812_1731_NNET/files/
    Ignored:    smp/20190812_1731_NNET/plots/
    Ignored:    smp/20190812_1733_NNET/files/
    Ignored:    smp/20190812_1733_NNET/plots/
    Ignored:    smp/20190815_1847_FUSION/
    Ignored:    website/

Untracked files:
    Untracked:  smp/120619_W2_1000_1.txt
    Untracked:  smp/120619_W2_1000_2.txt
    Untracked:  smp/120619_W2_300_1.txt
    Untracked:  smp/120619_W2_300_2.txt
    Untracked:  smp/120619_W2_300_3.txt
    Untracked:  smp/120619_W2_300_4.txt
    Untracked:  smp/120619_W2_300_5.txt
    Untracked:  smp/120619_W2_500_1.txt
    Untracked:  smp/120619_W2_500_2.txt
    Untracked:  smp/120619_W2_500_3.txt
    Untracked:  smp/120619_W2_500_4.txt
    Untracked:  smp/120619_W2_500_5.txt
    Untracked:  smp/120619_W2_500_6.txt
    Untracked:  smp/120619_W2_500_7.txt

Unstaged changes:
    Modified:   analysis/cnn_calibration.Rmd
    Modified:   analysis/cnn_exploration.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/preparation.Rmd
    Modified:   classification.R
    Modified:   code/cnn_cv_K70.R
    Modified:   code/functions.R
    Modified:   code/nnet.R
    Modified:   code/plot_functions.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html b125bc5 goergen95 2019-08-16 fixed error with pca in classification - now based of training data pca
html 2385fbc goergen95 2019-08-14 republish for layout change
Rmd 5d28ce0 goergen95 2019-08-14 changed citation note
html 5d28ce0 goergen95 2019-08-14 changed citation note
Rmd afd89c2 goergen95 2019-08-14 fixed error in preparation concering FUR class
html afd89c2 goergen95 2019-08-14 fixed error in preparation concering FUR class
Rmd c3f088e goergen95 2019-08-13 started exploration tab
html c3f088e goergen95 2019-08-13 started exploration tab
html c52182b goergen95 2019-08-13 rebuid website
html 6e92d01 goergen95 2019-08-13 Build site.
Rmd 9ca3d89 goergen95 2019-08-13 added website directory mirror
html 9ca3d89 goergen95 2019-08-13 added website directory mirror
html 6cfd689 goergen95 2019-08-13 Build site.
Rmd 5774923 goergen95 2019-08-13 included preparation

Reference Data

For this project we used a data base published by Primpke et al. (2018) online. The data base can be downloaded here. The authors state the samples were collected based on the FTIR-spectrometer Bruker Tensor 27 System for the spectral range 4000 to 40 1/cm. Additionally, some data of polymer-based fibers and spectra of biological origins were received from the Bremer Faserinstitut. During preprocessing, they applied a concave rubberband correction based on 10 iterations and 64 baseline points. They also excluded the C02 band between 2420 to 2200 1/cm by setting the data points to 0. This should be kept in mind, since the inclusion of additional reference samples requires the same procedure for the data base to stay in a consistent state. The data provided by Primpke et al. (2018) shows a spectral resolution of 2.1 1/cm. Additional reference samples need to be resampled to the same spectral resolution and the same baseline correction should be applied.

To ensure consistency, the data base was read into R and the wavenumbers were saved in a separate file for the future use of resampling additional reference spectra.

library(openxlsx)
url = "https://static-content.springer.com/esm/art%3A10.1007%2Fs00216-018-1156-x/MediaObjects/216_2018_1156_MOESM2_ESM.xlsx"

data = openxlsx::read.xlsx(url)
# extract wavenumbers from first row
wavenumbers = as.numeric(names(data)[2:1864])
# saving wavenumbers to reference sample directory
saveRDS(wavenumbers, paste0(ref, "wavenumbers.rds"))

An important feature of any data base is the distribution of the different classes. Here, we only print the 20 most common classes, since there are a lot of reference samples only found once or twice within the data base.

data$Abbreviation = as.factor(data$Abbreviation)
summary(data$Abbreviation)[1:10]
  PES    PP  LDPE  HDPE   PET    PE Nylon    PA    PS   PUR 
   15    12    11    10     9     8     7     7     7     7 

Construction of the Data Base

We are interested in assigning the correct class to potential plastic particles. The most important classes found in the data base to us are thus the ones of artificial polymer origin. However, sometimes also particles of biological origin will be subject to a spectral analysis, because they resemble the appearance of microplastics in environmental samples. Any machine learning algorithm trained only with reference samples from plastics would eventually assign a plastic-class also to the particles of biological origin. It will just assign the class with the greatest similarity to the classes it learned. This can lead to so-called false positive errors. To reduce the error of false positives, we include some of the samples of biological origin as well. We summaries these samples to broader classes.

# furs and wools
indexFur = grep("fur", data$Abbreviation)
indexWool = grep("wool", data$Abbreviation)
furs = data[c(indexFur, indexWool), ]
furs = furs[ , c(2:1864)] # leave out index column
names(furs) = paste("wvn", wavenumbers, sep="")
furs$class = "FUR"

# fibres
indexFibre = grep("fibre", data$Abbreviation)
fibre = data[indexFibre, ]
fibre = fibre[ , c(2:1864)] # leave out index column
names(fibre) = paste("wvn", wavenumbers, sep="")
fibre$class = "FIBRE"

# wood
indexWood = grep("wood", data$Abbreviation)
wood = data[indexWood, ]
wood = wood[ , c(2:1864)] # leave out index colums
names(wood) = paste("wvn", wavenumbers, sep="")
wood$class = "WOOD"

# synthetic polymers
polyIndex = which(data$`Natural./Synthetic` =="synthetic polymer")
syntPolymer = data[polyIndex,]
counts = summary(syntPolymer$Abbreviation)
polyNames = names(counts)[1:10] # only major polymers
syntPolymer = syntPolymer[which(syntPolymer$Abbreviation %in%  polyNames) , ]
classes = droplevels(syntPolymer$Abbreviation)
syntPolymer = syntPolymer[ , c(2:1864)] # leave out index column
names(syntPolymer) = paste("wvn",wavenumbers,sep="")
syntPolymer$class = as.character(classes)

# lets group together some synthetic polymer classes
syntPolymer$class[grep("Nylon",syntPolymer$class)] = "PA"

Class Distribution

We now bind the reference samples together and take a look at the distribution of classes in the resulting data frame, which is the concrete data base for the following calculations.

data = rbind(furs,wood,fibre,syntPolymer) 
data$class = as.factor(data$class)
summary(data$class)
FIBRE   FUR  HDPE  LDPE    PA    PE   PES   PET    PP    PS   PUR  WOOD 
   27    23    10    11    14     8    15     9    12     7     7     4 

In total, 93 (53%) reference samples of plastic polymers are present in the data base and 44 (47%) of biological origin. Within the plastic samples, we find that the data is very balanced with no single class showing less than 7 samples. For the samples of biological origin, however, the class FIBRE dominates the distribution. This could prove as an disadvantage if a machine learning algorithm picks up this unbalance by minimizing its error rate simply by more frequently predicting the FIBRE class. At this point, we will leave the resulting data base as it is and save it to disk. We save the data in individual files as well as in a comprehensive data base in csv format. This way we ensure that later extensions to the data base are easy to manage.

write.csv(data, file = paste0(ref, "reference_database.csv"), row.names=FALSE)

# writing class control file
classIndex = as.character(unique(data$class))

for (class in classIndex){
  tmp = data[data$class==class , ]
  write.csv(tmp, file = paste0(ref, "reference_", class, ".csv"), row.names=FALSE)
}

write(classIndex, paste0(ref, "classes.txt"))

Citations on this page

Primpke, Sebastian, Marisa Wirth, Claudia Lorenz, and Gunnar Gerdts. 2018. “Reference database design for the automated analysis of microplastic samples based on Fourier transform infrared (FTIR) spectroscopy.” Analytical and Bioanalytical Chemistry 410 (21). Analytical; Bioanalytical Chemistry: 5131–41. https://doi.org/10.1007/s00216-018-1156-x.


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.1

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tensorflow_1.14.0         abind_1.4-5              
 [3] e1071_1.7-2               keras_2.2.4.1            
 [5] workflowr_1.4.0.9001      baseline_1.2-1           
 [7] gridExtra_2.3             stringr_1.4.0            
 [9] prospectr_0.1.3           RcppArmadillo_0.9.600.4.0
[11] openxlsx_4.1.0.1          magrittr_1.5             
[13] ggplot2_3.2.0             reshape2_1.4.3           
[15] dplyr_0.8.3              

loaded via a namespace (and not attached):
 [1] reticulate_1.13  tidyselect_0.2.5 xfun_0.8         purrr_0.3.2     
 [5] lattice_0.20-38  colorspace_1.4-1 generics_0.0.2   htmltools_0.3.6 
 [9] base64enc_0.1-3  yaml_2.2.0       rlang_0.4.0      later_0.8.0     
[13] pillar_1.4.2     glue_1.3.1       withr_2.1.2      foreach_1.4.7   
[17] plyr_1.8.4       munsell_0.5.0    gtable_0.3.0     zip_2.0.3       
[21] codetools_0.2-16 evaluate_0.14    knitr_1.24       SparseM_1.77    
[25] tfruns_1.4       httpuv_1.5.1     class_7.3-15     highr_0.8       
[29] Rcpp_1.0.2       xtable_1.8-4     promises_1.0.1   scales_1.0.0    
[33] backports_1.1.4  jsonlite_1.6     mime_0.7         fs_1.3.1        
[37] digest_0.6.20    stringi_1.4.3    shiny_1.3.2      grid_3.6.1      
[41] rprojroot_1.3-2  tools_3.6.1      lazyeval_0.2.2   tibble_2.1.3    
[45] crayon_1.3.4     whisker_0.3-2    pkgconfig_2.0.2  zeallot_0.1.0   
[49] Matrix_1.2-17    assertthat_0.2.1 rmarkdown_1.14   iterators_1.0.12
[53] R6_2.4.0         git2r_0.26.1     compiler_3.6.1