Last updated: 2021-04-30
Checks: 1 1
Knit directory: CassavaNIRS/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has staged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 759b463. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: Hershberger_CassavaNIRS_2021.zip
Ignored: analysis/.DS_Store
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Ignored: data/Cassavabase_phenotypes_20210419.csv
Ignored: data/Corrected_metadata/
Ignored: data/README.html
Ignored: data/README.txt
Ignored: data/Spectra/
Ignored: data/TrialNameKey.csv
Ignored: data/raw_pheno.csv
Ignored: data/raw_scans.csv
Ignored: output/.DS_Store
Ignored: output/Figure2_DMC_distributions.png
Ignored: output/Figure4_within_predictions.png
Ignored: output/Figure5_Subsamples.png
Ignored: output/Figure6_RF_Importance.png
Ignored: output/Figure7_CV_predictions.png
Ignored: output/FigureS2_within_trial_prediction_all.png
Ignored: output/S1_overlapping_accession_counts.csv
Ignored: output/S3_removed_scans.csv
Ignored: output/Table2_DMC_statistics.csv
Ignored: output/Table3_performance_summary.csv
Ignored: output/TableS2_within_trial_predictions.csv
Ignored: output/TableS4_cv_results.csv
Ignored: output/cv_base.png
Ignored: output/cv_results.csv
Ignored: output/full_filtered_plots.csv
Ignored: output/full_filtered_subsamples.csv
Ignored: output/full_filtered_unaggregated.csv
Ignored: output/subsampling_prediction_results_2021.csv
Ignored: output/within_trial_waves_PLSR.csv
Ignored: output/within_trial_waves_RF.csv
Ignored: output/within_trial_waves_RF_importance.csv
Ignored: output/within_trial_waves_SVM.csv
Staged changes:
Modified: .gitignore
New: analysis/README.Rmd
Modified: analysis/_site.yml
Modified: analysis/filter_aggregate.Rmd
Modified: analysis/index.Rmd
Modified: analysis/manuscript_predictions.Rmd
Modified: analysis/manuscript_subsampling.Rmd
Modified: analysis/manuscript_summary_figures.Rmd
Modified: code/server_CV.R
Modified: code/server_within_trial_predictions_PLSR_RF_SVM.R
Modified: data/README.md
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd
) and HTML (docs/index.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 759b463 | Jenna Hershberger | 2021-04-22 | Update curation code |
Rmd | 8f143af | Jenna Hershberger | 2021-04-21 | Add content |
html | 8f143af | Jenna Hershberger | 2021-04-21 | Add content |
Rmd | fecea09 | Jenna Hershberger | 2021-04-19 | Start workflowr project. |
This repository documents all analyses, summary, tables, and figures associated with the following PREPRINT: Low-cost, handheld near-infrared spectroscopy for root dry matter content prediction in cassava
Over 800 million people across the tropics rely on cassava as a major source of calories. While the root dry matter content (RDMC) of this starchy root crop is important for both producers and consumers, characterization of RDMC by traditional methods is time-consuming and laborious for breeding programs. Alternate phenotyping methods have been proposed but lack the accuracy, cost, or speed ultimately needed for cassava breeding programs. For this reason, we investigated the use of a low-cost, handheld NIR spectrometer for field-based RDMC prediction in cassava. Oven-dried measurements of RDMC were paired with 21,044 scans of roots of 376 diverse clones from 10 field trials in Nigeria and grouped into training and test sets based on cross-validation schemes relevant to plant breeding programs. Mean partial least squares regression model performance ranged from R2p = 0.62 - 0.89 for within-trial predictions, which is within the range achieved with laboratory-grade spectrometers in previous studies. Relative to other factors, model performance was highly impacted by the inclusion of samples from the same environment in both the training and test sets. Random forest variable importance analysis of root spectra revealed increased importance in a region previously identified as predictive of water content in plants (~950 - 990 nm). With appropriate model calibration, the tested spectrometer will allow for field-based collection of spectral data with a smartphone for accurate RDMC prediction and potentially other quality traits, a step that could be easily integrated into existing harvesting workflows of cassava breeding programs.
The R package workflowr was used to document this study reproducibly.
Much of the supporting data and output from the analyses documented here are too large for GitHub.
The raw data for this repository is stored on Cyverse. Download this folder and add the contents to the /data folder in this repository to run the analysis code. When running the code, follow the order listed below.
Some of the analyses in this manuscript were more efficiently run from the command line on a server with more memory than is common on desktop/laptop machines. The scripts for these analyses are located in the code/
sub-folder of this repository with names starting with “server”. Results from these analyses are used in subsequent html / Rmd files to generate figures and tables for the manuscript.
code/server_within_trial_predictions_PLSR_RF_SVM.R
: Command line script that performs within-trial predictions with plot mean scanscode/server_within_trial_predictions_RF_var_importance.R
: Command line script that calculates within-trial random forest variable importancecode/run_subsampling.sh
: Command line shell script that calls code/server_subsampling_generalized.R
to subsamples sets of scans within each plot and then performs within-trial predictions on those sets with code/server_subsample_plsr.R
. Utilizes functions from code/subsampling_functions.R
.code/server_CV.R
: Command line R script that performs predictions according to four cross-validation schemes relevant to plant breeding