Last updated: 2018-04-19
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(20180411) 
The command set.seed(20180411) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: bb95415 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
Untracked files:
    Untracked:  analysis/pca_cell_cycle.Rmd
    Untracked:  analysis/ridge_mle.Rmd
    Untracked:  data/zip.test.txt
    Untracked:  data/zip.train.txt
    Untracked:  docs/figure/
    Untracked:  svd_flash_zip.Rmd
Unstaged changes:
    Modified:   analysis/cell_cycle.Rmd
    Modified:   analysis/density_est_cell_cycle.Rmd
    Modified:   analysis/eb_vs_soft.Rmd
    Modified:   analysis/eight_schools.Rmd
    Modified:   analysis/glmnet_intro.Rmd
| File | Version | Author | Date | Message | 
|---|---|---|---|---|
| Rmd | bb95415 | stephens999 | 2018-04-19 | wflow_publish(“analysis/svd_zip.Rmd”) | 
Here we read in the “zipcode” training data, and extract the 2s and 3s.
z = read.table("data/zip.train.txt")
sub = (z[,1] == 2) | (z[,1]==3)
z23 = as.matrix(z[sub,])Now we run svd (excluding the first column which are the labels)
z23.svd = svd(z23[,-1])Plot the first two two singular vectors, colored by group, we see the second sv separates the groups reasonably well.
plot(z23.svd$u[,1],z23.svd$u[,2],col=z23[,1]) 
And a histogram suggests a mixture of two Gaussians might be a reasonable start:
hist(z23.svd$u[,2],breaks=seq(-0.07,0.07,length=20)) 
sessionInfo()R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
loaded via a namespace (and not attached):
 [1] workflowr_0.11.0.9000 Rcpp_0.12.14          digest_0.6.13        
 [4] rprojroot_1.3-2       R.methodsS3_1.7.1     backports_1.1.2      
 [7] git2r_0.21.0          magrittr_1.5          evaluate_0.10.1      
[10] stringi_1.1.6         whisker_0.3-2         R.oo_1.21.0          
[13] R.utils_2.6.0         rmarkdown_1.8         tools_3.3.2          
[16] stringr_1.2.0         yaml_2.1.16           htmltools_0.3.6      
[19] knitr_1.18           This reproducible R Markdown analysis was created with workflowr 0.11.0.9000