Last updated: 2022-04-05

Knit directory: stat34800/analysis/

I took this code illustrating regularization from

My intention here is to run the code and show the figures it produced (since the plots are not shown in the html above). I had to move the require(ggplot2) command to the top of the code get it to run. I also added a plot of the L1 coefficients because that was missing from the code. I downloaded the data from

Unfortunately the graphical results do not seem to match the description in the text in the html. For example, there does not seem to be any need for regularization: the lowest error in the CV occurs at or near the least squares solution. I do not know the reason for this.

Loading required package: data.table
Warning: package 'data.table' was built under R version 4.1.1
Loading required package: glmnet
Warning: package 'glmnet' was built under R version 4.1.1
Loading required package: Matrix
Loaded glmnet 4.1-3
Loading required package: ggplot2
Warning: package 'ggplot2' was built under R version 4.1.1
###reading data
##Removing non numeric var

###Splitting data,nrow(housingData)*0.005)

##no Reg

lm(formula = price ~ ., data = housingData[indexTrain])

    Min      1Q  Median      3Q     Max 
-329868 -109393  -19223   78141  891663 

Coefficients: (1 not defined because of singularities)
                Estimate Std. Error t value Pr(>|t|)   
(Intercept)    4.617e+06  1.997e+06   2.312  0.02296 * 
bedrooms      -3.024e+04  3.179e+04  -0.951  0.34389   
bathrooms      7.125e+04  6.078e+04   1.172  0.24405   
sqft_living    4.155e+01  6.850e+01   0.607  0.54564   
sqft_lot      -1.188e+00  1.016e+00  -1.170  0.24493   
floors         4.427e+04  5.359e+04   0.826  0.41080   
waterfront            NA         NA      NA       NA   
view           9.330e+04  3.734e+04   2.499  0.01421 * 
condition      3.881e+04  3.003e+04   1.292  0.19944   
grade          9.142e+04  3.126e+04   2.925  0.00432 **
sqft_above     4.036e+01  5.998e+01   0.673  0.50267   
yr_built      -2.710e+03  1.016e+03  -2.668  0.00899 **
yr_renovated   1.234e+02  4.521e+01   2.730  0.00755 **
sqft_living15  7.727e+01  5.408e+01   1.429  0.15634   
sqft_lot15    -7.130e-01  1.981e+00  -0.360  0.71968   
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 189400 on 94 degrees of freedom
Multiple R-squared:  0.6416,    Adjusted R-squared:  0.592 
F-statistic: 12.94 on 13 and 94 DF,  p-value: 8.307e-16
Warning in predict.lm(lmNoReg, housingData[-indexTrain]): prediction from a
rank-deficient fit may be misleading
[1] 239795.3
 coeff=melt(coeff,id.vars = 'name')
 ggplot(coeff,aes(x=variable,y=value,color=name))+geom_line()+xlab(paste0(type,' regularisation'))+ylab('Value of coefficient')+scale_x_log10()

##Different L1 regularisation
fit = glmnet(as.matrix(housingData[indexTrain,-c('price'),with=F]),as.matrix(housingData[indexTrain]$price) , family="gaussian",alpha=1)

plotCoeffEvolution(fit,'L1') # I added this line

ggplot(DF_plot,aes(x=lambda,y=rmse))+geom_line()+ggtitle("Evolution of test error vs lambda value")+scale_x_log10()

Loading required package: plotly
Warning: package 'plotly' was built under R version 4.1.1

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

The following object is masked from 'package:stats':

The following object is masked from 'package:graphics':

## L2 regularisation
fit = glmnet(as.matrix(housingData[indexTrain,-c('price'),with=F]),as.matrix(housingData[indexTrain]$price) , family="gaussian",alpha=0)


ggplot(DF_plot,aes(x=lambda,y=rmse))+geom_line()+ggtitle("Evolution of test error vs lambda value")+scale_x_log10()

##Different L1L2 regularisation
fit = glmnet(as.matrix(housingData[indexTrain,-c('price'),with=F]),as.matrix(housingData[indexTrain]$price) , family="gaussian",alpha=0.03)


ggplot(DF_plot,aes(x=lambda,y=rmse))+geom_line()+ggtitle("Evolution of test error vs lambda value")+scale_x_log10()

R version 4.1.0 Patched (2021-07-20 r80657)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plotly_4.10.0     ggplot2_3.3.5     glmnet_4.1-3      Matrix_1.3-4     
[5] data.table_1.14.2

loaded via a namespace (and not attached):
 [1] shape_1.4.6       tidyselect_1.1.1  xfun_0.28         purrr_0.3.4      
 [5] splines_4.1.0     lattice_0.20-45   colorspace_2.0-2  vctrs_0.3.8      
 [9] generics_0.1.1    viridisLite_0.4.0 htmltools_0.5.2   yaml_2.2.1       
[13] utf8_1.2.2        survival_3.2-13   rlang_0.4.12      jquerylib_0.1.4  
[17] later_1.3.0       pillar_1.6.4      glue_1.5.0        withr_2.4.2      
[21] DBI_1.1.1         bit64_4.0.5       foreach_1.5.1     lifecycle_1.0.1  
[25] stringr_1.4.0     munsell_0.5.0     gtable_0.3.0      workflowr_1.7.0  
[29] htmlwidgets_1.5.4 codetools_0.2-18  evaluate_0.14     labeling_0.4.2   
[33] knitr_1.36        fastmap_1.1.0     httpuv_1.6.3      fansi_0.5.0      
[37] highr_0.9         Rcpp_1.0.7        promises_1.2.0.1  scales_1.1.1     
[41] jsonlite_1.7.2    farver_2.1.0      bit_4.0.4         fs_1.5.0         
[45] digest_0.6.28     stringi_1.7.5     dplyr_1.0.7       grid_4.1.0       
[49] rprojroot_2.0.2   tools_4.1.0       magrittr_2.0.2    lazyeval_0.2.2   
[53] tibble_3.1.6      tidyr_1.1.4       crayon_1.4.2      whisker_0.4      
[57] pkgconfig_2.0.3   ellipsis_0.3.2    httr_1.4.2        assertthat_0.2.1 
[61] rmarkdown_2.11    iterators_1.0.13  R6_2.5.1          git2r_0.29.0     
[65] compiler_4.1.0