vtreat
mgcv
standardize
PROBLEM: Create a statistical regression model that relates acute hospital profit per day to operational influences
DATA: Operational, Revenue, Cost and Efficiency metrics for 43 acute hospitals over 3 years i.e. 129 data rows
INTENTION: Parsimonious model that effectively caters for the diversity of experience across the hospitals
Isolate the metrics which are statistically significant profit per day drivers
Ascertain the influence of the Hospital Manager vs Head Office on these metrics
Use the results and appropriate presentation to convince financial executives of the “many-model approach” to business strategy!
Analyse the potential explanatory variables against the dependent variable
Evaluate a Linear regression model to ascertain the statistically significant variables
Evaluate a Generalised Additive Model with the same variables
Decide on how to present results
##
## Call:
## lm(formula = fmla.lin, data = HP_Data[split$train, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1356.82 -176.03 4.02 173.02 700.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.510e+02 4.251e+02 0.355 0.723
## Occupancy 2.146e+03 2.733e+02 7.852 1.37e-11 ***
## Med_PPD.Rt -1.339e+03 2.780e+02 -4.815 6.63e-06 ***
## Net_Rev.PPD 1.494e-01 2.474e-02 6.039 4.34e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 314.2 on 82 degrees of freedom
## Multiple R-squared: 0.8376, Adjusted R-squared: 0.8317
## F-statistic: 141 on 3 and 82 DF, p-value: < 2.2e-16
##
## Family: gaussian
## Link function: identity
##
## Formula:
## OpEB_PPD ~ s(Occupancy) + s(Med_PPD.Rt) + s(Net_Rev.PPD)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1846.75 31.42 58.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Occupancy) 2.330 2.889 18.956 4.19e-08 ***
## s(Med_PPD.Rt) 8.736 8.970 2.826 0.00504 **
## s(Net_Rev.PPD) 1.000 1.000 16.084 0.00014 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.856 Deviance explained = 87.6%
## GCV = 1.0014e+05 Scale est. = 84924 n = 86
Drop outliers, in particular loss making hospitals and re-evaluate the models
Experiment with interaction terms for the GAM
Present the results in a non-offensive manner
Decide whether it is a model to