Tutorial ridge regression
This tutorial gives a brief introduction to ridge regression. The tutorial makes use of the acetylene dataset from "Marquardt, D. W., and Snee, R. D. (1975). “Ridge Regression in Practice.” American Statistician 29:3–20.", and the follow the same outline as the SAS documentation
First, creating the dataset.
We create the dataset with the help of the DataFrames.jl
package.
using DataFrames
x1 = [1300, 1300, 1300, 1300, 1300, 1300, 1200, 1200, 1200, 1200, 1200, 1200, 1100, 1100, 1100, 1100]
x2 = [7.5, 9.0, 11.0, 13.5, 17.0, 23.0, 5.3, 7.5, 11.0, 13.5, 17.0, 23.0, 5.3, 7.5, 11.0, 17.0]
x3 = [0.012, 0.012, 0.0115, 0.013, 0.0135, 0.012, 0.04, 0.038, 0.032, 0.026, 0.034, 0.041, 0.084, 0.098, 0.092, 0.086]
y = [49.0, 50.2, 50.5, 48.5, 47.5, 44.5, 28.0, 31.5, 34.5, 35.0, 38.0, 38.5, 15.0, 17.0, 20.5, 29.5]
df = DataFrame(x1= x1, x2= x2, x3= x3, y= y)
Row | x1 | x2 | x3 | y |
---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | |
1 | 1300 | 7.5 | 0.012 | 49.0 |
2 | 1300 | 9.0 | 0.012 | 50.2 |
3 | 1300 | 11.0 | 0.0115 | 50.5 |
4 | 1300 | 13.5 | 0.013 | 48.5 |
5 | 1300 | 17.0 | 0.0135 | 47.5 |
6 | 1300 | 23.0 | 0.012 | 44.5 |
7 | 1200 | 5.3 | 0.04 | 28.0 |
8 | 1200 | 7.5 | 0.038 | 31.5 |
9 | 1200 | 11.0 | 0.032 | 34.5 |
10 | 1200 | 13.5 | 0.026 | 35.0 |
11 | 1200 | 17.0 | 0.034 | 38.0 |
12 | 1200 | 23.0 | 0.041 | 38.5 |
13 | 1100 | 5.3 | 0.084 | 15.0 |
14 | 1100 | 7.5 | 0.098 | 17.0 |
15 | 1100 | 11.0 | 0.092 | 20.5 |
16 | 1100 | 17.0 | 0.086 | 29.5 |
Second, make a least square regression
We make a ordinary least squares (OLS) regression for comparison.
using LinearRegressionKit, StatsModels
using VegaLite
f = @formula(y ~ x1 + x2 + x3 + x1 & x2 + x1^2)
lm, ps = regress(f, df, "all", req_stats=["default", "vif"])
lm
Model definition: y ~ 1 + x1 + x2 + x3 + :(x1 ^ 2) + x1 & x2
Used observations: 16
Model statistics:
R²: 0.993708 Adjusted R²: 0.990562
MSE: 1.33625 RMSE: 1.15596
σ̂²: 1.33625
F Value: 315.86 with degrees of freedom 5 and 10, Pr > F (p-value): 1.14778e-10
Confidence interval: 95%
Coefficients statistics:
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci VIF
──────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
(Intercept) │ 390.538 211.523 1.84632 0.0946146 . -80.7641 861.841 0.0
x1 │ -0.776761 0.324481 -2.39385 0.0377095 * -1.49975 -0.0537716 7682.37
x2 │ 10.1735 0.943009 10.7883 7.89586e-7 *** 8.07235 12.2747 320.022
x3 │ -121.626 69.0175 -1.76225 0.108507 -275.407 32.1545 53.5246
x1 ^ 2 │ 0.000398306 0.000125283 3.17925 0.00983192 ** 0.000119158 0.000677454 6643.32
x1 & x2 │ -0.0080548 0.000772088 -10.4325 1.07686e-6 *** -0.00977512 -0.00633448 344.545
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We observe that the VIF for the coefficients are highs, and hence indicate likely multicollinearity.
Ridge regression
Ridge regression requires a parameter (k), while there are methods to numerically identify k. It is also possible to trace the coefficients and VIFs values to let the analyst choose a k. Here we are going to trace for the k between 0.0 and 0.1 by increment of 0.0005. We display only the results for the first 5 k.
rdf, ps = ridge(f, df, 0.0:0.0005:0.1, traceplots=true)
rdf[1:5 , :]
Row | k | MSE | RMSE | R2 | ADJR2 | (Intercept) | x1 | x2 | x3 | x1 ^ 2 | x1 & x2 | vif (Intercept) | vif x1 | vif x2 | vif x3 | vif x1 ^ 2 | vif x1 & x2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 0.0 | 1.33625 | 1.15596 | 0.993708 | 0.990562 | 390.538 | -0.776761 | 10.1735 | -121.626 | 0.000398306 | -0.0080548 | 0.0 | 7682.37 | 320.022 | 53.5246 | 6643.32 | 344.545 |
2 | 0.0005 | 3.07744 | 1.75426 | 0.985509 | 0.978264 | -90.3826 | -0.00657257 | 7.63994 | 4.35999 | 8.80983e-5 | -0.00598285 | 0.0 | 118.261 | 179.734 | 12.0955 | 104.157 | 193.496 |
3 | 0.001 | 4.67342 | 2.16181 | 0.977994 | 0.966991 | -105.723 | 0.0353659 | 6.12837 | 0.960535 | 6.41509e-5 | -0.00474541 | 0.0 | 35.5156 | 114.986 | 11.3358 | 32.4431 | 123.78 |
4 | 0.0015 | 6.08707 | 2.4672 | 0.971338 | 0.957006 | -105.785 | 0.0477596 | 5.12211 | -4.3978 | 5.41688e-5 | -0.00392159 | 0.0 | 17.787 | 79.8763 | 10.9935 | 17.007 | 85.9757 |
5 | 0.002 | 7.27496 | 2.69721 | 0.965744 | 0.948616 | -103.388 | 0.0529205 | 4.40401 | -9.06516 | 4.844e-5 | -0.00333371 | 0.0 | 11.1773 | 58.7312 | 10.7444 | 11.218 | 63.2076 |
Here is the default trace plot for the coefficients:
ps["coefs traceplot"]
It is an issue that the &
in the variable names are replaced by ampersand
because otherwise it would cause issue either on the web display or on the SVG display. This only affect the plot generate with the library. One can directly use the DataFrame to generate its own plots.
And here is the default trace plot for the VIFs:
ps["vifs traceplot"]
As it is difficult to see the VIFs traces, it is also possible to request the version of hte plot with the y-axis log scaled.
ps["vifs traceplot log"]
Once a k
has been selected (in this case 0.004) a regular ridge regression with this value can executed.
rlm = ridge(f, df, 0.004)
Ridge Regression
Constant k: 0.004
Model definition: y ~ 1 + x1 + x2 + x3 + :(x1 ^ 2) + x1 & x2
Used observations: 16
Model statistics:
R²: 0.951075 Adjusted R²: 0.926612
MSE: 10.3903 RMSE: 3.2234
Coefficients statistics:
Terms ╲ Stats │ Coefs VIF
──────────────┼─────────────────────────
(Intercept) │ -93.797 0.0
x1 │ 0.0578795 4.35597
x2 │ 2.83932 23.9391
x3 │ -21.3375 9.99599
x1 ^ 2 │ 3.82194e-5 5.1494
x1 & x2 │ -0.00205277 25.7445
From there the regular predict_*
functions can be used, although only the predicted
and potentially the residuals
statistics will be calculated.
res = predict_in_sample(rlm, df)
Row | x1 | x3 | x2 | y | predicted | residuals |
---|---|---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 1300 | 0.012 | 7.5 | 49.0 | 47.0616 | 1.93845 |
2 | 1300 | 0.012 | 9.0 | 50.2 | 47.3176 | 2.88236 |
3 | 1300 | 0.0115 | 11.0 | 50.5 | 47.6698 | 2.83024 |
4 | 1300 | 0.013 | 13.5 | 48.5 | 48.0646 | 0.435441 |
5 | 1300 | 0.0135 | 17.0 | 47.5 | 48.6514 | -1.15142 |
6 | 1300 | 0.012 | 23.0 | 44.5 | 49.7078 | -5.20777 |
7 | 1200 | 0.04 | 5.3 | 28.0 | 31.8337 | -3.83367 |
8 | 1200 | 0.038 | 7.5 | 31.5 | 32.7035 | -1.20355 |
9 | 1200 | 0.032 | 11.0 | 34.5 | 34.1476 | 0.352423 |
10 | 1200 | 0.026 | 13.5 | 35.0 | 35.2156 | -0.215604 |
11 | 1200 | 0.034 | 17.0 | 38.0 | 36.3609 | 1.63909 |
12 | 1200 | 0.041 | 23.0 | 38.5 | 38.4675 | 0.0324516 |
13 | 1100 | 0.084 | 5.3 | 15.0 | 17.4044 | -2.40437 |
14 | 1100 | 0.098 | 7.5 | 17.0 | 18.3845 | -1.38446 |
15 | 1100 | 0.092 | 11.0 | 20.5 | 20.547 | -0.0469559 |
16 | 1100 | 0.086 | 17.0 | 29.5 | 24.1626 | 5.33735 |