Find out why you should choose regularized regressions
Recall that fitting a linear regression model minimizes a loss function to choose a coefficient, Alpha, for each feature variable, and the intercept (b).
If we allow these coefficients to be very large, we can get overfitting. Therefore, it is common practice to alter the loss function of a tradicional regression model so that it penalizes large coefficients. This is called regularization.
Ridge Regression
The first type of regularized regression that we’ll look at is called ridge. With ridge, we use the Ordinary Least Squares loss function plus the squared value of each coefficient, multiplied by a constant, alpha.
So, when minimizing the loss function, models are penalized for coefficients with large positive or negative values. When using ridge, we need to choose the alpha value in order to fit and predict. Essentially, we can select the alpha for which our model performs best. Picking alpha for ridge is similar to picking k in KNN.
Alpha in ridge is known as a hyperparameter, which is a variable used for selecting a model’s parameters. Alpha controls model complexity.
When alpha equals zero, we are performing OLS, where large coefficients are not penalized and overfitting may occur. A high alpha means that large coefficients are significantly penalized, which can lead to underfitting.
Ridge regression in scikit-learn
To perform ridge regression in scikit-learn, we import Ridge from sklearn-dot-linear_model.
To highlight the impact of different alpha values, we create an empty list for our scores, then loop through a list of different alpha values. Inside the for loop we instantiate Ridge, setting the alpha keyword argument equal to the iterator, also called alpha.
We fit on the training data, and predict on the test data. We save the model’s R-squared value to the scores list. Finally, outside of the loop, we print the scores for the models with five different alpha values. We see performance gets worse as alpha increases.
LASSO Regression
There is another type of regularized regression called lasso, where our loss function is the OLS loss function plus the absolute value of each coefficient multiplied by some constant, alpha.
Lasso regression in scikit-learn
To use Lasso we import it from sklearn-dot-linear_model. The actual method for performing lasso regression in scikit-learn mirrors ridge regression, as we can see here. Performance drops substantially as alpha goes over 20!
Comparing LASSO and Ridge Equations
Insert here equations comparatinons between lasso e rige, ask ChatGPT to create it.
LASSO Regression for feature Seletcion
Lasso regression can actually be used to assess feature importance. This is because it tends to shrink the coefficients of less important features to zero. The features whose coefficients are not shrunk to zero are selected by the lasso algorithm.
Let’s check this out in practice.
LASSO for feature selection in Scikit Learn
We can see that the most important predictor for our target variable, blood glucose levels, is the binary value for whether an individual has diabetes or not! This is not surprising, but is a great sanity check.
This type of feature selection is very important because it allows us to communicate results to non-technical audiences. It is also useful for identifying which factors are important predictors for various physical phenomena.