Find out why you should choose regularized regressions

0 2 minutos de leitura

Recall that fitting a linear regression model minimizes a loss function to choose a coefficient, Alpha, for each feature variable, and the intercept (b).

If we allow these coefficients to be very large, we can get overfitting. Therefore, it is common practice to alter the loss function of a tradicional regression model so that it penalizes large coefficients. This is called regularization.

Ridge Regression

The first type of regularized regression that we’ll look at is called ridge. With ridge, we use the Ordinary Least Squares loss function plus the squared value of each coefficient, multiplied by a constant, alpha.

So, when minimizing the loss function, models are penalized for coefficients with large positive or negative values. When using ridge, we need to choose the alpha value in order to fit and predict. Essentially, we can select the alpha for which our model performs best. Picking alpha for ridge is similar to picking k in KNN.

Alpha in ridge is known as a hyperparameter, which is a variable used for selecting a model’s parameters. Alpha controls model complexity.

When alpha equals zero, we are performing OLS, where large coefficients are not penalized and overfitting may occur. A high alpha means that large coefficients are significantly penalized, which can lead to underfitting.

Chat for free with English native speakers.

Ridge regression in scikit-learn

To perform ridge regression in scikit-learn, we import Ridge from sklearn-dot-linear_model.

To highlight the impact of different alpha values, we create an empty list for our scores, then loop through a list of different alpha values. Inside the for loop we instantiate Ridge, setting the alpha keyword argument equal to the iterator, also called alpha.

We fit on the training data, and predict on the test data. We save the model’s R-squared value to the scores list. Finally, outside of the loop, we print the scores for the models with five different alpha values. We see performance gets worse as alpha increases.

LASSO Regression

There is another type of regularized regression called lasso, where our loss function is the OLS loss function plus the absolute value of each coefficient multiplied by some constant, alpha.

Lasso regression in scikit-learn

To use Lasso we import it from sklearn-dot-linear_model. The actual method for performing lasso regression in scikit-learn mirrors ridge regression, as we can see here. Performance drops substantially as alpha goes over 20!

Comparing LASSO and Ridge Equations

Insert here equations comparatinons between lasso e rige, ask ChatGPT to create it.

LASSO Regression for feature Seletcion

Lasso regression can actually be used to assess feature importance. This is because it tends to shrink the coefficients of less important features to zero. The features whose coefficients are not shrunk to zero are selected by the lasso algorithm.

Let’s check this out in practice.

LASSO for feature selection in Scikit Learn

We can see that the most important predictor for our target variable, blood glucose levels, is the binary value for whether an individual has diabetes or not! This is not surprising, but is a great sanity check.

This type of feature selection is very important because it allows us to communicate results to non-technical audiences. It is also useful for identifying which factors are important predictors for various physical phenomena.

Relacionado

0 2 minutos de leitura

Mostrar mais

Find out why you should choose regularized regressions

Ridge Regression

Ridge regression in scikit-learn

LASSO Regression

Lasso regression in scikit-learn

Comparing LASSO and Ridge Equations

LASSO Regression for feature Seletcion

LASSO for feature selection in Scikit Learn

Curtir isso:

Relacionado

Faça seu comentário:Cancelar resposta

Processo Administrativo Disciplinar e a Sanidade Mental do Servidor Federal Acusado

Cálculo de Juros Equivalentes Diários na HP-12C na Base de 252 Dias Úteis

Adding Data by Creating a New Pagefile in Eviews

Importing by Dragging and Dropping in Eviews

After creating a workfile, EViews will automatically create two (2) series objects.

How do you create a workfile in Eviews for annual time series data?

There are three types of workfile structures available in EViews:

True or False: EViews allows the user to create an empty workfile

An EViews workfile currently displays a range of 1980Q1 2014Q4 and a sample of 1990Q1 2014Q4 as shown below.

Statments EViews Workfiles

EViews is an object-oriented program.

Ridge Regression

Ridge regression in scikit-learn

LASSO Regression

Lasso regression in scikit-learn

Comparing LASSO and Ridge Equations

LASSO Regression for feature Seletcion

LASSO for feature selection in Scikit Learn

Compartilhe isso:

Curtir isso:

Relacionado

Alternatives to the log transformation applied to a time series

Taxas interbancárias no Brasil e Área do Euro ou Zona do Euro

Artigos relacionados

Faça seu comentário:Cancelar resposta

Adblock detectado