Skip to main content

Lasso — L1 regularized regression

Lasso (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty to OLS. Unlike Ridge, Lasso can drive some coefficients exactly to zero — i.e. automatic variable selection, yielding a sparse, interpretable model.

When to use

Use Lasso when you have many regressors and want to select the important subset. When groups of variables are highly correlated, consider Elastic Net.


Model specification

minβ  i=1n(YiXiβ)2+λj=1pβj\min_{\beta} \; \sum_{i=1}^{n} (Y_i - X_i \beta)^2 + \lambda \sum_{j=1}^{p} |\beta_j|

The L1 penalty (βj\sum |\beta_j|) produces corner solutions ⇒ many βj=0\beta_j = 0. λ\lambda controls the sparsity level.


Notes

  • Choose λ\lambda by cross-validation; standardize variables first.
  • With highly correlated variables, Lasso tends to pick one and drop the rest (unstable) ⇒ Elastic Net fixes this.
  • Post-selection inference requires care.

Running in EcoLab

  1. Modeling module → Regularized regression family → Lasso.
  2. Select YY, the XX variables; enable standardization; choose λ\lambda (CV).
  3. Read the retained variables (non-zero coefficients) and the path; export the replication code.

Replication code

* ---- Lasso with cross-validation ----
use "macro_data.dta", clear

lasso linear y x1-x20, selection(cv)

* Display selected variables and coefficients
lassocoef, display(coef, standardized)

Limitations

  • Unstable when variables are highly correlated.
  • Selects at most nn variables when p>np > n.

Video tutorial

Video Tutorial: Running Lasso regression in EcoLab

See also