Lasso — L1 regularized regression
Lasso (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty to OLS. Unlike Ridge, Lasso can drive some coefficients exactly to zero — i.e. automatic variable selection, yielding a sparse, interpretable model.
When to use
Use Lasso when you have many regressors and want to select the important subset. When groups of variables are highly correlated, consider Elastic Net.
Model specification
The L1 penalty () produces corner solutions ⇒ many . controls the sparsity level.
Notes
- Choose by cross-validation; standardize variables first.
- With highly correlated variables, Lasso tends to pick one and drop the rest (unstable) ⇒ Elastic Net fixes this.
- Post-selection inference requires care.
Running in EcoLab
- Modeling module → Regularized regression family → Lasso.
- Select , the variables; enable standardization; choose (CV).
- Read the retained variables (non-zero coefficients) and the path; export the replication code.
Replication code
- Stata
- R
- Python
* ---- Lasso with cross-validation ----
use "macro_data.dta", clear
lasso linear y x1-x20, selection(cv)
* Display selected variables and coefficients
lassocoef, display(coef, standardized)
# ---- Lasso (alpha = 1) with cross-validation ----
library(glmnet)
# Load and prepare data (illustrative)
df <- read.csv("macro_data.csv")
X <- as.matrix(df[, paste0("x", 1:20)])
y <- df$y
# Lasso with CV
cv_lasso <- cv.glmnet(X, y, alpha = 1)
plot(cv_lasso)
# Best lambda and non-zero coefficients
best_lambda <- cv_lasso$lambda.min
coef(cv_lasso, s = best_lambda)
# ---- Lasso with cross-validation ----
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Load data (illustrative)
df = pd.read_csv("macro_data.csv")
X = df[[f"x{i}" for i in range(1, 21)]]
y = df["y"]
# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# LassoCV selects lambda automatically
model = LassoCV(cv=5).fit(X_scaled, y)
print(f"Best alpha (lambda): {model.alpha_}")
print(f"Non-zero coefficients: {sum(model.coef_ != 0)}")
print(f"Coefficients: {model.coef_}")
Limitations
- Unstable when variables are highly correlated.
- Selects at most variables when .