Skip to main content

Heckman — Sample selection model (Heckit)

The Heckman model (Heckit) corrects sample selection bias — when whether an observation has an outcome value depends on factors related to that outcome. Example: we only observe wages of people who work; the working sample is non-random ⇒ OLS is biased.

When to use

Use Heckman when the outcome sample is endogenously selected (e.g. wage ↔ labor-force participation). You need an exclusion restriction: a variable that affects being selected but not the outcome directly.


Two-equation structure

  • Selection equation: Si=1[Ziγ+ui>0]S_i = 1[Z_i \gamma + u_i > 0] (Probit).
  • Outcome equation: Yi=Xiβ+ρσελ(Ziγ)+ξiY_i = X_i \beta + \rho \sigma_\varepsilon \, \lambda(Z_i \gamma) + \xi_i, where λ()\lambda(\cdot) is the inverse Mills ratio (IMR).

A significant IMR coefficient ⇒ sample selection bias is present (and Heckman is warranted).


Two estimation approaches

ApproachDescription
Two-step (Heckit)Step 1 Probit selection → compute IMR; step 2 OLS outcome with IMR
MLEEstimate both equations jointly (more efficient)

Running in EcoLab

  1. Modeling module → Limited dependent variable family → Heckman.
  2. Declare the outcome equation (YY, XX) and the selection equation (ZZ, including the exclusion variable).
  3. Choose two-step or MLE; run; read the IMR coefficient (ρ\rho) to confirm bias; export the replication code.

Replication code

* ===== Heckman Selection Model (Heckit) =====
* Two-step estimator
* Outcome eq.: lnwage = f(educ, exper)
* Selection eq.: working = f(married, kids) ← exclusion restriction
heckman lnwage educ exper, select(working = married kids)

* MLE estimator (more efficient)
heckman lnwage educ exper, select(working = married kids) twostep

* Key output:
* - mills (lambda): significant → selection bias present
* - rho: correlation between selection and outcome errors
* - sigma: std. deviation of outcome error

Limitations

  • Heavily depends on a valid exclusion restriction; without it, the model is poorly identified (collinearity with the IMR).
  • Sensitive to the bivariate-normal error assumption.

Video tutorial

Video Tutorial: Guide to running Heckman selection model in EcoLab

See also