IV / 2SLS — Instrumental Variables & Two-Stage Least Squares
IV/2SLS handles endogeneity — when a regressor is correlated with the error (due to omitted variables, measurement error, or simultaneity). In that case OLS is biased and inconsistent. IV uses an instrument to isolate the exogenous part of the endogenous variable.
Valid-instrument conditions
A valid instrument must be: (1) relevant — correlated with the endogenous variable; (2) exogenous (exclusion) — affecting only through the endogenous variable, not directly. A weak instrument causes severe bias.
Two-stage mechanism
Required tests
- Weak instrument: first-stage F-statistic (rule of thumb: F > 10).
- Endogeneity: Durbin-Wu-Hausman test (is IV needed?).
- Overidentification: Sargan/Hansen J test (when instruments > endogenous variables).
Running in EcoLab
- Modeling module → IV & simultaneous equations family → IV/2SLS.
- Declare , the exogenous variables, the endogenous variable(s) and the instrument(s) .
- Run; read the first-stage F, 2SLS coefficients, Sargan/Hansen; export the replication code.
Replication code
- Stata
- R
- Python
* ── IV / 2SLS estimation ──────────────────────────
* Endogenous: educ | Instruments: near_college, parent_educ
ivregress 2sls lnwage exper (educ = near_college parent_educ), first
* ── Post-estimation diagnostics ───────────────────
estat firststage // First-stage F (rule of thumb: F > 10)
estat endogenous // Durbin-Wu-Hausman endogeneity test
estat overid // Sargan / Hansen J overidentification test
# ── IV / 2SLS estimation ──────────────────────────
library(AER)
model_iv <- ivreg(
lnwage ~ educ + exper | near_college + parent_educ + exper,
data = df
)
# ── Summary with built-in diagnostics ────────────
# Reports: Weak instruments, Wu-Hausman, Sargan
summary(model_iv, diagnostics = TRUE)
# ── IV / 2SLS estimation ──────────────────────────
from linearmodels.iv import IV2SLS
# Define variables
dep = df["lnwage"] # Dependent variable
exog = df[["exper"]] # Exogenous regressors
endog = df[["educ"]] # Endogenous regressor
instr = df[["near_college",
"parent_educ"]] # Instruments
model = IV2SLS(dep, exog, endog, instr)
results = model.fit(cov_type="robust")
print(results)
# First-stage diagnostics are included in the output
# Check: first_stage.diagnostics for F-stat, Sargan, etc.
Limitations
- Weak/invalid instruments make IV worse than OLS.
- Finding good instruments is usually hard; needs strong theoretical justification.