Returns to education on wages (Mincer, OLS)
This illustrates EcoLab's 5-step workflow with the most classic model in labor economics: the Mincer wage equation estimated by OLS. The figures are format illustrations.
Summary: regress log wage on years of schooling and experience to estimate the "return to education" (the payoff to each additional year of schooling).
Step 1 — Ideation
- Question: by what percentage does one more year of schooling raise wages?
Step 2 — Literature Review
Human-capital theory (Becker, Mincer); standardize citations; clarify variables and the log-lin form.
Step 3 — Data Collection
| Variable | Symbol | Measurement | Source |
|---|---|---|---|
| Log wage | lnwage | log(hourly/monthly wage) | VHLSS; labor survey |
| Years of schooling | educ | years | VHLSS |
| Experience | exper, exper2 | years and squared | from age − schooling − 6 |
| Controls | gender, region | binary | VHLSS |
Step 4 — Modeling
Mincer form (log-lin, with squared experience to capture concavity):
Choose the Classical linear regression family → OLS, with robust standard errors (heteroskedasticity is common in micro data).
Illustrative results (format — not real results):
| Variable | Coefficient | SE (robust) | p-value |
|---|---|---|---|
| educ | 0.082 | 0.005 | 0.000 |
| exper | 0.031 | 0.004 | 0.000 |
| exper2 | −0.0005 | 0.0001 | 0.000 |
| 0.38 |
Sample interpretation: ⇒ each extra year of schooling is associated with about 8.2% higher wages (approximately, given log-lin); experience is concave (rising then flattening).
- Stata
- R
- Python
* ---- Mincer wage equation (OLS, robust SE) ----
use "vhlss_wage.dta", clear
gen exper2 = exper^2
regress lnwage educ exper exper2 gender region, vce(robust)
# ---- Mincer wage equation (OLS, robust SE) ----
library(lmtest)
library(sandwich)
df <- read.csv("vhlss_wage.csv")
model <- lm(lnwage ~ educ + exper + I(exper^2) + gender + region,
data = df)
# Robust standard errors
coeftest(model, vcov = vcovHC(model))
# ---- Mincer wage equation (OLS, robust SE) ----
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv("vhlss_wage.csv")
df["exper2"] = df["exper"] ** 2
X = sm.add_constant(df[["educ", "exper", "exper2", "gender", "region"]])
model = sm.OLS(df["lnwage"], X).fit(cov_type="HC1")
print(model.summary())
Step 5 — Reporting
Export an APA/Harvard… report with replication code.
educ may be endogenous (unobserved ability affects both schooling and wages) ⇒ OLS may be biased. See the instrumental-variables fix in IV example: Returns to education.
Video tutorial
Video Tutorial: Running the Mincer wage equation (OLS) in EcoLab
See also
- OLS · IV example · Catalog