Probit — Normal binary regression
Probit models the probability of a binary outcome through the standard normal cumulative distribution function . Empirically, Probit and Logit usually lead to similar conclusions; they differ in the assumed error distribution (normal vs logistic).
Logit or Probit?
Results are usually very close. Logit is convenient for its odds ratios; Probit is preferred when a normal error assumption is more reasonable or within extended models (Heckman, biprobit). Always compare through marginal effects.
Model specification
where is the standard normal CDF. Estimated by MLE.
Interpretation
- Coefficients are not read directly; use marginal effects (AME/MEM).
- Model fit: Pseudo-, classification, AUC.
Running in EcoLab
- Modeling module → Limited dependent variable family → Probit.
- Select the binary and the variables.
- Run; read marginal effects; compare with Logit; export the replication code.
Replication code
- Stata
- R
- Python
* ===== Probit — Normal binary regression =====
* Estimate the probit model
probit y x1 x2 x3
* Average marginal effects (AME)
margins, dydx(*)
* Predicted probabilities
predict phat, pr
* Classification table
estat classification
* Pseudo-R² is shown in the estimation output
# ===== Probit — Normal binary regression =====
# Estimate the probit model
model <- glm(y ~ x1 + x2 + x3,
data = df,
family = binomial(link = "probit"))
summary(model)
# Average marginal effects (AME)
library(margins)
mfx <- margins(model)
summary(mfx)
# Predicted probabilities
df$phat <- predict(model, type = "response")
# McFadden Pseudo-R²
null_model <- glm(y ~ 1, data = df, family = binomial(link = "probit"))
1 - logLik(model) / logLik(null_model)
# ===== Probit — Normal binary regression =====
import statsmodels.api as sm
# Prepare data
X = sm.add_constant(df[["x1", "x2", "x3"]])
y = df["y"]
# Estimate the probit model
model = sm.Probit(y, X).fit()
print(model.summary())
# Average marginal effects (AME)
mfx = model.get_margeff()
print(mfx.summary())
# Predicted probabilities
df["phat"] = model.predict(X)
# McFadden Pseudo-R² is shown in model.summary()
print("Pseudo R²:", model.prsquared)
Limitations
- No odds interpretation like Logit.
- Same underlying assumptions (exogeneity, correct specification) as other binary-choice models.