Poisson — Count regression
Poisson regression models count data (non-negative integers: number of patents, doctor visits, accidents…). It uses a log link to keep the expected value positive.
When to use
Use Poisson when is a count. The core assumption: mean equals variance (equidispersion). If the variance exceeds the mean (overdispersion), switch to Negative Binomial.
Model specification
Estimated by MLE. Coefficients are interpreted via the incidence rate ratio .
Diagnostics
- Overdispersion: test whether (e.g. Cameron-Trivedi test). If present ⇒ SE understated ⇒ use NegBin or Poisson with robust SE/QMLE.
- Excess zeros ⇒ ZIP.
Running in EcoLab
- Modeling module → Count data family → Poisson.
- Select the count and the variables; add an exposure/offset if needed (e.g. per population).
- Run; read IRRs; check overdispersion; export the replication code.
Replication code
- Stata
- R
- Python
* ===== Poisson — Count regression =====
* Estimate with robust standard errors
poisson patents rd_spend firm_size, vce(robust)
* Incidence Rate Ratios (IRR)
poisson patents rd_spend firm_size, vce(robust) irr
* Overdispersion test (Cameron-Trivedi)
* After estimation, check if Var(Y) >> E[Y]
estat gof
* Predicted counts
predict yhat, n
# ===== Poisson — Count regression =====
model <- glm(patents ~ rd_spend + firm_size,
data = df,
family = poisson(link = "log"))
summary(model)
# Incidence Rate Ratios (IRR)
exp(coef(model))
exp(confint(model))
# Overdispersion test
library(AER)
dispersiontest(model)
# If overdispersed, use robust SE (quasi-Poisson)
model_qp <- glm(patents ~ rd_spend + firm_size,
data = df,
family = quasipoisson())
summary(model_qp)
# ===== Poisson — Count regression =====
import statsmodels.api as sm
import numpy as np
# Prepare data
X = sm.add_constant(df[["rd_spend", "firm_size"]])
y = df["patents"]
# Estimate Poisson with robust SE
model = sm.GLM(y, X, family=sm.families.Poisson()).fit(cov_type="HC1")
print(model.summary())
# Incidence Rate Ratios (IRR)
print("IRR:")
print(np.exp(model.params))
# Check overdispersion: Pearson chi² / df_resid >> 1 ⇒ overdispersed
print("Dispersion:", model.pearson_chi2 / model.df_resid)
Limitations
- The equidispersion assumption is often violated in practice.
- Excess zeros make Poisson fit poorly ⇒ use zero-inflated/hurdle models.
Video tutorial
Video Tutorial: Guide to running Poisson regression in EcoLab
See also
- Negative Binomial · ZIP · ZINB · Catalog