Skip to main content

Poisson — Count regression

Poisson regression models count data (non-negative integers: number of patents, doctor visits, accidents…). It uses a log link to keep the expected value positive.

When to use

Use Poisson when YY is a count. The core assumption: mean equals variance (equidispersion). If the variance exceeds the mean (overdispersion), switch to Negative Binomial.


Model specification

E[YiXi]=exp(β0+β1X1i++βkXki),YiPoisson(μi)E[Y_i \mid X_i] = \exp(\beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki}), \qquad Y_i \sim \text{Poisson}(\mu_i)

Estimated by MLE. Coefficients are interpreted via the incidence rate ratio eβje^{\beta_j}.


Diagnostics

  • Overdispersion: test whether Var(Y)>E[Y]\text{Var}(Y) > E[Y] (e.g. Cameron-Trivedi test). If present ⇒ SE understated ⇒ use NegBin or Poisson with robust SE/QMLE.
  • Excess zeros ⇒ ZIP.

Running in EcoLab

  1. Modeling module → Count data family → Poisson.
  2. Select the count YY and the XX variables; add an exposure/offset if needed (e.g. per population).
  3. Run; read IRRs; check overdispersion; export the replication code.

Replication code

* ===== Poisson — Count regression =====
* Estimate with robust standard errors
poisson patents rd_spend firm_size, vce(robust)

* Incidence Rate Ratios (IRR)
poisson patents rd_spend firm_size, vce(robust) irr

* Overdispersion test (Cameron-Trivedi)
* After estimation, check if Var(Y) >> E[Y]
estat gof

* Predicted counts
predict yhat, n

Limitations

  • The equidispersion assumption is often violated in practice.
  • Excess zeros make Poisson fit poorly ⇒ use zero-inflated/hurdle models.

Video tutorial

Video Tutorial: Guide to running Poisson regression in EcoLab

See also