Negative Binomial regression

Negative Binomial (NegBin) is a count model that handles overdispersion — when the variance exceeds the mean, a very common situation that Poisson cannot describe correctly. NegBin adds a dispersion parameter $\alpha$ to relax the equidispersion constraint.

When to use

Use NegBin when $Y$ is a count with overdispersion ( $\text{Var}(Y) > E[Y]$ ). As $\alpha \to 0$ , NegBin reduces to Poisson.

Model specification

E[Y_i \mid X_i] = \mu_i = \exp(X_i \beta), \qquad \text{Var}(Y_i) = \mu_i + \alpha \, \mu_i^2

The parameter $\alpha > 0$ measures overdispersion. Estimated by MLE.

Diagnostics

Test $H_0: \alpha = 0$ (NegBin vs Poisson): rejection ⇒ NegBin is more appropriate.
If excess zeros remain ⇒ ZINB.

Running in EcoLab

Modeling module → Count data family → Negative Binomial.
Select the count $Y$ , the $X$ variables, an offset if needed.
Run; read IRRs and $\alpha$ ; compare AIC/BIC with Poisson; export the replication code.

Replication code

Stata
R
Python

* ===== Negative Binomial Regression =====
* Estimate with robust standard errors
nbreg patents rd_spend firm_size, vce(robust)

* Incidence Rate Ratios (IRR)
nbreg patents rd_spend firm_size, vce(robust) irr

* Test alpha = 0 (NegBin vs Poisson)
* The LR test of alpha is shown at the bottom of output
* p < 0.05 ⇒ NegBin preferred over Poisson

* Compare AIC/BIC
estimates store nb
quietly poisson patents rd_spend firm_size, vce(robust)
estimates store pois
estimates stats pois nb

# ===== Negative Binomial Regression =====
library(MASS)

model <- glm.nb(patents ~ rd_spend + firm_size, data = df)

summary(model)

# Incidence Rate Ratios (IRR)
exp(coef(model))
exp(confint(model))

# Dispersion parameter (theta); alpha = 1/theta
cat("alpha (overdispersion) =", 1 / model$theta, "\n")

# Compare AIC with Poisson
model_pois <- glm(patents ~ rd_spend + firm_size,
                   data = df, family = poisson())
AIC(model_pois, model)

# ===== Negative Binomial Regression =====
import statsmodels.api as sm
import numpy as np

# Prepare data
X = sm.add_constant(df[["rd_spend", "firm_size"]])
y = df["patents"]

# Estimate Negative Binomial (NB2 parameterization)
model = sm.GLM(y, X,
               family=sm.families.NegativeBinomial()).fit()
print(model.summary())

# Incidence Rate Ratios (IRR)
print("IRR:")
print(np.exp(model.params))

# Compare AIC with Poisson
model_pois = sm.GLM(y, X, family=sm.families.Poisson()).fit()
print(f"AIC Poisson: {model_pois.aic:.1f}")
print(f"AIC NegBin:  {model.aic:.1f}")
# Lower AIC ⇒ better fit

Limitations

Still poor when excess zeros arise from a separate mechanism ⇒ use zero-inflated.
Needs a large enough sample to estimate $\alpha$ stably.

Video tutorial

Video Tutorial: Guide to running Negative Binomial regression in EcoLab

Model specification​

Diagnostics​

Running in EcoLab​

Replication code​

Limitations​

Video tutorial​

See also​