Negative Binomial regression
Negative Binomial (NegBin) is a count model that handles overdispersion — when the variance exceeds the mean, a very common situation that Poisson cannot describe correctly. NegBin adds a dispersion parameter to relax the equidispersion constraint.
When to use
Use NegBin when is a count with overdispersion (). As , NegBin reduces to Poisson.
Model specification
The parameter measures overdispersion. Estimated by MLE.
Diagnostics
- Test (NegBin vs Poisson): rejection ⇒ NegBin is more appropriate.
- If excess zeros remain ⇒ ZINB.
Running in EcoLab
- Modeling module → Count data family → Negative Binomial.
- Select the count , the variables, an offset if needed.
- Run; read IRRs and ; compare AIC/BIC with Poisson; export the replication code.
Replication code
- Stata
- R
- Python
* ===== Negative Binomial Regression =====
* Estimate with robust standard errors
nbreg patents rd_spend firm_size, vce(robust)
* Incidence Rate Ratios (IRR)
nbreg patents rd_spend firm_size, vce(robust) irr
* Test alpha = 0 (NegBin vs Poisson)
* The LR test of alpha is shown at the bottom of output
* p < 0.05 ⇒ NegBin preferred over Poisson
* Compare AIC/BIC
estimates store nb
quietly poisson patents rd_spend firm_size, vce(robust)
estimates store pois
estimates stats pois nb
# ===== Negative Binomial Regression =====
library(MASS)
model <- glm.nb(patents ~ rd_spend + firm_size, data = df)
summary(model)
# Incidence Rate Ratios (IRR)
exp(coef(model))
exp(confint(model))
# Dispersion parameter (theta); alpha = 1/theta
cat("alpha (overdispersion) =", 1 / model$theta, "\n")
# Compare AIC with Poisson
model_pois <- glm(patents ~ rd_spend + firm_size,
data = df, family = poisson())
AIC(model_pois, model)
# ===== Negative Binomial Regression =====
import statsmodels.api as sm
import numpy as np
# Prepare data
X = sm.add_constant(df[["rd_spend", "firm_size"]])
y = df["patents"]
# Estimate Negative Binomial (NB2 parameterization)
model = sm.GLM(y, X,
family=sm.families.NegativeBinomial()).fit()
print(model.summary())
# Incidence Rate Ratios (IRR)
print("IRR:")
print(np.exp(model.params))
# Compare AIC with Poisson
model_pois = sm.GLM(y, X, family=sm.families.Poisson()).fit()
print(f"AIC Poisson: {model_pois.aic:.1f}")
print(f"AIC NegBin: {model.aic:.1f}")
# Lower AIC ⇒ better fit
Limitations
- Still poor when excess zeros arise from a separate mechanism ⇒ use zero-inflated.
- Needs a large enough sample to estimate stably.