Skip to main content

R&D and patent counts (count data)

This illustrates the count-data family: the number of patents a firm files (non-negative integer) as a function of R&D and size. Figures are illustrative.

Summary: because count data is often overdispersed, we compare Poisson and Negative Binomial and select the appropriate model.


Step 1 — Ideation

  • Question: how much does more R&D raise the patent count?

Step 2 — Literature Review

Economics of innovation, the knowledge production function; the patents count outcome.

Step 3 — Data Collection

VariableSymbolMeasurementSource
Patent countpatentscount/yearIP database
R&D spendinglnrdlog R&Dfinancials
Sizelnsizelog assets/laborfinancials

Step 4 — Modeling

Choose the Count data family → Poisson, test for overdispersion; if present ⇒ Negative Binomial:

E[patentsiXi]=exp(β0+β1lnrdi+β2lnsizei)E[patents_i \mid X_i] = \exp(\beta_0 + \beta_1 lnrd_i + \beta_2 lnsize_i)

Illustrative results (format — not real results):

PoissonNegBin
lnrd (IRR)1.35***1.31***
lnsize (IRR)1.12**1.10**
Overdispersion α\alpha0.42 (≠0) ⇒ choose NegBin

Sample interpretation: a 1% rise in R&D is associated with a higher expected patent count (IRR > 1); the test α0\alpha \ne 0NegBin fits better than Poisson.

* ===== Count Models — Patents & R&D =====
* Poisson with robust SE
poisson patents lnrd lnsize, vce(robust)
estimates store pois

* Negative Binomial with robust SE
nbreg patents lnrd lnsize, vce(robust)
estimates store nb

* Compare AIC/BIC
estimates stats pois nb

* IRR for interpretation
nbreg patents lnrd lnsize, vce(robust) irr

* Overdispersion: the LR test of alpha at the bottom
* of nbreg output. p < 0.05 ⇒ NegBin preferred

Step 5 — Reporting

Export a report + replication code; if there are excess zeros (many firms with 0 patents) ⇒ consider ZINB.

Video tutorial

Video Tutorial: Guide to running count models (Poisson/NegBin) in EcoLab

See also