GMM for dynamic panel data
GMM (Generalized Method of Moments) for dynamic panels handles the case where a lagged dependent variable appears on the right-hand side () and/or there are endogenous regressors. In that case FEM/REM are biased (the Nickell bias for small T); GMM uses internal instruments (lags of the variables themselves) to obtain consistent estimates. Two common variants: Difference GMM (Arellano–Bond, 1991) and System GMM (Arellano–Bover/Blundell–Bond, 1998).
In EcoLab, GMM belongs to the Panel Data group and generates reproducible Stata/R/Python code. See FEM and REM and Estimation & Modeling.
When should you use dynamic GMM?
- The model is dynamic: it contains (or several lags) among the regressors.
- The panel has large N, small T (many units, few periods) — typical of firm/country data.
- There are endogenous regressors correlated with the error, but no external instruments are available.
- You need to control for unobserved unit effects in a dynamic context.
System GMM is preferred when the series is close to a random walk (persistent) — then lags are weak instruments for Difference GMM.
Model specification
- : the lagged dependent variable (source of dynamics and endogeneity with ).
- Difference GMM: take first differences to remove , using level lags as instruments.
- System GMM: combine the difference and level equations, adding lagged differences as instruments for the level equation.
Required assumptions and tests
- AR(2) — Arellano–Bond: tests for second-order autocorrelation of the differenced residuals; must not be rejected (p > 0.05) for the instruments to be valid.
- Hansen/Sargan: tests the validity of the instrument set (overidentifying restrictions); a p-value that is too high (≈1.00) signals instrument proliferation.
- Number of instruments ≤ number of groups (N): keep the instrument count small (collapse/limit lags) to avoid weakening Hansen.
- Distinguish exogenous / predetermined / endogenous variables when declaring them.
Running in EcoLab
- Data Collection module: prepare the dynamic panel (entity + time columns), ensuring enough lags.
- Modeling module → Panel Data group → GMM (Arellano–Bond / Blundell–Bond).
- Declare , the lag terms, classify variables (exogenous/predetermined/endogenous), and choose Difference or System GMM.
- Run and read the Diagnostics tab: AR(1)/AR(2), Hansen, instrument count. Get the code from the Replication Code tab.
Input / output example
Input (illustrative): a panel of 30 countries × 15 years; growth is the dependent variable; growth_lag, invest, open are regressors.
Output (format, illustrative figures — not real results):
| Coefficient | p-value | |
|---|---|---|
| growth_lag | 0.34*** | 0.000 |
| invest | 0.21** | 0.018 |
| AR(2) p-value | 0.41 | (not rejected — valid) |
| Hansen p-value | 0.28 | (instruments valid) |
| Instruments / groups | 18 / 30 | (safe) |
Replication code
- Stata
- R
- Python
* ===== System GMM — Arellano-Bover / Blundell-Bond =====
* Panel setting
xtset country_id year
* System GMM: two-step with Windmeijer-corrected robust SE
* gmm() declares the endogenous variable and its instrument lag range
* iv() declares the strictly exogenous variable
xtabond2 growth L.growth invest open, ///
gmm(L.growth, lag(2 4)) iv(open) ///
twostep robust
* --- Diagnostics ---
* AR(2) p-value > 0.05 → no second-order autocorrelation (valid)
* Hansen p-value: not too low (<0.05) nor too high (≈1.00)
* Number of instruments ≤ number of groups
* Difference GMM (Arellano-Bond) for comparison
xtabond2 growth L.growth invest open, ///
gmm(L.growth, lag(2 4)) iv(open) ///
twostep robust noleveleq
# ===== System GMM — Arellano-Bover / Blundell-Bond =====
library(plm)
# Prepare panel data frame
pdata <- pdata.frame(df, index = c("country_id", "year"))
# System GMM: two-step estimator
# Dependent: growth; lagged dep. var as instrument (lags 2–4)
gmm_sys <- pgmm(
growth ~ lag(growth, 1) + invest | lag(growth, 2:4),
data = pdata,
effect = "twoways",
model = "twosteps"
)
# Robust summary (Windmeijer-corrected SE)
summary(gmm_sys, robust = TRUE)
# --- Diagnostics ---
# AR(2) test and Sargan/Hansen test printed in summary
# Ensure: AR(2) p > 0.05, Hansen p not too low or ≈1.00
# ===== System GMM — using pydynpd =====
# Install: pip install pydynpd
import pandas as pd
from pydynpd import regression
# df must contain: country_id, year, growth, invest, open
# pydynpd uses Arellano-Bond / Blundell-Bond internally
# System GMM command string (similar to Stata syntax)
command = (
"growth L1.growth invest open | "
"gmm(growth, 2 4) | "
"iv(open) | "
"twostep nolevel" # remove 'nolevel' for System GMM
)
# For System GMM (both level + difference equations):
command_sys = (
"growth L1.growth invest open | "
"gmm(growth, 2 4) | "
"iv(open) | "
"twostep"
)
result = regression.abond(command_sys, df, ["country_id", "year"])
# result prints: coefficients, AR(1)/AR(2), Hansen test
# Verify: AR(2) p > 0.05, Hansen p in a reasonable range
Limitations and notes
- Instrument proliferation invalidates Hansen; always report the instrument count and use collapse/limited lags.
- Difference GMM is weak with highly persistent series → consider System GMM.
- Requires a large enough N; with small N, standard errors are unreliable (use the Windmeijer correction).
- Not appropriate when T is large (consider other dynamic-panel estimators).