OLS — Ordinary Least Squares regression
OLS (Ordinary Least Squares) is the foundational linear regression model; it estimates coefficients by minimizing the sum of squared residuals. It is the starting point of most empirical analysis and the baseline against which more complex estimators are compared.
OLS suits cross-section data with a continuous dependent variable and a relationship linear in parameters. If assumptions are violated (heteroskedasticity, endogeneity, panel structure…), switch to an appropriate estimator.
Model specification
The OLS estimator (matrix form): , the solution to .
Gauss-Markov assumptions
- Linear in parameters and correctly specified.
- Zero conditional mean: (exogeneity).
- Homoskedasticity: .
- No autocorrelation among errors.
- No perfect multicollinearity among regressors.
When 1–5 hold, OLS is BLUE (Best Linear Unbiased Estimator).
Diagnostics & remedies
| Issue | Test | Remedy |
|---|---|---|
| Heteroskedasticity | Breusch-Pagan, White | Robust SE (HC0–HC3) |
| Autocorrelation | Durbin-Watson, Breusch-Godfrey | Newey-West / GLS |
| Multicollinearity | VIF | Drop variable / Ridge, Lasso |
| Endogeneity | Hausman | IV/2SLS |
| Non-normal residuals | Jarque-Bera | Transform / large sample |
When heteroskedasticity is suspected, choose White Robust (HC0–HC3) or Clustered SE for more reliable t-stats and p-values — this is exactly how EcoLab forms multiple estimators from the same model.
Running in EcoLab
- Modeling module → Classical linear regression family → OLS.
- Select the dependent variable and the independent variables .
- Choose the standard-error structure (Homoskedastic / Robust / Clustered).
- Run and read the Estimation, Diagnostics and Replication Code tabs.
Input / output example
Input (illustrative): wage on educ (years of schooling), exper (experience).
Output (format, illustrative figures — not real results):
| Variable | Coefficient | SE (robust) | p-value |
|---|---|---|---|
| educ | 0.078 | 0.012 | 0.000 |
| exper | 0.021 | 0.006 | 0.001 |
| 0.34 |
Replication code
- Stata
- R
- Python
* ---- OLS with robust standard errors ----
* Load data (illustrative)
use "wage_data.dta", clear
* Generate squared experience
gen exper2 = exper^2
* OLS with White robust standard errors
regress lnwage educ exper exper2, vce(robust)
* Diagnostics: Breusch-Pagan heteroskedasticity test
estat hettest
* Variance Inflation Factor (multicollinearity check)
vif
# ---- OLS with robust standard errors ----
library(lmtest)
library(sandwich)
# Load data (illustrative)
df <- read.csv("wage_data.csv")
df$exper2 <- df$exper^2
# Fit OLS model
model <- lm(lnwage ~ educ + exper + exper2, data = df)
summary(model)
# Robust standard errors (HC1, equivalent to Stata's robust)
coeftest(model, vcov = vcovHC(model, type = "HC1"))
# Variance Inflation Factor
library(car)
vif(model)
# ---- OLS with robust standard errors ----
import pandas as pd
import statsmodels.api as sm
# Load data (illustrative)
df = pd.read_csv("wage_data.csv")
df["exper2"] = df["exper"] ** 2
# Define variables
X = sm.add_constant(df[["educ", "exper", "exper2"]])
y = df["lnwage"]
# Fit OLS with HC1 robust standard errors
model = sm.OLS(y, X).fit(cov_type="HC1")
print(model.summary())
Limitations
- Sensitive to outliers and functional-form misspecification.
- Not suitable when is discrete/censored (use Logit/Probit/Tobit) or for panel data (use FE/RE).