Skip to main content

OLS — Ordinary Least Squares regression

OLS (Ordinary Least Squares) is the foundational linear regression model; it estimates coefficients by minimizing the sum of squared residuals. It is the starting point of most empirical analysis and the baseline against which more complex estimators are compared.

When to use

OLS suits cross-section data with a continuous dependent variable and a relationship linear in parameters. If assumptions are violated (heteroskedasticity, endogeneity, panel structure…), switch to an appropriate estimator.


Model specification

Yi=β0+β1X1i++βkXki+εiY_i = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} + \varepsilon_i

The OLS estimator (matrix form): β^=(XX)1XY\hat{\beta} = (X'X)^{-1} X'Y, the solution to minβi=1nεi2\min_{\beta} \sum_{i=1}^{n} \varepsilon_i^2.


Gauss-Markov assumptions

  1. Linear in parameters and correctly specified.
  2. Zero conditional mean: E[εiX]=0E[\varepsilon_i \mid X] = 0 (exogeneity).
  3. Homoskedasticity: Var(εi)=σ2\mathrm{Var}(\varepsilon_i) = \sigma^2.
  4. No autocorrelation among errors.
  5. No perfect multicollinearity among regressors.

When 1–5 hold, OLS is BLUE (Best Linear Unbiased Estimator).


Diagnostics & remedies

IssueTestRemedy
HeteroskedasticityBreusch-Pagan, WhiteRobust SE (HC0–HC3)
AutocorrelationDurbin-Watson, Breusch-GodfreyNewey-West / GLS
MulticollinearityVIFDrop variable / Ridge, Lasso
EndogeneityHausmanIV/2SLS
Non-normal residualsJarque-BeraTransform / large sample
Robust standard errors

When heteroskedasticity is suspected, choose White Robust (HC0–HC3) or Clustered SE for more reliable t-stats and p-values — this is exactly how EcoLab forms multiple estimators from the same model.


Running in EcoLab

  1. Modeling module → Classical linear regression family → OLS.
  2. Select the dependent variable YY and the independent variables X1,,XkX_1, \dots, X_k.
  3. Choose the standard-error structure (Homoskedastic / Robust / Clustered).
  4. Run and read the Estimation, Diagnostics and Replication Code tabs.

Input / output example

Input (illustrative): wage on educ (years of schooling), exper (experience).

Output (format, illustrative figures — not real results):

VariableCoefficientSE (robust)p-value
educ0.0780.0120.000
exper0.0210.0060.001
R2R^20.34

Replication code

* ---- OLS with robust standard errors ----
* Load data (illustrative)
use "wage_data.dta", clear

* Generate squared experience
gen exper2 = exper^2

* OLS with White robust standard errors
regress lnwage educ exper exper2, vce(robust)

* Diagnostics: Breusch-Pagan heteroskedasticity test
estat hettest

* Variance Inflation Factor (multicollinearity check)
vif

Limitations

  • Sensitive to outliers and functional-form misspecification.
  • Not suitable when YY is discrete/censored (use Logit/Probit/Tobit) or for panel data (use FE/RE).

Video tutorial

Video Tutorial: Running OLS in EcoLab

See also