OLS — Ordinary Least Squares regression

OLS (Ordinary Least Squares) is the foundational linear regression model; it estimates coefficients by minimizing the sum of squared residuals. It is the starting point of most empirical analysis and the baseline against which more complex estimators are compared.

When to use

OLS suits cross-section data with a continuous dependent variable and a relationship linear in parameters. If assumptions are violated (heteroskedasticity, endogeneity, panel structure…), switch to an appropriate estimator.

Model specification

Y_i = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} + \varepsilon_i

The OLS estimator (matrix form): $\hat{\beta} = (X'X)^{-1} X'Y$ , the solution to $\min_{\beta} \sum_{i=1}^{n} \varepsilon_i^2$ .

Gauss-Markov assumptions

Linear in parameters and correctly specified.
Zero conditional mean: $E[\varepsilon_i \mid X] = 0$ (exogeneity).
Homoskedasticity: $\mathrm{Var}(\varepsilon_i) = \sigma^2$ .
No autocorrelation among errors.
No perfect multicollinearity among regressors.

When 1–5 hold, OLS is BLUE (Best Linear Unbiased Estimator).

Diagnostics & remedies

Issue	Test	Remedy
Heteroskedasticity	Breusch-Pagan, White	Robust SE (HC0–HC3)
Autocorrelation	Durbin-Watson, Breusch-Godfrey	Newey-West / GLS
Multicollinearity	VIF	Drop variable / Ridge, Lasso
Endogeneity	Hausman	IV/2SLS
Non-normal residuals	Jarque-Bera	Transform / large sample

Robust standard errors

When heteroskedasticity is suspected, choose White Robust (HC0–HC3) or Clustered SE for more reliable t-stats and p-values — this is exactly how EcoLab forms multiple estimators from the same model.

Running in EcoLab

Modeling module → Classical linear regression family → OLS.
Select the dependent variable $Y$ and the independent variables $X_1, \dots, X_k$ .
Choose the standard-error structure (Homoskedastic / Robust / Clustered).
Run and read the Estimation, Diagnostics and Replication Code tabs.

Input / output example

Input (illustrative): wage on educ (years of schooling), exper (experience).

Output (format, illustrative figures — not real results):

Variable	Coefficient	SE (robust)	p-value
educ	0.078	0.012	0.000
exper	0.021	0.006	0.001
$R^2$	0.34

Replication code

Stata
R
Python

* ---- OLS with robust standard errors ----
* Load data (illustrative)
use "wage_data.dta", clear

* Generate squared experience
gen exper2 = exper^2

* OLS with White robust standard errors
regress lnwage educ exper exper2, vce(robust)

* Diagnostics: Breusch-Pagan heteroskedasticity test
estat hettest

* Variance Inflation Factor (multicollinearity check)
vif

# ---- OLS with robust standard errors ----
library(lmtest)
library(sandwich)

# Load data (illustrative)
df <- read.csv("wage_data.csv")
df$exper2 <- df$exper^2

# Fit OLS model
model <- lm(lnwage ~ educ + exper + exper2, data = df)
summary(model)

# Robust standard errors (HC1, equivalent to Stata's robust)
coeftest(model, vcov = vcovHC(model, type = "HC1"))

# Variance Inflation Factor
library(car)
vif(model)

# ---- OLS with robust standard errors ----
import pandas as pd
import statsmodels.api as sm

# Load data (illustrative)
df = pd.read_csv("wage_data.csv")
df["exper2"] = df["exper"] ** 2

# Define variables
X = sm.add_constant(df[["educ", "exper", "exper2"]])
y = df["lnwage"]

# Fit OLS with HC1 robust standard errors
model = sm.OLS(y, X).fit(cov_type="HC1")
print(model.summary())

Limitations

Sensitive to outliers and functional-form misspecification.
Not suitable when $Y$ is discrete/censored (use Logit/Probit/Tobit) or for panel data (use FE/RE).

Video tutorial

Video Tutorial: Running OLS in EcoLab

Model specification​

Gauss-Markov assumptions​

Diagnostics & remedies​

Running in EcoLab​

Input / output example​

Replication code​

Limitations​

Video tutorial​

See also​