# The OLS model

Multilevel analysis can be seen as an extension of ordinary least squares (OLS) regression that allows for more complex error structures. It is therefore essential to refresh basic regression skills. We will to do this by estimating and interpreting variations on a simple model for hourly wages among employees. The example has the advantage of having a continuous dependent variable and explanatory variables that will be familiar to everyone: years of education, age and gender.

The analyses will be variations of the following multiple regression model with hourly wage (Y) as the dependent variable and years of education (X_{1}), age (X_{2}), and female gender (X_{3}) as independent variables. The regression constant (intercept) and the regression coefficients are denoted by Greek letters (beta) and the residual (error) term by an āeā. The index āiā applies to all employees (units) in the sample.

Y_{i} = β_{0} + β_{1}X_{1i} + β_{2}X_{2i} + β_{3}X_{3i} + e_{i}

The regression model for the population rests on two sets of assumptions, about the specification of the model and about the residuals. Let us quickly review these assumptions:

A model should be correctly specified:

- All relevant x-variables should be included, and irrelevant ones eliminated.
- The relationships between the x-variables and Y are linear.
- The model is additive, without statistical interaction between the x-variables.

Assumption 2 can be relaxed by building non-linearity into the model, for instance by adding polynomials. Assumption 3 can be relaxed by adding interaction terms to the model.

The assumptions about the residuals:

- The residuals should have a mean (expected value) of zero in the population.
- The residuals should have equal variance for subgroups of all x-variables (homoscedasticity).
- The residuals are uncorrelated with each other and with the x-variables.
- The residuals should be normally distributed.
- The x-variables should not be perfectly correlated, pairwise or group-wise (no multicollinearity).

The regression equation for the sample is normally expressed in Roman letters:

Y_{i} = b_{0} + b_{1}X_{1i} + b_{2}X_{2i} + b_{3}X_{3i} + e_{i}