The variance component model

The null model

Let us start with the simplest possible regression model without explanatory variables. The regression coefficient estimates the grand mean of the dependent variable, and the residuals are the individual deviations from the mean.

Yi = β0 + ei

The variance of the residuals is also the sample variance of Y. The situation is illustrated in the figure.

Figure 3.1. Variance of the residuals = the sample variance of Y

Let us rewrite the regression equation to also include a contextual level (schools, firms, countries).

Yij = β0 + u0j + eij

Yij could be the wage of individual i in firm j. The regression constant has no subscript and is assumed to be fixed and equal to the grand mean. The new term in the model is the level 2 residual,u0j, which represents the deviations of the firm means from the overall mean. This is illustrated in the figure below, where the residuals in the former model are split into two components representing the variation at the two levels or the between (u) and within (e) firm (level 2 unit) variation. The variance in Y can be expressed as Var(Yij) = Var(uj + eij) = Var(uj) + Var(eij) + 2*Cov(uj,eij)

Since we assume that the covariance between level 1 and the level 2 residuals is zero, the variance in Y simplifies to the sum of the variances of the residuals: Var(Yij) = Var(uj) + Var(eij) = σu2 + σe2

Figure 3.2. Variances in Y = sum of the variancces of the residuals

The natural next step is to define a coefficient showing the proportion of the variance in Y that stems from the variation between the level 2 units. This is known as the Intraclass Correlation Coefficient (ICC) or as the Variance Partition Coefficient (VPC) symbolized by the Greek letter rho.

The ICC is frequently used as a baseline for estimating the variances at the two levels that can be explained by more complex models. It is also used to evaluate whether or not the level 2 variation is ignorable. It is hard to set a general threshold here, but the hierarchical data structure should not be ignored if the ICC is 0.05 or more. If the ICC is smaller, one-level models should be considered, perhaps with robust estimation of standard errors.

Random versus fixed effects

The variance component model implies random effects, in that the variation in the intercepts is captured by the variance in the level 2 residuals. The model is repeated below with the assumption of normally distributed errors. Residuals can be correlated within levels but not across levels.

Yij = β0 + U0j + eij

u0j ∼ N(0,σu02) eij ∼ N(0,σe2)

This model is similar to a One-Way ANOVA, where the level 2 units are seen as a factor (categorical X).

Yij = β0 + βj + eij

eij ∼ N(0,σe2)

In the first model, the level 2 residuals form a random normally distributed variable with a variance of σu2 because we see the level 2 units (schools, firms) as a random sample from a population. In the ANOVA model, the level 2 units are seen as fixed, and can even be strategically chosen. The level 2 effects are captured by a set of (J-1) dummy variables j).

In general, fixed-effects models represent an alternative to multilevel models. They are especially useful in (pooled) time series data, where they can be used to control for time-constant variables. In cross-sectional data, the fixed-effect model can be chosen because of the small number of level 2 units (<10), and when they are strategically chosen rather than drawing a random sample from a population. In cross-sectional data, however, the fixed effects model cannot be combined with level 2 covariates. Multilevel analysis is the best choice in situations with many (15+) level 2 units, where the level 2 units constitute a random sample from a population and we want to include level 2 explanatory variables.

Models with level 1 explanatory variables

The OLS regression model with one explanatory variable can be our point of departure:

Yi = β0 + β1*Xi + ei

The two-level version can be expressed in one equation for each level or in one equation. Both have advantages in relation to understanding multilevel models. The first version with two equations shows the relationship to the slopes-as-outcome approach. In the second equation, the regression constants (intercepts) constitute the dependent variable.

Yij = β0j + β1*Xij + eij

β0j = β0 + u0j

u0j ∼ N(0,σu2) eij ∼ N(0,σe2)

The one-equation version is the result of substituting the right-hand side of the second equation for β0j in the first equation.

Yij = β0 + β1*Xij + u0j + eij

As you may have seen in OLS regression models with dummy variables, they can be graphed as parallel lines, such as a wage equation for men and women. Here, the level 2 residuals can be used in the same way to estimate the parallel regression lines of each level 2 unit. The line for each level 2 unit is: Yj = (β0 + u0j) + β1*Xij. Summing the two terms in the parenthesis gives the level 2 unit specific intercept, and the regression coefficient is common to all level 2 units. The individuals (students, employees, respondents) are spread around the level 2 unit line with a residual of eij. This is illustrated in the figure below, with the mean regression line and lines for two level 2 units.

Let us stop to consider the terminology:

β0 and β1 are fixed regression coefficients.

u0j and eij are random effects or multilevel residuals.

σu2 and σe2 are random parameters to be estimated along with the (fixed) regression coefficients.

Figure 3.3. Mean regression line and lines for two level 2 units

Adding explanatory variables at both levels

Variance component (random intercept) models can be extended with additional level 1 and level 2 explanatory variables without much extra complexity. Let us add X2 at level 1, the individual level, and Z at level 2, the contextual level.

Yij = β0j + β1*X1ij + β2*X2ij + eij

β0j = β0 + β3*Zj + u0j

We continue to use betas to symbolize all regression coefficients, also those related to contextual covariates, although some texts use Greek gammas for the latter. For the one-equation version, we need to substitute the regression constant in the first equation with the right-hand side of the second equation. Note that the random part of the model is unchanged since the intercept is still the only parameter that is free to vary among level 2 units.

Yij = (β0 + β3*Zj + u0j) + β1*X1ij + β2*X2ij + eij

Rearranging and removing the parenthesis:

Yij = β0 + β1*X1ij + β2*X2ij + β3*Zj+ u0j + eij

Pseudo R squares

Although we use the same symbols for the residuals as in the null model, we expect them to be smaller due to the two added explanatory variables. The differences between the residual variances for the null model and the current model can therefore be used to compute pseudo R squares. We add pseudo here to distinguish them from the R squares in OLS regression models. We can compute three pseudo R squares, one for each level and one for the two levels combined. Let us start by defining the R square for the individual level variation:

The computation of the R square for the level 2 variation is quite similar:

The total R square can be computed as follows:

Unlike the R square from OLS regression, these R squares can also be negative. This can be the case when the estimates are close to zero, especially for the level 2 variation.