# All pages

# Chapter 2: Basic concepts in multilevel analysis

In this chapter, the most important aspects of multilevel analysis will be explained, starting with a short history and ending by using the routines for multilevel analysis to estimate one-level regression models. Most of the chapter consists of a text for reading, with review questions for important concepts. Those who have some familiarity with multilevel analysis may want to skip this chapter or only answer the review questions.

# A short history of multilevel analysis

Multilevel analysis is a relatively new statistical technique in social science research, although its roots can be traced back to classical sociological studies, especially Durkheim’s study of suicide. Durkheim sought the causes of suicide, a very personal and individual phenomenon, in the social contexts of the individual. Multilevel analysis can be viewed as a modern way of addressing research questions concerning how outcomes at the individual level can be seen as the result of the interplay between individual and contextual factors.

The first step towards modern multilevel analysis was the rise of *contextual* analysis in the USA in the 1940s. Contextual analysis was introduced as a critique of the dominant micro-perspective in American sociology. Contextual analysis became more established in the 1960s, the statistical techniques became more sophisticated and conceptual progress was made. Larzarsfeld’s [Lar59] concept of contextual propositions, Larzarsfeld and Menzel’s [Lar61] typology of variables by levels and Blau’s [Bla60] concept of structural effects were the most influential contributions.

Around 1970, contextual analysis was heavily criticized by Robert Hauser [Hau70]. He maintained that most alleged contextual effects lacked substance and were artefacts of inadequately specified individual-level models. Instead, the ‘contextual effects’ were grouped individual effects. Hauser used the term contextual fallacy to describe this phenomenon.

From the end of the 1970s, the crucial steps in developing multilevel analysis took place in school research. Educational data had mainly been analysed at the individual level, ignoring the schools. An innovative step was to analyse each school separately. The dependent variable could be an outcome variable such as the score in a mathematics test with explanatory variables at the individual level, such as gender and parents’ socioeconomic status. Estimating identical regression models for each school would yield a set of intercepts and regression coefficients that could show systematic variation by schools. This led to the slopes-as-outcomes approach. The slopes (regression coefficients) were seen as dependent variables in a school-level analysis, with explanatory variables at the school level. This approach can be viewed as a two-stage multiple regression design.

In the 1980s, several variations of multilevel models were developed to avoid the statistical problems of the two-stage design. In Chicago, a group of researchers developed the HLM software for simultaneous estimation of ‘hierarchical linear models’ with two levels [Rau02]. In London, another groups of educational researchers developed another software program for multilevel analysis, now known as Mlwin [Gol95].

# Individuals and contexts

In traditional survey research, the individual is often seen in isolation from his or her contexts, whereas qualitative research emphasizes contextualization. Multilevel analysis can be seen as a tool for contextualising quantitative statistical analysis. In many fields of research, the processes to be studied involve two or more levels of analysis. Learning takes place in schools and it does not seem right to ignore this fact when studying learning outcomes. Another example is from the field of public health, where the famous Wilkinson hypothesis implies that income inequality in a community can influence the health of the inhabitants. A third example is from the study of wage determination, where the wages of employees can be seen as dependent on the profitability of the firm as well as on the human capital of the employees. In these examples, research questions emerge from a combination of theories at the individual and contextual level.

### Hierarchical data structures

In multilevel analysis, the basic data structure is hierarchical. Units at a lower level are nested within units at higher levels. The table below lists four examples. In the first example, employees are nested within firms. In the two next examples, which could apply to the European Social Survey, respondents are nested within countries; first in a two-level model, then within regions and countries in a three-level model. The last example in the table is the structure of the multilevel model of change. The lowest level consists of measurements made on two or more occasions nested within students within schools. This model requires a repeated measurement or panel design, and it will not be covered in this package. It cannot be applied to data from the European Social Survey since the ESS is based on a cross-sectional design. Several rounds of the ESS can, however, be analysed using multilevel models, but the estimation of changes only applies to the country level. Individual change is unobserved due to the cross-sectional design.

Level 3 | Countries | Schools | ||
---|---|---|---|---|

Level 2 | Firms | Countries | Regions | Students |

Level 1 | Employees | Respondents | Respondents | Occasions |

### Which variables can constitute a level?

Individuals, schools and countries are typical levels, but gender and social class are not candidates for levels. Why is this so? The technical difference is that the former variables identify the units in the data file, whereas gender and class are candidates for explanatory variables at the individual level. For a variable to become a level it is necessary that it identifies units sampled from a population, such as individuals, families, firms, schools, regions and countries. Ideally, both the level 1 and the level 2 units should be random samples from their respective populations. They will thus constitute random classifications, whereas gender and social class with only a few non-exchangeable values constitute fixed classifications. Regions and countries are commonly used as levels although they are seldom randomly sampled. This is a common practice, although it can be seen as a violation of one of the assumptions of multilevel models.

For some variables, there may be a choice. In longitudinal data, occasions (time) can be defined as a level or as an explanatory variable. Occupational groups can be defined as a level or collapsed into categories or scored and used as an explanatory variable. The same applies to industry classifications in studies of firms.

# The problem with ignoring the multilevel structure

Ignoring the multilevel structure of the data creates both conceptual and statistical problems. If we drop the contextual levels, such as schools in studies of learning and firms in the study of wage determination, we ignore the arenas for learning and wage determination. If we focus our analysis on only one problem, we also run into statistical problems. Ignoring the contextual level, and conducting the analysis at the individual level, could lead to underestimation of the standard errors and result in invalid statistical tests, especially for contextual variables. The opposite solution - aggregating the data to the contextual analysis and ignoring the individual level - opens for the ecological fallacy. The safest solutions are to correct for clustering and compute robust standard error if the ‘context’ or clusters are artificial and of little theoretical interest, and to use multilevel analysis if the contexts are theoretically important.

### Comparative research and multilevel models

Comparative research has traditionally involved cross-cultural comparisons of two or more countries. A wider definition is to include any comparison of cases in space or time. The cases can be any type of macro-unit, organizational units (firms, schools) or geographic units (communities, regions, countries). A classification of comparative research is presented in Figure 2.1. The first dimension in the table classifies the data structure. We can distinguish between the use of either the micro or macro level, or both. In the latter situation, the micro-level units are embedded in the macro-units, such as respondents in countries in the European Social Survey. The other dimension classifies designs by the number of contexts (cases), i.e. countries in studies based on the European Social Survey. Studies of a single case (country) may be made implicitly comparative by comparing them with other studies of other countries with similar research questions. Comparative studies have traditionally involved comparison of two or a few cases. We can distinguish between comparative system analysis, separate micro-level analysis and pooled fixed-effects analysis. The first type is mainly qualitative, such as studies comparing political systems. The second type could involve the estimation of a common regression model for two countries, while the third would be based on a pooled data file with country fixed effects, i.e. a set of dummy variables to represent countries. This implies that the intercept in the regression model would vary between countries. The regression coefficients are assumed to be common, however, but they may be allowed to vary by adding country by variables interactions. The main advantages of the two latter approaches are that the countries can be chosen for theoretical reasons and that they will remain in focus throughout the statistical analysis.

In the last column, designs based on many countries are presented. The first one is based on aggregated statistics and a longitudinal perspective can also be added. The two remaining designs are widely used in the analysis of European Social Survey data. The pooled fixed-effects analysis would use dummy variables to represent countries. The multilevel design differs from the fixed effects model in important ways. Firstly, the variation in the intercept and/or the regression coefficients (slopes) is captured by variance components that constitute the random parameters in the multilevel model. Secondly, it allows for explanatory variables at country level to be entered into the analysis.

Figure 2.1. A typology of comparative design# Review questions

- Which research field in the social sciences was most central in developing multilevel analysis?
- Explain the slope-as-outcomes approach.
- Explain the shortcomings of a one-level analysis.
- Can multilevel analysis be regarded as a comparative design?

1. Education, school research.

2. The regression coefficients for a sample of schools became the dependent variables in a two-stage OLS regression analysis.

3. The main alternative is to ignore the higher level and conduct the analysis without considering the contexts. This may yield underestimated standard errors and it ignores the contexts as arenas for the individual actions.

4. Yes, it can be seen as a large-N comparative design that follows Przeworski and Teune’s advice to replace countries (contexts) with their values for explanatory variables.

# Software and literature

In addition to special purpose software such as HLM and Mlwin, routines for estimating multilevel models have now been incorporated into general purpose software packages for statistical analysis. In SAS, the Mixed procedure can estimate a variety of linear multilevel models. In SPSS, Linear Mixed Models allows the estimation of multilevel models with several levels. The routine is restricted to continuous dependent variables, however. In other words, all versions of multilevel logistic models (binary, multinomial, ordinal) cannot be estimated in SPSS. Heck, Thomas and Tabata (2010) have written an extended manual to show how multilevel models for cross-sectional and longitudinal data can be estimated in SPSS. Linear mixed models are generalizations of the linear OLS regression model to allow for correlated data and non-constant variability (heterocedasticity). These routines enable variances and covariances to be modelled, in addition to estimating the regression coefficients.

Stata has several routines for estimating linear (XTREG, XTMIXED) and logistic (XTLOGIT, XTMELOGIT) multilevel models in addition to the very powerful add-in program Gllamm (Rabe-Hesketh and Skrondal 2012). At present, Stata has the most complete routines for multilevel modelling besides the special purpose programs HLM and Mlwin. Rabe-Hesketh and Skrondal (2012) give examples of how the Stata routines can be used to estimate a variety of multilevel models.

The public domain program R is growing in popularity among advanced users. Routines for multilevel models have been developed for R. The web page developed in connection with Kreft and de Leeuw’s introductory book includes data sets from the book in formats for the most popular software packages.

Finally, some programs for Structural Equation Models (SEM) are able to estimate multilevel models with and without latent variables. Mplus and Lisrel are two widely used SEM programs with this capability.

In linear multilevel models, all programs generally produce almost identical results. In non-linear models, such as variations of multilevel logistic models, the results are more prone to differ due to differences in the estimation algorithms.

In this learning module, we will estimate all models in the general purpose statistical packages Stata and SPSS.

### Introductory books on multilevel analysis

There are now a large number of introductory books on multilevel analysis. Below follows a short list of good introductory books, including the book by Rabe-Hesketh and Skrondal, which is better suited for the advanced user than for the novice. It is an excellent sourcebook, however, for Stata users working with multilevel models. I would recommend the novice to start with Hox’s book or the one by Kreft and Leeuw.

Goldstein, H. (1995), Multilevel Statistical Models, 2nd edition. London: Edward Arnold.

Hox, J. (2002), Multilevel Analysis. Techniques and Applications, 2nd edition. London: Lawrence Erlbaum.

Kreft, I. and Leeuw, J. de (2000), Introducing multilevel modelling. London: Sage

Rabe-Hesketh, Sophia and Anders Skrondal (2012), Multilevel and Longitudinal Modeling Using Stata. 3nd edition. Volume I: Continuous Responses. College Station, Texas: Stata Press.

Raudenbush, S.W. and Bryk, A.S. (2002), Hierarchical Linear Models. Applications and Data Analysis Methods, 2nd edition. Sage: Thousand Oaks.

Snijders, Tom A. B. and Roel J. Bosker (1999), Multilevel Analysis, London: Sage.

# Estimation of multilevel models

This non-technical description of the estimations procedures for multilevel models is largely based on Hox (2010, Chapter 3). Multilevel models are normally estimated by Maximum Likelihood (ML), Restricted Maximum Likelihood (RML) or Iterative Generalized Least Squares (IGLS) algorithms. The main idea behind ML estimation is to find the estimates of the model parameters that have most likely produced the observed data, i.e. the covariances and the variances among the variables in the model. In large samples, ML estimates are reasonably robust against mild violations of assumptions such as non-normal errors. In ML estimation, the maximum likelihood function is maximized. In the full information ML method, both the regression coefficients and the variance components are included in the likelihood functions. In the RML method, only the variance components are included in the likelihood function, and the regression coefficients are estimated in a second step. The RML method seems to produce less biased estimates of the variance components, especially in small samples. The difference between the two estimation methods is normally insignificant. The ML method is still used because it has some other advantages over the RML method. It is computationally easier and, since the regression coefficients are included in the likelihood functions, likelihood ratio tests can be used to compare nested models that differ in the fixed part, i.e. the number of regression coefficients.

ML estimation requires an iterative procedure with starting values for the regression coefficients taken from OLS regression estimates and with zeros for the variance components. In the first iteration, a complex iteration procedure is used to try to improve on the starting values. Then the likelihood function is evaluated and the second iteration is performed. This procedure continues until the process converges, i.e. until the changes in the estimates are below a small threshold. Sometimes, however, the models do not converge. The most common cause of this is the inclusion of variance components that are close to zero. The remedy for this is to simplify the random part of the model.

Maximum likelihood estimation is a complex technical subject. An illustration of the principle of ML estimation in a very simple situation could therefore be helpful. More satisfactory explanations are found in Rabe-Hesketh and Skrondal (2012), Chapter 2.10-11, and in Raudenbush and Bryk (2002), Chapter 3.

# An illustration of Maximum Likelihood (ML) estimation

*The example is inspired by a similar example in [Won77]*.

Let us assume that we have been assigned the task of estimating the quality of a production line, in other words to estimate P(Defect), the probability that a randomly chosen product is defective. Assume further that we have drawn a sample of five products and found three to be defective. A common sense estimate of P is 3/5 or 0.6. Let us try to estimate P by using a maximum likelihood principle. First, we need the likelihood function that is necessary to evaluate the estimates. The number of defect products can be seen as resulting from a binomial experiment with n=5 trials. The probability distribution of the random variable X (number of defective products) is the binomial distribution:

We can manually try out various values of P, choosing the one that returns the maximum value of the likelihood function. In statistical software, this is done by an iterated algorithm. Let us start with P=0.10. What is the probability of observing three defective products out of five, given P=0.10?

L(p̂) = 10*0.1^{3}*0.9^{2} = 0.008

Table 2.2 shows the results of the likelihood function for probabilities (P) varying from 0 to 1. The function peaks at P=0.6 as shown in Figure 2.2. Thus, the use of the maximum likelihood algorithm yields the same result as our common sense estimate. For more complex situations, there are no equivalents to the common sense estimate and the likelihood functions are much more complex than in our example.

P | 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 |
---|---|---|---|---|---|---|---|---|---|---|---|

L(P) | 0 | 0.008 | 0.051 | 0.132 | 0.23 | 0.312 | 0.346 | 0.309 | 0.205 | 0.073 | 0 |

Figure 2.2. An illustration of maximum likelihood estimation

## Estimating OLS models in statistical procedures for multilevel models

SPSSIn SPSS can linear multilevel models be estimated using the Linear Mixed Models procedure. For most purposes, writing the syntax is preferable to using the menus. Let us estimate the final model without age squared and the interaction term with the minimum of required text:

The structure of the command is different from REGRESSION. The dependent variable follows the MIXED keyword, and factors (categorical covariates) follow the BY keyword. This means that we do not have to create a set of dummy variables for classes manually, as long as we are satisfied with having the last class as the reference category. The continuous covariates (or dummy variables) follow the WITH keyword. The interaction term is included as in REGRESSION, but we do not need to compute the variable manually. Instead, we could have replaced ‘edfem’ with ‘edyears*female’. The FIXED subcommand defines the (fixed) regression coefficients and their order in the output table. The estimation method is restricted maximum likelihood (REML) and the last command prints the solution. The most interesting tables are included below. The coefficients and the t-test are identical to those from REGRESSION. There are a few differences. Because of the difference in the estimation procedures, standardized coefficients are not available in MIXED and nor is the R square . The small table below shows estimates of the total residual variance. In the multilevel models to come, this table will contain more interesting information.

Table 2.3. Estimates of Fixed Effects^{a}- SPSS output

Table 2.4. Estimates of Covariance Parameters^{a} - SPSS output

In Stata, two procedures - XTREG and XTM IXED - can be used to estimate multilevel regression. In models of cross-sectional data, the XTMIXED procedure will be used throughout this module, since it is the most general one and allows the estimation of all ensuing models. The syntax for XTMIXED and the output is shown below. Note the ‘variance’ at the end that ensures that the variance components or random-effects parameters are reported in the variance scale rather than the default standard deviations. The estimates are identical to the ones from SPSS.

Table 2.5. XTMIXED estimates - Stata output- [Bla60] Blau, P.M. (1960), Structural effects. American Sociological Review 25: 178-193.
- [Gol95] Goldstein, H. (1995), Multilevel Statistical Models, 2nd edition. London: Edward Arnold.
- [Hau70] Robert Hauser (1970), Context and consex: A cautionary tale. American Journal of Sociology 75: 645-664.
- [Lar59] Larzarsfeld, P.F. (1959), Problems in methodology. In Merton, R.K., Broom, L., Cattrell, L.S. (eds), Sociology Today. New York: Basic Books.
- [Lar61] Larzarsfeld, P.F. and Menzel, H. (1961), On the relations between individual and collective properties. In Etzioni, A. (ed.), Complex Organisations. New York: Holt, Rinehart & Winston.
- [Rau02] Raudenbush, S.W. and Bryk, A.S. (2002), Hierarchical Linear Models. Applications and Data Analysis Methods, 2nd edition. Sage: Thousand Oaks.
- [Won77] Wonnacott, T.H. and Wonnacott, R.J. (1977).
*Introductory statistics*2nd edition. New York: Wiley.