A short history of multilevel analysis

Multilevel analysis is a relatively new statistical technique in social science research, although its roots can be traced back to classical sociological studies, especially Durkheim’s study of suicide. Durkheim sought the causes of suicide, a very personal and individual phenomenon, in the social contexts of the individual. Multilevel analysis can be viewed as a modern way of addressing research questions concerning how outcomes at the individual level can be seen as the result of the interplay between individual and contextual factors.

The first step towards modern multilevel analysis was the rise of contextual analysis in the USA in the 1940s. Contextual analysis was introduced as a critique of the dominant micro-perspective in American sociology. Contextual analysis became more established in the 1960s, the statistical techniques became more sophisticated and conceptual progress was made. Larzarsfeld’s [Lar59] concept of contextual propositions, Larzarsfeld and Menzel’s [Lar61] typology of variables by levels and Blau’s [Bla60] concept of structural effects were the most influential contributions.

Around 1970, contextual analysis was heavily criticized by Robert Hauser [Hau70]. He maintained that most alleged contextual effects lacked substance and were artefacts of inadequately specified individual-level models. Instead, the ‘contextual effects’ were grouped individual effects. Hauser used the term contextual fallacy to describe this phenomenon.

From the end of the 1970s, the crucial steps in developing multilevel analysis took place in school research. Educational data had mainly been analysed at the individual level, ignoring the schools. An innovative step was to analyse each school separately. The dependent variable could be an outcome variable such as the score in a mathematics test with explanatory variables at the individual level, such as gender and parents’ socioeconomic status. Estimating identical regression models for each school would yield a set of intercepts and regression coefficients that could show systematic variation by schools. This led to the slopes-as-outcomes approach. The slopes (regression coefficients) were seen as dependent variables in a school-level analysis, with explanatory variables at the school level. This approach can be viewed as a two-stage multiple regression design.

In the 1980s, several variations of multilevel models were developed to avoid the statistical problems of the two-stage design. In Chicago, a group of researchers developed the HLM software for simultaneous estimation of ‘hierarchical linear models’ with two levels [Rau02]. In London, another groups of educational researchers developed another software program for multilevel analysis, now known as Mlwin [Gol95].