Data structure in multilevel analysis

When analysing cross-sectional data, the data files will normally have the desired format, which is a hierarchical sorted data file. With two levels, such as employees in firms or respondents in countries, we need to sort the file first by the firm or country and then by the individuals. In data with three levels, such as students in classes in schools, we need to sort the file first by schools, next by classes, then by students (although the final step is not actually necessary). This means that the files need to have variables that identify the levels.

In longitudinal data, the data can be available in a wide or a long format. The wide format is a rectangular file with one line for each individual (subject) with the occasions (time points for the measurements) represented by variables, such as income1, income2 etc. However, multilevel models need the long format, where the occasions are nested within the subjects. In Stata, the Reshape command can change a file from a wide to a long format and vice versa. The SPSS equivalent is the Restructure command, which is accessed from the Data menu. Hox (2010: 79-83) gives a more thorough description of the differences between the two data formats.

In SPSS, Stata and Mlwin, as well as R, the contextual variables are added to the individual (level 1 record), all with identical values within a level 2 unit. If we want to include the Gross Domestic Product (GDP) of a country in a multilevel analysis, an identical GDP value must be added to all individuals within the same country.

How do we obtain and add explanatory variables at level 2? This of course depends on the level 2 units. To simplify, think of the European Social Survey, where the countries constitute the level 2 units. We can aggregate the data from within the data file to the country level. For example, in all rounds of the ESS, there is a question on how well households manage on their present income:

Which of the descriptions on this card comes closest to how you feel about your household’s income nowadays? Living comfortably on present income (1), coping on present income (2), finding it difficult on present income (3), finding it very difficult on present income (4).

We can define (3) and (4) as indicating difficulties managing on present income and recode these values to 1 and the rest to zero. Aggregating the new variable to the country level would result in a contextual variable measuring the proportion in each country with income problems. How this is done varies between the software packages. In SPSS, Aggregate will do the job, with the choice of adding the aggregated variable to the current file or saving the aggregated variables in a contextual level file. In Stata, collapse aggregates variables in a similar way in a new file.

Country level statistics are also available from the ESS website as well as from other sources such as Eurostat. From the ESS site, it is possible to add country level and regional level variables to the main ESS file (more about this in chapter 5).

If the aggregate variables are in a separate file, it will have to be added to the ESS main file. In SPSS, Merge files - Add variables in the Data menu will merge files with a common level-2 identifier, and, in Stata, the merge command will do this job. If the number of level 2 units is low, as in the ESS, it is also possible to add a contextual variable by recoding the country identification variable, although this is clearly inefficient.

Go to next page >>