All pages
Chapter 3: Multigroup Factor Analysis
Introduction
A key aim of many social surveys is to measure the same constructs in different groups in order to make crossgroup comparisons of the distributions of the constructs. This is clearly the case in crossnational surveys such as the ESS, where the populations of individuals in the different countries are the key groups of interest. A defining purpose of a crossnational survey is to provide data for making comparisons between countries, often in terms of the distributions of latent constructs which are measured by multiple indicators.
Since the latent variables in factor analysis are assumed to follow a multivariate normal distribution, such crossnational comparisons focus on the means, variances and covariances of the factors, which together fully define the multivariate normal distribution. Multigroup factor analysis can be used for such comparisons. It extends the standard (singlegroup) factor analysis model by allowing some parameters of the model to vary across the groups.
In order for crossgroup comparisons to be meaningful, the variable of interest should be measured in the same way and on the same scale across the groups. In the case of a latent variable this requirement amounts to the condition that at least a sufficiently large part of the measurement model of the variable should have the same form and identical parameter values in all of the groups. If this is the case, we can say that those parts of the measurement model are invariant (or equivalent) across the groups, and that crossgroup measurement invariance (or measurement equivalence) holds for them.
In the rest of this chapter we first describe multigroup factor analysis models where measurement invariance is assumed to hold for the entire measurement model, before discussing how and when this condition may be checked and perhaps partially relaxed by allowing partial noninvariance of measurement.
Chapter 3: Multigroup Factor Analysis
Multigroup models under full measurement invariance
Consider again a model for one or more η_{j}, which are measured by multiple indicators in a way described by a factor analysis measurement model. Suppose now that we have data on respondents from G known groups such as countries. In this section we assume that complete measurement invariance across the groups holds for the measurement model, so that this model is exactly the same in all the groups. There is then nothing new to say about the measurement model, which is defined and interpreted in the same ways as in the singlegroup situation discussed before. We can thus focus on the changes in how the distribution of the latent factors is specified.
For concreteness of notation, suppose that there are two factors η_{1} and η_{2}. We now assume that among individuals in each group g = 1, ..., G, the factors are jointly normally distributed with means
E(η_{1}) = κ_{1}^{(g)} and E(η_{2}) = κ_{2}^{(g)}
variances
var(η_{1}) = φ_{1}^{(g)} and var(η_{2}) = φ_{2}^{(g)}
and covariance
cov(η_{1}, η_{2}) = φ_{12}^{(g)}
In other words, this allows all of the parameters which describe the distribution of the factors to be different in different groups.
It is still necessary to impose some constraints on these parameters in order to identify the scales of the latent factors. This, however, now needs to be done only in one group (which can be chosen freely), leaving all the parameters free to be estimated in all the other groups. For example, we may choose group 1 as the reference group and fix the factor means κ_{1}^{(1)} and κ_{2}^{(1)} to be 0 and the factor variances φ_{1}^{(1)} and φ_{2}^{(1)} to be 1 in that group (the factor covariance φ_{12}^{(g)} can be freely estimated in all groups, including group 1). When this is done, the mean of 0 in group 1 becomes a benchmark against which the means κ_{1}^{(g)} and κ_{2}^{(g)} in the other groups, and values of the factors for individuals in all groups, can be compared, and similarly the value 1 in group 1 becomes a benchmark for the factor variances φ_{1}^{(g)} and φ_{2}^{(g)}.
Such models can be easily fitted in standard software, and estimates from them allow us to compare distributions of factors across countries. This is illustrated by Example 1 later in this chapter.
Chapter 3: Multigroup Factor Analysis
Models with some noninvariance of measurement
To introduce the key concepts related to noninvariance of measurement in factor analysis models, we focus on the simple case of a model with one factor η. In a multigroup context, the measurement model for any item y_{j} (j = 1, ..., p) for a respondent in group g = 1, ..., G can then be expressed as
y_{j} = ν_{j}^{(g)} + λ_{j}^{(g)}η + ε_{j}
where ε_{j} is normally distributed with mean 0 and variance θ_{j}^{(g)}. In other words, such a measurement model allows any or all of the measurement parameters for an item (intercept ν_{j}^{(g)}, loading λ_{j}^{(g)} and/or the error variance θ_{j}^{(g)}) to have different values in different groups g.
The first question we should ask about comparability of measurement is whether the overall structure of the measurement model for the items is the same in all groups. In the onefactor case this is the question of whether a onefactor model is indeed adequate in all groups. More generally, it is the question of whether a measurement model with the same number of factors and the same pattern of zero and nonzero factor loadings is adequate in all groups. Example 2 in Chapter 2 was an illustration of this kind of analysis, there for a particular 2factor confirmatory factor analysis model. If the same model is adequate in all groups in this sense, the measurement model is said to possess configural invariance (or construct invariance) across the groups. In essence this means that the items can be thought to measure the same latent constructs in each group, even if possibly with different exact values of the measurement parameters. If construct invariance does not hold, the items cannot really be used for meaningful comparisons of constructs between the groups. If it does hold, we can proceed to examine whether the parameters of the common measurement model also have equal values across groups.
If any of the parameters of the measurement model for item y_{j} do vary across the groups, the item is noninvariant across the groups. If, in contrast, each of the measurement parameters has the same value in all groups (i.e. ν_{j}^{(g)} = ν_{j}, λ_{j}^{(g)} = λ_{j} and θ_{j}^{(g)} = θ_{j} for g = 1, ..., G), full (or "strict") invariance of measurement holds for that item. When an item is fully invariant, it thus functions as a measure of the factor in exactly the same way in all of the groups. If full invariance holds for all items y_{1}, ..., y_{p} which are treated as measures of the factor η, the measurement of the factor itself is fully invariant.
We may also consider models which possess partial invariance, meaning that some but not all items for a factor and/or some but not all measurement parameters for an item are noninvariant. The following terms are often used to refer to specific kinds of partial invariance in terms of types of parameters:
 Scalar invariance (also known as "strong" factorial invariance) holds for an item if the intercepts ν_{j} and loadings λ_{j} are invariant across the groups, but the error variances θ_{j}^{(g)} are not. If this is the case for all the items, scalar invariance holds for the whole scale of measurement for the factor.
 Metric invariance (also known as "weak" factorial invariance) holds for an item if the loadings λj are invariant across the groups, but the intercepts ν_{j}^{(g)} and the error variances θ_{j}^{(g)} are not. If this is the case for all the items, metric invariance holds for the whole scale of measurement for the factor.
Chapter 3: Multigroup Factor Analysis
Identification of multigroup models
The main reason for estimating a multigroup factor analysis model is typically that we wish to estimate and compare means, variances or covariances of the factors between the groups. The requirement for the identifiability of the model is then that it should be possible to uniquely identify distinct values for these parameters in the different groups.
If the measurement model has full invariance of measurement, these countryspecific distributions of the factors are identified if the measurement model is such that it would be identified also for singlegroup factor analysis (as discussed in Chapter 2) and if the factor means and variances are fixed in one group as discussed earlier in this chapter.
The remaining question is then whether and when the multigroup model is identified if the measurement model includes some noninvariance of measurement. Here we give some conditions for this. Consider first models which have different types of partial noninvariance for all of the observed items which are used as indicators of a factor:
 Means, variances and covariances of all factors are identified separately in each group if full or scalar invariance holds for all the items.
 Variances and covariances of the factors are identified also if metric invariance holds for all the items.
 Correlations of the factors are identified even under complete noninvariance, i.e. when configural invariance holds but all measurement parameters are different across the groups. Example 2 of Chapter 2 gives an example of estimating such correlations from separate countryspecific models.
In the case of models for one factor, we have the following results on partial noninvariance by item:
 Means and variances of the factor are identified if at least two observed items have full measurement invariance.
 Neither means nor variances of the factor are identified if only one item is fully invariant and all other items are fully noninvariant. All such models are equivalent to each other, whichever item is chosen to be the one invariant item, and also equivalent to an infinite number of models where all the items are fully noninvariant ([Asp14]). In other words, all such models fit the data equally well but give different conclusions about the means and variances of the latent factor across the groups.
Chapter 3: Multigroup Factor Analysis
Effects of noninvariance of measurement on conclusions about latent factors
While it is possible to fit multigroup models with different levels of invariance and noninvariance of measurement, each such model will in general give different results for the distributions of the latent factors which are the focus of interest in the analysis. For example, the following results are worth bearing in mind:
 If the measurement intercept of an item (ν_{j}^{(g)}) is noninvariant across the groups, that item will make little or no contribution to the estimation of the factor means (κ^{(g)}). In other words, conclusions on comparisons of factor means between the groups will in effect be determined by data on the invariant items only.
 If any measurement parameters are noninvariant, the formula for calculating a factor score (predicted value of the factor) will depend on the group. This means that two respondents who have exactly the same values of all the observed items, but who belong to different groups, will be assigned different values of the factor score.
Chapter 3: Multigroup Factor Analysis
Assessing levels of noninvariance of measurement
Goodness of fit of models with different levels of noninvariance of measurement can be examined and compared, to assess whether it would be necessary to allow for some noninvariance to achieve a good fit to the data. In such comparisons, the model with full measurement invariance is the most restricted model, and models with different levels of noninvariance are less restricted and thus betterfitting models.
Standard likelihood ratio tests can be used for such comparisons, for example to compare the full invariance model to partial noninvariance models, or to compare nested pairs of noninvariance models (e.g. scalar invariance vs. full noninvariance for a given item, or noninvariance for one vs. two items) to each other. These tests are often quite sensitive in practice, so it is common for them to reject the full invariance model. Because of this sensitivity, the tests may be supplemented by other methods of model assessment such as the AIC and BIC statistics. Examples of the use of these statistics and likelihood ratio tests for such comparisons of different measurement models are given in Example 2 of this chapter.
Chapter 3: Multigroup Factor Analysis
Invariance vs. noninvariance models in practice: Sensitivity of main conclusions
Whatever methods we use for model assessment, in many applications it is a common conclusion that there is evidence of at least some noninvariance of measurement. This is certainly the case for large crossnational surveys of general populations, where it appears to be very rare that full invariance is formally judged to hold for any measurement scales with multiple items. This then raises the difficult question of what would be the best way to analyze latent constructs in such situations.
When the main purpose of a multigroup analysis is to obtain crossgroup (e.g. crossnational) comparisons of the latent factors, the most relevant criterion for assessing the effect of noninvariance of measurement is how different specifications of the measurement models affect the main conclusions about distributions of the factors. If these conclusions are relatively insensitive in this respect, the choice of the measurement models does not matter much – and, in particular, we can with confidence use the simplest choice, the model with full invariance of measurement. Such sensitivity analysis can be done by fitting models with different levels of noninvariance in turn, and comparing the parameter estimates for the distributions of the factors. This is illustrated in Example 2 of this chapter. An even simpler approach has been proposed by [Obe14]. His EPCinterest statistic (expected parameter change in parameters of interest) requires only that the full invariance model is fitted, and gives a good approximation of how much estimates of parameters of interest such as factor means would change if different measurement parameters were freed to be noninvariant across groups.
Ultimately the outcome of such sensitivity analyses may be that the choice matters, i.e. that conclusions about crossnational comparisons do depend on how much noninvariance of measurement we allow in the model specification. In this situation it might seem natural to use the results obtained from the bestfitting noninvariance model. However, this approach also has its problems, even apart from the fact that there may be no partial noninvariance model which both fits well and is identified. Any noninvariance model presents additional complications of interpretation, for example that (as noted previously) comparative conclusions about the means of latent factors are then really based only on those observed items which are specified as invariant. The opposite alternative approach is to base the conclusions on the full invariance model even when it does not fit well according to formal model selection criteria. This is easy to do and ensures that each item is treated in the same way, for all countries; however, it also ignores the observed evidence of noninvariance and thus in effect defines the latent constructs to be measured on a common scale across the countries. In short, all possible choices on how to treat items which are thought to be crossnationally noninvariant have their disadvantages as well as different advantages. [Kuh15] present some further discussion of these conceptually difficult questions.
Chapter 3: Multigroup Factor Analysis
Example 1 on Multigroup factor analysis: A model under measurement invariance
Consider the data in our example for the questions D18D23, using a 2factor confirmatory factor analysis model where questions D18D20 measure only the factor "obligation to obey the police" and questions D21D23 only the factor "moral alignment with the police". Fit this as a multigroup model to data from all 27 countries, specifying the measurement model to have full invariance of measurement across the countries. Use the results of the model to compare the estimated means, variances and correlations of the factors between the countries.
Estimated means, variances and correlation of the two factors from the multigroup model are shown in Table 3.1 for each of the countries, and also in graphical form in Figures 3.13.4. Note that here the factor means are fixed at 0 and factor variances at 1 for the first country, Belgium. From these results, we may for example observe the following:
 There is a large amount of variation in the estimated means of the factors (see Figures 3.1 and 3.2). For both, the difference between the highest and the lowest means is around 2 units, in other words around two individuallevel standard deviations of the factors. This means, for example, that almost all individuals in the countries where the average values of these factors are the highest have higher values than the average individual in the countries where the values are the lowest. The standard errors in the estimated means are fairly small, so that many of the differences between country means appear to be statistically significant.
 There are fairly clear geographic regularities in the levels of the means, although with exceptions. Levels of felt obligation to obey the police and moral alignment with the police tend to be higher in Northern and Western European countries than in Eastern and Southern Europe.
 The values of the two factors are positively correlated, both among individuals within the countries (Table 3.1) and between the country averages of the factors (Figure 3.3).
 The means and variances of the factors are negatively correlated (Figure 3.4). In other words, in countries where the average levels of felt obligation to obey the police or moral alignment with the police are highest, variation between individuals in these factors tends to be lowest.
Means  Variances  

Country  Obey  MoralAl  Obey  MoralAl  corr. 
Belgium (BE)  0  0  1  1  0.35 
Bulgaria (BG)  0.53  0.37  2.56  1.81  0.46 
Switzerland (CH)  0.36  0.26  1.25  0.83  0.12 
Cyprus (CY)  0.45  0.22  1.33  1.55  0.45 
Czech Republic (CZ)  0.23  0.50  1.54  1.47  0.31 
Germany (DE)  0.31  0.36  1.17  0.84  0.35 
Denmark (DK)  0.87  0.41  0.82  0.75  0.41 
Estonia (EE)  0.23  0.12  1.74  0.82  0.24 
Spain (ES)  0.10  0.10  0.94  1.06  0.39 
Finland (FI)  0.75  0.57  0.60  0.62  0.50 
France (FR)  0.15  0.30  1.12  1.50  0.36 
United Kingdom (GB)  0.01  0.07  1.10  1.03  0.44 
Greece (GR)  0.23  0.62  1.42  1.72  0.55 
Croatia (HR)  0.36  0.33  1.99  1.23  0.42 
Hungary (HU)  0.38  0.33  1.53  1.33  0.37 
Ireland (IE)  0.18  0.17  1.30  1.44  0.50 
Israel (IL)  0.60  0.72  1.57  1.86  0.29 
Lithuania (LT)  0.03  0.21  1.64  0.96  0.40 
Netherlands (NL)  0.26  0.10  0.87  0.78  0.35 
Norway (NO)  0.39  0.41  0.94  0.67  0.52 
Poland (PL)  0.04  0.11  1.37  0.87  0.27 
Portugal (PT)  0.08  0.11  1.26  0.94  0.43 
Russia (RU)  0.89  0.98  1.77  1.71  0.47 
Sweden (SE)  0.54  0.30  0.98  0.57  0.44 
Slovenia (SI)  0.71  0.30  2.11  1.10  0.34 
Slovakia (SK)  0.10  0.11  1.76  1.14  0.23 
Ukraine (UA)  0.67  1.17  2.05  2.05  0.31 
Chapter 3: Multigroup Factor Analysis
Example 2 on Multigroup factor analysis: Assessing noninvariance of measurement
Consider the data on the questions D18D20 which are treated as measures of the factor "obligation to obey the police", for data from Denmark, Norway and Sweden. Fit multigroup models with one factor, and compare models with different specifications of measurement invariance and noninvariance in the items. How well do these different models fit the data, and how do they affect conclusions about crossnational comparisons of the mean of the factor?
Here we consider only three countries and one factor, to keep the command file in Stata relatively short (the commands for all 27 countries and more factors and items would be an obvious extension of these).
Results for the fitted models are summarized in Table 3.2. Here we consider for illustration seven different specifications for the measurement models across the three countries: The model where full invariance holds, scalar invariance for all of the three items, scalar invariance, metric invariance and complete noninvariance for item D18, and complete noninvariance for items D19 and D20. Note that models with metric invariance in all the items or models with noninvariance in two or more items are not included, because such models would not allow the identification of distinct estimates of factor means for the different countries.
Considering first the goodness of fit of the models, likelihood ratio tests indicate that allowing for noninvariance in any one item would improve the fit compared to the full invariance model. The tests of partial invariance models shown for item D18 suggest that for this item at least the measurement intercepts in particular are significantly different between the countries. The AIC and BIC statistics indicate that the model preferred by each of them includes noninvariance of measurement. So in these data, even with just three items and three culturally and linguistically fairly similar countries, a multigroup analysis suggests significant deviations from exact invariance of measurement.
What matters most for substantive interpretation, however, is whether comparative conclusions about the constructs being measured are affected by different choices for the measurement models. Here they are not. Table 3.2 also shows that all the models considered here yield very similar estimates for the estimated means of the factor. According to all of them, the average level of felt obligation to obey the police is around 0.52 in Norway and around 0.35 in Sweden, on a scale where the mean in Denmark is fixed at 0, and the standard deviation in Denmark fixed at 1. The estimated standard errors of these estimates are around 0.04, so all the differences between the country means are statistically significant. Since all the models give similar results about the factors, here we could without difficulty focus on the simplest model which assumes invariance of measurement.
Estimated mean (and standard error) of the factor [Obligation to obey the police] 


Measurement model  LRtest againt full invariance model: PValue  AIC  BIC  Denmark (Constrained) 
Norway  Sweden 
Full invariance  57417  57501  0  0.52 (0.04) 
0.36 (0.04) 

Scalar invariance (error variances free) 
<0.001  57394  57516  0  0.51 (0.04) 
0.36 (0.04) 
Scalar invariance for item D18 
0.10  57417  57513  0  0.52 (0.04) 
0.36 (0.04) 
Metric invariance for item D18 (error variance and intercept free) 
<0.001  57384  57493  0  0.50 (0.04) 
0.37 (0.04) 
Complete noninvariance for item D18  <0.001  57381  57503  0  0.50 (0.04) 
0.37 (0.04) 
Complete noninvariance for item D19  <0.001  57397  57519  0  0.57 (0.05) 
0.34 (0.04) 
Complete noninvariance for item D20  <0.001  57396  57518  0  0.52 (0.04) 
0.37 (0.04) 
 [Asp14] Asparouhov, T. and Muthén, B. (2014). Multiplegroup factor analysis alignment. Structural Equation Modeling, 21, 495–508.
 [Kuh15] Kuha, J. and Moustaki, I. (2015). Nonequivalence of measurement in latent variable modeling of multigroup data: A sensitivity analysis. Psychological Methods, 20, 523–536.
 [Obe14] Oberski, D. L. (2014). Evaluating sensitivity of parameters of interest to measurement invariance in latent variable models. Political Analysis, 22, 45–60.