# All pages

# Chapter 5: Latent variable models with categorical indicators

So far in this module we have described only linear factor analysis and structural equation models. In these models, all observed and latent variables are taken to be continuous and normally distributed variables, and the structural and measurement models for them are specified as linear regression models (with the one exception of observed variables which are used only as explanatory variables, which are not modelled and can be of any type).

These assumptions are not the only ones possible, and latent variable models can also be defined with different assumptions about some of the variables. In particular, in many applications it would be useful to consider models where some of the latent variables and/or their observed indicators are not continuous but categorical variables which can only take on two or more discrete possible values (categories). In this chapter we briefly discuss such models. We focus on the case where the latent variables are still continuous but the indicators are categorical. Such models may be called latent trait models or simply “factor analysis models for categorical items”. They are also known as *Item Response Theory *(IRT) models; this is the most common term in applications in educational or psychological testing, where these models are very widely used.

The discussion in this chapter is brief, and meant to give only a general introductory idea rather than a full description of latent trait models. Much more information on them can be found in the books listed below. Also described in these books are still further types of latent variable models which are not discussed in this module. One very important class of such models is that of latent class models where both latent variables and their indicators are categorical.

### References on general types of latent variable models

- Bartholomew, D. J., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis: a unified approach (Third edition). Wiley.
- Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. G. (2008). Analysis of multivariate social science data (Second edition). Chapman & Hall/CRC.
- de Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. Guilford Press.
- Hagenaars, J. A. and McCutcheon, A. L. (Eds.). (2002). Applied Latent Class Analysis. Cambridge University Press.
- Skrondal, A. and Rabe-Hesketh, S. (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC.

# Chapter 5: Latent variable models with categorical indicators

### One-factor model for binary items: Definition

To introduce the basic elements of latent trait (IRT) models in their simplest case, we consider first a model where *p* observed items *y*_{1}, ..., *y*_{p} are used as indicators of a single continuous latent factor *η*, and where each of the items is binary, meaning that it has only two possible values. We suppose that the two values of each item are coded as 0 and 1, for example 0 for “Disagree” and 1 for “Agree” for a survey question which has only these response options.

Just like in factor analysis, it is assumed that the factor *η* is normally distributed with mean *κ* and variance φ. To identify the latent scale, we again need to fix either the parameters of this distribution or one intercept and one loading in the measurement models at specific values. In the examples of this section we do the former, and fix *κ* = 0 and φ = 1. For the whole one-factor model for binary items to be identified, we must have at least *p* = 3 items.

While the specification of the latent factor *η* is the same as in factor analysis, the definition of the measurement model must change when the items are binary rather than continuous. We now assume that, conditional on *η*, each item *y*_{j} (*j* = 1, ..., *p*) follows a binomial (Bernoulli) distribution with probability parameter *π*_{j}(*η*) = P(*y*_{j} = 1|*η*) [and thus with 1 - *π*_{j}(*η*) = P(*y*_{j} = 0|*η*)]. The model for this probability given the factor should be a regression model which is appropriate for binary response variables. Here we use the standard binary logistic model

for each *j* = 1, ..., *p*. This can also be expressed as a model for the probability *π*_{j}(*η*) directly, as

.

In the same way as in factor analysis, it is most often assumed that the items are conditionally independent given the factor. When this is the case, these measurement models for the individual items, taken together, define the whole measurement model for how the items measure the factor.

A plot of the probabilities *π*_{j}(*η*) as a function of different values of the factor *η* is known as the *item response curve* of an item. Its shape depends on the parameters *ν*_{j} and *λ*_{j}. The intercept parameter *ν*_{j} is also known (in common IRT terminology) as the *difficulty parameter*. For any fixed value of *λ*_{j}*η*, higher values of *ν*_{j} give higher values of the item response probability *π*_{j}(*η*). In particular, when *η* = 0 (its average value, if we have fixed *κ* = 0), the probability depends only on the difficulty parameter, as *π*_{j}(0) = exp(*ν*_{j})/[1 + exp(*ν*_{j})].

The loading parameter *λ*_{j} describes the direction and strength of the association between the factor *η* and the item *y*_{j}. When *λ*_{j} > 0, higher values of the factor are associated with higher probabilities that *y*_{j} = 1, and when *λ*_{j} < 0, higher values of the factor are associated with lower probabilities that *y*_{j} = 1. This association is the stronger the higher is the absolute value of *λ*_{j} (i.e. the further *λ*_{j} is from 0). When the association is strong, even small differences in *η* translate into relatively large differences in the probabilities *π*_{j}(*η*), so the item “discriminates well” between individuals with different levels of *η*. For this reason, *λ*_{j} is also known as the *discrimination parameter* of the binary logistic measurement model.

Figure 5.1 displays the probability *π*_{j}(*η*) for different values of the factor and for different values of the difficulty parameter *ν*_{j} and the discrimination parameter *λ*_{j}.

*Figure 5.1: The probability of a binary item (item response curve of an item) for different values of*

*ν*and*λ*

# Chapter 5: Latent variable models with categorical indicators

### One-factor model for binary items: Estimation

The latent trait model implies also a joint distribution for the *p* observed items. When all the items are categorical variables, this distribution is defined by the probabilities of the cells in the *p*-variate contingency table of the items. When the items *y*_{1}, ..., *y*_{p}, are all binary with values 0 and 1, this is a 2 x 2 x ... x 2 contingency table with a total of 2^{p} cells, where each cell corresponds to one combination of values (*k*_{1}, ..., *k*_{p}) for the items, with each *k*_{j} being either 0 or 1.

A one-factor latent trait model implies that the probabilities in this joint distribution are given by

where the integral is over all the possible values of η, and *p*(η) denotes the distribution (more precisely, the probability density function) of η (which under the identification assumption stated in the previous section is a normal distribution with mean 0 and variance 1).

Maximum likelihood (ML) estimates of the parameters of the model are those values of the parameters which imply the closest match (in terms of the likelihood function for the data) between these model-implied probabilities and the sample probabilities in the observed contingency table of the items. These estimates can be found using an iterative computational algorithm, as was also the case for factor analysis and structural equation modelling. Furthermore, and unlike in the case of those previous models, estimation of a latent trait model also requires that integrals over the distributions of the factors (of the kind shown above) need to be evaluated at each iteration using computer-intensive numerical methods of integration. Because of this numerical integration, latent trait models are much harder and slower to estimate than factor analysis and structural equation models. In practice this means that latent trait models with more than one or two latent factors may often be computationally too demanding to be easily and routinely used.

# Chapter 5: Latent variable models with categorical indicators

### One-factor model for binary items: Factor scores

The general idea of calculating factor scores, i.e. predicted values of a latent factor given the observed items, is the same for latent trait models as it was for factor analysis. Most commonly, the factor score for an individual respondent is calculated as the expected value of the factor given the observed values of the items. These scores are calculated by any computer software (such as Stata) which can be used to fit latent trait models.

To get a sense of what the factor scores tell us, it is useful to consider a simpler version of them, the “component score” *y*_{1}λ̂_{1} + *y*_{2}λ̂_{2} + ... + *y*_{p}λ̂_{p}. Since the values of the binary items *y*_{j} are coded as 0 or 1, this is simply the sum of the estimated loading parameters λ̂_{j} for those items for which a respondent’s value is 1 (e.g. all the items that the respondent agrees with, if the coding is 1 for “Agree” and 0 for “Disagree”). The largest possible value of this score is then obtained by a respondent who has the value 1 for all the items which have positive estimated loadings and 0 for all the items which have negative loadings. This score is not exactly equal to the expected value that standard software use as the factor score, but it carries essentially the same information; in particular, the component scores and factor scores give exactly the same ranking of individuals in terms of the values of the latent factor. This result, and component scores and factor scores more generally, are discussed by [Bar08] and [Bar11]

Factor scores derived from a latent trait model may be used as observed variables in other analyses. This approach is discussed further later in this chapter, and an illustration of it is given in Example 2.

# Chapter 5: Latent variable models with categorical indicators

### One-factor model for binary items: Assessment of model fit

Methods that can be used to assess the goodness of fit of a latent variable model have been discussed for factor analysis models in Chapter 2. Most of the general ideas introduced there, and some but not all of the methods, apply also to model assessment for latent trait models:

- An overall goodness of fit test can again be defined as a likelihood ratio test between the fitted and saturated models. Both of these are now models for the cell probabilities of the contingency table of the categorical items: the probabilities from the saturated model are estimated simply by the sample proportions of the cells, and the probabilities from the fitted model are implied by its estimated parameters. As in factor analysis, this test has high power when the sample size is even moderately large, so the test often rejects most models.
- Likelihood ratio tests can be used also to compare nested pairs of non-saturated models, in the same way and for similar purposes as in factor analysis.
- AIC and BIC “information criterion” statistics can be used to compare models (even non-nested ones) in the same way as in factor analysis.
- On the other hand, most of the model fit indices that have been developed specifically for factor analysis and linear structural equation models cannot be used for latent trait models. These include statistics such as RMSEA and CFI.
- A different class of methods which can be used to examine the goodness of fit of latent trait models is based on marginal residuals for the fitted model. These are calculated for lower-order marginal tables derived from the full contingency table of the
*p*observed items. In particular, we may examine in turn all the two-way tables for each pair of two items, each aggregated over the other items. For each such table, we can calculate its cell probabilities in the table derived from the observed sample, and the probabilities implied by the fitted model. The (appropriately standardized) differences between these observed and fitted probabilities are the marginal residuals, which can be used to assess how well the model fits for each pair of items. Methods of using such residuals are discussed in more detail by [Bar08] and [Bar11]. They are, however, not yet routinely implemented by all software for fitting latent trait models.

If we consider a model with one latent factor and conclude that it does not fit well, this suggests that either some of the items could be omitted to obtain a model which fits better for the remainder of the items, or that a model with more than one factor should be considered. Models with multiple factors are discussed briefly in the next section, together with other extensions.

# Chapter 5: Latent variable models with categorical indicators

### More general models with continuous factors and categorical items

We have focused on the model with binary items and a single latent factor, in order to introduce the basic ideas of latent trait models in the simplest possible case. These models can be generalized in various ways, which we note very briefly here:

- Instead of being binary, categorical items may also have three or more categories, and these categories may be treated as being ordered (ordinal items) or unordered (nominal items). The measurement model of an item then needs to be specified in such a way that it is appropriate for a variable with multiple categories. The
*multinomial logistic model*is used as the measurement model for nominal items, and the*ordinal logistic model*(proportional odds model) is commonly used for ordinal items. - Instead of being all of one kind, different indicators for the same factor may be a mixture of binary, ordinal and nominal (and indeed even continuous) items. The measurement model of each individual item is then specified in whatever way is appropriate for that item. This situation of course assumes that we are dealing with items for which it still makes substantive sense to treat them as measures of the same latent variable.
- Instead of a single latent factor, a latent trait model can have two or more factors. The joint distribution of the factors is then assumed to be a multivariate normal distribution. This distribution and the assumptions required to identify the latent scales are specified in the same way as in factor analysis. The measurement model for multiple factors can again be an “exploratory” model where all items measure all factors, or a “confirmatory” model which imposes further constraints on the factor loadings. In an exploratory model, “rotation” of the factors again needs to be fixed, and this can be done in the same ways as in factor analysis.
- Instead of focusing only on measurement models, we can also define models which include structural models for associations and regression models among latent factors and observed explanatory and/or response variables. This is done in the same ways as in linear structural equation models (SEMs). In practice, however, the computational complexity of estimating latent trait measurement models for categorical items often imposes limits on the complexity of structural models that it is practicable to estimate. When estimation of combined measurement and structural models all in one step is infeasible, a practicable approach is often a “three-step” analysis where we (1) estimate the measurement model separately for each distinct factor, (2) use these models to calculate a factor score for each factor for each respondent, and (3) fit the structural model with the factor scores treated as observed values of each factor. This approach has advantages and disadvantages of its own, as discussed in the section on factor scores for factor analysis earlier in this module (Chapter 2) and for models for categorical items by, for example, [Bak13]; this article considers methods of three-step modelling for latent class models, but many of the comments there are also relevant for all latent variable models). An example of this approach is given in Example 2 later in this chapter.

# Chapter 5: Latent variable models with categorical indicators

### Example 1 on Latent trait models for binary items: A measurement model

In this and the next example we continue to use the same data as in the rest of this module. Because these data do not have any items which are originally binary, we create such items by dichotomizing three of the existing items. These are items D15, D16 and D17, the three indicators of trust in the procedural fairness of the police. The items are dichotomized so that their original levels 1 and 2 (“Not at all often” and “Not very often”) are combined as the new level 0 (“Not often”) and levels 3 and 4 (“Often” and “Very often”) as the new level 1 (“Often”). The binary items derived from D15, D16 and D17 are labelled *respect*, *fair* and *explain* respectively. The commands included below show how they can be created in Stata.

Fit a latent trait model for the three binary items *respect*, *fair* and *explain* given one latent factor, and using the data for all the countries together. Interpret the parameters of the estimated measurement model, and hence interpret the factor implied by this measurement model.

*Note*: Only Stata commands are included for this and the next example. In Stata, latent trait models can be fitted with the command *gsem*, which was first included in Stata Version 13. The R package *lavaan* does not currently include functions for fitting these models. There are other add-on packages in R for fitting latent trait models, but they are not used here.

The estimated parameters of the measurement models are shown in Table 5.1, and the item response curves implied by them are plotted in Figure 5.1. The curves show the probabilities of the response coded as 1 – i.e. “Often” – for the items, given different values of the latent factor. All the factor loadings (discrimination parameters) are positive. This means that higher values of the factor correspond to higher probabilities of the response “Often”, and thus that higher values of the factor indicate higher levels of trust in procedural fairness of the police.

Comparing the measurement parameters between the different items, we can see that the item *explain* has the lowest value of the intercept (difficulty) parameter. This means that given the value 0 of the factor (i.e. its mean value), this item has the lowest probability of the “Often” response (around 0.6, compared to nearly 0.9 for the other two items). The item *explain* also has the lowest discrimination parameter, so its item response probabilities are a little less strongly associated with the factor than are the probabilities of the other two items (this can be seen in Figure 5.1, where the item response curve of this item is the least steep of the three). The measurement models of the items *respect* and *fair*, on the other hand, are very similar to each other.

*Table 5.1: Estimated parameters (and their standard errors) of the measurement model for binary items “respect”, “fair” and “explain”, given latent factor “Procedural fairness of the police”. The model is fitted to the pooled sample of all respondents in the ESS (n=50501).*

Item | Intercept ν̂_{j} ("difficulty parameter") |
Loading λ̂_{j} ("discrimination parameter") |
---|---|---|

respect | 1.99 (0.03) | 3.24 (0.05) |

fair | 1.83 (0.03) | 3.52 (0.06) |

explain | 0.44 (0.02) | 2.18 (0.03) |

*Figure 5.2: Item response curves for the binary items “respect”, “fair” and “explain”, given latent factor “Procedural fairness of the police”, from the estimated measurement models in Table 5.1.*

# Chapter 5: Latent variable models with categorical indicators

### Example 2 on Latent trait models for binary items: Multigroup analysis for cross-national comparison of the factor means

*Note: The discussion of this example focuses on somewhat more advanced topics than most of the rest of this module.*

Using the same three binary indicators of trust in the procedural fairness of the police, estimate and compare the averages of this factor between the countries in the ESS.

This analysis expands that of Example 1 by adding a structural model where the country of the respondent is used as an explanatory variable for the factor. This is thus a multigroup analysis with country as the group and assuming cross-national equivalence of measurement for all of the observed items. Unlike for the *sem* command for factor analysis and structural equation modelling, the *gsem* command in Stata does not have separate command syntax for multigroup analysis. Instead, the group (here country) is simply specified as a categorical explanatory variable for the factor, and entered in the form of dummy variables for the groups (omitting the dummy variable for one reference group, which is here taken to be Belgium).

The Stata commands included above show two ways of carrying out this analysis:

- Method 1: A one-step analysis where the measurement model, and the structural model for the latent factor conditional on the country, are estimated together. This approach is comparable to fitting a multigroup structural equation model for continuous items and with country as the group.
- Method 2: A three-step analysis where we (i) fit the measurement model for all the data, ignoring country; (ii) assign factor scores for each respondent based on the model from (i), and (iii) fit a linear regression model for these factor scores given country. Steps (i) and (ii) were already done in Example 1 above.

These approaches have slightly different characteristics:

- Method 2 is computationally easier. However, it has the theoretical disadvantage that since the value of a factor score is not exactly equal to the value of the unobserved latent factor for each respondent, using the factor score in the role of the factor can induce a measurement error bias in the estimated structural model (here, where the factor score is used as a response variable in a linear model, the bias arises because the factor score is not an
*unbiased*prediction of the factor). - Method 1 is computationally more demanding. It avoids the measurement error problem of Method 2 because estimating the measurement model together with the structural model correctly allows for measurement error in the individual items as measures of the factor. However, this approach has the arguable disadvantage that the estimated measurement model itself – and thus the implied definition of the factor – is affected by the inclusion of the explanatory variable for the factor. This can be seen by observing that the estimated parameters of the measurement model are here somewhat different from what they were in Example 1. The estimated measurement model would change again every time the structural model was changed, for example if we added respondent’s age and sex as explanatory variables.

Two further characteristics are common to both Methods 2 and 3:

- Unlike Method 1, Method 2 does not require fixing the intercept and residual variance of the structural model to identify the scale of the factor, so these parameters are also estimated.
- Method 2 (as it is used here) takes the estimated measurement model from the first step as known. This means that the standard errors of the parameters of the structural model will be underestimated to some extent.

The differences between these different ways of fitting the model matter ultimately only if they lead to meaningful differences in the main conclusions that we aim to draw from the analysis. In this example these questions of interest are the comparisons of average levels of the factor (trust in the procedural fairness of the police) between the countries. Figure 5.2 shows these country averages estimated using Methods 1 and 2. It is clear that both sets of estimates are very similar, in that both give essentially the same relative differences and rankings of the countries. Many of the differences between the countries are clearly statistically significant (the standard errors of the means from Method 1 are around 0.05). The ordering of the countries shows fairly consistent geographic regularities, with levels of trust in procedural fairness of the police mostly highest in the North and West of Europe and lower in the South and East.

*Figure 5.3: Averages of the factor “Procedural fairness of the police” in the countries in the ESS, as estimated in Example 2. The plot shows two estimates of these averages, from a joint model (“Method 1” as discussed in the text, on the horizontal axis), and from a linear model for factor scores derived from the measurement model fitted in Example 1 (“Method 2”, on the vertical axis). The main conclusion from this comparison is that both sets of estimates give very similar results about comparisons between the countries.*

- [Bak13] Bakk, Z., Tekle, F. B. and Vermunt, J. K. (2013). Estimating the association between latent class membership and external variables using bias-adjusted three-step approaches. Sociological Methodology, 43, 272-311.
- [Bar08] Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. G. (2008). Analysis of multivariate social science data (Second edition). Chapman & Hall/CRC.
- [Bar11] Bartholomew, D. J., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis: a unified approach (Third edition). Wiley.