All pages

Chapter 5: Latent variable models with categorical indicators

So far in this module we have described only linear factor analysis and structural equation models. In these models, all observed and latent variables are taken to be continuous and normally distributed variables, and the structural and measurement models for them are specified as linear regression models (with the one exception of observed variables which are used only as explanatory variables, which are not modelled and can be of any type).

These assumptions are not the only ones possible, and latent variable models can also be defined with different assumptions about some of the variables. In particular, in many applications it would be useful to consider models where some of the latent variables and/or their observed indicators are not continuous but categorical variables which can only take on two or more discrete possible values (categories). In this chapter we briefly discuss such models. We focus on the case where the latent variables are still continuous but the indicators are categorical. Such models may be called latent trait models or simply “factor analysis models for categorical items”. They are also known as Item Response Theory (IRT) models; this is the most common term in applications in educational or psychological testing, where these models are very widely used.

The discussion in this chapter is brief, and meant to give only a general introductory idea rather than a full description of latent trait models. Much more information on them can be found in the books listed below. Also described in these books are still further types of latent variable models which are not discussed in this module. One very important class of such models is that of latent class models where both latent variables and their indicators are categorical.

References on general types of latent variable models

Page 1

Chapter 5: Latent variable models with categorical indicators

One-factor model for binary items: Definition

To introduce the basic elements of latent trait (IRT) models in their simplest case, we consider first a model where p observed items y1, ..., yp are used as indicators of a single continuous latent factor η, and where each of the items is binary, meaning that it has only two possible values. We suppose that the two values of each item are coded as 0 and 1, for example 0 for “Disagree” and 1 for “Agree” for a survey question which has only these response options.

Just like in factor analysis, it is assumed that the factor η is normally distributed with mean κ and variance φ. To identify the latent scale, we again need to fix either the parameters of this distribution or one intercept and one loading in the measurement models at specific values. In the examples of this section we do the former, and fix κ = 0 and φ = 1. For the whole one-factor model for binary items to be identified, we must have at least p = 3 items.

While the specification of the latent factor η is the same as in factor analysis, the definition of the measurement model must change when the items are binary rather than continuous. We now assume that, conditional on η, each item yj (j = 1, ..., p) follows a binomial (Bernoulli) distribution with probability parameter πj(η) = P(yj = 1|η) [and thus with 1 - πj(η) = P(yj = 0|η)]. The model for this probability given the factor should be a regression model which is appropriate for binary response variables. Here we use the standard binary logistic model

for each j = 1, ..., p. This can also be expressed as a model for the probability πj(η) directly, as

.

In the same way as in factor analysis, it is most often assumed that the items are conditionally independent given the factor. When this is the case, these measurement models for the individual items, taken together, define the whole measurement model for how the items measure the factor.

A plot of the probabilities πj(η) as a function of different values of the factor η is known as the item response curve of an item. Its shape depends on the parameters νj and λj. The intercept parameter νj is also known (in common IRT terminology) as the difficulty parameter. For any fixed value of λjη, higher values of νj give higher values of the item response probability πj(η). In particular, when η = 0 (its average value, if we have fixed κ = 0), the probability depends only on the difficulty parameter, as πj(0) = exp(νj)/[1 + exp(νj)].

The loading parameter λj describes the direction and strength of the association between the factor η and the item yj. When λj > 0, higher values of the factor are associated with higher probabilities that yj = 1, and when λj < 0, higher values of the factor are associated with lower probabilities that yj = 1. This association is the stronger the higher is the absolute value of λj (i.e. the further λj is from 0). When the association is strong, even small differences in η translate into relatively large differences in the probabilities πj(η), so the item “discriminates well” between individuals with different levels of η. For this reason, λj is also known as the discrimination parameter of the binary logistic measurement model.

Figure 5.1 displays the probability πj(η) for different values of the factor and for different values of the difficulty parameter νj and the discrimination parameter λj.

Figure 5.1: The probability of a binary item (item response curve of an item) for different values of ν and λ

Page 2

Chapter 5: Latent variable models with categorical indicators

One-factor model for binary items: Estimation

The latent trait model implies also a joint distribution for the p observed items. When all the items are categorical variables, this distribution is defined by the probabilities of the cells in the p-variate contingency table of the items. When the items y1, ..., yp, are all binary with values 0 and 1, this is a 2 x 2 x ... x 2 contingency table with a total of 2p cells, where each cell corresponds to one combination of values (k1, ..., kp) for the items, with each kj being either 0 or 1.

A one-factor latent trait model implies that the probabilities in this joint distribution are given by

where the integral is over all the possible values of η, and p(η) denotes the distribution (more precisely, the probability density function) of η (which under the identification assumption stated in the previous section is a normal distribution with mean 0 and variance 1).

Maximum likelihood (ML) estimates of the parameters of the model are those values of the parameters which imply the closest match (in terms of the likelihood function for the data) between these model-implied probabilities and the sample probabilities in the observed contingency table of the items. These estimates can be found using an iterative computational algorithm, as was also the case for factor analysis and structural equation modelling. Furthermore, and unlike in the case of those previous models, estimation of a latent trait model also requires that integrals over the distributions of the factors (of the kind shown above) need to be evaluated at each iteration using computer-intensive numerical methods of integration. Because of this numerical integration, latent trait models are much harder and slower to estimate than factor analysis and structural equation models. In practice this means that latent trait models with more than one or two latent factors may often be computationally too demanding to be easily and routinely used.

Page 3

Chapter 5: Latent variable models with categorical indicators

One-factor model for binary items: Factor scores

The general idea of calculating factor scores, i.e. predicted values of a latent factor given the observed items, is the same for latent trait models as it was for factor analysis. Most commonly, the factor score for an individual respondent is calculated as the expected value of the factor given the observed values of the items. These scores are calculated by any computer software (such as Stata) which can be used to fit latent trait models.

To get a sense of what the factor scores tell us, it is useful to consider a simpler version of them, the “component score” y1λ̂1 + y2λ̂2 + ... + ypλ̂p. Since the values of the binary items yj are coded as 0 or 1, this is simply the sum of the estimated loading parameters λ̂j for those items for which a respondent’s value is 1 (e.g. all the items that the respondent agrees with, if the coding is 1 for “Agree” and 0 for “Disagree”). The largest possible value of this score is then obtained by a respondent who has the value 1 for all the items which have positive estimated loadings and 0 for all the items which have negative loadings. This score is not exactly equal to the expected value that standard software use as the factor score, but it carries essentially the same information; in particular, the component scores and factor scores give exactly the same ranking of individuals in terms of the values of the latent factor. This result, and component scores and factor scores more generally, are discussed by [Bar08] and [Bar11]

Factor scores derived from a latent trait model may be used as observed variables in other analyses. This approach is discussed further later in this chapter, and an illustration of it is given in Example 2.

Page 4

Chapter 5: Latent variable models with categorical indicators

One-factor model for binary items: Assessment of model fit

Methods that can be used to assess the goodness of fit of a latent variable model have been discussed for factor analysis models in Chapter 2. Most of the general ideas introduced there, and some but not all of the methods, apply also to model assessment for latent trait models:

If we consider a model with one latent factor and conclude that it does not fit well, this suggests that either some of the items could be omitted to obtain a model which fits better for the remainder of the items, or that a model with more than one factor should be considered. Models with multiple factors are discussed briefly in the next section, together with other extensions.

Page 5

Chapter 5: Latent variable models with categorical indicators

More general models with continuous factors and categorical items

We have focused on the model with binary items and a single latent factor, in order to introduce the basic ideas of latent trait models in the simplest possible case. These models can be generalized in various ways, which we note very briefly here:

Page 6

Chapter 5: Latent variable models with categorical indicators

Example 1 on Latent trait models for binary items: A measurement model

In this and the next example we continue to use the same data as in the rest of this module. Because these data do not have any items which are originally binary, we create such items by dichotomizing three of the existing items. These are items D15, D16 and D17, the three indicators of trust in the procedural fairness of the police. The items are dichotomized so that their original levels 1 and 2 (“Not at all often” and “Not very often”) are combined as the new level 0 (“Not often”) and levels 3 and 4 (“Often” and “Very often”) as the new level 1 (“Often”). The binary items derived from D15, D16 and D17 are labelled respect, fair and explain respectively. The commands included below show how they can be created in Stata.

Fit a latent trait model for the three binary items respect, fair and explain given one latent factor, and using the data for all the countries together. Interpret the parameters of the estimated measurement model, and hence interpret the factor implied by this measurement model.

Note: Only Stata commands are included for this and the next example. In Stata, latent trait models can be fitted with the command gsem, which was first included in Stata Version 13. The R package lavaan does not currently include functions for fitting these models. There are other add-on packages in R for fitting latent trait models, but they are not used here.

Stata commands:

// A one-factor latent trait model for binary items: Measurement model
// Creating the binary item used in this illustrative example:
tab1 plcrspc plcfrdc plcexdc // Checking the distributions of the
// original items
recode plcrspc (1 2 = 0 "Not often") (3 4 = 1 "Often") (missing=.), ///
gen(respect)
recode plcfrdc (1 2 = 0 "Not often") (3 4 = 1 "Often") (missing=.), ///
gen(fair)
recode plcexdc (1 2 = 0 "Not often") (3 4 = 1 "Often") (missing=.), ///
gen(explain)
tab1 respect fair explain // Checking the distributions of the
// dichotomized items
// Fitting the latent trait model:
gsem (ProcFair -> respect fair explain, logit) ///
, var(ProcFair@1) intmethod(ghermite)
matrix b=e(b) // Saving the parameter estimates for use in the next example
predict procFairScore , latent // Saving factor scores for each respondent,
// for use in the next example

Stata output

The estimated parameters of the measurement models are shown in Table 5.1, and the item response curves implied by them are plotted in Figure 5.1. The curves show the probabilities of the response coded as 1 – i.e. “Often” – for the items, given different values of the latent factor. All the factor loadings (discrimination parameters) are positive. This means that higher values of the factor correspond to higher probabilities of the response “Often”, and thus that higher values of the factor indicate higher levels of trust in procedural fairness of the police.

Comparing the measurement parameters between the different items, we can see that the item explain has the lowest value of the intercept (difficulty) parameter. This means that given the value 0 of the factor (i.e. its mean value), this item has the lowest probability of the “Often” response (around 0.6, compared to nearly 0.9 for the other two items). The item explain also has the lowest discrimination parameter, so its item response probabilities are a little less strongly associated with the factor than are the probabilities of the other two items (this can be seen in Figure 5.1, where the item response curve of this item is the least steep of the three). The measurement models of the items respect and fair, on the other hand, are very similar to each other.

Table 5.1: Estimated parameters (and their standard errors) of the measurement model for binary items “respect”, “fair” and “explain”, given latent factor “Procedural fairness of the police”. The model is fitted to the pooled sample of all respondents in the ESS (n=50501).

Item Intercept ν̂j
("difficulty parameter")
Loading λ̂j
("discrimination parameter")
respect 1.99 (0.03) 3.24 (0.05)
fair 1.83 (0.03) 3.52 (0.06)
explain 0.44 (0.02) 2.18 (0.03)

Figure 5.2: Item response curves for the binary items “respect”, “fair” and “explain”, given latent factor “Procedural fairness of the police”, from the estimated measurement models in Table 5.1.

Page 7

Chapter 5: Latent variable models with categorical indicators

Example 2 on Latent trait models for binary items: Multigroup analysis for cross-national comparison of the factor means


Note: The discussion of this example focuses on somewhat more advanced topics than most of the rest of this module.

Using the same three binary indicators of trust in the procedural fairness of the police, estimate and compare the averages of this factor between the countries in the ESS.

This analysis expands that of Example 1 by adding a structural model where the country of the respondent is used as an explanatory variable for the factor. This is thus a multigroup analysis with country as the group and assuming cross-national equivalence of measurement for all of the observed items. Unlike for the sem command for factor analysis and structural equation modelling, the gsem command in Stata does not have separate command syntax for multigroup analysis. Instead, the group (here country) is simply specified as a categorical explanatory variable for the factor, and entered in the form of dummy variables for the groups (omitting the dummy variable for one reference group, which is here taken to be Belgium).

Stata commands:

// A one-factor latent trait model for binary items:
// Models which involve country as explanatory variable

// ** Version 1: Fitting measurement and structural models together:
// Initial fit with smaller number of integration points
// (faster but less accurate):
gsem (ProcFair -> respect fair explain, logit) ///
(ProcFair <- i.country) ///
, var(e.ProcFair@1) from(b,skip) intmethod(ghermite) intpoints(3)
matrix b2=e(b)
// Final fit with more integration points, with starting values from
// the first fit above:
gsem (ProcFair -> respect fair explain, logit) ///
(ProcFair <- i.country) ///
, var(e.ProcFair@1) from(b2) intmethod(ghermite)

// ** Version 2: A three-step model where the factor score from Example 1
// is used as the response variable
reg procFairScore i.country

Stata output and notes

The Stata commands included above show two ways of carrying out this analysis:

These approaches have slightly different characteristics:

Two further characteristics are common to both Methods 2 and 3:

The differences between these different ways of fitting the model matter ultimately only if they lead to meaningful differences in the main conclusions that we aim to draw from the analysis. In this example these questions of interest are the comparisons of average levels of the factor (trust in the procedural fairness of the police) between the countries. Figure 5.2 shows these country averages estimated using Methods 1 and 2. It is clear that both sets of estimates are very similar, in that both give essentially the same relative differences and rankings of the countries. Many of the differences between the countries are clearly statistically significant (the standard errors of the means from Method 1 are around 0.05). The ordering of the countries shows fairly consistent geographic regularities, with levels of trust in procedural fairness of the police mostly highest in the North and West of Europe and lower in the South and East.

Figure 5.3: Averages of the factor “Procedural fairness of the police” in the countries in the ESS, as estimated in Example 2. The plot shows two estimates of these averages, from a joint model (“Method 1” as discussed in the text, on the horizontal axis), and from a linear model for factor scores derived from the measurement model fitted in Example 1 (“Method 2”, on the vertical axis). The main conclusion from this comparison is that both sets of estimates give very similar results about comparisons between the countries.

Page 8