Chapter 5: Latent variable models with categorical indicators

One-factor model for binary items: Definition

To introduce the basic elements of latent trait (IRT) models in their simplest case, we consider first a model where p observed items y1, ..., yp are used as indicators of a single continuous latent factor η, and where each of the items is binary, meaning that it has only two possible values. We suppose that the two values of each item are coded as 0 and 1, for example 0 for “Disagree” and 1 for “Agree” for a survey question which has only these response options.

Just like in factor analysis, it is assumed that the factor η is normally distributed with mean κ and variance φ. To identify the latent scale, we again need to fix either the parameters of this distribution or one intercept and one loading in the measurement models at specific values. In the examples of this section we do the former, and fix κ = 0 and φ = 1. For the whole one-factor model for binary items to be identified, we must have at least p = 3 items.

While the specification of the latent factor η is the same as in factor analysis, the definition of the measurement model must change when the items are binary rather than continuous. We now assume that, conditional on η, each item yj (j = 1, ..., p) follows a binomial (Bernoulli) distribution with probability parameter πj(η) = P(yj = 1|η) [and thus with 1 - πj(η) = P(yj = 0|η)]. The model for this probability given the factor should be a regression model which is appropriate for binary response variables. Here we use the standard binary logistic model

for each j = 1, ..., p. This can also be expressed as a model for the probability πj(η) directly, as


In the same way as in factor analysis, it is most often assumed that the items are conditionally independent given the factor. When this is the case, these measurement models for the individual items, taken together, define the whole measurement model for how the items measure the factor.

A plot of the probabilities πj(η) as a function of different values of the factor η is known as the item response curve of an item. Its shape depends on the parameters νj and λj. The intercept parameter νj is also known (in common IRT terminology) as the difficulty parameter. For any fixed value of λjη, higher values of νj give higher values of the item response probability πj(η). In particular, when η = 0 (its average value, if we have fixed κ = 0), the probability depends only on the difficulty parameter, as πj(0) = exp(νj)/[1 + exp(νj)].

The loading parameter λj describes the direction and strength of the association between the factor η and the item yj. When λj > 0, higher values of the factor are associated with higher probabilities that yj = 1, and when λj < 0, higher values of the factor are associated with lower probabilities that yj = 1. This association is the stronger the higher is the absolute value of λj (i.e. the further λj is from 0). When the association is strong, even small differences in η translate into relatively large differences in the probabilities πj(η), so the item “discriminates well” between individuals with different levels of η. For this reason, λj is also known as the discrimination parameter of the binary logistic measurement model.

Figure 5.1 displays the probability πj(η) for different values of the factor and for different values of the difficulty parameter νj and the discrimination parameter λj.

Figure 5.1: The probability of a binary item (item response curve of an item) for different values of ν and λ

Go to next page >>