# Chapter 5: Latent variable models with categorical indicators

### One-factor model for binary items: Definition

To introduce the basic elements of latent trait (IRT) models in their simplest case, we consider first a model where *p* observed items *y*_{1}, ..., *y*_{p} are used as indicators of a single continuous latent factor *η*, and where each of the items is binary, meaning that it has only two possible values. We suppose that the two values of each item are coded as 0 and 1, for example 0 for “Disagree” and 1 for “Agree” for a survey question which has only these response options.

Just like in factor analysis, it is assumed that the factor *η* is normally distributed with mean *κ* and variance φ. To identify the latent scale, we again need to fix either the parameters of this distribution or one intercept and one loading in the measurement models at specific values. In the examples of this section we do the former, and fix *κ* = 0 and φ = 1. For the whole one-factor model for binary items to be identified, we must have at least *p* = 3 items.

While the specification of the latent factor *η* is the same as in factor analysis, the definition of the measurement model must change when the items are binary rather than continuous. We now assume that, conditional on *η*, each item *y*_{j} (*j* = 1, ..., *p*) follows a binomial (Bernoulli) distribution with probability parameter *π*_{j}(*η*) = P(*y*_{j} = 1|*η*) [and thus with 1 - *π*_{j}(*η*) = P(*y*_{j} = 0|*η*)]. The model for this probability given the factor should be a regression model which is appropriate for binary response variables. Here we use the standard binary logistic model

for each *j* = 1, ..., *p*. This can also be expressed as a model for the probability *π*_{j}(*η*) directly, as

.

In the same way as in factor analysis, it is most often assumed that the items are conditionally independent given the factor. When this is the case, these measurement models for the individual items, taken together, define the whole measurement model for how the items measure the factor.

A plot of the probabilities *π*_{j}(*η*) as a function of different values of the factor *η* is known as the *item response curve* of an item. Its shape depends on the parameters *ν*_{j} and *λ*_{j}. The intercept parameter *ν*_{j} is also known (in common IRT terminology) as the *difficulty parameter*. For any fixed value of *λ*_{j}*η*, higher values of *ν*_{j} give higher values of the item response probability *π*_{j}(*η*). In particular, when *η* = 0 (its average value, if we have fixed *κ* = 0), the probability depends only on the difficulty parameter, as *π*_{j}(0) = exp(*ν*_{j})/[1 + exp(*ν*_{j})].

The loading parameter *λ*_{j} describes the direction and strength of the association between the factor *η* and the item *y*_{j}. When *λ*_{j} > 0, higher values of the factor are associated with higher probabilities that *y*_{j} = 1, and when *λ*_{j} < 0, higher values of the factor are associated with lower probabilities that *y*_{j} = 1. This association is the stronger the higher is the absolute value of *λ*_{j} (i.e. the further *λ*_{j} is from 0). When the association is strong, even small differences in *η* translate into relatively large differences in the probabilities *π*_{j}(*η*), so the item “discriminates well” between individuals with different levels of *η*. For this reason, *λ*_{j} is also known as the *discrimination parameter* of the binary logistic measurement model.

Figure 5.1 displays the probability *π*_{j}(*η*) for different values of the factor and for different values of the difficulty parameter *ν*_{j} and the discrimination parameter *λ*_{j}.

*Figure 5.1: The probability of a binary item (item response curve of an item) for different values of*

*ν*and*λ*