# Chapter 2: Factor Analysis

### The mathematical formula of a factor analysis model

Let *η* denote a single latent factor, and let *y*_{1}, ..., *y*_{p} be *p* indicators of *η*. The factor *η* is assumed to be normally distributed with mean *κ* and variance *φ*. The *measurement model* for item *y*_{j} as a measure of *η* is

*y*_{j} = ν_{j} + λ_{j}*η* + ε_{j} for each *j* = 1, ..., *p*.

This is a simple linear regression model where item *y*_{j} is the dependent variable, factor *η* is the explanatory variable, and *ε*_{j} is the residual or *measurement error*. It is assumed that the *ε*_{j} are all normally distributed with means 0 and variances* θ*_{j} and that they are uncorrelated with *η*. The parameters of this measurement model of an item given one factor are the intercept *ν*_{j}, the regression coefficient *λ*_{j} - which in factor analysis is called the *loading* - and the variance *θ*_{j} of the measurement error. For instance, for the factor "Obligation to obey" we use the three indicators D18-D20, so *p* = 3 and the measurement model consists of the three models

*
y _{1} = ν_{1} + λ_{1}η + ε_{1} *

y_{2} = ν_{2} + λ_{2}η + ε_{2}

y_{3} = ν_{3} + λ_{3}η + ε_{3}

which - if we substitute labels for the variables in this example - stands for

*
(item D18) = ν _{1} + λ_{1}(Obligation to obey) + (measurement error)_{1} *

(item D19) = ν_{2} + λ_{2}(Obligation to obey) + (measurement error)_{2}

(item D20) = ν_{3} + λ_{3}(Obligation to obey) + (measurement error)_{3}

It is usually assumed that the measurement errors *ε*_{j} are all uncorrelated with each other, so that there are no "error correlations" (residual correlations) between the observed indicators of the factor after we control for their common dependence on *η*. However, this assumption is sometimes relaxed by allowing non-zero covariances cov(*ε*_{j}, *ε*_{k}) = *θ*_{jk} between the error terms of one or more specific pairs of items.

The model thus describes a situation where each item measures the factor, but not perfectly, so that the value of an item is determined by the factor and a measurement error. The larger *λ*^{2}_{j} *φ* is relative to *θ*_{j}, the larger is the percentage of the variance of *y*_{j} that is explained by the factor and the more reliable is thus *y*_{j} as an indicator for *η*.

The model may have more than one factor. For example, suppose that six items *y*_{1}, ..., *y*_{6} are regarded as measures of two latent factors *η*_{1} and *η*_{2}. The measurement model may then be extended, for example as

*y _{1} = ν_{1} + λ_{11}η_{1} + λ_{12}η_{2} + ε_{1}*

*y*

_{2}= ν_{2}+ λ_{21}η_{1}+ λ_{22}η_{2}+ ε_{2}*y*

_{3}= ν_{3}+ λ_{31}η_{1}+ λ_{32}η_{2}+ ε_{3}*y*

_{4}= ν_{4}+ λ_{41}η_{1}+ λ_{42}η_{2}+ ε_{4}*y*

_{5}= ν_{5}+ λ_{51}η_{1}+ λ_{52}η_{2}+ ε_{5}*y*

_{6}= ν_{6}+ λ_{61}η_{1}+ λ_{62}η_{2}+ ε_{6}
where *η*_{1} and *η _{2}* are assumed to be jointly normally distributed, with means

*κ*

_{1}and

*κ*

_{2}, variances

*φ*

_{1}and

*φ*

_{2}, and covariance

*φ*

_{12}. In this model, all items are measures of both factors. Often we consider more restrictive (and thus simpler) models, in particular ones where each item is taken to measure only one factor. This is achieved by setting some of the loadings

*λ*

_{jk}to 0. For example, suppose that

*y*

_{1},

*y*

_{2},

*y*

_{3}measure factor

*η*

_{1}and

*y*

_{4},

*y*

_{5},

*y*

_{6}measure

*η*

_{2}. The measurement model is then

y_{1}= ν_{1}+ λ_{11}η_{1}+ ε_{1}y_{2}= ν_{2}+ λ_{21}η_{1}+ ε_{2}y_{3}= ν_{3}+ λ_{31}η_{1}+ ε_{3}y_{4}= ν_{4}+ λ_{42}η_{2}+ ε_{4}y_{5}= ν_{5}+ λ_{52}η_{2}+ ε_{5}y_{6}= ν_{6}+ λ_{62}η_{2}+ ε_{6}

Factor analysis models where all items measure all factors and there are no error correlations are often referred to as *Exploratory Factor Analysis* (EFA) models, and models with other sets of assumptions (such as further constraints of zero loadings, other parameter constraints, or non-zero error correlations) are known as *Confirmatory Factor Analysis* (CFA) models.