Chapter 2: Factor Analysis

The mathematical formula of a factor analysis model

Let η denote a single latent factor, and let y1, ..., yp be p indicators of η. The factor η is assumed to be normally distributed with mean κ and variance φ. The measurement model for item yj as a measure of η is

yj = νj + λjη + εj for each j = 1, ..., p.

This is a simple linear regression model where item yj is the dependent variable, factor η is the explanatory variable, and εj is the residual or measurement error. It is assumed that the εj are all normally distributed with means 0 and variances θj and that they are uncorrelated with η. The parameters of this measurement model of an item given one factor are the intercept νj, the regression coefficient λj - which in factor analysis is called the loading - and the variance θj of the measurement error. For instance, for the factor "Obligation to obey" we use the three indicators D18-D20, so p = 3 and the measurement model consists of the three models

y1 = ν1 + λ1η + ε1
y2 = ν2 + λ2η + ε2
y3 = ν3 + λ3η + ε3

which - if we substitute labels for the variables in this example - stands for

(item D18) = ν1 + λ1(Obligation to obey) + (measurement error)1
(item D19) = ν2 + λ2(Obligation to obey) + (measurement error)2
(item D20) = ν3 + λ3(Obligation to obey) + (measurement error)3

It is usually assumed that the measurement errors εj are all uncorrelated with each other, so that there are no "error correlations" (residual correlations) between the observed indicators of the factor after we control for their common dependence on η. However, this assumption is sometimes relaxed by allowing non-zero covariances cov(εj, εk) = θjk between the error terms of one or more specific pairs of items.

The model thus describes a situation where each item measures the factor, but not perfectly, so that the value of an item is determined by the factor and a measurement error. The larger λ2j φ is relative to θj, the larger is the percentage of the variance of yj that is explained by the factor and the more reliable is thus yj as an indicator for η.

The model may have more than one factor. For example, suppose that six items y1, ..., y6 are regarded as measures of two latent factors η1 and η2. The measurement model may then be extended, for example as

y1 = ν1 + λ11η1 + λ12η2 + ε1
y2 = ν2 + λ21η1 + λ22η2 + ε2
y3 = ν3 + λ31η1 + λ32η2 + ε3
y4 = ν4 + λ41η1 + λ42η2 + ε4
y5 = ν5 + λ51η1 + λ52η2 + ε5
y6 = ν6 + λ61η1 + λ62η2 + ε6

where η1 and η2 are assumed to be jointly normally distributed, with means κ1 and κ2, variances φ1 and φ2, and covariance φ12. In this model, all items are measures of both factors. Often we consider more restrictive (and thus simpler) models, in particular ones where each item is taken to measure only one factor. This is achieved by setting some of the loadings λjk to 0. For example, suppose that y1, y2, y3 measure factor η1 and y4, y5, y6 measure η2. The measurement model is then

			    y1 = ν1 + λ11η1		+ ε1
y2 = ν2 + λ21η1 + ε2
y3 = ν3 + λ31η1 + ε3
y4 = ν4 + λ42η2 + ε4
y5 = ν5 + λ52η2 + ε5
y6 = ν6 + λ62η2 + ε6

Factor analysis models where all items measure all factors and there are no error correlations are often referred to as Exploratory Factor Analysis (EFA) models, and models with other sets of assumptions (such as further constraints of zero loadings, other parameter constraints, or non-zero error correlations) are known as Confirmatory Factor Analysis (CFA) models.

Go to next page >>