Chapter 2: Factor Analysis

Factor scores

After a factor analysis model has been fitted, we can use the estimated model to calculate predicted values for the factors for any individuals, based on their observed values of the indicators of the factors. These predictions are known as "factor scores". They are weighted sums of the values of the observed items, with the weights determined by the parameters of the fitted model. Roughly, indicators which are more reliable measures of a factor (in essence, those with larger loadings) will receive higher weights in the calculation of a factor score for that factor.

A calculated factor score may then be used as an observed single measure of the corresponding latent construct in subsequent analyses, for example as an explanatory or response variable in regression models for associations between the construct and other variables. Whether it is desirable to do this depends on our view of the meaning and role of the latent variables in the model specification in specific applications. If we believe that the factor analysis model is a more or less real representation of how the observed indicators measure an unobservable but real latent factor, then substituting a factor score for the factor is undesirable. The reason for this is that the factor score and the true value of the factor for an individual will not be identical, which in turn will cause measurement error bias in many types of analyses if the factor score is used directly in the role of the factor. Instead, we will then prefer analyses which do not calculate factor scores but which estimate measurement models and models for the latent constructs together in one go. Such structural equation models with latent variables are discussed in Chapter 4 of this module.

However, we may also have a less strong view of the role of the latent factors. This is the case if we use the factor analysis model simply as a pragmatic device for deriving a rule for calculating a summary measure of a construct from multiple imperfect indicators it. The main or sole purpose of the factor analysis is then to calculate that summary measure – i.e. the factor score – so that we can use it in subsequent analyses. This approach also has the practical advantage that it is often much simpler to conduct the analysis in these separate steps – deriving the factor scores first, and then using them as observed variables in other analyses – than it is to combine them in one analysis.

Factor scores are not used in the main examples of this module. For completeness, we include here an example of how such scores can be calculated and saved in Stata and in R. This example uses a one-factor model for the three indicators of the construct "obligation to obey the police", and then calculates a factor score for that construct. Here the model is fitted and the scores calculated only for respondents in the United Kingdom.

An example of Stata commands for calculating and saving a factors score:

Show Stata commands

// Example of creating and saving factor scores:
sem (Obey -> bplcdc doplcsy dpcstrb) if cntry=="GB", ///
var(Obey@1) method(mlmv)
predict obeyScore if cntry=="GB", latent(Obey)
// Factor score for latent variable Obey will be called obeyScore.

An example of R commands for calculating and saving a factor score:

Show R commands

# Example of creating and saving factor scores:
ModelSyntax <- 'Obey =~ bplcdc + doplcsy + dpcstrb'
FittedModel <- sem(model = ModelSyntax, data = ESS5Police[ESS5Police$cntry=="GB",], = TRUE, meanstructure = TRUE,missing="ml")
ESS5Police$obeyScore <- NA
fscores <- lavPredict(FittedModel,type="lv",method="regression")
ESS5Police[ESS5Police$cntry=="GB",][inspect(FittedModel,"case.idx"),"obeyScore"] <- fscores[,"Obey"]
# Factor score for latent variable Obey will be called obeyScore.
# This is calculated only for cases with no missing data on the indicators.

Go to next page >>