# Chapter 2: Factor Analysis

### Example 1 on Factor analysis: Models in one country

Consider the data in our example for respondents in the UK, for the questions D12-D17. It is proposed that questions D12 – D14 measure only the factor "effectiveness of the police" and questions D15 – D17 only the factor "procedural fairness of the police". Fit a 2-factor CFA model with this measurement structure and a correlation between the factors, and also fit 1-factor and a 2-factor EFA models for these items. Do the models fit the data well? In particular, does the CFA model fit the data well enough to be adequate for subsequent use for these items? Interpret the scales of the two factors and their estimated correlation in the CFA model.

// Example of fitting factor analysis models, UK data only
// 1-Factor model:
sem (Factor -> plcpvcr plccbrg plcarcr plcrspc plcfrdc plcexdc) ///
if cntry=="GB", var(Factor@1) method(mlmv)
estat gof, stats(all) // Goodness-of-fit statistics
// 2-factor CFA model:
sem (Effective -> plcpvcr plccbrg plcarcr) ///
(ProcFair -> plcrspc plcfrdc plcexdc) if cntry=="GB", ///
var(Effective@1 ProcFair@1) method(mlmv)
estat gof, stats(all)
estimates store cfa2
matrix b=e(b) // Estimates from this model, for use as starting values for EFA
// 2-factor EFA model:
sem (Effective -> plcpvcr plccbrg plcarcr (plcrspc,init(0)) (plcfrdc,init(0))) ///
(ProcFair -> (plccbrg,init(0)) (plcarcr,init(0)) plcrspc plcfrdc plcexdc) ///
if cntry=="GB", ///
var(Effective@1 ProcFair@1) method(mlmv) from(b)
estat gof, stats(all)
lrtest cfa2 . // Likelihood ratio test between this EFA model and the CFA model

# Example of fitting factor analysis models, UK data only
library(lavaan)
# 1-Factor model:
ModelSyntax <- '
Factor =~ plcpvcr + plccbrg + plcarcr
+ plcrspc + plcfrdc + plcexdc
'
FittedModel <- sem(model = ModelSyntax,
data = ESS5Police[ESS5Police\$cntry=="GB",],
std.lv = TRUE, meanstructure = TRUE,missing="ml")
summary(FittedModel,fit.measures=T)
# 2-factor CFA model:
ModelSyntax <- '
Effective =~ plcpvcr + plccbrg + plcarcr
ProcFair =~ plcrspc + plcfrdc + plcexdc
'
FittedModel.cfa2 <- sem(model = ModelSyntax,
data = ESS5Police[ESS5Police\$cntry=="GB",],
std.lv = TRUE, meanstructure = TRUE,missing="ml")
summary(FittedModel.cfa2,fit.measures=T)
# 2-factor EFA model:
ModelSyntax <- '
Effective =~ plcpvcr+plccbrg+plcarcr+plcrspc+plcfrdc+0*plcexdc
ProcFair =~ 0*plcpvcr+plccbrg+plcarcr+plcrspc+plcfrdc+plcexdc
'
FittedModel.efa2 <- sem(model = ModelSyntax,
data = ESS5Police[ESS5Police\$cntry=="GB",],
std.lv = TRUE, meanstructure = TRUE,missing="ml")
summary(FittedModel.efa2,fit.measures=T)
# Likelihood ratio test between this EFA model and the CFA model:
anova(FittedModel.cfa2,FittedModel.efa2)

Some model assessment statistics are shown in Table 2.1. From them, we observe the following:

• The overall goodness of fit test indicates that all three models fit poorly (with p<0.001 in each case). This is not unusual and is not in itself conclusive, because with moderate to large sample sizes the test is sensitive to even small amounts of lack of fit.
• The 2-factor EFA model differs from the 2-factor CFA model in that the CFA model sets to 0 several factor loadings which are not zero in the EFA model. These are the "cross-loadings" between a factor and the items which are not expected to be indicators of that factor, such as the loading of Effectiveness in the measurement model for item D15 (plcrspc). The likelihood ratio test between these two models is a test of the null hypothesis that all of these cross-loadings are 0 in the population. Here this hypothesis is rejected (p<0.001), indicating that the EFA model fits better than the CFA model. However, this may again be partly due to the sensitivity of the test.
• The RMSEA and CFI fit indices suggest that the 1-factor model fits badly, the 2-factor EFA model very well, and the 2-factor CFA model with a moderate to good fit.
• Both the AIC and BIC statistics have their smallest values for the 2-factor EFA model, suggesting that this model achieves the best balance between goodness of fit and parsimony (number of parameters).
These results give some support for the conclusion that the 2-factor CFA model with a simple measurement model gives an adequate fit to the data on the six survey items in the sample of UK respondents, although also with evidence that this model still fits less well than the 2-factor EFA model.

Table 2.1: Model assesment statistics

Model LR test vs. saturated model:
p-value
AIC BIC RMSEA CFI
1-factor model
`           <0.001`
`39490`
`39595`
`0.188`
`0.808`
2-factor EFA model
`           <0.001`
`38729`
`38863`
`0.007`
`1.000`
3-factor CFA model
`           <0.001`
`38781`
`38891`
`0.054`
`0.986`
LR test: 2-factor EFA vs. 2-factor CFA model:
p-value
`           <0.001`

A path diagram for this CFA model, with estimated values of the parameters also included, is shown in Figure 2.2 (this graph was obtained from Stata, where the default is to show also the measurement errors as circled latent variables). All of the factor loadings have positive values. Recalling the coding of the items, this implies the interpretation that high values of the factors correspond to positive evaluations of the effectiveness and procedural fairness of the police. The estimated correlation of the factors is +0.58, so in the UK individuals who have a positive view of the effectiveness of the police also tend to have a positive view of their procedural fairness.

Figure 2.2: Path diagram and parameter estimates for a 2-factor Confirmatory Factor Analysis model for indicators of Effectiveness and Procedural fairness of the police (UK respondents).

Go to next page >>