# All pages

# Chapter 4: Structural Equation Models

### Introduction

In factor analysis, the focus of interest is on how observed indicators act as measures of latent factors, on average levels and variability of the factors, and on assigning values to the factors for individuals. In structural equation models (SEMs), the focus is extended and shifted toward answering questions about relationships (associations) between the constructs of interest, both latent and observed, rather than their measurement. This is illustrated by Figure 4.1, which shows a path diagram for the SEM which is considered in the examples of this chapter. This model has two core elements:

First, the *structural model* describes the relationships between the constructs of interest. Here those constructs are the five latent variables in the model, and the structural model for them is given by the substantive theoretical model introduced in Chapter 1 (see Figure 1.1). In the diagram, one-headed arrows indicate conditional distributions (regression relationships) where a response variable depends on an explanatory variable, while two-headed arrows indicate correlations (or residual correlations) for variables which are considered to be on an equal footing. Here the two factors *Effectiveness* and *Procedural Fairness* are on an equal footing and are used as only explanatory variables (such variables are also known as *exogenous variables*), while the factors Obligation to Obey, Moral Alignment and Co-operation are response variables to the exogenous variables and/or to each other (i.e. they are *endogenous variables*). Note that here Obligation to Obey and Moral Alignment are "intervening" or "mediating" variables which are response variables in some relationships (here given the exogenous variables) and explanatory variables in others (here for Co-operation). This kind of ordering of the variables and the assignment of different roles for them in the structural model is derived from substantive theory for the constructs.

Second, the *measurement model* describes how any latent constructs are measured by observed indicators. Here we use the fifteen survey indicators introduced in Chapter 1, and assume that each latent factor is measured by three of the indicators as shown in Figure 4.1. The measurement model for each factor is a factor analysis model for its three indicators.

More generally, some or even all of the explanatory and/or response variables in the structural model of a SEM may be observed rather than latent variables. For example, here we might have included the respondents' age or ethnic group as additional explanatory variables for the latent factors. For such variables a separate measurement model is redundant and is omitted, because the constructs are assumed to be directly observable.

# Chapter 4: Structural Equation Models

### Research questions

The main focus in analyzing structural equation models is usually on answering questions about the structural model, such as the following:

- What are the signs and strengths of the relationships between the variables in the model?
- Are any of these relationships absent (i.e. can they be set to 0)?
- If substantive theory hypothesizes that some relationships are absent, is this justified?

We may also want to know how answers to these questions vary between groups such as countries. This can be assessed using multigroup structural equation modelling. It is discussed at the end of this chapter.

It is also important to remember that these relationships, as estimated from a SEM fitted to observed data, are estimated statistical associations, not causal effects. The substantive theory which informs the specification of the form of the structural model will often include hypotheses about causal relationships between the constructs, which in part determine the ordering of the variables and their relationships in the structural model. However, the act of specifying such a model, drawing a path diagram for it, and estimating its parameters does not in itself imply that the results can then be interpreted as estimated causal effects. For such a claim to be justified, the data and the research design that produced them will always need to satisfy additional assumptions which cannot be checked from the analysis of the data alone. In this respect the situation for a SEM remains the same as it is for any inference on causal effects from any kind of statistical analysis on observed data.

# Chapter 4: Structural Equation Models

### Specification of a SEM

In conventional structural equation modelling – and in standard SEM software – all the variables in the structural model are treated as continuous and normally distributed, and all models for response variables are specified as linear regression models. (The only exception to these assumptions is that any observed variables which are used only as explanatory variables, such as age and gender of a respondent, can be of any type, just like in standard regression modelling.) In principle this is not essential, as the structure and ideas of SEMs can be generalized, for example to cases where some of the variables in the model are treated as categorical. Doing so would involve replacing assumptions of normal distributions and linear models with other assumptions where necessary. Some examples of doing so are discussed in Chapter 5 of this module. In practice, however, the use of such models is constrained by what can be conveniently implemented with current computer software. This is more limited for more general models than it is for the linear SEMs for continuous variables which are the topic of this chapter.

In a conventional linear SEM, all the measurement models are factor analysis models, so they are specified in the way that was described in Chapter 2. Distributions of exogenous latent variables are also specified as in factor analysis, as multivariate normal distributions. So the only new element which remains to be described here are regression models for endogenous variables. These are all specified as linear regression models.

To illustrate this model specification in a specific situation, consider the example shown in Figure 4.1. Let *η*_{1} and *η*_{2} denote the factors *Effectiveness* and *Procedural Fairness* respectively, and *η*_{3}, *η*_{4}, and *η*_{5} *Obligation to Obey, Moral Alignment* and *Co-operation* respectively (often different Greek letters are used for exogenous and endogenous latent variables, but we do not do that here). The joint distribution of *η*_{1} and *η*_{2} is specified as a bivariate normal distribution, as in factor analysis. The models for the other latent factors are specified as:

*η*_{3} = *α*_{3} + *β*_{31}*η*_{1} + *β*_{32}*η*_{2} + *ζ*_{3,}

*η*_{4} = *α*_{4} + *β*_{41}*η*_{1} + *β*_{42}*η*_{2} + *ζ*_{4,} and

*η*_{5} = *α*_{5} + *β*_{51}*η*_{1} + *β*_{52}*η*_{2} + *β*_{53}*η*_{3} + *β*_{54}*η*_{4} + *ζ*_{4}.

Here the *ζ*'s are normally distributed random residuals, each with mean 0 and variances var(*ζ*_{1}) = *ψ*_{1}, var(*ζ*_{2}) = *ψ*_{2} and var(*ζ*_{3}) = *ψ*_{3}. Each *ζ* is assumed to be uncorrelated with the explanatory variables in the model where it appears, and all the *ζ*'s are uncorrelated with all the measurement errors (ε's) in the measurement models for the *η*'s. The residuals for *η*_{3} and *η*_{4} are here assumed to be correlated, with covariance cov(*ζ*_{3},*ζ*_{4})= *ψ*_{34} (this corresponds to the two-headed arrow between Obey and ProcFair in Figure 4.1), but both are assumed to be uncorrelated with *ζ*_{5}.

These models for the endogenous *η*'s are linear regression models. Their parameters are the intercepts (*α*'s), residual variances and covariances (*ψ*'s), and regression coefficients (*β*'s). The regression coefficients are usually the parameters of main interest. They are interpreted in the same way as in standard linear regression modelling, as partial associations between explanatory variables and response variables. For example, here *β*_{51} can be interpreted as the expected change in Co-operation (in its units of measurement) that is associated with a one-unit increase in *effectiveness*, controlling for the other three latent factors. Note also that each *β* describes the strength of one regression path (one-headed arrow) in the path diagram for the model, and a value of 0 for a *β* would be indicated by omitting that path from the diagram.

# Chapter 4: Structural Equation Models

### Identification of SEMs

Conditions for a (single-group) structural equation model to be identified are an extension of the corresponding conditions for factor analysis models. First, to identify the latent scales, it is sufficient to assume that the means and variances of all exogenous latent variables are set to 0 and 1 respectively, and all intercept terms (*α*'s) and residual variances (*ψ*'s) of structural regression models are also set at 0 and 1 respectively. Once this is assumed, remaining conditions for identification depend on the structure of the model. However, the following two-step condition is sufficient and usually easy to check:

- Re-express the model as a confirmatory factor analysis (CFA) model, by replacing all paths in the structural model (whether covariances or regressions) with covariances (two-headed arrows). Then use identification rules for CFA (see Chapter 2) to check whether this model is identified. If it is, the measurement model of the SEM is identified.
- Consider then the structural model on its own, and replace each latent variable in it with a single observed variable. If this model would be identified and if condition 1 holds, the whole SEM is identified.

A sufficient condition for requirement 2. to hold is that the structural model should be recursive. This means, roughly, that the model should not contain feedback loops. Such loops might take the form of regression models in both directions between two variables, a regression and a residual correlation between two variables, or longer chains of variables which imply similar loops which would allow us to start from a variable and return back to it by following the arrows in the path diagram. Non-recursive structural models are conceptually and practically complicated even when they are identified, so they should be avoided unless substantive theory gives very strong reasons to consider them.

[Bol89] gives a more detailed account of identification conditions for SEMs.

# Chapter 4: Structural Equation Models

### Model selection and model assessment for SEMs

An identified structural equation model can be estimated using the same principles as for factor analysis models: the SEM implies also a model for the joint distribution of the observed variables, and the estimated parameters of the fitted model are chosen so as to give the best match between the model-implied distribution and the observed distribution of the sample data. Most often this is done using the method of maximum likelihood (ML) estimation. Examples of the computer commands for fitting SEMs are given in Examples 1 and 2 of this chapter.

For model selection for SEMs, we recommend the following two-step approach:

- Identify a sufficiently well-fitting and interpretable measurement model separately for each set of observed indicators which you expect on theoretical grounds to be measures of one or more common factors. In other words, indicators which are definitely regarded as measures of different constructs, and which you would never consider combining together in one summary measure of anything, are examined separately at this stage. This step is carried out using methods of model assessment for factor analysis models, as discussed in Chapter 2.
- Using the forms of the measurement models identified in Step 1 (i.e. fixing their patterns of zero and non-zero factors loadings to specify which indicators measure which factors), fit full structural equation models to estimate the parameters of both the measurement and structural models. Use likelihood ratio tests or z-tests of coefficients to compare different specifications for the structural model, in particular to test whether some paths (regression coefficients) in this model may be set to 0.

This approach avoids the common but rather unhelpful approach of using the kinds of overall goodness of fit statistics which were discussed in Chapter 2 (RMSEAs, CFIs, overall goodness of fit statistics, and so on) to assess the fit of the whole SEM at once. The problem with this approach is that if the model already includes all structural paths and adequate measurement models for all sets of indicators which belong together, and if model assessment statistics still appear unsatisfactory, the only way to improve the "fit" of the model would be to add factor loadings or error correlations between factors and/or indicators which refer to theoretically distinct constructs (quantities such as a factor loading between, say, effectiveness of the police and an indicator of co-operation in our example). This is not something that we would or should actually do in practice, so model assessment which could lead to it is not very constructive.

# Chapter 4: Structural Equation Models

### Multigroup structural equation models

Like factor analysis models, SEMs can be generalized to a multigroup version to examine how parameters of the model may vary between groups such as countries. For the measurement models and the distribution of exogenous latent factors this is done as in multigroup factor analysis, in the ways that were explained in Chapter 3. So the only new element that needs to be explained here is the multigroup extension of structural regression models for endogenous factors. To illustrate this step, consider the model for factor η_{5} (willingness to co-operate with the police) in our example. The multigroup version of this model is

η_{5} = α_{5}^{(g)} + β_{51}^{(g)}η_{1} + β_{52}^{(g)}η_{2} + β_{53}^{(g)}η_{3} + β_{54}^{(g)}η_{4} + ζ_{5}

for a respondent in group *g* = 1, ..., *G* where *ζ*_{5} is normally distributed with mean 0 and variance *ψ*_{5}^{(g)}. In other words, all the parameters of this model may vary between the groups. The intercept and residual variance again need to be fixed in one group (for example, by fixing *α*_{5}^{(1)} = 0 and *ψ*_{5}^{(1)} = 1) but can be freely estimated in the other groups.

Likelihood ratio tests can be used to compare models where parameters do and do not vary between the groups, for example to test the model above against one where ψ_{5}^{(g)} = ψ_{5} in all groups *g*, (i.e. where the residual variance for *η*_{5} is the same across the groups).

Typically the most interesting parameters are the regression coefficients *β*. Cross-group variation in them indicates that associations between a response variable (e.g. *η*_{5} above) and its explanatory variables are of different strengths in different groups. In the language of regression models, such variation thus indicates an *interaction* between the group and the explanatory variables in a model. An illustration of the estimation and testing such interactions in a cross-national analysis is given in Example 2 of this chapter.

# Chapter 4: Structural Equation Models

### Example 1 on Structural equation modelling: A model for one country

Fit the structural equation model shown in Figure 4.1, for ESS data on respondents from the United Kingdom only. Interpret the signs of the regression coefficients in the structural model. Are all of these coefficients significantly different from 0?

Figure 4.2 shows a path diagram of the structural model, with estimated values of the regression coefficients associated with each of the paths. All of the coefficients are significantly different from 0 (at the 5% level of significance), with the exception that the association between Moral alignment and Co-operatione is not significant in this model which also includes the other three factors as explanatory variables. With one exception the estimated regression coefficients are positive, indicating that higher levels of positive assessment of and co-operation with the police tend to go together. The exception is the coefficient of Effectiveness in the model for Co-operation; this is negative, indicating that individuals who feel that the police are effective in their work tend to be less likely to express willingness to co-operate with the police, at least after we control for the individuals’ assessment of Procedural fairness, Obligation to obey and Moral alignment with the police.

*Figure 4.2: Estimated regression coefficients in the structural model in Example 1, fitted to data on UK respondents.*

# Chapter 4: Structural Equation Models

### Example 2 on Structural equation modelling: The same model fitted separately for different countries

Fit the model considered in Example 1 separately for data from each of the countries in the ESS, and compare the results.

R commands:

As one illustration of the results from this analysis, Table 4.1 shows the estimated regression coefficients and their levels of significance for the structural model for Co-operation with the police. Because these models were estimated separately for each of the countries rather than in one multigroup model, the scales of the latent variables are not here fixed to be comparable. The exact values of the regression coefficients can thus not be compared between the countries. However, we can compare the qualitative patterns of the results, for example the signs and levels of significance of the same coefficient in different countries.

There is a fair amount of variation in the levels of significance of the coefficients, in that each of the explanatory factors is a significant predictor of willingness to co-operate in many countries, but none of them in all countries. The variable which is significant in the largest number of countries is the person’s trust in the procedural fairness of the police. The directions of the associations are consistent in the sense that where a coefficient is significant it has the same sign in all the countries, with the one exception of trust in the effectiveness of the police which has a significantly negative coefficient in some countries and a significantly positive one in others.

*Table 4.1: Estimated coefficients for the structural model for the factor on Willingness to co-operate with the police, estimated as part of the structural equation model shown in Figure 4.1, separately for each country in the ESS.*

Coefficients in the model for Co-operation: |
||||
---|---|---|---|---|

Country | Effectiveness | Procedural fairness |
Obligation to obey |
Moral alignment |

Belgium (BE) | -0.005 | 0.089 | 0.007 | 0.047 |

Bulgaria (BG) | -0.028 | 0.148** | 0.024 | 0.075** |

Switzerland (CH) | -0.078 | 0.197** | 0.004 | 0.028 |

Cyprus (CY) | -0.124* | 0.249*** | -0.050 | -0.063 |

Czech Republic (CZ) | 0.012 | 0.123** | 0.023 | 0.086** |

Germany (DE) | 0.030 | 0.130*** | -0.001 | 0.053* |

Denmark (DK) | -0.126** | 0.123* | 0.123*** | 0.063 |

Estonia (EE) | -0.091 | 0.243*** | 0.054 | 0.102** |

Spain (ES) | -0.040 | 0.141* | 0.046 | -0.021 |

Finland (FI) | -0.125** | 0.137** | 0.144*** | 0.098** |

France (FR) | -0.163** | 0.174** | 0.071* | 0.051 |

United Kingdom (GB) | -0.089* | 0.222*** | 0.106*** | -0.022 |

Greece (GR) | 0.091* | 0.074 | -0.018 | 0.012 |

Croatia (HR) | 0.066 | 0.138* | -0.019 | -0.034 |

Hungary (HU) | -0.049 | 0.073 | 0.113*** | 0.090* |

Ireland (IE) | 0.010 | 0.126** | 0.013 | 0.124*** |

Israel (IL) | -0.045 | 0.100* | 0.137*** | 0.008 |

Lithuania (LT) | 0.107* | -0.007 | 0.016 | 0.039 |

Netherlands (NL) | -0.049 | 0.135** | 0.104*** | -0.005 |

Norway (NO) | -0.115** | 0.116 | 0.050 | 0.130** |

Poland (PL) | -0.094* | 0.094 | 0.027 | 0.047 |

Portugal (PT) | -0.029 | 0.023 | 0.005 | -0.012 |

Russia (RU) | 0.063 | 0.094* | 0.016 | 0.022 |

Sweden (SE) | -0.088* | 0.166* | 0.077* | 0.047 |

Slovenia (SI) | 0.137** | 0.027 | -0.100** | -0.025 |

Slovakia (SK) | -0.004 | 0.248*** | 0.023 | -0.054 |

Ukraine (UA) | -0.052 | 0.106* | -0.023 | 0.096** |

*Note: ***: p<0.001; **: p<0.01; *:p<0.05*

# Chapter 4: Structural Equation Models

### Example 3 on Structural equation modelling: A multigroup model

Fit the structural equation model considered in the Examples 2 and 3, but now as a multigroup model fitted to the data from Denmark, Norway and Sweden (we limit the analysis to these countries to keep the example simple). Using likelihood ratio tests, examine if the regression coefficients in the structural model vary significantly between these countries.

The command and output files for Stata and R show examples of how these multigroup models can be fitted and likelihood ratio tests can be used to test cross-country constraints of equality for the regression coefficients. In all of these models we specify full cross-national invariance of measurement for all of the indicators of the factors. In the structural models, we allow free cross-national variation in the parameters of the distributions of the exogenous factors (Effectiveness and Procedural fairness) and in all the intercept terms and the residual variances and residual covariances of the models for the endogenous factors.

Consider for example the comparison between the most flexible and the most restrictive structural models here, that is the model where all of the regression coefficients vary between the countries, and the model where none of them do. The likelihood ratio test between these models has a p-value of p=0.56, which is not significant at any conventional level of significance. This means that we do not reject the null hypothesis that each of the coefficients are equal across the countries, i.e. that the associations between the latent factors in this model are of equal strengths in each of Denmark, Norway and Sweden. The common estimated coefficients from this model are shown in Figure 4.3. They are all statistically significant and all positive in sign, except that the coefficient of Effectiveness in the model for Co-operation is negative, as it was also for several other countries in Example 2 of this chapter.

*Figure 4.3: Estimated regression coefficients in the structural model in Example 3, fitted to data on respondents from Denmark, Norway and Sweden, and with these coefficients constrained to be equal across the countries.*

- [Bol89] Bollen, K. A. (1989). Structural equations with latent variables. Wiley.