# Chapter 4: Structural Equation Models

### Specification of a SEM

In conventional structural equation modelling – and in standard SEM software – all the variables in the structural model are treated as continuous and normally distributed, and all models for response variables are specified as linear regression models. (The only exception to these assumptions is that any observed variables which are used only as explanatory variables, such as age and gender of a respondent, can be of any type, just like in standard regression modelling.) In principle this is not essential, as the structure and ideas of SEMs can be generalized, for example to cases where some of the variables in the model are treated as categorical. Doing so would involve replacing assumptions of normal distributions and linear models with other assumptions where necessary. Some examples of doing so are discussed in Chapter 5 of this module. In practice, however, the use of such models is constrained by what can be conveniently implemented with current computer software. This is more limited for more general models than it is for the linear SEMs for continuous variables which are the topic of this chapter.

In a conventional linear SEM, all the measurement models are factor analysis models, so they are specified in the way that was described in Chapter 2. Distributions of exogenous latent variables are also specified as in factor analysis, as multivariate normal distributions. So the only new element which remains to be described here are regression models for endogenous variables. These are all specified as linear regression models.

To illustrate this model specification in a specific situation, consider the example shown in Figure 4.1. Let *η*_{1} and *η*_{2} denote the factors *Effectiveness* and *Procedural Fairness* respectively, and *η*_{3}, *η*_{4}, and *η*_{5} *Obligation to Obey, Moral Alignment* and *Co-operation* respectively (often different Greek letters are used for exogenous and endogenous latent variables, but we do not do that here). The joint distribution of *η*_{1} and *η*_{2} is specified as a bivariate normal distribution, as in factor analysis. The models for the other latent factors are specified as:

*η*_{3} = *α*_{3} + *β*_{31}*η*_{1} + *β*_{32}*η*_{2} + *ζ*_{3,}

*η*_{4} = *α*_{4} + *β*_{41}*η*_{1} + *β*_{42}*η*_{2} + *ζ*_{4,} and

*η*_{5} = *α*_{5} + *β*_{51}*η*_{1} + *β*_{52}*η*_{2} + *β*_{53}*η*_{3} + *β*_{54}*η*_{4} + *ζ*_{4}.

Here the *ζ*'s are normally distributed random residuals, each with mean 0 and variances var(*ζ*_{1}) = *ψ*_{1}, var(*ζ*_{2}) = *ψ*_{2} and var(*ζ*_{3}) = *ψ*_{3}. Each *ζ* is assumed to be uncorrelated with the explanatory variables in the model where it appears, and all the *ζ*'s are uncorrelated with all the measurement errors (ε's) in the measurement models for the *η*'s. The residuals for *η*_{3} and *η*_{4} are here assumed to be correlated, with covariance cov(*ζ*_{3},*ζ*_{4})= *ψ*_{34} (this corresponds to the two-headed arrow between Obey and ProcFair in Figure 4.1), but both are assumed to be uncorrelated with *ζ*_{5}.

These models for the endogenous *η*'s are linear regression models. Their parameters are the intercepts (*α*'s), residual variances and covariances (*ψ*'s), and regression coefficients (*β*'s). The regression coefficients are usually the parameters of main interest. They are interpreted in the same way as in standard linear regression modelling, as partial associations between explanatory variables and response variables. For example, here *β*_{51} can be interpreted as the expected change in Co-operation (in its units of measurement) that is associated with a one-unit increase in *effectiveness*, controlling for the other three latent factors. Note also that each *β* describes the strength of one regression path (one-headed arrow) in the path diagram for the model, and a value of 0 for a *β* would be indicated by omitting that path from the diagram.