# Correction for measurement errors in causal models with complex concepts

Social scientists often prefer to work with complex concepts measured by several variables instead of simple concepts that can be measured by just a direct question. For these complex concepts, a composite score or index is calculated on the basis of questions that are seen as indicators for the concept. A composite score is calculated based on multiple variables in order to form a reliable and valid measure of a latent, theoretical construct.

In this chapter, we will show that the quality of these composite scores is not perfect, but that we can also derive their quality and use this information in the analysis, in the same way as shown in the previous chapters.

Composite scores can be calculated in many different ways, but the most common procedures are weighted or unweighted sums or means. We are going to apply this to the variables used in the previous chapters in which we used three evaluative variables to measure satisfaction with democracy in a country. These variables are: Free, Critic and Equal. We define a new concept ‘Democracy level’ that is based on these three observed variables. In order to obtain a score for this new variable, we could add up the scores of these three variables, with some weights if they are known, or use the mean of the scores of these three variables. For our analysis, we will use the simple sum score to build the new variable. So,

 Democracy level = Free + Critic + Equal equation 7.1

In this way, we can simplify the causal model from the last chapter by substituting just one variable for the three variables measuring evaluation of democracy in a country, the composite score (CS): Democracy Level (Demlevel). The new model is presented in Figure 7.1.

Figure 7.1: The simplified causal model with the democracy level composite score substituted for the three variables for the evaluation of the democracy

To continue, we will compute the scores of the new variable for all respondents in the UK. This procedure is illustrated in the links below using SPSS and Stata. It should be noted that the same results could have been obtained using any other statistical package.

1. In the following link, you will find the dataset ‘CME data – ESSround 6’.
Open this dataset in SPSS:1

GET FILE='C:\...\CME data_ESSround6_F1.sav'.
2. In order to create a composite score from the three variables Free, Critic and Equal, we have to select the option ‘Compute variable’ from the SPSS heading ‘Transform’. In this screen, we have to give a name to the new variable in the box ‘Target variable’. We will call it ‘demlevel’. Furthermore, by clicking the option ‘Type and label’, we can label the variable as ‘Democracy level’ and mark the numeric type. Next, click Continue and add the numeric expression that will create this new variable. In this case, we will add the following notation: fairelcc + oppcrgvc + cttresac. Click OK and the variable will be created.
COMPUTE demlevel=fairelcc + oppcrgvc + cttresac.
VARIABLE LABELS demlevel 'Democracy level'.
EXECUTE.
3. Save this dataset under the name ‘CME data_ESSround6_withDemlevel’, because we are going to use it in the next steps.
SAVE OUTFILE='C:\...\CME data_ESSround6_withDemlevel.sav'
/COMPRESSED.

1. In the following link, you will find the dataset ‘CME data – ESSround 6’
Open this dataset in Stata:2

use "C:\...\CME data_ESSround6_F1.dta", clear
2. Performing some tabulations of our variables of interest before computing the composite score, we see that the variables Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta] have Refusal and Don’t know values that should be assigned to system missing. This can be done using the command ‘mvdecode’.
mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(99)
mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(88)
mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(77)
3. In order to create a composite score for the three variables Free, Critic and Equal, we have to use the Stata command ‘generate’ or ‘gen’. With this command, you can add any numeric expression to formulate the new variable. In this case, we will specify that the new variable ‘demlevel’ is the result of the sum of the variables Free [fairelcc], Critic [oppcrgvc] and Equal [cttresac]. In order to ensure that we have a homogeneous dataset, we can label the variable [demlevel] ‘Democracy level’ using the command ‘label variable’.
gen demlevel = fairelcc + oppcrgvc + cttresac
label variable demlevel "Democracy level"
4. Save this dataset under the name ‘CME data_ESSround6_withDemlevel’, because we are going to use it in the next steps. To do so, use the command ‘save’.
save "C:\...\CME data_ESSround6_withDemlevel.dta", replace

Having created the composite score [demlevel] for the concept ‘Democracy level’, we can again now use any statistical program to compute the correlation matrix for the four variables of the model specified in Figure 7.1. The correlation results are presented in Table 7.1. If you are interested in reproducing the table using SPSS or Stata, we suggest taking a look at the following link.

1. Use the dataset you created above: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in SPSS:3
GET FILE='C:\...\CME data_ESSround6_withDemlevel.sav'.
2. First, select the cases under study in our analysis. They concern the whole British population. Therefore, in SPSS from the Data heading, select ‘Select Cases…’. To limit the analysis to Great Britain, choose ‘If condition is satisfied’, select the variable ‘Country’ and insert the following notation: cntry = ‘GB’.
COMPUTE filter_\$=(cntry="GB").
VARIABLE LABELS filter_\$ 'cntry="GB" (FILTER)'.
VALUE LABELS filter_\$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.
3. The ESS suggests that the design weights also have to be taken into account in order to correct for specific characteristics of the sampling that may bias the results. In SPSS, under the Data heading, you have to weight the cases using design weights. Select ‘Weight cases’ and weight the cases by the variable ‘Design weight [dweight]’.
WEIGHT BY dweight.
4. To obtain the correlation matrix, choose ‘Correlate’ from Analyze and then click ‘Bivariate…’. From the list, select the variables in the following order: Satdem [stfdem], Demlevel [demlevel], LRplace [lrscale] and Inc [hinctnta]. Once the variables are selected, from Options, choose the option ‘Exclude cases listwise’ to obtain the results for the same cases in the sample.
CORRELATIONS
/VARIABLES=stfdem demlevel lrscale hinctnta
/PRINT=TWOTAIL NOSIG
/MISSING=LISTWISE.
5. This procedure should lead to the result in the following table:

1. Use the dataset you created above: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in Stata:4
use "C:\...\CME data_ESSround6_withDemlevel.dta", clear
2. Select the cases under study. They concern the whole British population. Therefore, in Stata we can use the command ‘keep if’ to indicate that we will keep all observations that, for the variable ‘Country (cntry)’, have the value ‘GB’.
keep if cntry=="GB"
3. To obtain the correlation matrix in Stata, we have used the command ‘pwcorr’. With this command, select the four variables under analysis in the following order: Satdem [stfdem], Demlevel [demlevel], LRplace [lrscale] and Inc [hinctnta]. Here, the design weights have been applied using the command ‘aweight’. Furthermore, we also added the options ‘listwise’ to provide the correlations for the same cases in the sample.
pwcorr stfdem fairelcc oppcrgvc cttresac lrscale hinctnta
[aweight=dweight], listwise

This procedure should lead to the result in the following table:

Table 7.1: The correlation matrix for the variables in Figure 7.1 of ESS Round 65 for Great Britain corrected by design weights (n=1468)6.

Looking at the correlation matrix obtained for the composite score model, we can now see that the common method effects are reduced to the variables Satdem and LRplace. In the next section, we will therefore focus on the correction of the matrix diagonal using the quality estimates. Thus, in this case, the correction of the correlation is reduced to a one-step procedure.

• [1] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
• [2] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
• [3] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
• [4] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
• [5] ESS Round 6: European Social Survey Round 6 Data (2012). Data file edition 2.0. Norwegian Social Science Data Services, Norway – Data Archive and distributor of ESS data.
• [6] SPSS adjusts the sample size on the basis of the design weights. Their adjusted sample size is 1424. However, for our illustration, we will stick to the original sample of 1468, which is the actual number of people that answered the questions.
Page 1

# Derivation of the quality of complex concepts

In Chapter 3, we have shown how the quality of the variables Satdem, LRplace and Inc were predicted using the program SQP 2.0. Here, we will show that we can also determine the quality of the composite score (CS) for the variable ‘Democracy level’ on the basis of the quality of the three indicator variables: Free, Critic and Equal.

The quality of a variable can be defined as the ratio between the systematic variance of the variable and the total variance of the variable or 1 – (error variance / total variance). For single questions, the quality can be predicted by SQP. However, SQP is not able to predict the quality of composite scores. In this case, the standard definition can be adjusted as follows:

 Quality of CS = 1 – (var(ecs)/ var(CS)) equation 7.2

The total variance of the CS can be obtained directly from the computed composite score while the error variance is equal to:

 var(ecs) = Σwk2 var(ek) + 2Σwkwk' cov(ekek') over k and k≠k' equation 7.3

where var(ei) = the error variance in yi and can be estimated as:

 var(ei)= (1-qi2)var(yi) equation 7.4

while cov(eiej) can be estimated as:

 cov(eiej) = cmvij • sisj = (rimimjrj)(sisj) for the variables yi and yj equation 7.5

These equations show that we can obtain an estimate of the quality of a composite score from the estimation of the quality of the single questions and the cmv.

Furthermore, we can also calculate the common method variance between the composite score and another variable, using the following equation:

 cmv(yi,ycs) = ri miΣ[(wk/σcs)mk rk] equation 7.6

For the example used in this illustration, the composite score Demlevel does not share a method with any other variable. Here, we will therefore focus on the correction of the correlation based on the computation of the quality of the composite score. However, it should be noted that this equation should be used in cases where the composite score shares a common method with another variable.

The quality of the questions needed to compute the quality of the composite score was given in Chapter 3. Furthermore, we need the standard deviations and variances of the observed variables. Table 7.2 provides the means, standard deviations (s) and variances (s2) of the different variables. The variances are the square of the standard deviations. If you are interested in reproducing the descriptive statistics results presented in Table 7.2 using SPSS or Stata, you can follow the steps described in the following links, using a dataset especially prepared for this module. It should be noted that the same results could have been obtained using any other statistical package.

1. Use the dataset you created above in the first page of Chapter 7: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in SPSS:1
GET FILE='C:\...\CME data_ESSround6_withDemlevel.sav'.
2. First, select the cases under study in our analysis. They concern the whole British population. Therefore, from Data in SPSS, select ‘Select Cases…’. To limit the analysis to Great Britain, choose ‘If condition is satisfied’, select the variable ‘Country’ and insert the following notation: cntry = ‘GB’.
COMPUTE filter_\$=(cntry="GB").
VARIABLE LABELS filter_\$ 'cntry="GB" (FILTER)'.
VALUE LABELS filter_\$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.
3. Under Data, you have to weight the cases using design weights. Select ‘Weight cases’ and weight the cases by the variable ‘Design weight [dweight]’.
WEIGHT BY dweight.
4. To obtain the descriptive statistics, choose ‘Correlate’ from Analyze, and then click ‘Bivariate…’. From the list, select the variables in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale], Inc [hinctnta] and Demlevel [demlevel]. Once the variables are selected, in Options choose ‘Mean and standard deviations’ to obtain the statistics of those variables. Also choose the option ‘Exclude cases listwise’ to obtain the results for the same cases in the sample.2
CORRELATIONS
/VARIABLES=stfdem fairelcc oppcrgvc cttresac lrscale hinctnta demlevel
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=LISTWISE.

1. Use the dataset you created above in the first page of Chapter 7: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in Stata:3
use "C:\...\CME data_ESSround6_withDemlevel.dta", clear
2. Select the cases under study. They concern the whole British population. Therefore, in Stata we can use the command ‘keep if’ to indicate that we will keep all observations that, for the variable ‘Country (cntry)’, have the value ‘GB’.
keep if cntry=="GB"
3. To obtain the descriptive statistics in Stata, we have used the command ‘corr’. It would be useful to compare these results with the ones obtained in Table 4.2. With this command, select the seven variables under analysis in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale], Inc [hinctnta] and Demlevel [demlevel]. Here, the design weights have been applied using the command ‘aweight’. Furthermore, in order to obtain the descriptive statistics for these variables, we add the command ‘means’.
corr stfdem fairelcc oppcrgvc cttresac lrscale hinctnta demlevel
[aweight=dweight], means

The results obtained are summarized in the following table:

Table 7.2: The ESS Round 64 descriptive statistics (weighted) of the variables for Great Britain (n=1468)5.

The first step is to compute the quality of the composite score, which will be used in the correlation matrix to correct the variance in the variable Demlevel. As we already have the quality predictions for all the other variables in Chapter 3, it should be noted that, here, we are just going to illustrate how we will derive the quality of the variable Demlevel, and that all the other values in the correlation matrix remain the same. In order to compute the quality of the composite score as presented in equation 7.2, we need to first compute the variance of the error of the composite score (see equation 7.3) based on the results obtained from the computation of the error variances (var(ei)) and covariances cov(eiej) of the variable Democracy level (Demlevel).To do so, the only information we need is that provided in Chapter 3 Table 3.3, where we obtained the quality estimates of the three indicators of democracy level, and the information from the above table, where we have presented their variances.

Table 7.3 presents the information required to compute the error variances, var(ei), for the variables that measure the democracy level (see equation 7.4).

Table 7.3: The error variances for the three observed variables for democracy level

As stated above, we also need to calculate the covariances, cov(eiej), for these three variables as seen in equation 7.5. This is done in Table 7.4. Because the cmv was the value for the standardized variables the covariance is obtained by multiplying the cmv by the standard deviations of the involved variables.

Table 7.4: The computation of the covariances of the errors of the indicators for democracy level

Now that we have all components of equation 7.3, we can compute the variance of the errors of the unweighted composite score for democracy level, which is:

var(edem) = Σwk2 var(ek) + 2Σwkwk'cov(ekek')
= (1.261 + 1.499 + 3.025) + 2 * (0.516 + 0.723 + 0.846)
= 9.955

Finally, the quality of the composite score of democracy level is computed using equation 7.2:

Qdem = 1 – (var(edem)/ vardem)
= 1 – (9.955 /24.480)
= 0.5933

The strength of the relationship between the composite score [Demlevel] of the variables Free, Critic and Equal and the variable of interest, Democracy level, is 0.59, which means that 59% of the observed variance in the composite score comes from the variables of interest and that 41% of the variance is error.

Before continuing to the next step, which is correction for measurement errors in a composite score model, we first need to correct the correlations.

We start with the correction of the correlation matrix presented in Table 7.1 above. As in Chapter 4, we first correct for the common method variance. In this case that is only expected for the correlations between Satdem and LRplace. From Table 4.4, we know that this correlation should be reduced by 0.076, which is the cmv of the variables Satdem and Lrplace. This will result in a reduced correlation of 0.112 after correction for the common method variance of these variables. Now, in Table 7.5 we only have to substitute the quality estimates for the variances of all standardized variables (1.00). The quality estimates for the variables Satdem, LRplace and Inc were obtained in Table 3.3 using SQP. Following this, the covariance matrix is obtained for the analysis with correction for measurement errors.

Table 7.5: The correlation matrix for the variables of Figure 7.1 of ESS Round 6 for Great Britain corrected for measurement errors.

• [1] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
• [2] We used the Correlation feature to obtain the results presented in the table, as it allows cases to be excluded listwise, an option that is not available in Descriptives in SPSS.
• [3] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
• [4] ESS Round 6: European Social Survey Round 6 Data (2012). Data file edition 2.0. Norwegian Social Science Data Services, Norway – Data Archive and distributor of ESS data.
• [5] SPSS adjusts the sample size on the basis of the design weights. Their adjusted sample size is 1424. However, for our illustration, we will stick to the original sample of 1468, which is the actual number of people that answered the questions.
Page 2

# Estimation of the causal model with complex concepts

Now that we have estimates of the quality of all the variables in the model, including the composite score for democracy level, the estimation of the causal model with correction for measurement errors is exactly the same as for the model with variables based on single questions. Thus, we are ready to estimate the parameters of the causal model with the composite score correcting for measurement errors.

Below, we will illustrate how to run the causal model for the composite score specified before in Figure 7.1 using both LISREL1 and Stata2. As both programs produce very similar results, please select which program you want to continue the analysis with:

In Figure 7.2, all effects have been indicated using the symbols from LISREL. The beta (be) represents the effect of the composite score (i.e. Demlevel) on satisfaction with democracy (i.e. Satdem). Similarly, the gammas (ga) represent the effects of the control variables (i.e. LRplace and Inc) on the explained variables in the model (i.e. Satdem and Demlevel). For example, ga(1,1) indicates the effect of the control variable left-right placement on satisfaction with democracy, while ga(2,2) indicates the effect of the other control variable, income, on the composite score variable, democratic level. The effect of the variable Inc on Satdem is specified by a dashed line because it represents an effect that has been omitted because it was not significant in the analysis with correction for measurement errors (Chapters 5 and 6).

Figure 7.2 The causal model for the evaluation of democracy by a composite score in LISREL notation

The variances in the disturbances of the explained variables are denoted as ps(1,1) and ps(2,2). For the details of the procedure, we refer to the LISREL manual [Jör96] and introductions to the program LISREL [Sar84]. First, the LISREL input for this analysis without corrections is presented in Syntax 7.1. Next, we present the same input corrected for measurement errors (see Syntax 7.2).

Syntax 7.1: The LISREL syntax for the estimation of the parameters of the causal model including a composite score without correction for measurement errors
Complex analysis without correction for measurement errors !Title
data ni=4 no=1468 ma=km !ni=number of variables no=number of observations ma=matrix
km !km=correlation matrix
1.00
.429 1.00
.188 .086 1.00
.163 .190 .009 1.00
labels
satdem demlevel lrplace inc !Labels of the variables
model ny=2 nx=2 be=fu,fi ga=fu,fi ps=sy,fi !Causal model ny=dependent variables nx=control variables
free be(1,2) !free=coefficients to be estimated
free ga(2,1) ga(2,2) ga(1,1)
free ps(1,1) ps(2,2)
pd !To obtain a path diagram
out nd=3 !out= output nd=number of decimals

Syntax 7.2: The LISREL syntax for the estimation of the parameters of the causal model including a composite score with correction for measurement errors
Complex analysis with correction for measurement errors
data ni=4 no=1468 ma=km
km
.710 !The covariance matrix corrected for measurement errors
.429 .590
.112 .086 .682
.163 .190 .009 .624
labels
satdem demlevel lrplace inc
model ny=2 nx=2 be=fu,fi ga=fu,fi ps=sy,fi
free be(1,2)
free ga(2,1) ga(2,2) ga(1,1)
free ps(1,1) ps(2,2)
pd
out nd=3

The most important point is that the coefficients that have to be estimated are presented in the lines starting with ‘free’. Comparing these two inputs, we see that only the matrix with the data to be analysed has been changed. Focusing on the input for the model with correction for measurement errors, the effects will be estimated on the basis of the covariance matrix in Table 7.5 (i.e. the matrix with the correlations corrected for cmv and with the qualities on the diagonal). Because we ask in the data line that the matrix to be analysed (ma) should be the correlation matrix (km), Table 7.5 is transformed by the program into the correlation matrix corrected for measurement errors, Table 7.6.

Table 7.6: Correlations corrected for measurement errors using LISREL

The nice feature of this approach, correcting the correlations for measurement errors before estimating the effects, is that the input for the analysis is exactly the same with and without correction for measurement errors, except for the matrix of correlations that is used in the analysis. This point is illustrated in the input for the analyses with and without correction for measurement errors presented in Syntaxes 7.1 and 7.2.

It is important in the estimation of causal models to test whether the model fits to the data, i.e. that the model is not misspecified. Without going into detail, see [Sar09], we can say that the model fits very well to the data corrected for measurement errors. So there is no reason to change the model.

However, analysing the matrix without correction for measurement errors, the program indicates that the fit of the model is not good. This suggests that the effect, ga(1,2), of the control variable Income on satisfaction with democracy has to be introduced in the model. If we do so, this model also fits well to the data and we obtain the results presented in Table 7.7.

Table 7.7: The LISREL results of the estimation of the causal model including the complex concept presented in Figure 7.1 with and without corrections

Comparing Syntaxes 7.1 and 7.2, we can observe that all effects have been indicated using the Stata notation [Aco13]. Comparing the two inputs, we see that only the matrix with the data to be analysed has been changed. Focusing on the input for the model with correction for measurement errors, the effects will be estimated on the basis of the covariance matrix in Table 7.5 (i.e. the matrix with the correlations corrected for cmv and with the qualities on the diagonal).

Syntax 7.1: The Stata syntax for the estimation of the parameters of the causal model including a complex concept without correction for measurement errors
*Complex analysis without correction for measurement errors
clear all
ssd init satdem demlevel lrplace inc /*variables*/
ssd set observations 1468 /*observations*/

*Correlation matrix
#delimit;
ssd set correlations
1.00\
.429 1.00\
.188 .086 1.00\
.163 .190 .009 1.00;
#delimit cr
save ssdmatrix.dat, replace

*Causal model with complex concept
clear
use ssdmatrix.dat
ssd list
sem (satdem <- demlevel lrplace) ///
(demlevel<- lrplace inc), ///
standardized
estat eqgof /*Equation-level goodness of fit*/

Syntax 7.2: The Stata syntax for the estimation of the parameters of the causal model including a complex concept with correction for measurement errors
*Complex analysis with correction for measurement errors
clear all
ssd init satdem demlevel lrplace inc
ssd set observations 1468

*Covariance matrix
#delimit ;
ssd set covariance /*The correlation matrix corrected for measurement errors*/
.710\
.429 .590\
.112 .086 .682\
.163 .190 .009 .624;
#delimit cr
save ssdmatrix.dat, replace

*Causal model with complex concept
clear
use ssdmatrix.dat
ssd list
sem (satdem <- demlevel lrplace) ///
(demlevel<- lrplace inc), ///
standardized
estat eqgof /*Equation-level goodness of fit*/

The nice feature of this approach, correcting the correlations for measurement errors before estimating the effects, is that the input for the analysis is exactly the same with and without correction for measurement errors, except for the matrix of correlations that is used in the analysis. This point is illustrated in the input for the analyses with and without correction for measurement errors presented in Syntaxes 7.1 and 7.2.

However, analysing the matrix without correction for measurement errors, the program indicates that the fit of the model is not good. This suggests that the effect of the control variable income (Inc) on satisfaction with democracy has to be introduced in the model. If we do so, this model also fits well to the data and we get the results presented in Table 7.8.

Table 7.8: The Stata results of the estimation of the causal model including a complex concept presented in Figure 7.1 with and without corrections

If we compare the results with and without correction for measurement errors, we see first of all that the model is different. After correction for measurement errors, the effects of the control variable Inc on satisfaction with the democracy is not significantly different from zero, while, without correction for measurement errors, this effect is necessary to achieve a good fit of the model. In the latter case, we say that this variable has a direct effect on satisfaction with democracy, while in the former analysis we have to conclude that there is no direct effect, only an indirect effect.

Figure 7.3: The parameter estimates of the causal model with a complex concept of evaluation of democracy without correction for measurement errors

Figure 7.4:The parameter estimates of the causal model with a complex concept of evaluation of democracy with correction for measurement errors

Furthermore, comparing Figures 7.3 and 7.4, we see that, after correction for measurement errors, nearly all other effects are much bigger than without correction for errors. All significant effects are indicated in the figures by an asterisk (*). In this example, we see that we now have only one variable out of the three indicators of democracy level. The effect of this variable increases much more than before because the effect is not reduced by the correlations between these three indicators, i.e. the variable is now alone. Furthermore, without corrections, the explained variance in the variable Satdem is 21.4%, while, after corrections, it increases to 44.4%.3 This example again illustrates how different the results can be if measurement errors are corrected.

## Exercise 7.1

Compute the corrected correlation matrix for the variables introduced in exercise 3.1 using the composite score of the variables Economy and Culture as represented in the figure below. The composite score ‘Country threats’ is created as a simple sum:

Country threats (CS) = Economic threat + Cultural threat

The model to be estimated is:

Causal model for attitudes towards immigration with a composite score

Below, the correlation matrix is provided without corrections, together with the quality predictions obtained in exercise 3.1 using SQP and the descriptive statistics used in exercise 6.1 (adding the composite score descriptives).

Use all this information to correct the correlation matrix for measurement errors.

In order to correct the correlation of a composite score, we need to first calculate the predicted quality. This cannot be done by SQP. The alternative is to use the following formula:

Quality of CS = 1 – (var(ecs)/ var(CS))
where: var(ecs) = Σwk2 var(ek) + 2Σwkwk' cov(ekek') over k where k≠k'
var(ei) = (1-qi2)var(yi)
cov(eiej) = cmvij • sisj = (rimimjrj)(sisj)

Using the quality predictions obtained in exercise 3.1, we can derive the quality of the composite score (Threats). To do so, we need to first compute the variance in the error of the composite score using the quality estimates of the variables Economy and Culture and the information from the table above, where we have presented their variances. Computing var(ei) and cov(eiej), we get:

var(eB38) = (1 – qB382)var(B38) = (1-0.702) * 3.640 = 1.085
var(eB39) = (1 – qB392)var(B39) = (1-0.641) * 3.771 = 1.354
cov(eB38eB39) = (cmvB38,B39)(sB38sB39) = rB38mB38mB39rB39 * sB38sB39 =
= (0.896*0.354*0.418*0.881) * (1.908*1.942) = 0.433

Now we have all the components required to compute the variance in the errors of the composite score for the variable country threats, which is:

var(ethreats) = Σwk2 var(ek) + 2Σwkwk' cov(ekek')
= (1.085 + 1.354) + 2 * (0.433) = 3.305

Finally, the quality of the composite score of country threats can be computed as follows:

Qthreats = 1 – (var(ethreats)/ varthreats)
= 1 – (3.305 /11.343) = 0.709

In this case, the common method variance between the composite score (Threats) and the variable (Allow) also has to be taken into account. In this chapter , we have presented the formula for computing the cmv for this pair of variables:

cmvB40,threats = rB40 mB40 [(1/σthreats) mB38rB38 + (1/σthreats)rB39mB39] =
= 0.853 * 0.349 [(1/3.368)0.354 * 0.896 + (1/3.368)0.881 * 0.418] = 0.061

As before, the common method variance has to be subtracted from the correlation between these two variables affected by the same method. So:

corr(B40,Threats) = 0.624 – 0.061 = 0.563

To conclude, we just have to change on the diagonal the variances in the variables Allow and Better for the qualities obtained in SQP. The variance in the variable Threats has to be substituted by the quality, and the correlation between Better and Threats has to be corrected as indicated above. This will result in the correlation matrix corrected for measurement errors.

Solution

## Exercise 7.2

Taking into account the results obtained in exercise 7.1, run the estimation of the same causal model with the complex concept. The correlation matrices with and without corrections obtained before are reproduced below. Use this information to compute, either in LISREL or Stata, the results for the analysis of the causal explanation of the opinion about immigration by people from outside Europe to the Netherlands, with and without correction for measurement errors.

1. LISREL syntax for the estimation of the causal model without corrections for measurement errors:
Causal model without correction for measurement errors
data ni=3 no=1801 ma=km
km
1.000
-0.351 1.000
-0.424 0.624 1.000
labels
impcntr imwbcnt threats
model ny=2 nx=1 be=fu,fi ga=fu,fi ps=sy,fi
free be(1,2)
free ga(2,1) ga(1,1)
free ps(1,1) ps(2,2)
pd
out nd=3
2. LISREL syntax for the estimation of the causal model with corrections for measurement errors:
Causal model with correction for measurement errors
data ni=3 no=1801 ma=km
km
0.763
-0.351 0.639
-0.424 0.563 0.709
labels
impcntr imwbcnt threats
model ny=2 nx=1 be=fu,fi ga=fu,fi ps=sy,fi
free be(1,2)
free ga(2,1) ga(1,1)
free ps(1,1) ps(2,2)
pd
out nd=3

1. Stata syntax for the estimation of the causal model without corrections for measurement errors:
*Causal model without correction for measurement errors
clear all
ssd init impcntr imwbcnt threat
ssd set observations 1801
*Correlation matrix
#delimit ;
ssd set correlation
1.000\
-0.351 1.000\
-0.424 0.624 1.000;
#delimit cr
save ssdmatrix.dat, replace
*Causal model
clear
use ssdmatrix.dat
ssd list
sem (impcntr <- imwbcnt) ///
(impcntr imwbcnt <- threat), ///
standardized
estat eqgof
2. Stata syntax for the estimation of the causal model with corrections for measurement errors:
*Causal model with correction for measurement errors
clear all
ssd init impcntr imwbcnt threat
ssd set observations 1801
*Covariance matrix
#delimit ;
ssd set covariance
0.763\
-0.351 0.639\
-0.424 0.563 0.709;
#delimit cr
save ssdmatrix.dat, replace
*Causal model
clear
use ssdmatrix.dat
ssd list
sem (impcntr <- imwbcnt) ///
(impcntr imwbcnt <- threat), ///
standardized
estat eqgof

The figure at the top presents the results for the model with the composite score Threats before corrections, while the lower figure presents the results of the model after corrections. In this case, we observe that the differences in the effects are considerable, i.e. one of the effects is no longer significant, while the other two are much larger. From the exercises, we can conclude that the differences are larger in this small model. It is obvious that the explained variance (R2) in this case again increases after correcting for measurement errors.

This exercise has shown once again how different the results can be if it is corrected for measurement errors. Furthermore, it has also shown that the correction for measurement errors can be done not only in models with simple concepts, but also in models with complex concepts.

• [1] The following illustration and results are based on the LISREL 8.7 software version: Jöreskog, K.G. & Sörbom, D. (2004). LISREL 8.7 for Windows [Computer software]. Skokie, IL: Scientific Software International, Inc.
• [2] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
• [3] The explained variance can be obtained in LISREL from the section Squared Multiple Correlations for Structural Equations (R2). Similarly, in Stata the command estat eqgof will show you the value of R2.
Page 3
• [Aco13] Acock, A. C. (2013). Discovering Structural Equation Modeling Using Stata, Revised Edition. Stata press.
• [Jör96] Jöreskog, K. G. and Sörbom, D. (1996). LISREL 8 User’s Reference Guide. Scientific Software International.
• [Sar09] Saris, W. E., Satorra, A. and Van der Veld, W. M. (2009). Testing Structural Equation Models or Detection of Misspecifications? Structural Equation Modeling: A Multidisciplinary Journal, 16 (4), 561-582.
• [Sar84] Saris, W. E. and Stronkhorst, L. H. (1984). Causal modelling in nonexperimental research: an introduction to the LISREL approach. Sociometric Research Foundation.