# Correction for measurement errors in causal models with complex concepts

Social scientists often prefer to work with complex concepts measured by several variables instead of simple concepts that can be measured by just a direct question. For these complex concepts, a composite score or index is calculated on the basis of questions that are seen as indicators for the concept. A composite score is calculated based on multiple variables in order to form a reliable and valid measure of a latent, theoretical construct.

In this chapter, we will show that the quality of these composite scores is not perfect, but that we can also derive their quality and use this information in the analysis, in the same way as shown in the previous chapters.

Composite scores can be calculated in many different ways, but the most common procedures are weighted or unweighted sums or means. We are going to apply this to the variables used in the previous chapters in which we used three evaluative variables to measure satisfaction with democracy in a country. These variables are: Free, Critic and Equal. We define a new concept ‘Democracy level’ that is based on these three observed variables. In order to obtain a score for this new variable, we could add up the scores of these three variables, with some weights if they are known, or use the mean of the scores of these three variables. For our analysis, we will use the simple sum score to build the new variable. So,

Democracy level = Free + Critic + Equal | equation 7.1 |

In this way, we can simplify the causal model from the last chapter by substituting just one variable for the three variables measuring evaluation of democracy in a country, the composite score (CS): Democracy Level (Demlevel). The new model is presented in Figure 7.1.

To continue, we will compute the scores of the new variable for all respondents in the UK. This procedure is illustrated in the links below using SPSS and Stata. It should be noted that the same results could have been obtained using any other statistical package.

Compute composite scores with SPSS

- In the following link, you will find the dataset ‘CME data – ESSround 6’.
Open this dataset in SPSS:1GET FILE='C:\...\CME data_ESSround6_F1.sav'.
- In order to create a composite score from the three variables Free, Critic and Equal, we have to select the option ‘Compute variable’ from the SPSS heading ‘Transform’. In this screen, we have to give a name to the new variable in the box ‘Target variable’. We will call it ‘demlevel’. Furthermore, by clicking the option ‘Type and label’, we can label the variable as ‘Democracy level’ and mark the numeric type. Next, click Continue and add the numeric expression that will create this new variable. In this case, we will add the following notation: fairelcc + oppcrgvc + cttresac. Click OK and the variable will be created.
COMPUTE demlevel=fairelcc + oppcrgvc + cttresac.VARIABLE LABELS demlevel 'Democracy level'.EXECUTE.
- Save this dataset under the name ‘CME data_ESSround6_withDemlevel’, because we are going to use it in the next steps.
SAVE OUTFILE='C:\...\CME data_ESSround6_withDemlevel.sav'/COMPRESSED.

Compute composite scores with Stata

- In the following link, you will find the dataset ‘CME data – ESSround 6’
Open this dataset in Stata:2use "C:\...\CME data_ESSround6_F1.dta", clear
- Performing some tabulations of our variables of interest before computing the composite score, we see that the variables Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta] have Refusal and Don’t know values that should be assigned to system missing. This can be done using the command ‘mvdecode’.
mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(99)mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(88)mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(77)
- In order to create a composite score for the three variables Free, Critic and Equal, we have to use the Stata command ‘generate’ or ‘gen’. With this command, you can add any numeric expression to formulate the new variable. In this case, we will specify that the new variable ‘demlevel’ is the result of the sum of the variables Free [fairelcc], Critic [oppcrgvc] and Equal [cttresac]. In order to ensure that we have a homogeneous dataset, we can label the variable [demlevel] ‘Democracy level’ using the command ‘label variable’.
gen demlevel = fairelcc + oppcrgvc + cttresaclabel variable demlevel "Democracy level"
- Save this dataset under the name ‘CME data_ESSround6_withDemlevel’, because we are going to use it in the next steps. To do so, use the command ‘save’.
save "C:\...\CME data_ESSround6_withDemlevel.dta", replace

Having created the composite score [demlevel] for the concept ‘Democracy level’, we can again now use any statistical program to compute the correlation matrix for the four variables of the model specified in Figure 7.1. The correlation results are presented in Table 7.1. If you are interested in reproducing the table using SPSS or Stata, we suggest taking a look at the following link.

- Use the dataset you created above: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in SPSS:3
GET FILE='C:\...\CME data_ESSround6_withDemlevel.sav'.
- First, select the cases under study in our analysis. They concern the whole British population. Therefore, in SPSS from the Data heading, select ‘Select Cases…’. To limit the analysis to Great Britain, choose ‘If condition is satisfied’, select the variable ‘Country’ and insert the following notation: cntry = ‘GB’.
COMPUTE filter_$=(cntry="GB").VARIABLE LABELS filter_$ 'cntry="GB" (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMATS filter_$ (f1.0).FILTER BY filter_$.EXECUTE.
- The ESS suggests that the design weights also have to be taken into account in order to correct for specific characteristics of the sampling that may bias the results. In SPSS, under the Data heading, you have to weight the cases using design weights. Select ‘Weight cases’ and weight the cases by the variable ‘Design weight [dweight]’.
WEIGHT BY dweight.
- To obtain the correlation matrix, choose ‘Correlate’ from Analyze and then click ‘Bivariate…’. From the list, select the variables in the following order: Satdem [stfdem], Demlevel [demlevel], LRplace [lrscale] and Inc [hinctnta]. Once the variables are selected, from Options, choose the option ‘Exclude cases listwise’ to obtain the results for the same cases in the sample.
CORRELATIONS/VARIABLES=stfdem demlevel lrscale hinctnta/PRINT=TWOTAIL NOSIG/MISSING=LISTWISE.

This procedure should lead to the result in the following table:

- Use the dataset you created above: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in Stata:4
use "C:\...\CME data_ESSround6_withDemlevel.dta", clear
- Select the cases under study. They concern the whole British population. Therefore, in Stata we can use the command ‘keep if’ to indicate that we will keep all observations that, for the variable ‘Country (cntry)’, have the value ‘GB’.
keep if cntry=="GB"
- To obtain the correlation matrix in Stata, we have used the command ‘pwcorr’. With this command, select the four variables under analysis in the following order: Satdem [stfdem], Demlevel [demlevel], LRplace [lrscale] and Inc [hinctnta]. Here, the design weights have been applied using the command ‘aweight’. Furthermore, we also added the options ‘listwise’ to provide the correlations for the same cases in the sample.
pwcorr stfdem fairelcc oppcrgvc cttresac lrscale hinctnta[aweight=dweight], listwise
This procedure should lead to the result in the following table:

Looking at the correlation matrix obtained for the composite score model, we can now see that the common method effects are reduced to the variables Satdem and LRplace. In the next section, we will therefore focus on the correction of the matrix diagonal using the quality estimates. Thus, in this case, the correction of the correlation is reduced to a one-step procedure.

#### Footnotes

- [1] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
- [2] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011.
**Stata Statistical Software: Release 12**. College Station, TX: StataCorp LP. - [3] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
- [4] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011.
**Stata Statistical Software: Release 12**. College Station, TX: StataCorp LP. - [5] ESS Round 6: European Social Survey Round 6 Data (2012). Data file edition 2.0. Norwegian Social Science Data Services, Norway – Data Archive and distributor of ESS data.
- [6] SPSS adjusts the sample size on the basis of the design weights. Their adjusted sample size is 1424. However, for our illustration, we will stick to the original sample of 1468, which is the actual number of people that answered the questions.