# All pages

# Correction of the correlations for measurement errors

In order to illustrate how to correct the correlations for measurement errors, we extend the model in Figure 2.2 to two variables of interest (f), for example ‘satisfaction with the government’ (f_{1}) and ‘satisfaction with the economy’ (f_{2}). The measurement model for two variables of interest is presented in Figure 4.1.

In this model it is assumed that:

- f
_{i}is the trait/factor i of interest measured by a direct question. - y
_{ij}is the observed variable (for trait i measured by method j). - t
_{ij}is the ‘true score’ of the response variable y_{ij}. - M
_{j}is the method factor that represents a specific reaction of respondents to a method and therefore generates a systematic error. - e
_{ij}is the random measurement error term for y_{ij}.

Furthermore, from Chapter 2 we already know that:

The r_{ij} coefficients represent the standardized effects of the true scores on the observed scores. This effect is smaller if the random errors are larger. This coefficient is called the **reliability coefficient**. **Reliability** is defined as the strength of the relationship between the observed response (y_{ij}) and the true score (t_{ij}), which is r_{ij}^{2}.

The v_{ij} coefficients represent the standardized effects of the variables of interest on the true scores for the observed variables that are really measured. Therefore, this coefficient is called the **validity coefficient**. **Validity** is defined as the strength of the relationship between the variable of interest (f_{i}) and the true score (t_{ij}), which is v_{ij}^{2}.

The m_{ij} coefficients represent the standardized effects of the method factor on the true scores, called the method effect. An increase in the method effect results in a decrease in validity and vice versa. It can be shown that, for this model, m_{ij}^{2} = 1 – v_{ij}^{2}, and the method effect is therefore equal to the invalidity due to the method used. The **systematic method effect** is the strength of the relationship between the method factor (M_{j}) and the true score (t_{ij}) resulting in m_{ij}^{2}. The contribution of the method to the correlations, called **common method variance** or **cmv**, is equal to r_{1j}m_{1j}m_{2j}r_{2j}.

The **total quality of a measure** is defined as the strength of the relationship between the observed variable and the variable of interest, that is (r_{ij}v_{ij})^{2}.

# The consequences for the correlations

The reason for employing these definitions and their criteria becomes evident when examining the effect of the characteristics of the measurement model on the correlations between observed variables.

It can be shown from Figure 4.1 that the correlation between the observed variables r(y_{1j},y_{2j}) is equal to the joint effect of the variables that we want to measure (f_{1} and f_{2}), plus the spurious correlation due to the method factor [Sar84] as demonstrated in equation 4.1.

r(y_{1j},y_{2j}) = r_{1j}v_{1j} r(f_{1},f_{2})v_{2j}r_{2j} + r_{1j}m_{1j}m_{2j}r_{2j} | equation 4.1 |

or

r(y_{1j},y_{2j}) = q_{1j} r(f_{1},f_{2})q_{2j} + cmv_{12} | equation 4.2 |

where q_{ij} = r_{ij}v_{ij} and cmv_{12} = r_{1j}m_{1j}m_{2j}r_{2j}

Note that the equations show that, in general, the observed correlation is only equal to the correlation between the variables of interest (correlation corrected for measurement error), i.e. r(y_{1j},y_{2j}) = r(f_{1},f_{2}), when reliability and validity are equal to 1 and, consequently the method effects are zero. However, a situation in which there are no random errors (r_{ij}=1) is very unlikely.

Note also that r_{ij} and v_{ij}, which are always smaller than 1, will decrease the correlation (see the first term in equation 4.1: r_{1j}v_{1j} r(f_{1},f_{2})v_{2j}r_{2j}) while the method effects, if they are not zero, can generate an increase in the correlation (see the second term in equation 4.1: r_{1j}m_{1j}m_{2j}r_{2j}). This result suggests that it is possible that the low correlations for Methods 1 and 3 in Table 1.3 are due to the lower reliability of Methods 1 and 3 compared to Method 2. However, it is also possible that the correlations of Method 2 are higher because of common method variance for this method.

Before we leave this subject, we would like to mention that from equation 4.2 it immediately follows that:

r(f_{1}f_{2}) = (r(y_{1}y_{2}) - cmv_{12})/q_{1}q_{2} | equation 4.3 |

This means that, if we know the quality of the measures and we know the common method variance (cmv), then we can also correct the observed correlation for measurement error and obtain the correlation corrected for measurement error (r(f_{1}f_{2})). This would be the solution to the correction for measurement errors because, if the correlations are corrected, they can be used for the estimation of the coefficients of the regression and the causal models, as will be illustrated in the next three chapters. So, the problem will be solved if we know the qualities and the cmv. In the last chapter, we have seen that this information is provided by SQP. Thus, the procedure for correcting for measurement errors has, in principle, been specified. In the next chapters, we will show how this can be done in an easy and efficient way.

# Correction for measurement errors in the correlation matrix

As we have shown above, there is a simple way to correct for measurement errors in the correlation coefficients. This approach is based on the equation presented before:

r(f_{i}f_{j}) = (r(y_{i}y_{j}) - cmv_{ij})/q_{i}q_{j} | equation 4.3 |

where q_{ij} = r_{ij}v_{ij} and cmv_{ij} = r_{ij}m_{ij}m_{ij}r_{ij}

So, correction for measurement error in the observed correlation is very simple if we know the quality of the observed variables. This result holds for single questions as well as for composite scores. The information we need to make these corrections is the observed correlations between the variables and the quality, reliability and validity coefficients. At the end, if the correlations are corrected for measurement errors, the relationships between the variables can be estimated as if there were no measurement errors. This will be illustrated in the following chapters. In this chapter, we will focus on correcting the correlation matrix.

The first step in correcting the correlation matrix would be to obtain the necessary estimates of quality, reliability and validity. This has already been illustrated in the previous chapter for the variables of interest in our example. The results obtained are reproduced in Table 4.1:

_{i}y

_{j})). The weighted results are presented in Table 4.2. If you are interested in reproducing the correlation results presented in this table using SPSS or Stata, you can follow the steps described in the following links, using a dataset especially prepared for this module. It should be noted that the same results could have been obtained using any other statistical package.

- In the following link, you will find the dataset ‘CME data – ESSround 6’.
Open this dataset in SPSS:1GET FILE='C:\...\CME data_ESSround6_F1.sav'.
- First select the cases under study in our analysis. They concern the whole British population. Therefore, from Data in SPSS, select ‘Select Cases…’. To limit the analysis to Great Britain, choose ‘If condition is satisfied’, select the variable ‘Country’ and insert the following notation: cntry = ‘GB’.
COMPUTE filter_$=(cntry="GB").VARIABLE LABELS filter_$ 'cntry="GB" (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMATS filter_$ (f1.0).FILTER BY filter_$.EXECUTE.
- The ESS suggests that it is also necessary to take into account the design weights in order to correct for specific characteristics of the sampling that may bias the results. In SPSS, under the Data heading, you have to weight the cases using design weights. Select ‘Weight cases’ and weight the cases by the variable ‘Design weight [dweight]’.
WEIGHT BY dweight.
- To obtain the correlation matrix, choose ‘Correlate’ from Analyze and then click ‘Bivariate…’. From the list, select the variables in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta]. Once the variables are selected, in Options choose the option ‘Exclude cases listwise’ to obtain all correlations for the same cases in the sample.
CORRELATIONS/VARIABLES=stfdem fairelcc oppcrgvc cttresac lrscale hinctnta/PRINT=TWOTAIL NOSIG/MISSING=LISTWISE.

- In the following link, you will find the dataset ‘CME data – ESSround 6’
Open this dataset in Stata:2use "C:\...\CME data_ESSround6_F1.dta", clear
- Carrying out some tabulations of our variables of interest before asking for the correlation, we see that the variables Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta] have Refusal and Don’t know values, which should be assigned to system missing. This can be done using the command ‘mvdecode’.
mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(99)mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(88)mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(77)
- Select the cases under study. They concern the whole British population. Therefore, in Stata we can use the command ‘keep if’ and indicate that we will keep all observations that, for the variable ‘Country (cntry)’, have the value ‘GB’.
keep if cntry=="GB"
- To obtain the correlation matrix in Stata, we have used the command ‘pwcorr’. Using this command, select the six variables under analysis in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta]. Here, the design weights have been applied using the command ‘aweight’. Furthermore, we also added the options ‘listwise’ to provide the correlations for the same cases in the sample.
pwcorr stfdem fairelcc oppcrgvc cttresac lrscale hinctnta [aweight=dweight], listwise

If you have followed the procedures in one of the syntaxes above, you should end up with the results in Table 4.2:

^{3}for Great Britain corrected by design weights (n=1468)

^{4}.

In Table 4.2, the correlations between the observed variables are presented using design weights. Weighting does not change the correlations between the variables very much. The change mostly occurs only in the third decimal. However, this may be different for frequencies and means. Thus, because weighting with the design weight is better than not weighting, we will continue the analysis with the latter matrix. For more details, we refer to the ESS EduNet topic ‘Weighting the ESS’ [Gan10].

At this point, we have reliability, validity and quality values (see Table 4.1) as well as the correlations between the observed variables (see Table 4.2). So we have all the information we need to correct the correlation matrix for measurement errors. This will be done using equation 4.3 above.

The correction has to be done in two steps. The first step is to subtract the common method variance (cmv) from the given correlations to correct for systematic effects of the method used. The second step consists of dividing all correlations by the product of the quality coefficients to correct for the random errors in all variables.

We should mention that the cmv should only be computed for variables measured by the same method. This holds true in this analysis for the group E17, E20 and E25 and the pair B23 and B19, and not for the other combinations.5 Questions E17, E20 and E25 use the same method because they are in the same battery with the same scale, while questions B19 and B23 use an 11-point bipolar scale. In Table 4.3, we reproduce the correlation matrix of the observed variables, highlighting the correlations that we need to correct for common method variance.

The correlations highlighted in this matrix are the correlations that may be too high due to method effects. So, for these correlations, we have to subtract the cmv. The cmv for any pair of variables (i) and (j) is (r_{i}m_{i}m_{j}r_{j}). The cmv can be calculated using the predictions obtained from the quality predictions for the questions (see Table 4.2).

The calculations for the correction for common method variance in the correlations highlighted above are presented in Table 4.4.

This illustrates that a simple calculation is needed to correct the correlations for common method variance. Table 4.5 illustrates the result after putting these new correlations in the proper place.

To finish the correction, the correlations should be divided by the product of the quality coefficients (see equation 4.3). However, this calculation does not have to be done by hand, because we can put the quality coefficients on the diagonal and ask the program to transform this covariance matrix into a correlation matrix. In that case, the program will automatically compute: ((r_{ij}-cmv_{ij})/ q_{i}q_{j}). Table 4.6 highlights the variances that have to be modified.

In the same way, putting the quality estimates of the variables on the diagonal in the matrix, we have formulated the covariance matrix that will be used to obtain the correlations corrected for measurement errors (see Table 4.7).

In the analysis, the program is first asked to transform this covariance matrix in a correlation matrix. This is done by the program dividing each covariance by the product of the standard deviations of the related variables, but that is the same in this case as dividing the covariance by the product of the quality coefficients. For example, the corrected correlation for the variables Satdem and Free will be obtained as follows: 0.395/√(0.710*0.643) = 0.585. It is rather risky to do all these calculation by hand. In the next chapter, we will therefore show how the correction for lack of quality can be done using standard software.

The result of this computation for all variables can be seen in the following matrix (Table 4.8), which contains the correlations between the same variables corrected for method effect and random errors.

If we compare Table 4.2 of this chapter with the above correlation matrix (Table 4.8), it is clear that, after correction for measurement errors, the correlations are quite different from before correction. So we can also expect the results of the analyses to be different. Note that the first step in the correction procedure has to be done by hand, while the program used for the analysis does the second step automatically.

In sum, the user has to produce Table 4.7, correcting for common method variance where necessary and reducing the variance of the observed variable, which is 1 for standardized variables, to the systematic part (q^{2}) or, to put it differently, removing the error variance. Using this corrected correlation matrix like the one in Table 4.7, the analysis can start because commonly used programs for analysis, such as Stata and LISREL, will automatically transform this corrected correction matrix into a correlation matrix corrected for measurement error, as presented in Table 4.8. After these corrections for measurement errors in the correlation matrix, the analysis is the same whether we analyse the original correlations or the corrected correlations. We will see that the results will be very different, however.

It is important to mention that this process sometimes goes wrong because, after correction for measurement errors, the correlations can be greater than 1.6 Such correlations are impossible and should be seen as incorrect estimates of the true correlations. From the formula for correction for measurement errors [(r(y_{11}y_{21}) – cmv)/ q_{11}q_{21}], it follows that, if the numerator is larger than the denominator, then the estimated correlation will be larger than 1. This can be because of a too high estimate of the observed correlation, a too low estimate of the cmv or a too low estimate of the quality of the questions. If this occurs, one can check the following:

- There is an error in the SQP coding. Especially in the case of non-authorized coding, we suggest first checking the coding.
- The original correlations can be too high because the method effects are underestimated or because of outliers in the data. In that case, we suggest checking the data and removing the outliers in the data and using the new correlation matrix for the further analysis.
- The SQP predictions are based on the general trends of the relationships between the different characteristics of the questions in all countries and the quality estimates of these questions. If a country deviates considerably from these general trends, this can lead to problems. In order to cope with this problem, one can first try to see whether the uncertainty in the predictions of the quality estimates is taken into account. You can see whether the problem would be solved if the maximum value of the specified inter-quartile interval had been used in the calculations instead of the point estimates of the quality indicators. If this approach does not lead to a solution of the problem, there is no other solution than to rely on MTMM experiments to obtain the quality estimates for the questions involved in these problems. For a large number of questions, the quality estimates of such experiments are also available in SQP but not for all possible questions, of course. If no information about an MTMM experiment is available, new experiments have to be done.

### Exercise 4.1:

To see the effect of reliability on the observed correlation, we evaluate the following cases in the model in Figure 4.1. If the correlation between the variables corrected for measurement error r(f_{1},f_{2}) = 0.9, the validity = 1, and the method effect = 0, what is the correlation between the observed variables in the following cases?

Reliability coefficient |
Reliability coefficient |
Observed correlation |
---|---|---|

y_{11} | y_{12} | r_{y11,y12} = r_{11}v_{11}r(f_{1},f_{2})v_{12}r_{12} + r_{11}m_{11}m_{12}r_{12} |

1.0 | 1.0 | r_{y11,y12} = |

0.9 | 0.9 | r_{y11,y12} = |

0.8 | 0.8 | r_{y11,y12} = |

0.7 | 0.7 | r_{y11,y12} = |

0.6 | 0.6 | r_{y11,y12} = |

Reliability coefficient |
Reliability coefficient |
Observed correlation |
---|---|---|

y_{11} | y_{12} | r_{y11,y12} = r_{11}v_{11}r(f_{1},f_{2})v_{12}r_{12} + r_{11}m_{11}m_{12}r_{12} |

1.0 | 1.0 | r_{y11,y12} = 1*1*0.9*1*1 + 1*0*0*1 = 0.9 + 0 = 0.9 |

0.9 | 0.9 | r_{y11,y12} = 0.9*1*0.9*1*0.9 + 0.9*0*0*0.9 = 0.73 + 0 = 0.73 |

0.8 | 0.8 | r_{y11,y12} = 0.8*1*0.9*1*0.8 + 0.8*0*0*0.8 = 0.58 + 0 = 0.58 |

0.7 | 0.7 | r_{y11,y12} = 0.7*1*0.9*1*0.7 + 0.7*0*0*0.7 = 0.44 + 0 = 0.44 |

0.6 | 0.6 | r_{y11,y12} = 0.6*1*0.9*1*0.6 + 0.6*0*0*0.6 = 0.32 +0 = 0.32 |

This exercise illustrates that, keeping all other coefficients fixed, the correlation between the observed variables decreases if the reliability coefficient decreases. Recalling the definition of reliability, this makes sense because the smaller the reliability, the larger the random errors and, thus, the lower the correlation. Note that, with a reasonably high reliability coefficient of 0.6, the correlation has been reduced to nearly a third of its real value.

### Exercise 4.2:

Imagine that the correlation between the variables corrected for measurement error = 0.4 and the reliability coefficient = 0.99 for both measures. What is the correlation between the observed variables in the following cases?

Validity coefficient |
Validity coefficient |
Method effect |
Method effect |
Observed correlation |
---|---|---|---|---|

y_{11} | y_{12} | y_{11} | y_{12} | r_{y11,y12} = r_{11}v_{11}r(f_{1},f_{2})v_{12}r_{12} + r_{11}m_{11}m_{12}r_{12} |

1.0 | 1.0 | 0.0 | 0.0 | r_{y11,y12} = |

0.9 | 0.9 | 0.43 | 0.43 | r_{y11,y12} = |

0.8 | 0.8 | 0.6 | 0.6 | r_{y11,y12} = |

0.7 | 0.7 | 0.71 | 0.71 | r_{y11,y12} = |

Validity coefficient |
Validity coefficient |
Method effect |
Method effect |
Observed correlation |
---|---|---|---|---|

y_{11} | y_{12} | y_{11} | y_{12} | r_{y11,y12} = r_{11}v_{11}r(f_{1},f_{2})v_{12}r_{12} + r_{11}m_{11}m_{12}r_{12} |

1.0 | 1.0 | 0.0 | 0.0 | r_{y11,y12} = 0.99*1*0.4*1*0.99 + 0.99*0*0*0.99 = 0.392 + 0 = 0.392 |

0.9 | 0.9 | 0.43 | 0.43 | r_{y11,y12} = 0.99*0.9*0.4*0.9*0.99 + 0.99*0.43*0.43*0.99 = 0.318 + 0.181 = 0.499 |

0.8 | 0.8 | 0.6 | 0.6 | r_{y11,y12} = 0.99*0.8*0.4*0.8*0.99 + 0.99*0.6*0.6*0.99 = 0.251 + 0.353 = 0.604 |

0.7 | 0.7 | 0.71 | 0.71 | r_{y11,y12} = 0.99*0.7*0.4*0.7*0.99 + 0.99*0.71*0.71*0.99 = 0.192 + 0.494 = 0.686 |

From the results, we can see that, if the common method variance (i.e. the second term) is larger than the decrease in the first term, the result is that the observed correlation becomes larger. These two exercises show that the observed correlation can be too low because of measurement errors or too high because of common method variance.

### Exercise 4.3:

In exercise 3.1, we have introduced and obtained the quality predictions for the four variables that are intended to explain the opinion about immigration by people from outside Europe to the Netherlands. The results obtained from SQP about the quality predictions of these questions are reproduced below. We have added to this table the reliability (r), validity (v) and quality (q) coefficients, which are the square roots of the coefficients given earlier.

For the purpose of this exercise, the correlation matrix for these variables is also provided in the next table.

Given the observed correlations in the table above, what are the correlations between these variables corrected for measurement errors?

First, compute the cmv for the variables using a common method. This holds true for the variables Economy, Culture and Better, which use 11-point, item-specific scales. The correlation between these variables has to be reduced by the cmv.

Computing the cmv gives:

_{B40B38}= r

_{B40}m

_{B40}m

_{B38}r

_{B38}= 0.853 * 0.349 * 0.354 * 0.896 = 0.094

_{B40B39}= r

_{B40}m

_{B40}m

_{B39}r

_{B39}= 0.853 * 0.349 * 0.418 * 0.881 = 0.110

_{B38B39}= r

_{B38}m

_{B38}m

_{B39}r

_{B39}= 0.896 * 0.354 * 0.418 * 0.881 = 0.117

Next, the correlations corrected for common method variance are reduced as follows:

_{B40,B38}= 0.534 – 0.094 =

**0.440**

_{B40,B39}= 0.557 – 0.110 =

**0.447**

_{B38,B39}= 0.530 – 0.117 =

**0.413**

By putting the qualities (q^{2}) from the table above on the diagonal of the matrix, the correlation matrix corrected for measurement errors will have been computed. The result is presented below.

- [1] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
- [2] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
- [3] ESS Round 6: European Social Survey Round 6 Data (2012). Data file edition 2.0. Norwegian Social Science Data Services, Norway – Data Archive and distributor of ESS data.
- [4] SPSS adjusts the sample size on the basis of the design weights. Their adjusted sample size is 1424. However, for our illustration, we will stick to the original sample of 1468, which is the actual number of people who answered the questions.
- [5] The formulations of the questions used to measure these variables were presented in the introduction to this module.
- [6] An example of this is the analysis in exercise 3.1 when using the British data. It is for this reason that we use the Dutch data in the exercises.

- [Gan10] Ganninger, M. (2010): Weighting in the ESS.
*European Social Survey Education Net (ESS EduNet)*.Available: http://essedunet.nsd.uib.no/cms/topics/weight/ - [Sar84] Saris, W. E. and Stronkhorst, L. H. (1984). Causal modelling in nonexperimental research: an introduction to the LISREL approach.
*Sociometric Research Foundation*.