Correction for measurement errors in the correlation matrix

As we have shown above, there is a simple way to correct for measurement errors in the correlation coefficients. This approach is based on the equation presented before:

r(fifj) = (r(yiyj) - cmvij)/qiqj equation 4.3

where qij = rijvij and cmvij = rijmijmijrij

So, correction for measurement error in the observed correlation is very simple if we know the quality of the observed variables. This result holds for single questions as well as for composite scores. The information we need to make these corrections is the observed correlations between the variables and the quality, reliability and validity coefficients. At the end, if the correlations are corrected for measurement errors, the relationships between the variables can be estimated as if there were no measurement errors. This will be illustrated in the following chapters. In this chapter, we will focus on correcting the correlation matrix.

The first step in correcting the correlation matrix would be to obtain the necessary estimates of quality, reliability and validity. This has already been illustrated in the previous chapter for the variables of interest in our example. The results obtained are reproduced in Table 4.1:

Table 4.1: The SQP quality predictions of the questions under study in ESS Round 6 for Great Britain

The second step is to get the correlation matrix of the observed variables (r(yiyj)). The weighted results are presented in Table 4.2. If you are interested in reproducing the correlation results presented in this table using SPSS or Stata, you can follow the steps described in the following links, using a dataset especially prepared for this module. It should be noted that the same results could have been obtained using any other statistical package.

Correlation matrix with SPSS

  1. In the following link, you will find the dataset ‘CME data – ESSround 6’.
    Download data in SPSS format
    Open this dataset in SPSS:1

    GET FILE='C:\...\CME data_ESSround6_F1.sav'.
  2. First select the cases under study in our analysis. They concern the whole British population. Therefore, from Data in SPSS, select ‘Select Cases…’. To limit the analysis to Great Britain, choose ‘If condition is satisfied’, select the variable ‘Country’ and insert the following notation: cntry = ‘GB’.

    COMPUTE filter_$=(cntry="GB").
    VARIABLE LABELS filter_$ 'cntry="GB" (FILTER)'.
    VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
    FORMATS filter_$ (f1.0).
    FILTER BY filter_$.
    EXECUTE.
  3. The ESS suggests that it is also necessary to take into account the design weights in order to correct for specific characteristics of the sampling that may bias the results. In SPSS, under the Data heading, you have to weight the cases using design weights. Select ‘Weight cases’ and weight the cases by the variable ‘Design weight [dweight]’.

    WEIGHT BY dweight.
  4. To obtain the correlation matrix, choose ‘Correlate’ from Analyze and then click ‘Bivariate…’. From the list, select the variables in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta]. Once the variables are selected, in Options choose the option ‘Exclude cases listwise’ to obtain all correlations for the same cases in the sample.

    CORRELATIONS
    /VARIABLES=stfdem fairelcc oppcrgvc cttresac lrscale hinctnta
    /PRINT=TWOTAIL NOSIG
    /MISSING=LISTWISE.

Correlation matrix with Stata

  1. In the following link, you will find the dataset ‘CME data – ESSround 6’
    Download data in Stata format
    Open this dataset in Stata:2

    use "C:\...\CME data_ESSround6_F1.dta", clear
  2. Carrying out some tabulations of our variables of interest before asking for the correlation, we see that the variables Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta] have Refusal and Don’t know values, which should be assigned to system missing. This can be done using the command ‘mvdecode’.

    mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(99)
    mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(88)
    mvdecode stfdem fairelcc oppcrgvc cttresac lrscale hinctnta, mv(77)
  3. Select the cases under study. They concern the whole British population. Therefore, in Stata we can use the command ‘keep if’ and indicate that we will keep all observations that, for the variable ‘Country (cntry)’, have the value ‘GB’.

    keep if cntry=="GB"
  4. To obtain the correlation matrix in Stata, we have used the command ‘pwcorr’. Using this command, select the six variables under analysis in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale] and Inc [hinctnta]. Here, the design weights have been applied using the command ‘aweight’. Furthermore, we also added the options ‘listwise’ to provide the correlations for the same cases in the sample.

    pwcorr stfdem fairelcc oppcrgvc cttresac lrscale hinctnta [aweight=dweight], listwise

If you have followed the procedures in one of the syntaxes above, you should end up with the results in Table 4.2:

Table 4.2: The correlations between the variables from the satisfaction with democracy model of ESS Round 63 for Great Britain corrected by design weights (n=1468)4.

In Table 4.2, the correlations between the observed variables are presented using design weights. Weighting does not change the correlations between the variables very much. The change mostly occurs only in the third decimal. However, this may be different for frequencies and means. Thus, because weighting with the design weight is better than not weighting, we will continue the analysis with the latter matrix. For more details, we refer to the ESS EduNet topic ‘Weighting the ESS’ [Gan10].

At this point, we have reliability, validity and quality values (see Table 4.1) as well as the correlations between the observed variables (see Table 4.2). So we have all the information we need to correct the correlation matrix for measurement errors. This will be done using equation 4.3 above.

The correction has to be done in two steps. The first step is to subtract the common method variance (cmv) from the given correlations to correct for systematic effects of the method used. The second step consists of dividing all correlations by the product of the quality coefficients to correct for the random errors in all variables.

We should mention that the cmv should only be computed for variables measured by the same method. This holds true in this analysis for the group E17, E20 and E25 and the pair B23 and B19, and not for the other combinations.5 Questions E17, E20 and E25 use the same method because they are in the same battery with the same scale, while questions B19 and B23 use an 11-point bipolar scale. In Table 4.3, we reproduce the correlation matrix of the observed variables, highlighting the correlations that we need to correct for common method variance.

Table 4.3: The correlation matrix between the observed variables of ESS Round 6 for Great Britain indicating the correlations to be corrected for common method variance

The correlations highlighted in this matrix are the correlations that may be too high due to method effects. So, for these correlations, we have to subtract the cmv. The cmv for any pair of variables (i) and (j) is (rimimjrj). The cmv can be calculated using the predictions obtained from the quality predictions for the questions (see Table 4.2).

The calculations for the correction for common method variance in the correlations highlighted above are presented in Table 4.4.

Table 4.4: Correction for common method variance (cmv)

This illustrates that a simple calculation is needed to correct the correlations for common method variance. Table 4.5 illustrates the result after putting these new correlations in the proper place.

Table 4.5: The correlations corrected for common method variance (Step 1: by hand)

To finish the correction, the correlations should be divided by the product of the quality coefficients (see equation 4.3). However, this calculation does not have to be done by hand, because we can put the quality coefficients on the diagonal and ask the program to transform this covariance matrix into a correlation matrix. In that case, the program will automatically compute: ((rij-cmvij)/ qiqj). Table 4.6 highlights the variances that have to be modified.

Table 4.6: The correlation matrix between the observed variables of ESS Round 6 for Great Britain corrected for common method variance and indicating the variances to be corrected for measurement

In the same way, putting the quality estimates of the variables on the diagonal in the matrix, we have formulated the covariance matrix that will be used to obtain the correlations corrected for measurement errors (see Table 4.7).

Table 4.7: The correlations and variances corrected for measurement errors (Step2: by hand)

In the analysis, the program is first asked to transform this covariance matrix in a correlation matrix. This is done by the program dividing each covariance by the product of the standard deviations of the related variables, but that is the same in this case as dividing the covariance by the product of the quality coefficients. For example, the corrected correlation for the variables Satdem and Free will be obtained as follows: 0.395/√(0.710*0.643) = 0.585. It is rather risky to do all these calculation by hand. In the next chapter, we will therefore show how the correction for lack of quality can be done using standard software.

The result of this computation for all variables can be seen in the following matrix (Table 4.8), which contains the correlations between the same variables corrected for method effect and random errors.

Table 4.8: Correlation matrix corrected for measurement errors (by the program)

If we compare Table 4.2 of this chapter with the above correlation matrix (Table 4.8), it is clear that, after correction for measurement errors, the correlations are quite different from before correction. So we can also expect the results of the analyses to be different. Note that the first step in the correction procedure has to be done by hand, while the program used for the analysis does the second step automatically.

In sum, the user has to produce Table 4.7, correcting for common method variance where necessary and reducing the variance of the observed variable, which is 1 for standardized variables, to the systematic part (q2) or, to put it differently, removing the error variance. Using this corrected correlation matrix like the one in Table 4.7, the analysis can start because commonly used programs for analysis, such as Stata and LISREL, will automatically transform this corrected correction matrix into a correlation matrix corrected for measurement error, as presented in Table 4.8. After these corrections for measurement errors in the correlation matrix, the analysis is the same whether we analyse the original correlations or the corrected correlations. We will see that the results will be very different, however.

It is important to mention that this process sometimes goes wrong because, after correction for measurement errors, the correlations can be greater than 1.6 Such correlations are impossible and should be seen as incorrect estimates of the true correlations. From the formula for correction for measurement errors [(r(y11y21) – cmv)/ q11q21], it follows that, if the numerator is larger than the denominator, then the estimated correlation will be larger than 1. This can be because of a too high estimate of the observed correlation, a too low estimate of the cmv or a too low estimate of the quality of the questions. If this occurs, one can check the following:

  1. There is an error in the SQP coding. Especially in the case of non-authorized coding, we suggest first checking the coding.
  2. The original correlations can be too high because the method effects are underestimated or because of outliers in the data. In that case, we suggest checking the data and removing the outliers in the data and using the new correlation matrix for the further analysis.
  3. The SQP predictions are based on the general trends of the relationships between the different characteristics of the questions in all countries and the quality estimates of these questions. If a country deviates considerably from these general trends, this can lead to problems. In order to cope with this problem, one can first try to see whether the uncertainty in the predictions of the quality estimates is taken into account. You can see whether the problem would be solved if the maximum value of the specified inter-quartile interval had been used in the calculations instead of the point estimates of the quality indicators. If this approach does not lead to a solution of the problem, there is no other solution than to rely on MTMM experiments to obtain the quality estimates for the questions involved in these problems. For a large number of questions, the quality estimates of such experiments are also available in SQP but not for all possible questions, of course. If no information about an MTMM experiment is available, new experiments have to be done.

Exercise 4.1:

To see the effect of reliability on the observed correlation, we evaluate the following cases in the model in Figure 4.1. If the correlation between the variables corrected for measurement error r(f1,f2) = 0.9, the validity = 1, and the method effect = 0, what is the correlation between the observed variables in the following cases?

Compute the observed correlations
Reliability
coefficient
Reliability
coefficient
Observed correlation
y11 y12 ry11,y12 = r11v11r(f1,f2)v12r12 + r11m11m12r12
1.0 1.0 ry11,y12 =
0.9 0.9 ry11,y12 =
0.8 0.8 ry11,y12 =
0.7 0.7 ry11,y12 =
0.6 0.6 ry11,y12 =

Solution

Solution
Reliability
coefficient
Reliability
coefficient
Observed correlation
y11 y12 ry11,y12 = r11v11r(f1,f2)v12r12 + r11m11m12r12
1.0 1.0 ry11,y12 = 1*1*0.9*1*1 + 1*0*0*1 = 0.9 + 0 = 0.9
0.9 0.9 ry11,y12 = 0.9*1*0.9*1*0.9 + 0.9*0*0*0.9 = 0.73 + 0 = 0.73
0.8 0.8 ry11,y12 = 0.8*1*0.9*1*0.8 + 0.8*0*0*0.8 = 0.58 + 0 = 0.58
0.7 0.7 ry11,y12 = 0.7*1*0.9*1*0.7 + 0.7*0*0*0.7 = 0.44 + 0 = 0.44
0.6 0.6 ry11,y12 = 0.6*1*0.9*1*0.6 + 0.6*0*0*0.6 = 0.32 +0 = 0.32

This exercise illustrates that, keeping all other coefficients fixed, the correlation between the observed variables decreases if the reliability coefficient decreases. Recalling the definition of reliability, this makes sense because the smaller the reliability, the larger the random errors and, thus, the lower the correlation. Note that, with a reasonably high reliability coefficient of 0.6, the correlation has been reduced to nearly a third of its real value.

Exercise 4.2:

Imagine that the correlation between the variables corrected for measurement error = 0.4 and the reliability coefficient = 0.99 for both measures. What is the correlation between the observed variables in the following cases?

Compute the correlations
Validity
coefficient
Validity
coefficient
Method
effect
Method
effect
Observed correlation
y11 y12 y11 y12 ry11,y12 = r11v11r(f1,f2)v12r12 + r11m11m12r12
1.0 1.0 0.0 0.0 ry11,y12 =
0.9 0.9 0.43 0.43 ry11,y12 =
0.8 0.8 0.6 0.6 ry11,y12 =
0.7 0.7 0.71 0.71 ry11,y12 =

Solution

Solution
Validity
coefficient
Validity
coefficient
Method
effect
Method
effect
Observed correlation
y11 y12 y11 y12 ry11,y12 = r11v11r(f1,f2)v12r12 + r11m11m12r12
1.0 1.0 0.0 0.0 ry11,y12 = 0.99*1*0.4*1*0.99 + 0.99*0*0*0.99 = 0.392 + 0 = 0.392
0.9 0.9 0.43 0.43 ry11,y12 = 0.99*0.9*0.4*0.9*0.99 + 0.99*0.43*0.43*0.99 = 0.318 + 0.181 = 0.499
0.8 0.8 0.6 0.6 ry11,y12 = 0.99*0.8*0.4*0.8*0.99 + 0.99*0.6*0.6*0.99 = 0.251 + 0.353 = 0.604
0.7 0.7 0.71 0.71 ry11,y12 = 0.99*0.7*0.4*0.7*0.99 + 0.99*0.71*0.71*0.99 = 0.192 + 0.494 = 0.686

From the results, we can see that, if the common method variance (i.e. the second term) is larger than the decrease in the first term, the result is that the observed correlation becomes larger. These two exercises show that the observed correlation can be too low because of measurement errors or too high because of common method variance.

Exercise 4.3:

In exercise 3.1, we have introduced and obtained the quality predictions for the four variables that are intended to explain the opinion about immigration by people from outside Europe to the Netherlands. The results obtained from SQP about the quality predictions of these questions are reproduced below. We have added to this table the reliability (r), validity (v) and quality (q) coefficients, which are the square roots of the coefficients given earlier.

The SQP quality predictions

For the purpose of this exercise, the correlation matrix for these variables is also provided in the next table.

Correlation matrix

Given the observed correlations in the table above, what are the correlations between these variables corrected for measurement errors?

Solution

First, compute the cmv for the variables using a common method. This holds true for the variables Economy, Culture and Better, which use 11-point, item-specific scales. The correlation between these variables has to be reduced by the cmv.

Computing the cmv gives:

cmvB40B38 = rB40mB40 mB38 rB38 = 0.853 * 0.349 * 0.354 * 0.896 = 0.094
cmvB40B39 = rB40mB40 mB39 rB39 = 0.853 * 0.349 * 0.418 * 0.881 = 0.110
cmvB38B39 = rB38mB38 mB39 rB39 = 0.896 * 0.354 * 0.418 * 0.881 = 0.117

Next, the correlations corrected for common method variance are reduced as follows:

correctedcorrB40,B38 = 0.534 – 0.094 = 0.440
correctedcorrB40,B39 = 0.557 – 0.110 = 0.447
correctedcorrB38,B39 = 0.530 – 0.117 = 0.413

By putting the qualities (q2) from the table above on the diagonal of the matrix, the correlation matrix corrected for measurement errors will have been computed. The result is presented below.

Go to next chapter >>

Footnotes

References