# Derivation of the quality of complex concepts

In Chapter 3, we have shown how the quality of the variables Satdem, LRplace and Inc were predicted using the program SQP 2.0. Here, we will show that we can also determine the quality of the composite score (CS) for the variable ‘Democracy level’ on the basis of the quality of the three indicator variables: Free, Critic and Equal.

The quality of a variable can be defined as the ratio between the systematic variance of the variable and the total variance of the variable or 1 – (error variance / total variance). For single questions, the quality can be predicted by SQP. However, SQP is not able to predict the quality of composite scores. In this case, the standard definition can be adjusted as follows:

Quality of CS = 1 – (var(e_{cs})/ var(CS)) | equation 7.2 |

The total variance of the CS can be obtained directly from the computed composite score while the error variance is equal to:

var(e_{cs}) = Σw_{k}^{2} var(e_{k}) + 2Σw_{k}w_{k'} cov(e_{k}e_{k'}) over k and k≠k^{'} | equation 7.3 |

where var(e_{i}) = the error variance in y_{i} and can be estimated as:

var(e_{i})= (1-q_{i}^{2})var(y_{i}) | equation 7.4 |

while cov(e_{i}e_{j}) can be estimated as:

cov(e_{i}e_{j}) = cmv_{ij} • s_{i}s_{j} = (r_{i}m_{i}m_{j}r_{j})(s_{i}s_{j}) for the variables y_{i} and y_{j} | equation 7.5 |

These equations show that we can obtain an estimate of the quality of a composite score from the estimation of the quality of the single questions and the cmv.

Furthermore, we can also calculate the common method variance between the composite score and another variable, using the following equation:

cmv(y_{i},y_{cs}) = r_{i} m_{i}Σ[(w_{k}/σ_{cs})m_{k} r_{k}] | equation 7.6 |

For the example used in this illustration, the composite score Demlevel does not share a method with any other variable. Here, we will therefore focus on the correction of the correlation based on the computation of the quality of the composite score. However, it should be noted that this equation should be used in cases where the composite score shares a common method with another variable.

The quality of the questions needed to compute the quality of the composite score was given in Chapter 3. Furthermore, we need the standard deviations and variances of the observed variables. Table 7.2 provides the means, standard deviations (s) and variances (s^{2}) of the different variables. The variances are the square of the standard deviations. If you are interested in reproducing the descriptive statistics results presented in Table 7.2 using SPSS or Stata, you can follow the steps described in the following links, using a dataset especially prepared for this module. It should be noted that the same results could have been obtained using any other statistical package.

Descriptive statistics with SPSS

- Use the dataset you created above in the first page of Chapter 7: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in SPSS:1
GET FILE='C:\...\CME data_ESSround6_withDemlevel.sav'.
- First, select the cases under study in our analysis. They concern the whole British population. Therefore, from Data in SPSS, select ‘Select Cases…’. To limit the analysis to Great Britain, choose ‘If condition is satisfied’, select the variable ‘Country’ and insert the following notation: cntry = ‘GB’.
COMPUTE filter_$=(cntry="GB").VARIABLE LABELS filter_$ 'cntry="GB" (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMATS filter_$ (f1.0).FILTER BY filter_$.EXECUTE.
- Under Data, you have to weight the cases using design weights. Select ‘Weight cases’ and weight the cases by the variable ‘Design weight [dweight]’.
WEIGHT BY dweight.
- To obtain the descriptive statistics, choose ‘Correlate’ from Analyze, and then click ‘Bivariate…’. From the list, select the variables in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale], Inc [hinctnta] and Demlevel [demlevel]. Once the variables are selected, in Options choose ‘Mean and standard deviations’ to obtain the statistics of those variables. Also choose the option ‘Exclude cases listwise’ to obtain the results for the same cases in the sample.2
CORRELATIONS/VARIABLES=stfdem fairelcc oppcrgvc cttresac lrscale hinctnta demlevel/PRINT=TWOTAIL NOSIG/STATISTICS DESCRIPTIVES/MISSING=LISTWISE.

Descriptive statistics with Stata

- Use the dataset you created above in the first page of Chapter 7: ‘CME data_ESSround6_withDemlevel’. You can also download it from this link. Open this dataset in Stata:3
use "C:\...\CME data_ESSround6_withDemlevel.dta", clear
- Select the cases under study. They concern the whole British population. Therefore, in Stata we can use the command ‘keep if’ to indicate that we will keep all observations that, for the variable ‘Country (cntry)’, have the value ‘GB’.
keep if cntry=="GB"
- To obtain the descriptive statistics in Stata, we have used the command ‘corr’. It would be useful to compare these results with the ones obtained in Table 4.2. With this command, select the seven variables under analysis in the following order: Satdem [stfdem], Free [fairelecc], Critic [oppcrgvc], Equal [cttresac], LRplace [lrscale], Inc [hinctnta] and Demlevel [demlevel]. Here, the design weights have been applied using the command ‘aweight’. Furthermore, in order to obtain the descriptive statistics for these variables, we add the command ‘means’.
corr stfdem fairelcc oppcrgvc cttresac lrscale hinctnta demlevel[aweight=dweight], means

The results obtained are summarized in the following table:

^{4}descriptive statistics (weighted) of the variables for Great Britain (n=1468)

^{5}.

The first step is to compute the quality of the composite score, which will be used in the correlation matrix to correct the variance in the variable Demlevel. As we already have the quality predictions for all the other variables in Chapter 3, it should be noted that, here, we are just going to illustrate how we will derive the quality of the variable Demlevel, and that all the other values in the correlation matrix remain the same. In order to compute the quality of the composite score as presented in equation 7.2, we need to first compute the variance of the error of the composite score (see equation 7.3) based on the results obtained from the computation of the error variances (var(e_{i})) and covariances cov(e_{i}e_{j}) of the variable Democracy level (Demlevel).To do so, the only information we need is that provided in Chapter 3 Table 3.3, where we obtained the quality estimates of the three indicators of democracy level, and the information from the above table, where we have presented their variances.

Table 7.3 presents the information required to compute the error variances, var(e_{i}), for the variables that measure the democracy level (see equation 7.4).

As stated above, we also need to calculate the covariances, cov(e_{i}e_{j}), for these three variables as seen in equation 7.5. This is done in Table 7.4. Because the cmv was the value for the standardized variables the covariance is obtained by multiplying the cmv by the standard deviations of the involved variables.

Now that we have all components of equation 7.3, we can compute the variance of the errors of the unweighted composite score for democracy level, which is:

_{dem}) = Σw

_{k}

^{2}var(e

_{k}) + 2Σw

_{k}w

_{k'}cov(e

_{k}e

_{k'})

**9.955**

Finally, the quality of the composite score of democracy level is computed using equation 7.2:

_{dem}= 1 – (var(e

_{dem})/ var

_{dem})

**0.5933**

The strength of the relationship between the composite score [Demlevel] of the variables Free, Critic and Equal and the variable of interest, Democracy level, is 0.59, which means that 59% of the observed variance in the composite score comes from the variables of interest and that 41% of the variance is error.

Before continuing to the next step, which is correction for measurement errors in a composite score model, we first need to correct the correlations.

We start with the correction of the correlation matrix presented in Table 7.1 above. As in Chapter 4, we first correct for the common method variance. In this case that is only expected for the correlations between Satdem and LRplace. From Table 4.4, we know that this correlation should be reduced by 0.076, which is the cmv of the variables Satdem and Lrplace. This will result in a reduced correlation of 0.112 after correction for the common method variance of these variables. Now, in Table 7.5 we only have to substitute the quality estimates for the variances of all standardized variables (1.00). The quality estimates for the variables Satdem, LRplace and Inc were obtained in Table 3.3 using SQP. Following this, the covariance matrix is obtained for the analysis with correction for measurement errors.

#### Footnotes

- [1] The following illustration and results are based on the SPSS 19 software version: IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
- [2] We used the Correlation feature to obtain the results presented in the table, as it allows cases to be excluded listwise, an option that is not available in Descriptives in SPSS.
- [3] The following illustration and results are based on the Stata 12 software version: StataCorp. 2011.
**Stata Statistical Software: Release 12**. College Station, TX: StataCorp LP. - [4] ESS Round 6: European Social Survey Round 6 Data (2012). Data file edition 2.0. Norwegian Social Science Data Services, Norway – Data Archive and distributor of ESS data.
- [5] SPSS adjusts the sample size on the basis of the design weights. Their adjusted sample size is 1424. However, for our illustration, we will stick to the original sample of 1468, which is the actual number of people that answered the questions.