Chapter 8: Summated scales in regression analysis
As noted in chapter 1, linear regression analysis presupposes that the variables are metric. But a large proportion of the variables in the ESS data sets are ordinal, and, in addition, most of them are measures of attitudes. Individual measures of attitudes tend to be inaccurate because they only extract particular aspects of the general attitudes we wish to measure, or because people’s answers to single-attitude questions are plagued by random inaccuracies. But both these problems can be alleviated somewhat by combining the values of several indicator variables into scales. This is done by taking their aggregate or average value.
Such summated scales can cover a wider range of manifestations of the relevant attitudes than single-attitude measures do, and positive random measurement errors can be offset by negative ones and vice versa. Furthermore, they have more values than their individual components and their values often have a distribution that is better adapted to linear regression analysis than single-attitude measurements are. If, for instance, we add two variables with two identical values each, we get a new variable with three values. The more indicator variables we combine, the more values the summated scale will have and the more symmetrically dispersed and similar to that of a metric variable its value distribution tends to be.
Indicator variables with identical value codes provide equal contributions to the combined variable. Thus, if we have no particular reason to assign more weight to some indicators than to others, we would prefer indicators with identical value ranges. (Or we might rescale them in order to make their value ranges equal.) The indicator variables to be used as components in a summated scale should be selected with care. SPSS provides tests that can be used to check candidate variables’ appropriateness. Factor analysis is used to select candidates for a scale by singling out variables that can be conceived as indicators of a common underlying attitude or phenomenon. Reliability analysis is used to check whether the associations between selected candidates are strong enough to make their sum a sufficiently accurate measure of the underlying phenomenon. Limitations of space prevent us from going into more detail about these methods other than to state that the reliability analysis option can be found under ‘Scale’ in the Analyze menu, and that, as a minimal test procedure, we could put the candidate indicators on the items list, click ’OK’ and accept all candidates if the resulting Cronbach’s alpha value exceeds 0.7, or try another candidate variable set if it does not. Make sure that the indicators are positively correlated (associated) with each other before you run the analysis. Limit the number of indicators in your candidate sets (3 - 6 ought to be enough). The cases we present here have alpha values greater than 0.7.
Summated scales: Example
You should read the text even if you choose to copy, paste and run the syntax.Syntax for example
* The following command causes the cases to be weighted by the design weight variable 'dweight'.
WEIGHT BY dweight.
* The following commands causes SPSS to select for analysis those cases that belong to the British sample (value GB on country variable) and have lower values than 1975 on the birth year variable (& stands for AND, < stands for 'less than'). * In this process, the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases. * Change the last part of line 2 (which starts after the first equals sign) if you wish to select other cases (if you do this, you should also change the variablelabel, which can be found within double quotation marks on line 3).
* These commands compute summated scales.
* These commands run multiple regression with summated scales and other ordinal variables.
We have computed two scales. One is intended to find out whether people feel that they are being squeezed between family responsibilities and their jobs. We call it the ‘Job-family time squeeze scale’. The other scale is supposed to elicit people’s general sense of wellbeing. We use SPSS’s ‘Compute Variable’ feature to create them. Check that ‘don’t know’ answers are coded as ‘User-missing’ before you make any computations. (There are other options, but they will not be discussed here.) As shown in Figure 17, the time-squeeze scale is computed by adding up the values of four indicator variables and by dividing this sum by the number of indicators. Hence, the scale value is the mean of the indicator variables. The indicator variables are the respondents’ accounts of how often they encounter the following problems.
- Too tired after work to enjoy things you like do at home, how often
- Job prevents you from devoting time to partner/family, how often
- Partner/family fed up with pressure of your job, how often
- Difficult to concentrate on work because of family responsibilities
Their values range between 1 (never) and 5 (always). Hence, the scale values also range between 1 and 5.
The second scale is the mean of the values of the respondents’ reports about how often they have experienced the following during the last two weeks.
- Have felt cheerful and in good spirits the last 2 weeks
- Have felt calm and relaxed the last 2 weeks
- Have felt active and vigorous the last 2 weeks
- Have woken up feeling fresh and rested the last 2 weeks
- Daily life been filled with things that interest me the last 2 weeks
Note that, in this case, the order of the values has been reversed. ‘All of the time’ has value 1, while ‘At no time’ has value 6. Thus, the scale values become higher the less cheerful etc. people feel. Therefore, we have named the scale the ‘General bad mood scale’. Observe that we have created the scales for instructional purposes only. We do not make any claims as to their general applicability.
Regression with summated scales: Example
Tables 14 and 15 present the results of a multiple regression analysis made with British data where the ‘General bad mood scale’ is the dependent variable, while the ‘Job-family time squeeze scale’ is one of the independent variables. We have also taken the liberty of including four single-indicator, ordinal variables. This is common practice but, as pointed out above, you should use summated scales if you have the right kind of indicator variables available. As demonstrated in chapter 6, you can also avoid the whole ordinal variable problem by recoding them into sets of dummy variables. In the case discussed here, however, where we use four ordinal independent variables, you may perhaps regard the total number of variables needed for this solution as too high for comfort. In any case, the four ordinal variables, used with their original coding scheme intact, are:
- Subjective general health
- Feeling about household's present income
- Many things to do at home, often run out of time before I get everything done
- Look after others in household, children/ill/disabled/elderly
We have also included the respondent’s gender recoded as a dummy variable. The original gender variable applies other value codes (women = 2 and men = 1). There is nothing to prevent you from using this original variable instead of the dummy coded version, but it would change the value and interpretation of the constant term. (The estimate or interpretation of the gender variable’s coefficient would not be affected.)
Caution should be shown with respect to drawing causal inferences, but the results shown in table 14 at least seem to confirm that several factors affect people’s mood, including time squeeze problems, health problems and perceived financial problems.
Note also that it is essential that you know how the variables are coded, i.e. that you know what numerical values have been assigned to the different answers given to the survey questions. For instance, you need to know that the values of the variable ‘Feeling about household's present income’ ranges from 1 to 4, and that 1 stands for ‘Living comfortably on present income’, while 4 stands for ‘Very difficult on present income’. Knowing this enables you to see that the positive sign of this variable’s coefficient indicates that the average mood is worse among those who experience financial problems than among others. Make your own interpretations of the other coefficients’ signs. (Find out about how you can inspect a variable’s value codes here.
Table 14. SPSS output: Regression with summated scales coefficients
Table 15. Regression with summated scales regression goodness of fit statistics
- Make a summated scale that measures degree of religiosity by adding together these variables
- How religious are you?
- How often do you attend religious services apart from special occasions?
- How often do you pray apart from at religious services?
The values of the first of these variables increases with the respondents’ subjectively perceived religiosity, whereas the values for the latter two decrease with the frequency of their religious activities. Therefore, either the first one or the last two must be recoded to make the value of all three either increase or decrease with intensity of religiosity. In this case, we get the most sensible ordering of scale values if we change the value ordering of the last two. Recoding can be done in many different ways. In this case, the easiest way is to use the ‘Automatic Recode’ feature in the ‘Transform’ menu. Open the dialogue box. Find the variables on the list. Give names to the two recoded variables, tick the ‘Highest value’ option, and click ‘OK’. (See figure below.)
Compute the scale by taking the average of the two recoded variables and the ‘How religious are you’ variable. Select the British sample and use the scale as the dependent variable in a regression analysis, with gender, two-digit year of birth and years of full-time education as independent variables. You can also compute a squared years of education variable and add this as another term on the independent variables list to see whether education length has a non-linear effect on religiosity.Figure 18. Automatic Recode in SPSS
- Verify that the ‘explained’ proportion of the dependent variable’s variance (i.e. the R2) becomes (at least somewhat) smaller if you replace the dependent summated scale with any of the individual indicators that have been used to construct it. (You can also try the same with the variables in the example in the main text. In addition, you will also find that the coefficients of summated scales used as independent variables tend to be higher than those of such scales’ component indicator variables.) These results support the idea that the summated scale is less distorted by random measurement errors than any of its components.
- Use the syntax below to compute Schwartz value scales. Read about these scales and their applications on the ESS Edunet pages on Human values. Develop your own regression models in which you use these scales as dependent or independent variables.
*This syntax makes SPSS compute 10 Schwartz value scales. *In addition, you could consider multiplying the resulting scales by -1 to make their values increase rather than decrease with the respondents’ adherence to the values they measure.