One of the biggest problems with trying to bring together or compare different types of information is that they are measured in different units and on different scales. This is just as true when looking at survey data as any other information. For example, consider the following two questions:
- How much of the time spent with your immediate family is enjoyable?
- To what extent do you feel that people treat you with respect?
In both cases, the response scales go from 0 to 6. However, for the first question ‘0’ means ‘none of the time’ and 6 means ‘all of the time’, whilst for the second question ‘0’ means ‘not at all’ and 6 means ‘a great deal’. The two scales are not comparable. A ‘4’ for question 1 is not necessarily the same as a ‘4’ in question 2. If the mean for a country is slightly higher, for example, on question 1, this does not necessarily indicate that country has higher levels of family well-being than general levels of respect. Similarly, if we wanted to bring these questions together, alongside others that measure some aspect of social well-being, there is no way of knowing whether a ‘4’ for question 1 and a ‘3’ for question 2 is better or worse than a ‘3’ for question 1 and a ‘4’ for question 2.
These problems exist for all types of data. For some well-known indicators, it is a bit easier to gauge their levels based simply on the numbers. For example, returning to the HDI, if a country had an average life expectancy of 45 years, and a literacy rate of 95%, most people would agree that health was its primary concern. In the real world, however, comparison presents problems even for these indicators. Consider Gambia – where the life expectancy is 54 years, and the literacy rate is 38%. Which indicator represents the more pressing concern? Perhaps an expert on development would be able to immediately recognise that it is Gambia’s literacy rate that is particularly low, but the rest of us would find this hard to spot immediately.
This is why standardisation is useful. It gives us some way of comparing apples with oranges. Scores for each question are transformed such that they are expressed in the same terms: the distance from the mean for that question. Questions where higher figures indicated lower well-being are reversed such that higher numbers now indicate higher well-being. Standardisation follows a well-known formula. For a given individual:
The unit for standardised scores (also called z-scores) is a standard deviation. So, for example, a z-score of 2.0 on a certain question would indicate that an individual’s response was 2 standard deviations above the mean response for that question. A z-score of -0.5 would indicate that their response was half a standard deviation below the mean for that question. A z-score of 0.0 would indicate the individual’s response is the mean for that question. This allows direct comparison to be made between responses to different questions. If an individual’s z-score for question 1 is higher than their z-score for question 2, then we can be sure that their relative ‘family enjoyment’ is higher than their relative ‘feelings of respect’.
It is vital to note the use here of the word relative. Standardising the scores provides no way of knowing absolute levels. If everyone says that they find no time with their family enjoyable (i.e. a ‘0’ on the 0–6 scale), someone who circles ‘1’ (still very low of course) will come out with a positive z-score. By the same token, standardising implies that we cannot compare scores for different questions for the dataset as a whole. The means for Europe for all questions (using z-scores) will be 0 – we are not able to say that Europe as a whole is doing well on one aspect of well-being or another. We can only make comparisons within Europe, between countries, individuals, or demographic groups and, if data are collected for future years, over time. If identical data are collected for other countries in the world, we would be able to draw conclusions about Europe as a whole, but again, these would be relative to the rest of the world. However, without absolute targets of what high well-being looks like (in terms of survey data), and without absolute reference points to allow comparison between different aspects of well-being, nothing else is possible. This problem is not unique to well-being data. Without a reference point, there is no way of knowing that Angola’s GDP of £3440 per capita in 2007 is likely to be associated with poor living standards. Indeed, in 1950, such a level of GDP per capita would have actually been quite high, around that seen in Italy at the time. We cannot conclude from this that living conditions in Angola now are similar to those in Italy in 1950. The only way that we can understand £3440 per capita is in comparison to other figures.
A few further technical details. Before calculating the z-scores included in the data here, we excluded any respondents who had missing data on any of the questions included in the calculations (so called listwise deletion).1 We also excluded Russia because, although there are no more respondents for Russia than for any other country, its large population means that it is weighted very highly (a quarter of the total weighted count for Europe). The results of a single Russian respondent are weighted around ten times more than a single Belgian one, meaning that patterns emerging amongst the 2000 or so Russian respondents will dominant our conclusions.
In exercise 4 you will have a chance to try out calculating standardised scores. Whilst there is a way to get SPSS to do this automatically, we encourage you to use the formula shown above.
-  The exception to this rule was the question on having lots of energy, which was not asked in Hungary. So as not to exclude the entire country from the analysis, we simply ascribed the mean z-score (i.e. ‘0’) to all respondents in Hungary for this question.