# Chapter 1: Why Weighting?

Most European countries have participated at least once in the ESS. Some have participated only once, some twice and some have participated continuously from Round I onwards. The basis for the sampling differs in each country. It can even differ over time within the same country. Despite these differences between the sample designs, data analysts want to be able, for example, to compare the proportion of self-employed workers in Norway with the proportion of self-employed workers in Portugal. To find substantial differences in self-employment between Norway and Portugal, it is necessary that this comparison is not influenced by the way in which the data have been collected in the two countries. If, for example, in Portugal rural areas were deliberately oversampled (i.e. the proportion of people from rural areas is higher in the sample than in the population), simply ignoring this fact would result in a wrong interpretation of the results based on the sample data. In probability sample surveys like the ESS, the essential characteristics of the sample design are captured by so-called inclusion probabilities and design weights, which this Teaching Module will focus on.

To illustrate, for example, that the differences between weighted an unweighted proportions can be great, Table 1.1 displays the absolute and relative frequencies of people living in Lisbon based on unweighted and weighted data. These frequencies can be computed in Nesstar by first selecting ‘ESS4-2008, ed.3.0’ as the data set for analysis, then sub-setting the ESS4 data to Portugal (use button ) and then performing a univariate tabulation of the item `Region', with the design weight switched on and off, respectively (use button ). For the sake of simplicity, we construct two categories, one for respondents living in the capital of Portugal and its surroundings (the Lisbon area) and one to which we collapse all other categories. Based on unweighted data, we would conclude that 40.9% of the people live in Lisbon. Taking weights into account, the proportion drops to 36.8%, which is still higher than the 2001 Census count of 32.3% [Wik10] but considerably closer to it.

Design weight off | Design weight on | |||
---|---|---|---|---|

Frequency | % | Frequency | % | |

Total | 2367 | 100 | 2367 | 100 |

Lisbon area | 986 | 40.9 | 870.9 | 36.8 |

Other | 1399 | 59.1 | 1496.1 | 63.2 |

Source: ESS 4, Portugal.

Another example is the proportion of unemployed people. Table 1.2 provides an overview of the relative frequencies of unemployed people in Portugal who are actively looking for a job, based on unweighted and weighted ESS data and on data from Statistics Portugal. We can see that, based on ESS data and based on administrative data, the difference between the proportion of unemployed people actively looking for a job is larger when ESS data are not weighted.

ESS 4 | Statistics Portugal | ||
---|---|---|---|

Design weight off | Design weight on | ||

Unemployed, looking for a job | 4.82 % | 5.14 % | 5.45 % |

Other | 95.18 % | 94.86 % | 94.55 % |

Source: Statistics Portugal and ESS 4, Portugal.

A one-sample Z-test performed on the weighted and unweighted proportions and the administrative data gives a Z-value of 0.68 for weighted and 1.43 for unweighted data. These values mean that we are much closer to rejecting the null hypothesis that the data is a random sample from the inference population if we consider unweighted than if we consider weighted data.

The above examples illustrate that weighting data is essential to make valid inferences based on sample data. If a sample design deviates from simple random sampling (or, more generally, from equal probability sampling), it is necessary to take design weights into consideration to account for the effects of the sample design.

### Exercise 1

Using ESS 4 data in Nesstar, calculate the absolute and relative frequencies of the item `Highest level of education' in Portugal, for both weighted and unweighted data, and describe the results.

Design weight off | Design weight on | |||
---|---|---|---|---|

Highest level of education | Frequency | % | Frequency | % |

Total | 2367 | 100 | 2367 | 100 |

Not completed primary education | 251 | 10.6 | 211.5 | 8.9 |

Primary or first stage of basic | 1076 | 45.5 | 1098.8 | 46.4 |

Lower secondary or second stage of basic | 405 | 17.1 | 452.5 | 19.1 |

Upper secondary | 346 | 14.6 | 352 | 14.9 |

Post-secondary, non-tertiary | 15 | 0.6 | 13.6 | 0.6 |

First stage of tertiary | 266 | 11.2 | 234.3 | 9.9 |

Second stage of tertiary | 8 | 0.3 | 4.6 | 0.2 |

Source: ESS 4, Portugal.

Open the table with the unweighted results in Nesstar.

Open the table with the weighted results in Nesstar.

From Table 1.3, we can see that the numbers and frequencies are not identical. The design weight reduces the number of respondents in the first and in the three last education categories, and increases the number of respondents in the three categories in the middle.

#### References

- [Wik10] Wikipedia (2010).
*Portugal*. Wikipedia. http://en.wikipedia.org/wiki/Portugal.