# Weighting the ESS data

Weighting is a very important concept in the analysis of sample data. Weighting allows you to assign different weights to the different cases in the analysis file. Weighting is usually used to correct skewness in a sample that is meant to represent a particular population.

If, for example, you have measured the height of 50 men and 10 women in a country with an equal gender distribution, the mean height of this sample is not the same as the one you would have found if you had measured all the people in the whole population. Because men generally are taller than women, the sample must be weighted so that it can be used for inferential purposes. To make the sample more representative of a "true" population, it is necessary to reduce the effect of the male majority in the sample. There are 60 respondents in this example, 83.3 percent are men and 16.7 percent are women. According to population data, these percentages should have been 50-50. You can create a weight variable that makes the female respondents count more and the male respondent to count less by dividing the population shares by the sample shares:
Male weight: 50/83.3 = 0.6
Female weight: 50/16.7 = 3

There are two weights in the ESS data that may and often should be switched on. The first is the design weight (dweight). Several of the sample designs chosen by countries participating in the ESS were not able to give all individuals in the population aged 15+ precisely the same chance of selection. Thus, for instance, the unweighted samples in some countries over- or under-represent people at certain types of addresses or in certain types of households, such as those in larger households. The design weight corrects for these slightly different probabilities of selection, thereby making the sample more representative of a "true" sample of individuals aged 15+ in each country.

The second weight is the Population Size weight (pweight). This is used when examining data for two or more countries combined. This weight corrects for the fact that most countries taking part in the ESS had very similar sample sizes, no matter how large or small their population. The mathematics of probability proves that a sample of, for example, 1000 respondents is equally useful in examining the opinions in a country with 10 million inhabitants as it would be in a country with a population of only 1 million.

Without weighting, any figures that combine data for two or more countries would over-represent smaller countries at the expense of larger ones. So the population size weight makes an adjustment to ensure that each country is represented in proportion to its population size.

When computing any tables or percentages, you should always use weighted data. However, different types of tables require different combinations of weights. The general rules of thumb are:

• You should always use the design weight.
• When you are comparing data of two or more countries and with reference to the average (or combined total) of those countries, and when you are combining countries into a group, such as "EU member states", both design and population size weights should be applied.

The following example illustrates some of the effect of weighting. If we want to find the general political interest in the population covered by all the countries participating in the ESS survey, we could create a frequency table using the variable "How interested in politics". Table 7 gives the weighted and unweighted valid frequencies for this variable. The unweighted frequencies tell us the level of political interest in this sample only. If we want to say something about the population in these countries, we must use the weighted frequencies. The discrepancy may not seem very large, but it will have an important impact in many analyses. You should note that the N is less in the weighted column, and not directly interpretable. This is because respondents from smaller countries count less than one, and in this way make the samples more proportional to the population size.

### Table 7: 'How interested in politics': Unweighted and weighted frequencies

Unweighted frequencies
CodeFrequency% of all% of valid
Total 42,359 100.0 100.0
Very interested 1 4,719 11.1 11.2
Quite interested 2 15,879 37.5 37.6
Hardly interested 3 14,278 33.7 33.8
Not at all interested 4 7,335 17.3 17.4
Refusal 7 38 0.1 -
Don't know 8 89 0.2 -
No answer 9 21 0.0 -
Weighted frequencies
CodeFrequency% of all% of valid
Total 37,497.4 100.0 100.0
Very interested 1 4,141.7 11.0 11.1
Quite interested 2 12,979.9 34.6 34.7
Hardly interested 3 13,121.9 35.0 35.1
Not at all interested 4 7,137.8 19.0 19.1
Refusal 7 17.0 0.0
Don't know 8 92.5 0.2