# Chapter 4: Use of Weights to combine Data across Countries and Rounds

Most users of ESS data will want results based on samples that represent the country or group of countries they are interested in as closely as possible. This means that the most accurate estimates will only be obtained after first weighting the data.

The two weights present in each round of the ESS are the Design weight (DWEIGHT) and the Population size weight (PWEIGHT):

• DWEIGHT
Several of the sample designs used by countries participating in the ESS were not able to give all individuals in the population aged 15+ precisely the same chance of selection. Thus the unweighted samples in some countries over or under-represent people in certain types of address or household, such as those in larger households. The design weight corrects for these slightly different probabilities of selection.
• PWEIGHT
This weight corrects for the fact that most countries taking part in the ESS have very similar sample sizes, no matter how large their population. Without weighting, any figures combining two or more countries’ data would be incorrect, over-representing smaller countries at the expense of larger ones. The Population size weight makes an adjustment to ensure that each country is represented in proportion to its population size.

These two weights should almost always be used when analysing ESS data. Furthermore, when combining data from several rounds, further measures need to be taken. This chapter will describe in detail when and how the weights should be used.

Page 1

# Single-round use of weights

Whether and which weights are to be used when analysing means, proportions and cross-tabulations of a single round is discussed in [Ess06b]. Table 4.1 summarises the main recommendations made therein.

Table 4.1. Examples of different types of analyses1
Example: Voter turnout2 Design weight Population weight
a) To examine data from a single country - whether a single variable or a cross-tabulation Voter turnout in Germany X
Voter turnout in Germany by age and gender X
b) To compare results for two or more countries separately - without using totals or averages Compare voter turnout in France, Germany and the UK. X
c) To combine countries - whether on a single variable or via a cross-tabulation i) Voter turnout in Scandinavia. X X
ii) Voter turnout in the EU. X X
iii) Voter turnout across all countries participating in the ESS. X X
iv) Compare voter turnout between EU member states and accession countries. X X
v) Voter turnout by age groups across all ESS participating countries. X X

1 = Table based on [Ess06b]. The 'X'es in the table indicate that the weight should be used.
2 = % of respondents voting in the last election.

Population weights can be viewed as stratification weights, taking countries as strata. Formal definitions of these weights and their distribution in round II are provided in [Ess06a].

Page 2

# Weights for multi-round analyses

When data from multiple rounds (and multiple countries) are combined in an analysis, varying design and population weights have to be accounted for. Overall design and perhaps also population weights have to be calculated to allow for valid inference from the combined samples to the population.

The structure of the ESS cumulative data set is illustrated in table 4.2.

Table 4.2. ESS cumulative data set structure
ESS Round Country Design weight
Round I C1 DW 1
DW 2
DW 3
C2 DW 1
DW 2
DW 3
C3 DW 1
DW 2
DW 3
C4 DW 1
DW 2
DW 3
Round II C1 DW 1
DW 2
DW 3
C2 DW 1
DW 2
DW 3
C3 DW 1
DW 2
DW 3
C4 DW 1
DW 2
DW 3
Round III C1 DW 1
DW 2
DW 3
C2 DW 1
DW 2
DW 3
C3 DW 1
DW 2
DW 3
C4 DW 1
DW 2
DW 3

In line with the above example, in the ESS cumulative data set, multiple respondents (not shown) are nested within multiple weighting classes1 (DW 1 to DW 3) within countries (C1 to C4) within ESS rounds (Round I and II). To simplify presentation here, the number of weighting classes is assumed not to vary between countries and rounds. This is not the case in reality, however.

The problem, then, is how to combine estimators from different rounds and perhaps also different countries. In the following, these two cases will be discussed, namely combining estimators from a) different rounds and the same country, b) from different countries and different rounds.

First of all, some definitions will be useful. Let the design weight of the wth weighting class in the cth country in round r be denoted by dwcr and let the population weight be referred to as prc.

Let a be the vector of length z with elements r. Of course, a ⊂ l, which is the vector of length m with elements i = {1, ... , m} indicating ESS rounds where data is available. If we consider rounds one and two, a would simply be (1, 2), which, as we can see, is of length z = 2. Since, no more than the first two rounds that were chosen are available so far, l = a.

Similarly, let s be the number of countries for which a combined estimator is to be calculated. The vector kr of length qr with elements j = {1 , ... , qr} indicates the participating countries in the rth round.2 The vector k of length s then contains the countries under consideration. To make this easier to illustrate, let us assume that every element (country) of k must be an element of every kr, which means that all countries, kc, under consideration must have participated in all rounds under consideration. To illustrate, assume the countries under consideration to be Germany (DE), the United Kingdom (GB), and Portugal (PT). Then k = (DE, GB, PT).

Let an unbiased estimator, θ, that takes into account either population weights, design weights or both be denoted by θ(p), θ(d), and θ(p,d), respectively. Furthermore, let their unbiased combined equivalents be denoted by (p), (d), and (p,d)3.

• [1] Most commonly, but not always, the weighting classes correspond to the household size classes.
• [2] Note that, instead of numbers, labels that correspond to international abbreviations are used to identify countries.
• [3] Note that combined is irrespective of the combination involved. Each of the combinations ‘multi-round single-country’, ‘single-round multi-country’ and ‘multi-round multi-country’ is possible.
Page 3

# Combining estimators from multiple rounds and the same country

It is often of interest to combine data from two ESS waves conducted in the same country in order to estimate a mean or proportion in that country. This may be the case if, for example, many respondents refuse to answer a certain item. Combining two or more waves can thus help to increase the sample size. The implicit assumption, however, is that the samples are taken from the same population. This should always be kept in mind.

Our combined multi-round single-country estimator is a weighted average of combinations of the respective single-round unbiased estimators. Weights are chosen in relation to the effective sample sizes, namely

wrc = nrc/deffrc

where nrc is the net sample size and deffrc the design effect (see Chapter 5 for more details about design effects) of the variable under study1 in round r and country c, respectively.

The combined multi-round single-country estimator can then be expressed as

The construction of the combined estimator as a weighted average of the single-round estimators takes two basic principles into account: firstly, an estimator from a wave with many respondents is trusted more than one from a wave with fewer respondents. Secondly, for a given sample size, more trust is placed in the estimator that has a lower design effect.

In the following example, the combined estimator of the STFLIFE variable is constructed for Germany, the United Kingdom and Portugal, respectively. In this context, the estimators for the first and second round of ESS are considered.

### Example 3

Consider a case involving combining the estimators of the variable ‘satisfaction with life’ (STFLIFE) from the first and second rounds of the ESS in Germany, the United Kingdom and Portugal.

First of all, the weighted estimators have to be calculated. These are shown in the following table.

Table 4.3. Satisfaction with life (STFLIFE), weighted mean
Country Round I Round II
DE6.96 7.17
GB7.07 7.40
PT5.91 6.15

When combining the estimators for Germany, the weighted average of the round I and round II estimators has to be calculated. We thus need the design effect and the net sample size for the STFLIFE variable in Germany in both rounds. Together with those of the United Kingdom and Portugal, they are shown in the following table:

Table 4.4. Design effect and net sample size for Satisfaction with life (STFLIFE)
Design effect Net sample size
Country Round I Round II Round I Round II
DE6.104.9529162870
GB2.322.3720451897
PT3.723.3914982052

The combined estimator for overall satisfaction with life in Germany is

The combined estimators for rounds one and two in the United Kingdom and Portugal, calculated in the same way, are 7.22 and 6.06, respectively.

We can see that, in Germany, slightly more weight is given to the estimator from the second round - mainly due to the smaller design effect. In the UK, on the other hand, the estimator in the first wave is trusted slightly more, mainly because of the larger sample size (the design effects of both waves are almost equal). In Portugal, however, much more weight is given to the estimator in the second round, as indicated by both the net sample size and the design effect.

• [1] Where design effects are not available for a certain study variable, an overall design effect may serve this purpose.
Page 4

# Combining estimators from multiple rounds and multiple countries

As a rule, when combining estimators from multiple countries - in the single-round as well as in the multi-round case - population weights have to be used (see Table 4.1).

Table 4.1. Examples of different types of analyses1
Example: Voter turnout2 Design weight Population weight
a) To examine data from a single country - whether a single variable or a cross-tabulation Voter turnout in Germany X
Voter turnout in Germany by age and gender X
b) To compare results for two or more countries separately - without using totals or averages Compare voter turnout in France, Germany and the UK. X
c) To combine countries - whether on a single variable or via a cross-tabulation i) Voter turnout in Scandinavia. X X
ii) Voter turnout in the EU. X X
iii) Voter turnout across all countries participating in the ESS. X X
iv) Compare voter turnout between EU member states and accession countries. X X
v) Voter turnout by age groups across all ESS participating countries. X X

1 = Table based on [Ess06b]. The 'X'es in the table indicate that the weight should be used.
2 = % of respondents voting in the last election.

However, combining estimators from multiple rounds and multiple countries raises the question of how to account for possibly varying population weights1. Assuming, again, a constant population from which the samples in various rounds are taken, a combined population weight is simply the average of the single-round weights. This can be formalised in the following way. The multi-round population weight of a specific country is defined as

It is convenient, however, to rescale the single-round population weights to the sum of the weights. The rescaled population weight for country c in round r is thus

and the corresponding average population weight of rounds a in country c is defined as

To further combine multi-round estimators for multiple countries, average population weights have to be calculated for all countries under consideration. These must be used as explained in [Ess06b].

The combined multi-round multi-country estimator can be expressed as

### Example

Now assume we were interested in the question of how satisfied people are on average when living in countries with a relatively left-wing government. To answer this question, we combine the overall satisfaction scores (STFLIFE}) of Germany, the United Kingdom, and Portugal, which can be seen as fairly typical cases of European left-wing governments. However, such a grouping of countries requires us to take design into account as well as population weights.

We have already calculated the single-country multi-round estimates in the previous Chapter. What we need for the construction of the combined estimator are the single-country multi-round population weights of the three countries.

First of all, the population weights of each country in each round have to be rescaled. The original and the rescaled weights are shown in the following table.

Table 4.5. Original and rescaled population weight
Original Rescaled
Country Round I Round II Round I Round II
DE2.392.450.45400.4498
GB2.332.570.44160.4713
PT0.550.430.10440.0789

The average of the rescaled population weights for Germany, the United Kingdom, and Portugal from rounds one and two are 0.4519, 0.4564, and 0.0917, respectively.

The weighted sum of the multi-round single-country overall satisfaction score (STFLIFE) is then

1,2;DE,GB,PT = (7.08*0.4519) + (7.22*0.4564) + (6.06*0.0917)= 7.05

• [1] Population weights may vary between rounds because countries drop out or because another one is added.
Page 5
• [Ess06a] ESS (2006a). Design weights. European Social Survey, 2.0 edition. http://ess.nsd.uib.no/streamer/?module=main&year=2005&country=null&download=%5CSurvey+documentation%5C2005%5C07%23ESS2+-+Design+weights%2C+ed.+2.0%5CLanguages%5CEnglish%5CESS2Design_Weights_2.pdf
• [Ess06b] ESS (2006b). Weighting European Social Survey Data. European Social Survey. http://ess.nsd.uib.no/streamer/?module=main&year=2007&country=null&download=%5CSurvey+documentation%5C2007%5C07%23ESS3+-+Weighting+ESS+Data%5CLanguages%5CEnglish%5CWeightingESS.pdf