# All pages

# Chapter 2: From the Population to the Sample: Sample Designs and Inclusion Probabilities

This Teaching Module is as practically oriented as possible. However, as a data analyst you will need some basic knowledge about the processes that generated the data that you are using. Among other things, this enables you to better understand and decide when to use a certain weight and not to simply `do as they say'. We will therefore now introduce some notation and definitions that we will use in the rest of this Teaching Module. This will help us to save time and words when describing the concepts and methods we are dealing with.

# Population

When we talk about sample designs, it is important to keep in mind that any sample is drawn from a (possibly much) larger set of elements called **population** or **universe**. The upper case letter U denotes a universe of size N. In the sampling literature, upper case letters usually denote quantities of the universe and lower case letters quantities of the sample. A universe U has exactly N elements (persons, businesses, countries etc.) - not more or less or approximately N, but exactly N. A universe U of size N = 10 people thus contains exactly 10 people.

To be able to identify the elements of U, we enumerate them U_{1}, U_{2},..., U_{i},..., U_{N}, i = 1,..., N. For convenience of notation, we generally write U_{i} and only specify which i we mean. You will soon become familiar with this notation.

Every person in our little universe of size N=10 thus has a unique number i=1,..., 10. The first person is referred to as U_{1}, the second one as U_{2}, the third as U_{3} and so on until the tenth person, whom we refer to as U_{10}. Please note that this enumeration does not imply any order. We could just as well refer to U_{3} as U_{1} or U_{7} as U_{5}. However, once the enumeration is fixed, we have to stick to it, of course.

A **study variable** denoted Y is associated with each element of U. Obviously, Y has the same length as U. In our example, Y has the length 10. The value of the i-th element in the study variable is denoted by Y_{i}. The Y-value of the first person in our universe is denoted by Y_{1}, the Y-value of the fourth person by Y_{4}, and so on. If the study variable in our universe were **age**, Y could look like this

Y = (Y_{1}, Y_{2},..., Y_{10})’ = ( 37, 37, 63, 31, 39, 45, 59, 22, 53, 18)’.

This means that the first and second person are both 37 years old, the third person is 63 years and the tenth person is 18 years old. Obviously, we usually we do not know the values in a study for all elements of the population. That is precisely why we select a sample. We pretend to know them just for illustrative purposes.

You may have asked yourself what the apostrophe after the closing bracket could mean. Usually, study variables are defined as so-called **column vectors**. A column vector is a mathematical object comparable to a column in a table in a data set. Consequently, we would have to write Y as

which is a bit cumbersome. Thus, to save space, we write Y as in the first equation (as a row vector) and use the apostrophe to indicate that Y is actually a column vector and must be treated as such.

In the ESS, the size of the population from which we draw samples differs from country to country. The basic definition of the elements that are part of the population is the same in all countries, however. The ESS includes `all persons aged 15 and over (no upper age limit) resident within private households in each country, regardless of their nationality, citizenship or language' [Ess09]. Thus, any sample that is drawn from a population that meets these definitions will, naturally, only contain elements with the aforementioned characteristics. For example, adolescents younger than 15 do not fall within the above definition and are thus excluded from the population, and hence from the sample. That is why we refer to the population we target for our sample as the **target population**. Should the sampling process, for whatever reason, systematically exclude elements of the target population, they will not have a chance of being included in the sample. We are therefore unable to make any judgments about these elements. The population we can reasonably infer to is thereby reduced. We call this reduced population the **inference population**.

In order to actually draw a sample from a population, we need an accessible list of all the elements in the population, a so-called **sampling frame**. Today, sampling frames are usually digital lists that can be processed by computers. Ideally, the sampling frame contains exactly the elements of the target population. However, due to practical problems, the sampling frame can contain elements that are not part of the target population (e.g. people younger than 15 years). If this is the case, we speak of **over coverage**. If the sampling frame does not contain all elements of the target population (e.g. only people older than 18 years), we speak of **under coverage**. Both, over and under-coverage can occur at the same time. Figure 2.1 illustrates the interconnection between over and under-coverage and inference and target population.

Figure 2.1. Interdependence of coverage and inference and target population

We aim for a high overlap of the target and inference populations. In real-world sampling practice, the magnitude of the overlap depends on the quality of the sampling frame. In the ESS, sampling frames are often electoral lists. However, these lists only cover people in a country who are eligible to vote: in most European countries citizens aged 18 and older. If only electoral lists were used as a sampling frame, we would have strong under-coverage of people between 15 and 18 year of age. Therefore, we must either resort to another sampling frame or enrich the electoral lists by using a sampling frame that includes the missing elements.

# Sample

Now that we have a basic understanding of what a population is, let us turn to the concept of a **sample**. At the general level, a sample is a subset of size n from a population of size N selected according to predefined rules. In the simplest case, the n elements of a sample are selected randomly, every element having the same chance of being selected to the sample. Obviously, there are many possible ways in which we can select n elements from the population into the sample, depending on both the size of the population N and the sample size n. If, for example, we wanted to sample n=4 persons from our universe of N=10 persons and we did not replace each element after it had been surveyed, the result is 210 possible samples. The first of these samples includes elements 1, 2, 3 and 4, the second possible sample elements 1, 2, 3 and 5, and so on until the 210th possible sample, which includes elements 7, 8, 9, 10. The set of all possible samples of size n from N is denoted by S. In our example, S contains all 210 possible samples. A specific sample is denoted by the lower case letter s. The sample containing persons 1, 4, 7 and 8 is thus denoted by s={1, 4, 7, 8}.

The study variable Y is then surveyed for each of the n elements of the sample s. To distinguish between the (unknown) values of the study variable in the population and the (known) values in the sample, we denote the latter using the lower case letter y. The enumeration of the y-values in the sample follows the same logic as before. We denote the y-value of the ith element y_{i}, with the difference that i now ranges from 1 to n. Apart from that, the notation is very similar to the one we already know:

y = (y_{1}, ... , y_{i}, ... , y_{n})’

If we had conducted a sample containing elements 1, 4, 7 and 8 of the population the result would be

y = (y_{1}, y_{4}, y_{7}, y_{8})’ = (37, 31, 59, 22)’

### Exercise 2

Suppose we draw a sample of size n = 4 from our population and get elements U_{1}, U_{3}, U_{8} and U_{10} in the sample. Write y as a column and as a row vector.

Figure 2.2. y written as a column vector:

y written as a row vector:
y = (y_{1}, y_{3}, y_{8}, y_{10})’ = (37, 63, 22, 18)’

The elements of s are called **ultimate sampling units** if y can be surveyed directly for them. A **sample survey** is a combination of the realisation of a sample design and the measurement of the values of at least one (but probably many more) study variable(s) in elements of the sample.

Finally, we need to define the term **sample design**. A sample design is a mechanism that assigns a specific sample s a non-zero probability of realisation, P(s). The function P(^{.}) is also called **sample selection scheme** or sampling scheme. Sample designs that explicitly define such a function are called **probability sample designs**. On the other hand, sample designs that do not explicitly define P(^{.}) are called **non-probability sample designs**. They are not dealt with here. In the next section, we will see that srswor and srswr assign equal P(s) to all possible samples. Apart from srswor and srswr, there are many sample designs that assign different probabilities of realisation to different samples. Some of these sample designs will also be introduced in the following sections.

# Simple Random Sampling

A very simple and intuitive sample design that we have already used in the previous examples is defined by the following rule: ‘Draw n of N elements randomly and do not return each element to the population after it has been drawn’. This sample design is called **simple random sampling without replacement**, abbreviated srswor or simply srs. Using srswor, the probability of realising every possible sample of size n from a population of size N is

(1) |

The exclamation mark means `factorial'. It means we must calculate the product of the series of the elements 1, 2, 3, ..., n. Hence

and so on.

In practical applications, the number of possible samples of size n is enormous. Even with such moderate population and sample sizes as used in our previous examples, we have seen that, with N=10 and n=10, the number of possible samples is already 210. If the population contains N=500 elements and the sample size remains n=10, there are 2.4^{6} * 10^{20} possible samples. That is a number beginning with 246 followed by 18 zeros. As impressive as these figures might be, however, the important thing for us to know is that all samples have the same chance of being realised, the specific value of P(s) is not important yet.

A less practical but theoretically appealing variant of srswor is to return each element after it has been selected into the sample, thus giving it a chance of being re-selected in the next draw, i.e. to use **replacement**. This sample design is called **simple random sampling with replacement srswr**. Also when srswr is used, each sample of size n has the same probability of realisation, which is expressed as

(2) |

The number of possible samples using srswr is even larger than when using srswor. In our example with N=500 and n=10, the number of possible samples is approximately 9.77*10^{26}. Again, the magnitude of P(s) does not matter, only the fact that the functional form of the sample design in (2) does not `favour' a specific sample but treats all possible samples in the same way by assigning each of them the probability of 1/N^{n} of being realised.

### Example 1

From our universe, we want to draw n=4 people using both srswor and srswr. We know that in the first case there are 210 possible samples, but if we allow each element to be returned after it has been selected, there are N^{n} = 10^{4} = 10,000 possible samples. Analogously, if the population is N=500, the number of possible samples is 9.77*10^{26}, a very, very large number. Again, what is important is not the magnitude of P(s), but the fact that all samples are also equally likely under srswr.

If we were only interested in the sample data, we could calculate the mean age of the persons in the sample for all possible srswor and srswr samples as . Of course, the resulting value will depend on the composition of the sample. If, for example, only the four oldest persons were selected (i.e. elements 3, 6, 7 and 9), the sample mean would be very high (55 years). If the sample consists of the four youngest people (i.e. elements 1, 4, 8 and 10), the sample mean would be very low (27 years). Note that these figures are very much higher and lower, respectively, than the mean age in the population of 40.4 years. The important thing is that the sample that includes the oldest persons, the sample that includes the youngest persons, as well as the remaining 208 possible samples, are equally likely. Hence, all possible mean ages calculated on the basis of these samples are equally likely. Please note that the number of possible values of the sample mean need not be the same as the number of possible samples. In our example with n=4 and sampling using srswor, there are 80 distinct possible values of y, some y occurring only once, some twice and others even eight times. Similarly, for srswr, there are not 10,000 (the number of possible srswr samples of size n=4 from N=10) but only 140 possible sample means.

The following figures show the distributions of the 80 and 140 possible sample means based on 210 srswor and 10,000 srswr possible samples of size n=4 from N=10.

Example 1. Sample mean (srswor)

Example 1. Sample mean (srswr)

### Exercise 3

In Round III of the ESS, Norway applied a simple random sample design without replacement. Suppose that the size of the Norwegian ESS target population is N=3,733,370 and the gross sample size is n=2,750. Calculate P(s) both, assuming srswor and srswr.

The numbers are very large. The value of P(s) is not important; the important thing to note is that all s are treated equally.

Probability of realisation using srswor:

(1) |

P(s) = 2750!*(3733370-2750)!/3733370! =

Probability of realisation using srswr:

(2) |

P(s) = 1/ 3733370^{2750} =

# Inclusion Probabilities and Design Weights

Probability samples not only assign known probabilities of selection to every possible sample, but also to each element of the universe, so called **inclusion probabilities**. Each element in the population is assigned such an inclusion probability, which, according to Fuller1, is defined as

π_{i} = `*the sum of the sample probabilities for all samples that contain element i*'.

The inverse of π_{i} is called **design weight** and is defined as

w_{i} = 1/π_{i}

We have already used the design weight in the examples in the preceding section. We can now see very easily that, if all elements are drawn into the sample with equal probabilities, π_{i} = c and, instead of w_{i} = 1/c, we are free to define w_{i}= 1. This means that, if a sample design assigns equal inclusion probabilities to all elements in the population, we can simply ignore the design weights, since constant weights are equivalent to multiplication by 1. This makes it clear why it is important to know the details of the sample design when it comes to data analysis.

We can now also deduce directly from the above equation that elements that are assigned a low inclusion probability receive a high design weight and vice versa. This means that, if elements are sampled with unequal inclusion probabilities, an element that was very unlikely to be included in the sample (i.e. has a small value on π_{i}) is weighted up, which makes it `more important' than an element that had a very high chance of being selected (i.e. has a large value on π_{i}), which was weighted down and hence made `less important'.

Inclusion probabilities, and hence design weights, depend on the sample design, the sample size and the size of the population. Let us again assume that we draw a sample of size n=4 from our population of size N=10 using srswor. If unit i is chosen in the sample, the remaining n-1 = 3 elements must be chosen from the remaining N-1=9 elements in the population. There are
possible samples of size 3 from a population of N=9. They all contain element *i* since it has been selected into the sample on the first draw. Hence, the probability of selecting a sample that includes unit *i* is

and equal for all elements of the population.

In our example π_{i}^{(srswor)} = 4/10 = 0.4.

Using srswr, the inclusion probability of each element *i* is also constant and, for obvious reasons, can be expressed as:

π_{i}^{(srswr)} = n/N

Sample designs that produce constant inclusion probabilities are called **equal probability of selection method** (epsem}, **self-weighting**2, or

Apart from equal probability sample designs, there are **unequal probability sample designs** that assign non-constant inclusion probabilities to the elements. Many (but not all) designs of the class of so-called **complex sample designs** (for examples of variants of cluster sampling or multi-stage sampling, see the Section ‘Cluster Sampling and Multi-Stage Sampling'), produce non-constant inclusion probabilities.

# Stratified Sampling

In stratified sampling (str), the population of interest can be divided into H non-overlapping sub-populations or **strata** of size N_{h} (h = 1, ... , H) according to a stratification variable Z. The stratification variable is either discrete or has to be recoded into a discrete variable with as many unique values as the desired number of strata. The values of Z are denoted by Z_{h}. The total sample size n is then allocated to the strata, so that
. Samples of size n_{h} are drawn within each of the H strata. To summarise, there are four choices to make when planning a stratified sample:

- What variable to use for stratification?
- How are the stratum boundaries defined?
- By what method should the total sample size be allocated to the strata?
- What sample design is used to draw the samples within the strata?

The answer to the first question depends to large extent on the availability of population information. Very often there is no such additional information available, and the overall population figure N and the stratum population figures N_{h} are therefore used for Z and Z_{h} and, hence, n_{h} = n * (N_{h}/N).

Answering the second question is not always easy. Researchers often plan to stratify the sample geographically. In our example, it would make sense to stratify the sample according to the 19 regions of Norway. Hence, the stratum boundaries are clearly defined by the geographical location of the people living in Norway. But if a non-discrete variable, for example age, is to be used, the researcher would have to make a decision on how many discrete values the recoded variable should have and where to draw the stratum boundaries1.

Concerning the third question, there are basically two methods for allocating n to the H strata and thus determining n_{h}. First, n can be allocated to the strata proportionally (strp) to Z_{h} so that

n_{h} = n * Z_{h}/Z.

If we know the stratum population figures N_{h} for h = 1, ... , H, we would use them for stratification and set Z_{h} = N_{h}. Now we are able to allocate n to the H strata proportionally to their size:

n_{h} = n * N_{h}/N.

One advantage of proportional allocation is that the inclusion probabilities are constant. Generally, the inclusion probability of the jth element in stratum h can be expressed as

π_{hj}^{(str)} = n_{h}/N_{h}.

Since, in proportional allocation, the stratum sample sizes n_{h} are by definition proportional to N_{h}, the above ratio is constant and hence

There can, however, be reasons to decide not to use proportional allocation of the total sample size, for example if the researcher wants to make sure that a minimum sample size is drawn within each stratum. We refer to any stratified sample design where n_{h} is not decided by proportional allocation as **disproportional stratified sampling** (strd).

Question number four also depends on the availability of information within strata. If, for example, no stratum-wise lists of people are available but only a list of households, one would be forced to opt for a multi-stage sample design (see the Section on multi-stage sample design) instead of a simple random sample within strata.

One final question remains: why use stratification in the first place? One important aspect is that stratified samples can have a lower variance than srs. However, the magnitude of the reduction or increase in variance depends on the degree of homogeneity of elements within the strata and on heterogeneity between strata. Thus, a well-informed choice of stratification characteristics is essential to achieve the gains in efficiency that stratification generally offers2.

### Exercise 4

Suppose you want to draw a stratified sample of size n=2,750 from the Norwegian population. You know the population figures in the 19 regions of Norway. They are shown in the following table:

Region | Population |
---|---|

Total | 4 794 619 |

Akershus | 523272 |

Aust-Agder | 106842 |

Buskerud | 253006 |

Finnmark | 72560 |

Hedmark | 189586 |

Hordaland | 469681 |

Møre og Romsdal | 247933 |

Nordland | 235124 |

Nord-Trøndelag | 130192 |

Oppland | 183851 |

Oslo | 586860 |

Østfold | 267039 |

Rogaland | 420574 |

Sogn og Fjordane | 106389 |

Sør-Trøndelag | 284773 |

Telemark | 167102 |

Troms | 155061 |

Vest-Agder | 166976 |

Vestfold | 227798 |

- Allocate the sample size to the regions using proportional allocation.
- Allocate the sample size to the regions using the same sample size in each stratum.
- What practical problems do you have and how do you solve them?

- Find the percentage of the total population living in each of the strata (people living in stratum/4794619). Use this percentage to proportionally allocate the 2,750 persons in the total sample to the strata (percentage of stratum*2750). In the Akershus stratum, for example, the stratum sample size according to this allocation scheme is 586860/4794619*2750 = 336.6.
- Divide the total number of respondents in the sample by the number of regions (2750/19). The sample size in every stratum will be 2750/19=144.7.
- In practical applications, you will almost always encounter problems of integrity. Allocating a total sample size proportionally to strata hardly ever results in whole numbers. This is a problem since rounding stratum sample sizes in accordance with a fixed system can result in the sum of stratum sample sizes no longer equalling the total sample size. One solution to this problem is to use Cox controlled rounding [Cox89].

Region | Population | % of total population | Proportional n | Equal sized n | Difference in n |
---|---|---|---|---|---|

Total | 4 794 619 | 100 | 2750 | 2750 | 0 |

Akershus | 523272 | 10.9 | 300.1 | 144.7 | -155.4 |

Aust-Agder | 106842 | 2.2 | 61.3 | 144.7 | 83.5 |

Buskerud | 253006 | 5.3 | 145.1 | 144.7 | -0.4 |

Finnmark | 72560 | 1.5 | 41.6 | 144.7 | 103.1 |

Hedmark | 189586 | 4 | 108.7 | 144.7 | 36 |

Hordaland | 469681 | 9.8 | 269.4 | 144.7 | -124.7 |

Møre og Romsdal | 247933 | 5.2 | 142.2 | 144.7 | 2.5 |

Nordland | 235124 | 4.9 | 134.9 | 144.7 | 9.9 |

Nord-Trøndelag | 130192 | 2.7 | 74.7 | 144.7 | 70.1 |

Oppland | 183851 | 3.8 | 105.4 | 144.7 | 39.3 |

Oslo | 586860 | 12.2 | 336.6 | 144.7 | -191.9 |

Østfold | 267039 | 5.6 | 153.2 | 144.7 | -8.4 |

Rogaland | 420574 | 8.8 | 241.2 | 144.7 | -96.5 |

Sogn og Fjordane | 106389 | 2.2 | 61 | 144.7 | 83.7 |

Sør-Trøndelag | 284773 | 5.9 | 163.3 | 144.7 | -18.6 |

Telemark | 167102 | 3.5 | 95.8 | 144.7 | 48.9 |

Troms | 155061 | 3.2 | 88.9 | 144.7 | 55.8 |

Vest-Agder | 166976 | 3.5 | 95.8 | 144.7 | 49 |

Vestfold | 227798 | 4.8 | 130.7 | 144.7 | 14.1 |

Only one decimal is shown.

- [1] Defining stratum boundaries is not necessarily an arbitrary decision. Methods exist for optimally stratifying a sample.
- [2] For a more detailed overview of stratification techniques, see Särndal et al. (1992, chapter 3.7), Cochran (1977), Lehtonen and Pahkinene (2004, pp. 61) or Münnich (2003).

# Sampling with Probability Proportional to Size

When information on a size measure G exists for every element in the population and this size measure stores valuable information about the ‘importance’ of element i to be included in the sample, we can use this information in the sample design. Sample designs that make explicit use of such size measures are called **probability proportional to size ** (pps) sample designs. The inclusion probability of element i of a pps sample of size n is

Sample designs with pps are often used in business surveys when it is important to include the largest firms in an industry in the sample since they contribute a large amount to the industry’s production of goods or services. However, pps can also be combined with cluster sample designs or general multi-stage sample designs, which are introduced in the next Section.

## Cluster Sampling and Multi-Stage Sampling

A multi-stage or a cluster sample is drawn either because no population-wide sampling frame of ultimate sampling units exists or because the fieldwork personnel management wishes a geographical distribution of the interviewers that minimises travel between and within geographical clusters. Clustering can, however, lead to a severe loss of precision in estimators, as we will see later.

A **cluster sample design** (clu) is any sample design in which ultimate sample units are not selected directly but are taken from a sample of superordinate non-overlapping clusters. A **cluster**, or primary sampling unit (PSU), denotes a subset of population elements that belong to this subset due to some specific well-defined attributes (e.g. a person's address).

Each ultimate sampling element belongs to exactly one PSU and each PSU contains one or more ultimate sampling units. A clustered population consists of M PSUs, which are of size N_{i}, i=1, ... , M. We shall assume that a complete frame of PSUs exists from which a sample of m PSUs is drawn. The set of possible samples of m of the M clusters is denoted by S and a specific sample of m PSUs is denoted by s. The cluster sample design is defined as p(s). The inclusion probabilities of each of the M clusters is denoted by π_{i}. The value of π_{i} depends on the characteristics of the sample design. After s has been obtained, y is surveyed for each of the
ultimate sample elements (ignoring contact and non-response issues for the time being). This sampling scheme is referred to as **cluster sampling**, single or one-stage sampling. It is a special case of a wider class of so-called multi-stage sample designs.

A **multi-stage sample design ** (mul) is any sample design in which ultimate sample elements are selected through subsequent sampling in two or more superordinate stages. In two-stage sampling (mul2), for example, ultimate sampling units are nested directly within superordinate clusters. Under mul2 m of M, clusters are selected at the first stage. The set of possible samples of m primary sampling units is denoted by S^{(1)}. A specific sample of m primary sampling units is denoted by s^{(1)}; inclusion probabilities for each of the M PSUs are denoted by π_{i}, i=1, ... ,M. At the second stage, n_{i} **secondary sampling units (SSU)** of the ith PSU of size N_{i} are selected within each selected PSU. Thus,
Elements of the ith cluster are denoted by 1, ... ,j, ... ,n_{i}. The set of possible samples of n_{i} from N_{i} SSUs in the ith PSU is denoted by S_{i}^{(2)} and a specific sample by s_{i}^{(2)}.

The sum of all S_{i}^{(2)} is S and the sum of all s_{i}^{(2)} is s.

The inclusion probability of the jth element given the ith PSU selected is denoted by π_{j|i}. The magnitude of π_{j|i} depends on the sample design that is used to select the elements within the PSU. We need a convenient notation system for the ultimate sample elements selected into the sample. Say there are
elements selected in total. We will then refer to the jth element of the ith PSU as the kth element, k=1, ... , n. Table 2.3 illustrates this notational scheme.

Table 2.3. Notation scheme

Following this notation, the overall inclusion probability of the jth element in the ith PSU is denoted by π_{ij} or simply by π_{k} and is the product of the inclusion probabilities in the two stages, which is expressed as

π_{ij} = π_{k} = π_{i}* π_{j|i}

and the design weight is expressed as

A consistent notation is generally used in three or more stages in multi-stage sampling.

Figure 2.3 shows examples of cluster sampling with M=16, N_{i}=16 and m=5 and two-stage sampling with m=5 and n_{i}=3.

Figure 2.3. Cluster and two-stage sampling scheme

It is a commonly held belief that one of the most striking advantages of cluster sampling for social surveys is that it guarantees reduced travel costs. Interviewers can be sent into the field within closely defined geographical boundaries. A primary sampling unit is often defined as a municipality or a city district, making travel from address to address relatively inexpensive. We will later see that this assumption may hold if the estimators based on data for a geographically clustered sample design are estimated naively, but that it can be neglected if the effects of the sample design are incorporated in the estimation process.

A further explanation for the widespread use of cluster sample designs is the unavailability of alternative sampling frames for ultimate sample elements (e.g. population registers). In fact, many European countries either lack such a list or do not allow researchers to draw a sample from it. This is also reflected in the sample designs used by ESS countries from which only about half are not multi-stage designs [Ess05b] [Häd07]. In the ESS, the guidelines for selection of a sample design follow the recommendation of Kish1: ‘Sample designs may be chosen flexibly and there is no need for similarity of sample designs. Flexibility of choice is particularly advisable for multinational comparisons, because the foundations for sampling differ between countries. All this flexibility assumes probability selection methods: known probabilities of selection for all population elements’.

### Example 2

Multi-stage sample designs are often combined with pps sampling as described above. A very commonly used sample design in the ESS is the following: In the first stage, m of M clusters are drawn by probability proportional to their size. Then, at the second stage, a fixed number of c persons is selected using srswor.

This particular sample design has the very desirable property that the overall inclusion probabilities are constant. This can be seen fairly easily: the inclusion probability for the ith PSU sampled by pps is π_{i} = N_{i} / N. Once it has been sampled, the inclusion probability of the jth element in the ith PSU is π_{j|i} = c / N_{i}. As explained above, denote the jth element of the i PSU by k. The overall inclusion probability of the jth element in the ith cluster is simply the product of π_{i} and π_{j|i}, which is

π_{ij} = π_{k} = N_{i}/N * c/N_{i} = c/N

and hence constant for all elements.

### Exercise 5

Assume a population exists of M=10 PSUs of the following size:

i | N_{i} |
π_{i} |
π_{j|i} |
π_{ij} |
---|---|---|---|---|

1 | 20 | |||

2 | 40 | |||

3 | 20 | |||

4 | 10 | |||

5 | 10 | |||

6 | 15 | |||

7 | 25 | |||

8 | 20 | |||

9 | 15 | |||

10 | 25 |

We sample by pps m=5 of the 10 PSUs. In each PSU, we select c=5 elements randomly.

- Calculate the first stage inclusion probabilities π
_{i}. - Calculate the second stage inclusion probabilities π
_{j|i} - Calculate the overall inclusion probabilities π
_{i j}

- Due to the pps selection in the first stage, inclusion probabilities are defined as N
_{i}/N. Thus the inclusion probability of the first PSU is 20/200=0.1. - Randomly selecting five secondary sampling units within each selected PSU means that the second stage inclusion probabilities are 5/N
_{i}. For example, the inclusion probabilities of the elements of the first PSU are all 5/20=0.25. - The overall inclusion probability of an element is simply the product of its inclusion probabilities in all stages. In our example, the overall inclusion probability of all elements belonging to the first PSU is N
_{i}/N * 5/ N_{i}= 5/N = 5/200 = 0.025.

i | N_{i} |
π_{i} |
π_{j|i} |
π_{ij} |
---|---|---|---|---|

= N_{i}/N |
= c/N_{i} |
= c/N | ||

1 | 20 | 0.100 | 0.250 | 0.025 |

2 | 40 | 0.200 | 0.125 | 0.025 |

3 | 20 | 0.100 | 0.250 | 0.025 |

4 | 10 | 0.050 | 0.500 | 0.025 |

5 | 10 | 0.050 | 0.500 | 0.025 |

6 | 15 | 0.075 | 0.333 | 0.025 |

7 | 25 | 0.125 | 0.200 | 0.025 |

8 | 20 | 0.100 | 0.250 | 0.025 |

9 | 15 | 0.075 | 0.333 | 0.025 |

10 | 25 | 0.125 | 0.200 | 0.025 |

- [1] Kish 1994:173.

- [Cox89] Cox, L. W. and George, J. A. (1989). Controlled Rounding For Tables With Subtotals. In
*Annals of Operations Research*, 20:141-157. - [Ess05b] ESS (2005b). Sampling for the European Social Survey round III: Principles and requirements. Specification, European Social Survey.
- [Ess09] ESS (2009).
*European Social Survey, Round 5: Specification for participating countries*. London: Centre for Comparative Surveys. http://www.europeansocialsurvey.org/index.php?option=com_docman&task=doc_download&gid=602&itemid=80 - [Häd07] Häder, S., Laaksonen, S., and Lynn, P. (2007).
*ESS Round 2 2004/2005 Technical Report*, chapter THE SAMPLE. ESS