# Stratified Sampling

In stratified sampling (str), the population of interest can be divided into H non-overlapping sub-populations or strata of size Nh (h = 1, ... , H) according to a stratification variable Z. The stratification variable is either discrete or has to be recoded into a discrete variable with as many unique values as the desired number of strata. The values of Z are denoted by Zh. The total sample size n is then allocated to the strata, so that . Samples of size nh are drawn within each of the H strata. To summarise, there are four choices to make when planning a stratified sample:

1. What variable to use for stratification?
2. How are the stratum boundaries defined?
3. By what method should the total sample size be allocated to the strata?
4. What sample design is used to draw the samples within the strata?

The answer to the first question depends to large extent on the availability of population information. Very often there is no such additional information available, and the overall population figure N and the stratum population figures Nh are therefore used for Z and Zh and, hence, nh = n * (Nh/N).

Answering the second question is not always easy. Researchers often plan to stratify the sample geographically. In our example, it would make sense to stratify the sample according to the 19 regions of Norway. Hence, the stratum boundaries are clearly defined by the geographical location of the people living in Norway. But if a non-discrete variable, for example age, is to be used, the researcher would have to make a decision on how many discrete values the recoded variable should have and where to draw the stratum boundaries1.

Concerning the third question, there are basically two methods for allocating n to the H strata and thus determining nh. First, n can be allocated to the strata proportionally (strp) to Zh so that

nh = n * Zh/Z.

If we know the stratum population figures Nh for h = 1, ... , H, we would use them for stratification and set Zh = Nh. Now we are able to allocate n to the H strata proportionally to their size:

nh = n * Nh/N.

One advantage of proportional allocation is that the inclusion probabilities are constant. Generally, the inclusion probability of the jth element in stratum h can be expressed as

πhj(str) = nh/Nh.

Since, in proportional allocation, the stratum sample sizes nh are by definition proportional to Nh, the above ratio is constant and hence

There can, however, be reasons to decide not to use proportional allocation of the total sample size, for example if the researcher wants to make sure that a minimum sample size is drawn within each stratum. We refer to any stratified sample design where nh is not decided by proportional allocation as disproportional stratified sampling (strd).

Question number four also depends on the availability of information within strata. If, for example, no stratum-wise lists of people are available but only a list of households, one would be forced to opt for a multi-stage sample design (see the Section on multi-stage sample design) instead of a simple random sample within strata.

One final question remains: why use stratification in the first place? One important aspect is that stratified samples can have a lower variance than srs. However, the magnitude of the reduction or increase in variance depends on the degree of homogeneity of elements within the strata and on heterogeneity between strata. Thus, a well-informed choice of stratification characteristics is essential to achieve the gains in efficiency that stratification generally offers2.

### Exercise 4

Suppose you want to draw a stratified sample of size n=2,750 from the Norwegian population. You know the population figures in the 19 regions of Norway. They are shown in the following table:

Table 2.1. Population figures for the Norwegian counties
Region Population
Total 4 794 619
Akershus 523272
Aust-Agder 106842
Buskerud 253006
Finnmark 72560
Hedmark 189586
Hordaland 469681
Møre og Romsdal 247933
Nordland 235124
Nord-Trøndelag 130192
Oppland 183851
Oslo 586860
Østfold 267039
Rogaland 420574
Sogn og Fjordane 106389
Sør-Trøndelag 284773
Telemark 167102
Troms 155061
Vest-Agder 166976
Vestfold 227798

1. Allocate the sample size to the regions using proportional allocation.
2. Allocate the sample size to the regions using the same sample size in each stratum.
3. What practical problems do you have and how do you solve them?

1. Find the percentage of the total population living in each of the strata (people living in stratum/4794619). Use this percentage to proportionally allocate the 2,750 persons in the total sample to the strata (percentage of stratum*2750). In the Akershus stratum, for example, the stratum sample size according to this allocation scheme is 586860/4794619*2750 = 336.6.
2. Divide the total number of respondents in the sample by the number of regions (2750/19). The sample size in every stratum will be 2750/19=144.7.
3. In practical applications, you will almost always encounter problems of integrity. Allocating a total sample size proportionally to strata hardly ever results in whole numbers. This is a problem since rounding stratum sample sizes in accordance with a fixed system can result in the sum of stratum sample sizes no longer equalling the total sample size. One solution to this problem is to use Cox controlled rounding [Cox89].
Table 2.2. Population figures and stratification
Region Population % of total population Proportional n Equal sized n Difference in n
Total 4 794 619 100 2750 2750 0
Akershus 523272 10.9 300.1 144.7 -155.4
Aust-Agder 106842 2.2 61.3 144.7 83.5
Buskerud 253006 5.3 145.1 144.7 -0.4
Finnmark 72560 1.5 41.6 144.7 103.1
Hedmark 189586 4 108.7 144.7 36
Hordaland 469681 9.8 269.4 144.7 -124.7
Møre og Romsdal 247933 5.2 142.2 144.7 2.5
Nordland 235124 4.9 134.9 144.7 9.9
Nord-Trøndelag 130192 2.7 74.7 144.7 70.1
Oppland 183851 3.8 105.4 144.7 39.3
Oslo 586860 12.2 336.6 144.7 -191.9
Østfold 267039 5.6 153.2 144.7 -8.4
Rogaland 420574 8.8 241.2 144.7 -96.5
Sogn og Fjordane 106389 2.2 61 144.7 83.7
Sør-Trøndelag 284773 5.9 163.3 144.7 -18.6
Telemark 167102 3.5 95.8 144.7 48.9
Troms 155061 3.2 88.9 144.7 55.8
Vest-Agder 166976 3.5 95.8 144.7 49
Vestfold 227798 4.8 130.7 144.7 14.1

Only one decimal is shown.

Go to next page >>

#### Footnotes

• [1] Defining stratum boundaries is not necessarily an arbitrary decision. Methods exist for optimally stratifying a sample.
• [2] For a more detailed overview of stratification techniques, see Särndal et al. (1992, chapter 3.7), Cochran (1977), Lehtonen and Pahkinene (2004, pp. 61) or Münnich (2003).

#### References

• [Cox89] Cox, L. W. and George, J. A. (1989). Controlled Rounding For Tables With Subtotals. In Annals of Operations Research, 20:141-157.