# Estimation of Design Effects

Let us assume a population of elements grouped into M PSUs each of size N_{i}, i = 1, ..., M and let N=Σ N_{i}. Furthermore, let B = N/M be the average cluster size. For the time being, let us assume that the PSUs are of equal size, so that N_{i}=B. Finally, let y_{ij} denote the value of the variable of interest for the jth respondent in the ith cluster, as before. Consequently,
denotes the sum of the study variable in the ith cluster. A simple random sample of m clusters is drawn at the first stage and then all B elements of a PSU are selected. The homogeneity of y introduced by geographical clustering leads to the design effect, deff, which is defined by Kish1 as

(20) |

with and but here, following Lohr2, it shall be expressed more generally as

(21) |

where Var_{c} is the variance of the estimator under the actual complex design (here: one-stage sample design) and Var_{srs} is the variance of the same estimator under a (hypothetical) simple random sample3. Put less formally, the design effect is the factor by which the variance of an estimator under a complex design is under or overestimated by the naive formula.

The ratio

n_{eff} = n/deff | (22) |

is referred to as the **effective sample size** and is the number of ultimate sample elements required in an srs that yields the same precision for a certain estimator as under a given complex sample design. Kish4 showed that (20) can be expressed as

deff_{one-stage} = 1 + (B - 1) ρ | (23) |

if all B elements in a selected cluster are selected (one-stage or cluster sampling) and

deff_{two-stage} = 1 + (b - 1) ρ | (24) |

if b elements in a selected cluster are sub-sampled randomly (two-stage sampling). The factor ρ is a measure of homogeneity and will be discussed below.

If cluster sizes vary, for example due to non-response, [Gab99] showed that the design effect can be formulated as the product of two factors in the following way:

deff = deff_{p} * deff_{c} | (25) |

In this expression, deff_{p} is the **design effect due to unequal inclusion probabilities** and deff_{c} is the **design effect due to clustering**. In most cases, both components have to be estimated from sample data, so that the estimated design effect can be expressed as

(26) |

where the first term refers to the estimated design effect due to unequal inclusion probabilities. This factor can be expressed as

(27) |

The second term is the estimated design effect due to clustering, which is defined as

(28) |

The factor b* is the weighted average cluster size, which can be expressed as

(29) |

Obviously, does not depend on the distribution of the study variable. Thus, the design effect due to unequal inclusion probabilities is fixed for a given sample, regardless of the item under study. However, the magnitude of the design effect due to clustering will vary from item to item due to the fact that the magnitude of ρ (and its estimator) will vary with the distribution of the study variable. The next subsection provides a brief overview of the definition and the estimation of ρ.

#### Footnotes

- [1] Kish 1965, 162.
- [2] Lohr 1999, 239.
- [3] The ratios of variances in the sample means in the above examples correspond to the design effect as defined by the above formula.
- [4] Kish 1965, 162.

#### References

- [Gab99] Gabler, S., Häder, S., and Lahiri, P. (1999). A model based justification of Kish’s formula for design effects for weighting and clustering.
*Survey Methodology*, 25(1): 105-106.