# Example 5

One way to ensure that the precision of an estimator is independent of the sample design is to plan samples with an equal **effective sample size**. The effective sample size is a concept that incorporates the design effect. However, there is no unique design effect for a given sample. Design effects will vary in magnitude depending on the characteristics of the item under study. The following example illustrates the connection between a) a study variable, b) the definition of clusters and c) the effects of different sample designs [Kis89].

Let us assume a clustered population in which values of the study variable are distributed as shown in the following table. The column and row means are shown along with the variable values.

Row means | ||||||
---|---|---|---|---|---|---|

Column means | 3 | 8 | 13 | 18 | 23 | |

1 | 6 | 11 | 16 | 21 | 11 | |

2 | 7 | 12 | 17 | 22 | 12 | |

3 | 8 | 13 | 18 | 23 | 13 | |

4 | 9 | 14 | 19 | 24 | 14 | |

5 | 10 | 15 | 20 | 25 | 15 |

From this population n = 10 elements are to be drawn by a) srs and b) cluster sampling where clusters are either defined by columns (clu-col) or rows (clu-row) in the matrix. In cluster sampling, all elements in two randomly selected columns or rows are selected. Under srswr n elements are chosen randomly. In either case the sample mean
is to be calculated. The means and the variances under srs, Var_{(srswr)}(y), under column-wise cluster sampling, Var_{(clu-col)}(y) and under row-wise cluster sampling, Var_{(clu-row)}(y), are shown in the following table.

srswr | clu-col | clu-row | |
---|---|---|---|

Mean | 13.00 | 13.00 | 13.00 |

Variance | 5.20 | 18.75 | 0.75 |

It can be seen that the population mean of Y=13 is estimated without bias under all sample designs, but the variances in the estimates vary dramatically. The variance of the estimates of y under srswr (5.2) will serve as a reference.

Under column-wise cluster sampling, the variance of the sample mean is 18.75, which is Var_{(clu-col)}(y)/Var_{(srswr)}(y) = 18.75/5.2 = 3.61 times Var_{(srswr)}(y).

If the columns selected are not exactly symmetrical with the third column (i.e. first and fifth and second and fourth), the difference between the sample mean and the population parameter will be very large.

If rows are sampled, the variance of the sample mean is very low Var_{(clu-row)}(y) / Var_{(srswr)}(y) = 0.75/5.2 = 0.14.

This is due to the very low heterogeneity of row-wise means. Even if, in one of the worst cases, the two upper rows are selected, the sample mean is 11.5, which is closer to the population mean than in one of the corresponding worst cases of column-wise selection (i.e. if, for example, the first or last two columns are selected) where the sample mean is 5 and 20.5, respectively.

This example illustrates that cluster sampling can yield better and worse results (in terms of precision) than srswr. The magnitude of loss or gain in precision depends on the interrelation of the distribution of the study variable and the structure and definition of clusters in a sample design. In most real-world sample surveys, however, these two parameters are interrelated in such a manner that precision is lost.

#### References

- [Kis89] Kish, L. (1989). Deffs: Why, when and how? a review. In
*Proceedings of the Survey Research Methods Section, American Statistical Association*.