# Sample

Now that we have a basic understanding of what a population is, let us turn to the concept of a **sample**. At the general level, a sample is a subset of size n from a population of size N selected according to predefined rules. In the simplest case, the n elements of a sample are selected randomly, every element having the same chance of being selected to the sample. Obviously, there are many possible ways in which we can select n elements from the population into the sample, depending on both the size of the population N and the sample size n. If, for example, we wanted to sample n=4 persons from our universe of N=10 persons and we did not replace each element after it had been surveyed, the result is 210 possible samples. The first of these samples includes elements 1, 2, 3 and 4, the second possible sample elements 1, 2, 3 and 5, and so on until the 210th possible sample, which includes elements 7, 8, 9, 10. The set of all possible samples of size n from N is denoted by S. In our example, S contains all 210 possible samples. A specific sample is denoted by the lower case letter s. The sample containing persons 1, 4, 7 and 8 is thus denoted by s={1, 4, 7, 8}.

The study variable Y is then surveyed for each of the n elements of the sample s. To distinguish between the (unknown) values of the study variable in the population and the (known) values in the sample, we denote the latter using the lower case letter y. The enumeration of the y-values in the sample follows the same logic as before. We denote the y-value of the ith element y_{i}, with the difference that i now ranges from 1 to n. Apart from that, the notation is very similar to the one we already know:

y = (y_{1}, ... , y_{i}, ... , y_{n})’

If we had conducted a sample containing elements 1, 4, 7 and 8 of the population the result would be

y = (y_{1}, y_{4}, y_{7}, y_{8})’ = (37, 31, 59, 22)’

### Exercise 2

Suppose we draw a sample of size n = 4 from our population and get elements U_{1}, U_{3}, U_{8} and U_{10} in the sample. Write y as a column and as a row vector.

Figure 2.2. y written as a column vector:

y written as a row vector:
y = (y_{1}, y_{3}, y_{8}, y_{10})’ = (37, 63, 22, 18)’

The elements of s are called **ultimate sampling units** if y can be surveyed directly for them. A **sample survey** is a combination of the realisation of a sample design and the measurement of the values of at least one (but probably many more) study variable(s) in elements of the sample.

Finally, we need to define the term **sample design**. A sample design is a mechanism that assigns a specific sample s a non-zero probability of realisation, P(s). The function P(^{.}) is also called **sample selection scheme** or sampling scheme. Sample designs that explicitly define such a function are called **probability sample designs**. On the other hand, sample designs that do not explicitly define P(^{.}) are called **non-probability sample designs**. They are not dealt with here. In the next section, we will see that srswor and srswr assign equal P(s) to all possible samples. Apart from srswor and srswr, there are many sample designs that assign different probabilities of realisation to different samples. Some of these sample designs will also be introduced in the following sections.