Univariate descriptive

For metric variables having many values, you should consider whether univariate descriptive analysis may be more appropriate than frequency. Univariate descriptive analysis is a method of describing how the cases are distributed over the values of a particular variable. The technique results in several descriptive measures that portray central tendencies and distribution for metric variables. In other words, instead of listing all the values, it provides details on the distribution of a metric variable.

Read more about the use of weights

To perform a univariate descriptive analysis:

Table 2 shows the descriptive results for the variable "Age". Please note that the data are weighted according to the two weights.

Table 2: Example of univariate descriptive analysis: Age, in number of years in 2002
MeasuresValues
Median44.44
Mean45.41
Min4.00
Max109.00
S.d.18.08
Sum1,694,190.00
N37,307.90
CI 95% min45.23
CI 95% max45.59
CI 99% min45.17
CI 99% max45.65
Box whisker low4.00
First quartile31.14
Third quartile58.85
Box whisker high86.00

Weight is on

Open this table in Nesstar WebView

Let us start the interpretation of Table 2 with N. N is the number of observations with valid values. Because of the population size weight, N is not interpretable in this table. This weight variable causes each respondent from the smaller countries to count as less than 1, so the number of respondents reported, 37 307, is less than the real number (which is 42 101).

The maximum value is 109, which means that the oldest person in the sample was 109 years old in 2002. The minimum value is 4, which means that the youngest person in the sample was 4 years old in 2002. This is obviously a mistyping in the data, which illustrates that it is easy to make mistakes when preparing large datasets. The sum is less important in this analysis. The ages of all respondents are added together and the total is 1694190.

The median is a measure of the central tendency for ordinal or metric variables. The median is the value that divides a sorted distribution into two equal parts, i.e. 50 % of the cases will be above it, and 50 % below.

The arithmetic mean shows the central tendency for metric variables. It is the sum of all values divided by the number of cases. Since the variables must have metric characteristics to make it possible to calculate the mean, this measure should not be used for nominal and ordinal variables.

As you can see from Table 2, there is a small discrepancy between the two measures of the central tendency, the mean being slightly greater in this case than the median. This indicates that there may be some older people in the sample, pulling the mean higher. For metric variables you could use both measures, but you should be aware that the measures might be telling different stories. The mean is more vulnerable to extreme values, so for instance if Bill Gates lived in a small municipality, the mean income in this municipality would be extremely large. The median would be a better measure of the income most commonly earned.

Standard deviation is a measure of statistical dispersion based on the intervals between the individual data values and the arithmetic mean of these values. A large standard deviation indicates that the data points are far from the mean and a small standard deviation indicates that they are clustered closely around the mean. Standard deviation can only be used for metric variables. In Table 2 we can see that the standard deviation for the age variable is quite large, 18.01, indicating that the respondents are well spread around the mean.

For an explanation of the rest of the output, please look up Confidence interval and Box-and-whisker plot in the keyword section.

Go to next page >>