# Example: Why are men’s incomes higher than women’s incomes?

One proximate cause might be men’s longer working hours. The ESS data contain the dichotomous variable ‘gender’ and the metric variable ‘total hours normally worked per week in main job, overtime included’. Assume that the latter measures all paid work and that Poland is our country of interest. Figure 1 presents the dispersion of Polish men’s working hours (left column) and Polish women’s working hours (right column). Each circle represents one or several persons. Their vertical dispersion indicates that working hours vary strongly. How can we translate this into one single measure of gender differences? The most common procedure is to compute the difference between men’s and women’s mean values. In Figure 1, the upper horizontal line marks the men’s mean, whereas the lower line marks the women’s mean. We call these values conditional means because their computation is conditional on the individuals’ values on the gender variable. Thus, working women’s mean value is smaller than working men’s mean value.

Figure 1. Example 1: Regression line based on simple linear regression analysis with gender as the independent variable and total number of hours worked per week in main job as the dependent variable. Polish ESS round 2 data.

Rather than drawing two horizontal lines, however, we get an even more striking illustration of the difference between men and women by drawing a line between the mean point on the men’s column and the mean point on the women’s column. Such a line has been added in Figure 1. Now, the interesting thing here is that we get an identical line if we apply the ordinary least squares (OLS) method of linear regression analysis. Indeed, linear regression can be described as a method for establishing a linear relation between a set of units’ values on one or more independent variables and their mean values on a dependent variable. In this example, gender is the independent and hours worked the dependent variable.

Regression analysis does not add much to our understanding, however, if there is only one dichotomous independent variable. In such cases, a simple comparison of the two means will suffice. But the need for regression analysis as a simplifying device increases if there is more than one independent variable or if the independent variable has more than two values. In the latter case, the line that connects the conditional means may be bumpy rather than straight and simple, and in that case it can not longer be conceived as a convenient representation of the relation between the variables. This, and the use of a regression line to make things simpler, is illustrated by our second example.