Example: Different birth cohorts’ length of education
This example can be motivated by the fact that, during the last century, social development led to an increase in educational opportunities for most people. Assume that we want to study how this affected different birth cohorts’ length of education in, say, Norway. Has the number of years spent in educational institutions increased steadily from one cohort to the next; and if so, how steep has the increase been?
In Figure 2, Norwegian survey sample members with identical variable value combinations are represented by small circles. Only people born before 1975 are included, because many younger people had not finished their studies at the time of the survey. Notice that almost all possible lengths of education are represented in every cohort. But there is also a tendency for the proportion of people with long educations to become greater as the cohorts get younger. Thus, the conditional mean education lengths are higher in younger than in older cohorts. If we draw a line through all these conditional means, however, we don't get a straight line but a zigzag line, as can be seen in Figure 2.
Figure 2. Example 2: Regression line based on simple linear regression analysis with year of birth as the independent variable and length of education measured in years as the dependent variable. Norwegian ESS round 2 data.
Thus, the mean education length has not risen at a constant rate from one cohort to the next. However, the long-term tendency to rise still seems pretty uniform over time. Hence, we might obtain a less fuzzy, and for many purposes fully adequate, picture by just letting a straight line ascend through the zigs and zags of the zigzag line. Such a line is exactly what we get when we apply the ordinary least squares method of regression analysis (OLS) to these data. The resulting regression line is shown in Figure 2, which also illustrates that this line always passes through the point at which the overall mean of the dependent variable meets the overall mean of the independent variable. The line’s relative closeness to the observed conditional means is achieved partly because of the defining principle of the OLS method, which says that the regression line should be drawn so that the sum of its squared vertical distances from the various individuals’ positions in the diagram is as small as possible.
Since the regression line captures the association between year of birth and mean education length quite well, it would seem to be a good idea to choose a point on this line if we were to predict the education length for a person about whom we know nothing but his or her year of birth. Thus, we often treat the regression line as an expression of the association between observed values of the independent variable and predicted values of the dependent variable.
But note, also, that we have no guarantee that mean education length will continue to rise at the same long-term rate for cohorts born after 1974 as they did for those studied here. If they do not, a straight regression line should not be used to express associations between variables in studies that include people born before as well as people born after 1974. A possible solution to such problems is discussed in chapter 3.
On the following page, we will describe how SPSS can be used to create figures like the one presented in Figure 2.