Statistical testing of the linearity assumption

However, we still cannot be sure whether this association is linear or curved. The non-zero regression coefficient of the squared birth year variable reported in the Model 2 part of the table, indicates that the regression line is slightly curved, but is this tendency strong enough to warrant the belief that the population regression line is curved too? A look at the Sig. value of the squared term of the birth year variable tells us that it is smaller than the chosen significance level. Hence, there is a statistically significant association between birth year and education length that is not accounted for by the purely linear model (Model 1). Therefore, we conclude that it represents a curved element of the association. We can, in other words (at least temporarily), refute the hypothesis that the association is linear and instead assume that it is curved. The general rule is that, if we have a regression model where an independent variable is represented by both a squared and a non-squared term, and the squared term’s regression coefficient has a Sig. value that is lower than the chosen significance level, then we must accept the hypothesis that the population association between the independent and the dependent variable is curved and refute the null hypothesis that the association is linear. If the Sig. value is not lower than the chosen significance level, we do not discard the null hypothesis that the association is linear.

Table 5. SPSS output: Blockwise quadratic regression coefficients

All we need to know to draw these conclusions is the Sig. value. But the coefficients table also reports two other statistics that are related to this problem. The ‘Std. Error’ is the estimate of the standard error that has been defined above. The other statistic, the ‘t’, is computed by dividing the sample regression coefficient by the estimate of the standard error. The t value helps SPSS to find the significance probability (the Sig. value) and for some purposes it may be helpful for you too to familiarise yourself with this statistic. Consult an introductory statistics book to learn more.

The results presented in Table 5 indicate that adding a quadratic term to the regression equation improves the model. Another way to check this is to test whether there has been a statistically significant improvement in the model’s ability to explain the variance of the dependent variable after the quadratic term has been added to the model. This is done by having SPSS compute a statistic, known as the F statistic, which, in this version, can be interpreted as a measure of the relative improvement in the explained variance that has taken place after the model was extended with new additive terms. The statistic has a known sampling distribution. Therefore, we can know how probable it is that we would get the F value we actually obtained if a null hypothesis of zero improvement of the model were correct. We can make SPSS find this probability for us. (The numerical F value itself has no intuitive meaning and can be ignored for the rest of this course.) The null hypothesis of zero model improvement is tested in exactly the same way as we test null hypotheses about zero regression coefficients. We choose a significance level and check whether the relevant Sig. value is lower than this level. If it is lower, we discard the null hypothesis and (at least until further evidence has been gathered) accept that the extended model gives the best description of how the variables are associated in the population.

In order to make SPSS calculate the relevant significance probability for the F-test, we must click the ‘Statistics’ button in the linear regression dialogue box. Then another dialogue box opens, and we tick the ‘R squared change’ option. ‘Model fit’ and ‘Estimates’ have been pre-selected. Do not remove these ticks. (Figure 12.)

Figure 12. Obtaining R squared change statistics

The resulting Model Summary table is displayed as Table 6. The relevant test statistic is the ‘Sig. F Change’ in the Model 2 row. This statistic tells us that, provided that our chosen significance level is 0.1% or higher, we can discard the null hypothesis that the extension of the model by a squared birth year term has not improved the model. In other words, the addition of the quadratic term has brought about a statistically significant improvement of the model.

Table 6. SPSS output: Multiple blockwise regression goodness of fit statistics


Interpret the test statistics in the output you got when you did the exercise in chapter 3.

Go to next chapter >>