# All pages

# Chapter 4: Statistical testing of hypotheses about regression coefficients and R^{2}

Do these analyses tell us anything about the entire populations of those countries from which the European Social Survey samples were obtained? The answer is yes, provided that the surveys were properly conducted on random samples, which we shall assume to be the case. But, of course, the leap from conclusions about samples to conclusions about whole populations includes the possibility of error.

Fortunately, statisticians have proved that regression analyses on random sample data of the sizes used in the ESS program produce regression coefficients that are distributed in well-known ways. So, if we (as a thought experiment) collect all possible equally sized samples from a population and conduct a regression analysis on each sample, we get a large set of regression coefficients whose values have an approximately normal (bell shaped) distribution, with many relatively equal coefficients in the middle and few coefficients with extreme values in the tails. We also know that the mean value of this distribution (which is called the sampling distribution of the regression coefficient) is identical to the population coefficient, and we know the distances from this population coefficient, measured in standard errors, within which we will find a certain portion of the sample coefficients (say 90%, 95%, 99% or whatever portion we are interested in). A standard error is the standard deviation of a sampling distribution. In other words, the standard error is a measure of how widely dispersed the sampling distribution is. The higher the standard error, the further away from the population coefficient the value of the sample coefficients tend to lie. Hence, a sample regression coefficient is likely to be a more accurate estimate of the population regression coefficient if the standard error is small than if it is large, and the standard error becomes smaller the larger the sample is. Results obtained from large samples are therefore, under otherwise equal conditions, more reliable than results obtained from small samples.

# Standard error and significance level

In order to know how accurate our single sample based regression coefficient is as an estimate of the population coefficient, we need to know the size of the standard error. Fortunately, although we cannot find its exact value, we can get a fairly accurate estimate of it through analysis of our sample data. This estimate, which is reported in the SPSS regression analysis coefficients table, makes it possible to tell how likely it is that the difference between the population regression coefficient and our sample regression coefficient is larger or smaller than a certain, freely chosen value. This makes it possible to test so called null hypotheses about the value of the population regression coefficient.

Such testing is easy with SPSS if we accept the presumption that the relevant null hypothesis to test is the hypothesis that the population has a zero regression coefficient, i.e. that there is no linear association between the independent and the dependent variable. Our test criterion will be that the null hypothesis shall be refuted if there is less than a certain likelihood (e.g. 5% likelihood) that a population with a coefficient value of 0 would give rise to a sample with a regression coefficient whose absolute value is equal to or larger than the one we actually found in our sample. We call this chosen likelihood level our ‘significance level’. Note that we cannot conclude with certainty whether or not the null hypothesis is true. This criterion says that we should refute the null hypothesis if the chances that we would observe the estimated regression coefficient if the null hypothesis really were true is less than our chosen significance level. Thus, if we choose 5 % likelihood as our criterion, there is a 5% chance that we might refute a correct null hypothesis. Refuting a correct null hypothesis is called a ‘type 1 error’. If we think that a 5% percentage chance of making such an error is too high, we should choose a smaller significance level, say a 1% level. The most common significance levels are 10%, 5% and 1%. Higher levels than 10% are very rare. Levels that are lower than 1% may occur. But note that choosing a low significance level and, hence, a low risk of committing a type 1 error, comes at the cost of choosing a high risk of committing a ‘type 2 error’, which is the error of omitting to refute an incorrect null hypothesis. But this risk decreases with the size of the sample, so, with large samples, one may prefer small significance levels.

# Statistical testing of significance

Below, you will find the syntax for this analysis. The analysis is explained in the text.

Syntax for example*The following command causes the cases to be weighted by the design weight variable 'dweight'.

*The following syntax is identical to the syntax that was presented in chapter 3, except that we have added a command which instructs SPSS to perform a F-test and deleted the commands which instruct SPSS to create a scatterplot. *The following commands cause SPSS to select for analysis those cases that belong to the Swedish sample and have lower values than 1975 on the birth year variable. *In this process the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases.

*The values of the variable created by the following commands are the squared values of the two-digit 'birthyear' variable.

*The following commands instruct SPSS to run a blockwise regression analysis with the variable 'birthyear' as the independent variable in the first model and to add the variable 'sqbirthyear' as a second independent variable in the second model. *Note that the Command CHANGE at the end of the third line instructs SPSS to test whether the second model explains more of the dependent variable’s variance than the first model does.

Now, how do we actually make such tests using SPSS? The coefficients table reports a statistic called ‘Sig.’. (The abbreviation Sig. may be taken to stand for ‘significance probability’, which, in some other statistical applications, is called the p-value.) This statistic indicates the probability that we would find the sample regression coefficient we have actually found in our sample if the null hypothesis is true, i.e. if it is true that the value of the population regression coefficient is 0. (A probability of 5% will be reported as 0.05 etc.) If this statistic is smaller than our chosen significance level, we refute the null hypothesis, i.e. we will refute the hypothesis that there is no linear association between the relevant independent variable and the dependent variable. If the Sig. statistic has a value (converted into a percentage) that is higher than or equal to our chosen significance level, we retain the null hypothesis (but we may still refute it later if additional evidence indicates that it is false after all).

Let us see how this works. The regression coefficients reported for the Swedish sample in the previous chapter can be conceived as estimates of the association between birth year and education length in the Swedish population. But are they large enough to convince us that there really is an association between birth year and education length in this population? And, if there is an association in the population, can we be sufficiently sure that this is not a purely linear association rather than a curved one?

To perform the tests, we must first choose a significance level. Let the chosen level be 1% (which is written 0.01 if measured on the probability scale, which ranges from 0 to 1). Next, we consult the coefficients table. (See Table 5, which is a replica of Table 3.) Normally, we are not very interested in testing hypotheses about the constant, so we go directly to the independent variable ‘year of birth’ in the Model 1 part of the table. The Sig. value is reported to be 0.000. This indicates that it is less than 0.001 (but not exactly 0), which, in turn, means that it is less than our chosen significance level of 0.01. Thus, we can regard the null hypothesis as refuted and start believing that there really is an association. A common way to state this is to say that the association between the dependent and the independent variables is statistically significant.

Table 5. SPSS output: Blockwise quadratic regression coefficients

# Statistical testing of the linearity assumption

However, we still cannot be sure whether this association is linear or curved. The non-zero regression coefficient of the squared birth year variable reported in the Model 2 part of the table, indicates that the regression line is slightly curved, but is this tendency strong enough to warrant the belief that the population regression line is curved too? A look at the Sig. value of the squared term of the birth year variable tells us that it is smaller than the chosen significance level. Hence, there is a statistically significant association between birth year and education length that is not accounted for by the purely linear model (Model 1). Therefore, we conclude that it represents a curved element of the association. We can, in other words (at least temporarily), refute the hypothesis that the association is linear and instead assume that it is curved. The general rule is that, if we have a regression model where an independent variable is represented by both a squared and a non-squared term, and the squared term’s regression coefficient has a Sig. value that is lower than the chosen significance level, then we must accept the hypothesis that the population association between the independent and the dependent variable is curved and refute the null hypothesis that the association is linear. If the Sig. value is not lower than the chosen significance level, we do not discard the null hypothesis that the association is linear.

Table 5. SPSS output: Blockwise quadratic regression coefficients

All we need to know to draw these conclusions is the Sig. value. But the coefficients table also reports two other statistics that are related to this problem. The ‘Std. Error’ is the estimate of the standard error that has been defined above. The other statistic, the ‘t’, is computed by dividing the sample regression coefficient by the estimate of the standard error. The t value helps SPSS to find the significance probability (the Sig. value) and for some purposes it may be helpful for you too to familiarise yourself with this statistic. Consult an introductory statistics book to learn more.

The results presented in Table 5 indicate that adding a quadratic term to the regression equation improves the model. Another way to check this is to test whether there has been a statistically significant improvement in the model’s ability to explain the variance of the dependent variable after the quadratic term has been added to the model. This is done by having SPSS compute a statistic, known as the F statistic, which, in this version, can be interpreted as a measure of the relative improvement in the explained variance that has taken place after the model was extended with new additive terms. The statistic has a known sampling distribution. Therefore, we can know how probable it is that we would get the F value we actually obtained if a null hypothesis of zero improvement of the model were correct. We can make SPSS find this probability for us. (The numerical F value itself has no intuitive meaning and can be ignored for the rest of this course.) The null hypothesis of zero model improvement is tested in exactly the same way as we test null hypotheses about zero regression coefficients. We choose a significance level and check whether the relevant Sig. value is lower than this level. If it is lower, we discard the null hypothesis and (at least until further evidence has been gathered) accept that the extended model gives the best description of how the variables are associated in the population.

In order to make SPSS calculate the relevant significance probability for the F-test, we must click the ‘Statistics’ button in the linear regression dialogue box. Then another dialogue box opens, and we tick the ‘R squared change’ option. ‘Model fit’ and ‘Estimates’ have been pre-selected. Do not remove these ticks. (Figure 12.)

Figure 12. Obtaining R squared change statistics

The resulting Model Summary table is displayed as Table 6. The relevant test statistic is the ‘Sig. F Change’ in the Model 2 row. This statistic tells us that, provided that our chosen significance level is 0.1% or higher, we can discard the null hypothesis that the extension of the model by a squared birth year term has not improved the model. In other words, the addition of the quadratic term has brought about a statistically significant improvement of the model.

Table 6. SPSS output: Multiple blockwise regression goodness of fit statistics

## Exercise

Interpret the test statistics in the output you got when you did the exercise in chapter 3.