# Statistical testing of significance

Below, you will find the syntax for this analysis. The analysis is explained in the text.

Syntax for example*The following command causes the cases to be weighted by the design weight variable 'dweight'.

*The following syntax is identical to the syntax that was presented in chapter 3, except that we have added a command which instructs SPSS to perform a F-test and deleted the commands which instruct SPSS to create a scatterplot. *The following commands cause SPSS to select for analysis those cases that belong to the Swedish sample and have lower values than 1975 on the birth year variable. *In this process the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases.

*The values of the variable created by the following commands are the squared values of the two-digit 'birthyear' variable.

*The following commands instruct SPSS to run a blockwise regression analysis with the variable 'birthyear' as the independent variable in the first model and to add the variable 'sqbirthyear' as a second independent variable in the second model. *Note that the Command CHANGE at the end of the third line instructs SPSS to test whether the second model explains more of the dependent variable’s variance than the first model does.

Now, how do we actually make such tests using SPSS? The coefficients table reports a statistic called ‘Sig.’. (The abbreviation Sig. may be taken to stand for ‘significance probability’, which, in some other statistical applications, is called the p-value.) This statistic indicates the probability that we would find the sample regression coefficient we have actually found in our sample if the null hypothesis is true, i.e. if it is true that the value of the population regression coefficient is 0. (A probability of 5% will be reported as 0.05 etc.) If this statistic is smaller than our chosen significance level, we refute the null hypothesis, i.e. we will refute the hypothesis that there is no linear association between the relevant independent variable and the dependent variable. If the Sig. statistic has a value (converted into a percentage) that is higher than or equal to our chosen significance level, we retain the null hypothesis (but we may still refute it later if additional evidence indicates that it is false after all).

Let us see how this works. The regression coefficients reported for the Swedish sample in the previous chapter can be conceived as estimates of the association between birth year and education length in the Swedish population. But are they large enough to convince us that there really is an association between birth year and education length in this population? And, if there is an association in the population, can we be sufficiently sure that this is not a purely linear association rather than a curved one?

To perform the tests, we must first choose a significance level. Let the chosen level be 1% (which is written 0.01 if measured on the probability scale, which ranges from 0 to 1). Next, we consult the coefficients table. (See Table 5, which is a replica of Table 3.) Normally, we are not very interested in testing hypotheses about the constant, so we go directly to the independent variable ‘year of birth’ in the Model 1 part of the table. The Sig. value is reported to be 0.000. This indicates that it is less than 0.001 (but not exactly 0), which, in turn, means that it is less than our chosen significance level of 0.01. Thus, we can regard the null hypothesis as refuted and start believing that there really is an association. A common way to state this is to say that the association between the dependent and the independent variables is statistically significant.

Table 5. SPSS output: Blockwise quadratic regression coefficients