Performing ordinary linear regression analyses using SPSS

Follow the preparatory steps outlined in the first chapter, i.e. open the data set, turn on the design weight and select the Norwegian sample of persons born earlier than 1975. Then, run the regression analysis as follows:

You can also copy, paste and run this syntax

*Syntax for the example in chapter 2, the Norwegian sample. *The following command causes the cases to be weighted by the design weight variable 'dweight'.

WEIGHT BY dweight.

*The following commands cause SPSS to select for analysis those cases that belong to the Norwegian sample (value NO on country variable) and have lower values than 1975 on the birth year variable (& stands for AND, < stands for 'less than'). *In this process, the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases. *Change the last part of line 2 (which starts after the first equals sign) if you wish to select other cases. If you do this, you should also change the variable label, which is in double quotation marks on line 3.

USE ALL.
COMPUTE filter_$=cntry = 'NO' & yrbrn < 1975.
VARIABLE LABEL filter_$ "cntry = 'NO' & yrbrn < 1975 (FILTER)".
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

*The following commands cause a linear regression analysis to be performed on the selected data with dependent variable 'eduyrs' and independent variable 'yrbm'. *Change variable names in the last two lines if you wish to run the analysis with other dependent and independent variables.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT eduyrs
/METHOD=ENTER yrbrn.

 

Figure 7. Running a simple (bivariate) linear regression analysis

The output you get if you execute these commands correctly, contains the ‘Coefficients’ table shown here as Table 1.The computed values of a and b are shown in the B column. The item in the first row is the a-coefficient, which SPSS terms the ‘Constant’. The item in the second row is the birth year variable’s b-coefficient, which indicates the steepness of the regression line or, if you prefer, indicates how much the predicted value of the dependent variable (length of education) increases when the value of the independent birth year variable increases by one unit (one year). The coefficient’s value is 0.097 (or, more exactly, 0.096672408), which means that each new cohort’s predicted length of education is 0.097 years longer than that of the cohort that was born one year before it. The a-coefficient or ‘constant’ is identical to the predicted value of the dependent variable for those cases whose independent variable value is 0. But be careful when interpreting this coefficient. Here, its value is negative (-175.553), and surely no one has a negative length of education. The reason for this strange result is that the persons thus attributed a negative education length are supposed to have been born in year 0. But there are no survivors from year 0 in our sample, and the regression results only apply to persons whose x-variable values lie within the span used in the computations (values between 1910 and 1974). You should avoid making extrapolations beyond these limits, and extrapolations that extend far beyond these limits make little sense. Hence, the constant term has no substantial interpretation in this example, but we still need it for computations of predicted y values.

Table 1. SPSS output: Simple linear regression coefficients

The computed coefficient values may be seen as interesting in themselves. But they can also be used to compute predicted dependent variable values for particular persons or groups of persons. Such computations are done by inserting these persons’ independent variable values and the computed coefficient values into the right-hand side of the function ŷi = a + b∙xi. For example, person 4 in Figure 6 was born in 1954. If we insert this value into the function together with the coefficient values, we get:

ŷ4 = (-175.5535449315) + 0.096672408 ∙1954 = 13.34.

(Abnormal numbers of decimals are used for exact prediction.) In other words, we can predict this person’s length of education to be 13.34 years. But this person’s actual length of education is 11 years, so we get a residual value of -2.34 years, which accords with what Figure 6 shows us.

The coefficients presented in Table 1 pertain to those who are members of the ESS sample. The table also contains information about the accuracy of these coefficients regarded as indicators of the association between birth year and education length in the entire population of Norwegians born before 1975. Exactly what these columns (Std.Error, t and Sig.) tell us will, however, be explained in chapter 4.

The Beta column contains the regression coefficients one gets when the analysis is performed on standardised variables. You don’t have to know anything about them to perform ordinary regression analysis. In fact, in most cases you should avoid using them. Consult a regression analysis textbook if you want to know more about them.

Go to next page >>