Interaction terms: Example

In the following example, we abstain from excesses and present a simple model in which we use data from Poland, Great Britain and Norway (which is the reference country). The focus is still on the association between year of birth and length of education, but now the model also includes the country dummies and the country * birth year interaction terms. The regression function can be expressed as follows:

yi = a + b1∙xGBi + b2∙xPLi + b3∙xBirthyear i + b4∙xGB * Birthyear i + b5∙xPL * Birthyear i + ei

where:

xBirthyear i is person i’s birth year
xGBi is a dummy variable with values Great Britain = 1, other countries = 0
xPLi is a dummy variable with values Poland = 1, other countries = 0
xGB * Birthyear i is the product of xGB i and xBirthyear i (an interaction term)
xPL * Birthyear i is the product of xPL i and xBirthyear i (an interaction term)

For members of the British sample, the value of the first interaction term is identical to their birth year value, while, for Norwegians and Poles, its value is 0.

Similarly, for members of the Polish sample, the value of the second interaction term is identical to their birth year value, while it is 0 for others.

Those who belong to the Norwegian sample are assigned the value 0 on both dummy variables as well as on both interaction terms. Hence, for Norwegians the regression function reduces to yi = a + b3∙xBirthyear i + ei, and the estimate of the coefficient b3 is therefore an estimate of the association between birth year and education length among Norwegians. (In other words, it is an estimate of the slope of the regression line for the association between birth year and education length in the Norwegian population.)

For the British, the regression function reduces to:

yi = a + b1∙xGBi + b3∙xBirthyear i + b4∙xGB * Birthyear i + ei

And since, for the British, xBirthyear i = xGB * Birthyear i, we can express their function as follows:

yi = a + b1∙xGBi + (b3 +b4)xBirthyear i + ei

Thus, the slope of the British regression line can be estimated by taking the sum of b3 and b4, while the coefficient b4 is an estimate of the difference between the British and the Norwegian regression line slopes. Similarly, b5 is an estimate of the difference between the Polish and the Norwegian slopes. Finally, note that the a-coefficient is an estimate of the mean education length of Norwegians who were born in the year 1900 (and a dubious one at that, since there are no 104-year-olds in the sample), while b1 could be seen as an estimate of the mean education differences between 104-year-old Britons and 104-year-old Norwegians. (What could b2 be seen as an estimate of?)

To perform a regression analysis based on this model, we must first compute the interaction term variables by multiplying the country dummy variables by the birth year variable. Just as in previous chapters, we use the ‘Compute’ feature in the ‘Transform’ menu to create the new variables. Start, for example, with the Great Britain * Year of birth interaction. Give the product of these two variables a name and a label. (In the example presented here, we gave it the label ‘Lives in Great Britain x two-digit year of birth’.) Next, instruct SPSS to compute this product. In the present example we did this by typing ‘Greatbritain * birthyear’ in the ‘Numerical Expression’ field, and clicking ‘OK’. (The asterisk * is the multiplication sign, ‘Greatbritain’ is the name of Great Britain’s country dummy variable, and ‘birthyear’ is the two-digit birth year variable’s name.) Follow the same steps to create the Poland * Year of birth interaction term. Finally, use the same procedures that have been demonstrated in the previous chapters to run the regression analysis. Here, we have put the birth year variable and the two country dummy variables in the first ‘Independent(s)’ field, and the two interaction terms in the field that appears when we click ‘Next’, so that we can use the F Change statistic to test whether the model that includes the interaction terms fits the data better than the model that does not include these terms. Remember to tick ‘R squared change’ in the ‘Statistics’ dialogue box.

Syntax that performs these procedures

* The following command causes the cases to be weighted by the design weight variable 'dweight'.

WEIGHT BY dweight.

*The following commands cause SPSS to select for analysis those cases that belong to the British, Polish or Norwegian sample (values GB, PL and NO on the country variable) and have lower values than 1975 on the birth year variable (& stands for AND, while | stands for OR, and < stands for 'less than'). *In this process, the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases. *Change the last part of line 2 (which starts after the first equals sign) if you wish to select other cases (if you do this, you should also change the variable label, which can be found within double quotation marks on line 3).

USE ALL.
COMPUTE filter_$=(cntry = 'GB' | cntry = 'PL' | cntry = 'NO') & yrbrn < 1975.
VARIABLE LABEL filter_$ "cntry = 'GB' or 'PL' or 'NO' & yrbrn < 1975 (FILTER)".
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

*Use this command to create a dummy variable that assigns value 1 to members of the Polish sample and 0 to the other selected cases.

COMPUTE Poland = ANY (cntry,'PL').
VARIABLE LABELS Poland 'Lives in Poland'.
EXECUTE.

*Computes a dummy variable that assigns the value 1 to members of the British sample and 0 to the rest.

COMPUTE Greatbritain = ANY (cntry,'GB').
VARIABLE LABELS Greatbritain 'Lives in Great Britain'.
EXECUTE.

*Compute interaction terms.

COMPUTE PolandXbirthyear = Poland * birthyear.
VARIABLE LABELS PolandXbirthyear 'Lives in Poland x two-digit year of birth'.
EXECUTE.
COMPUTE GreatbritainXbirthyear = Greatbritain * birthyear.
VARIABLE LABELS GreatbritainXbirthyear 'Lives in Great Britain x two-digit year of birth'.
EXECUTE.

*Command to run regression with interaction terms.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT eduyrs
/METHOD=ENTER Poland Greatbritain birthyear
/METHOD=ENTER PolandXbirthyear GreatbritainXbirthyear.
Go to next page >>