Interaction terms: Example
In the following example, we abstain from excesses and present a simple model in which we use data from Poland, Great Britain and Norway (which is the reference country). The focus is still on the association between year of birth and length of education, but now the model also includes the country dummies and the country * birth year interaction terms. The regression function can be expressed as follows:
yi = a + b1∙xGBi + b2∙xPLi + b3∙xBirthyear i + b4∙xGB * Birthyear i + b5∙xPL * Birthyear i + ei
where:
For members of the British sample, the value of the first interaction term is identical to their birth year value, while, for Norwegians and Poles, its value is 0.
Similarly, for members of the Polish sample, the value of the second interaction term is identical to their birth year value, while it is 0 for others.
Those who belong to the Norwegian sample are assigned the value 0 on both dummy variables as well as on both interaction terms. Hence, for Norwegians the regression function reduces to yi = a + b3∙xBirthyear i + ei, and the estimate of the coefficient b3 is therefore an estimate of the association between birth year and education length among Norwegians. (In other words, it is an estimate of the slope of the regression line for the association between birth year and education length in the Norwegian population.)
For the British, the regression function reduces to:
yi = a + b1∙xGBi + b3∙xBirthyear i + b4∙xGB * Birthyear i + ei
And since, for the British, xBirthyear i = xGB * Birthyear i, we can express their function as follows:
yi = a + b1∙xGBi + (b3 +b4)xBirthyear i + ei
Thus, the slope of the British regression line can be estimated by taking the sum of b3 and b4, while the coefficient b4 is an estimate of the difference between the British and the Norwegian regression line slopes. Similarly, b5 is an estimate of the difference between the Polish and the Norwegian slopes. Finally, note that the a-coefficient is an estimate of the mean education length of Norwegians who were born in the year 1900 (and a dubious one at that, since there are no 104-year-olds in the sample), while b1 could be seen as an estimate of the mean education differences between 104-year-old Britons and 104-year-old Norwegians. (What could b2 be seen as an estimate of?)
To perform a regression analysis based on this model, we must first compute the interaction term variables by multiplying the country dummy variables by the birth year variable. Just as in previous chapters, we use the ‘Compute’ feature in the ‘Transform’ menu to create the new variables. Start, for example, with the Great Britain * Year of birth interaction. Give the product of these two variables a name and a label. (In the example presented here, we gave it the label ‘Lives in Great Britain x two-digit year of birth’.) Next, instruct SPSS to compute this product. In the present example we did this by typing ‘Greatbritain * birthyear’ in the ‘Numerical Expression’ field, and clicking ‘OK’. (The asterisk * is the multiplication sign, ‘Greatbritain’ is the name of Great Britain’s country dummy variable, and ‘birthyear’ is the two-digit birth year variable’s name.) Follow the same steps to create the Poland * Year of birth interaction term. Finally, use the same procedures that have been demonstrated in the previous chapters to run the regression analysis. Here, we have put the birth year variable and the two country dummy variables in the first ‘Independent(s)’ field, and the two interaction terms in the field that appears when we click ‘Next’, so that we can use the F Change statistic to test whether the model that includes the interaction terms fits the data better than the model that does not include these terms. Remember to tick ‘R squared change’ in the ‘Statistics’ dialogue box.
Syntax that performs these procedures
* The following command causes the cases to be weighted by the design weight variable 'dweight'.
*The following commands cause SPSS to select for analysis those cases that belong to the British, Polish or Norwegian sample (values GB, PL and NO on the country variable) and have lower values than 1975 on the birth year variable (& stands for AND, while | stands for OR, and < stands for 'less than'). *In this process, the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases. *Change the last part of line 2 (which starts after the first equals sign) if you wish to select other cases (if you do this, you should also change the variable label, which can be found within double quotation marks on line 3).
*Use this command to create a dummy variable that assigns value 1 to members of the Polish sample and 0 to the other selected cases.
*Computes a dummy variable that assigns the value 1 to members of the British sample and 0 to the rest.
*Compute interaction terms.
*Command to run regression with interaction terms.