Interpreting results of regression with interaction terms: Example

Table 12 shows that adding interaction terms, and thus letting the model take account of the differences between the countries with respect to birth year effects on education length, increases the R2 value somewhat, and that the increase in the model’s fit is statistically significant. Correspondingly, the model 2 part of table 13 shows that both the Polish and the British associations between birth year and education length are significantly different from the Norwegian one at the 5% level. The estimate of the Polish regression line slope indicates that it is a notch steeper than the Norwegian (0.097 + 0.02 = 0.117 as against 0.097), while the British line seems to be less steep (0.097 – 0.037 = 0.06), which can be seen from the negative sign of the estimate of the interaction term’s coefficient. However, only the slope of the British regression line is significantly different from the Norwegian slope at the 1% level. Table 13 also shows that the mean Polish education length starts at a lower level than the Norwegian one in the older cohorts. (The country dummy is negative and statistically different from 0, which indicates that the Polish regression line cuts across the dependent variable axis at a smaller education length value than the Norwegian line does.)

Table 12. SPSS output: Regression with interaction goodness of fit statistics

Table 13. SPSS output: Regression with interaction coefficients

Note that the model 1 estimate of the birth year’s coefficient (0.087) is a non-weighted mean of the three countries’ coefficients. It is necessary to weight the cases with the combined population size / design weight to obtain an unbiased estimate of the mean coefficient. (An estimate that takes account of the countries’ population size differences.) Using ordinary case weighting and regression analysis may produce better slope estimates (in the proximity of 0.078) but the statistical tests cannot be trusted. If correct statistical tests are an issue, you could use this ‘Complex samples’ procedure. (Weighted least squares regression with the population size / design weight, as the weighting variable would also be better than OLS regression on weighted cases.)

Complex samples procedure

*Computes weight variable to be used in analyses aimed at estimating mean values for groups of countries.

COMPUTE combiweight = dweight * pweight.
VARIABLE LABELS combiweight 'Design weight * Population size weight'.

*The following commands cause SPSS to select for analysis those cases that belong to the British, Polish or Norwegian sample (values GB, PL and NO on the country variable) and have lower values than 1975 on the birth year variable . *In this process the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases. *Change the last part of line 2 (which starts after the first equals sign) if you wish to select other cases (if you do this, you should also change the variable label, which can be found within double quotation marks on line 3).

COMPUTE filter_$=(cntry = 'GB' | cntry = 'PL' | cntry = 'NO') & yrbrn < 1975.
VARIABLE LABEL filter_$ "cntry = 'GB' or 'PL' or 'NO' & yrbrn < 1975 (FILTER)".
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.

*Use this command to create a dummy variable that assigns value 1 to members of the Polish sample and 0 to the other selected cases.

COMPUTE Poland = ANY(cntry,'PL').
VARIABLE LABELS Poland 'Lives in Poland'.

*Creates a dummy variable that assigns the value 1 to members of the British sample and 0 to the rest.

COMPUTE Greatbritain = ANY(cntry,'GB').
VARIABLE LABELS Greatbritain 'Lives in Great Britain'.

*Preparation of Plan file for Complex Samples analysis of ESS data. *The PLAN FILE command tells SPSS to store the plan file in the root directory of drive C, and you must change the name of the directory if you want the file to be stored somewhere else. *If you are using the dataset downloaded from ESS EduNet, please ignore the warning 'This procedure does not check the consistency of the working data file with the plan file. We recommend looking at the output table or the plan file to check consistency before performing selection or analysis'.

/PLAN FILE='C:\ESSplan.csaplan'

*Runs Complex Samples General Linear Model. *Note: If the plan file is not stored in the root directory of drive C, please insert the correct directory name in the PLAN FILE command.

CSGLM eduyrs WITH birthyear Poland Greatbritain
/PLAN FILE ='C:\ESSplan.csaplan'
/MODEL birthyear Poland Greatbritain


Regression lines based on weighted model 1 estimates are presented in Figure 15, whereas Figure 16 demonstrates the more nuanced story told by the estimates obtained by using model 2, which includes interaction terms. Ultimately, it is up to you to decide whether the improved detail of the results presented in Figure 16 compared with those presented in Figure 15, is worth the effort and complexity of interpretation required if model 1 is replaced with model 2.

Figure 15. Estimated regression lines for three countries with common (weighted mean) slope

Figure 16. Estimated regression lines for three countries with separate slope estimates.

Note also that the use of interaction terms is not limited to cases in which the associations between dependent and independent variables vary between countries. In principle, it can be used in all cases where one variable’s association with the dependent variable varies with the value of another variable. The association between education and subsequent occupational career may, for instance, depend on people’s gender and social or ethnic background. In such cases, you can create interaction terms by multiplying education variables by gender or ethnicity variables etc.