Adding interaction terms to OLS regression models

Do men and women profit equally from an added year of education? This question can be answered by adding the education by gender interaction term to the model.

SPSS solution

The interaction term is simply the product of the two variables, female and edyears. In SPSS, we can create a new variable called edfem as follows:

Compute edfem = edyears*female.

Let us add this term to the model and re-estimate:

Table 1.12. Regression analysis with interaction term - SPSS output

First, we see that the coefficient of the statistical interaction term is statistically significant. This means that that the interaction effect should not be ignored. How should it be interpreted? The coefficient of the interaction term is the difference in the effect of education between women and men. The coefficient of edyears is no longer a general (main) effect, but the effect of education for men, i.e. when female=0. In other words, the marginal effect of adding one year of education is estimated to be 4.842 for men and 4.842-0.677= 4.165 for women.

To show this more clearly, it is best to work from the equation, replacing the x-symbols by variable names and edfem by its components.

Yi = b0 + b1Edyears1i + b2Age2i + b3Agesqr3i + b4Female + b5Edyears*Female + ei

Now, the effects of education and female cannot be interpreted separately. Edyears appears in two terms with the coefficients b1 and b5. Let us create two new equations, one for men and one for women, where we substitute the genders with their codes.

Men:

Yi = b0 + b1Edyears1i + b2Age2i + b3Agesqr3i + b4*0 + b5Edyears*0 + ei

Women:

Yi = b0 + b1Edyears1i + b2Age2i + b3Agesqr3i + b4*1 + b5Edyears*1 + ei

The actual regression coefficient for years of education is now: (b1 + b5)Edjears. For men this reduces to b1, and for women, the coefficient of years of education is b1+b5.

Stata solution

The interaction term is simply the product of the two variables, female and edyears. In Stata, we can create a new variable called edfem as follows:

generate edfem = edyears*female

Let us add this term to the model and re-estimate:

. regress wage edyears age agesqr female edfem
Table 1.13. Regression analysis with interaction term - Stata output

First, we see that the coefficient of the statistical interaction term is statistically significant at the 0.05 level. This means that that the interaction effect should not be ignored. How should it be interpreted? The coefficient of the interaction term is the difference in the effect of education between women and men. The coefficient of edyears is no longer a general (main) effect, but the effect of education for men, i.e. when female=0. In other words, the marginal effect of adding one year of education is estimated to be 4.842 for men and 4.842-0.677= 4.165 for women.

To show this more clearly, it is best to work from the equation, replacing the x-symbols by variable names, and edfem by its components

Yi = b0 + b1Edyears1i + b2Age2i + b3Agesqr3i + b4Female + b5Edyears*Female + ei

Now, the effects of education and female cannot be interpreted separately. Edyears appears in two terms with the coefficients b1 and b5. Let us create two new equations, one for men and one for women, where we substitute the genders with their codes.

Men:

Yi = b0 + b1Edyears1i + b2Age2i + b3Agesqr3i + b4*0 + b5Edyears*0 + ei

Women:

Yi = b0 + b1Edyears1i + b2Age2i + b3Agesqr3i + b4*1 + b5Edyears*1 + ei

The actual regression coefficient or effect of years of education is now: (b1 + b5Edyears. For men this reduces to b1, and for women, the coefficient of years of education is b1+b5.

Final comments

As an additional exercise, you can redo the example using the natural logarithm of wage as the dependent variable. Note that the interpretation of the regression coefficients changes. The regression coefficient of education will now show the proportional change in wages if one year of education is added.

Go to next chapter >>