Chapter 5: Nominal independent variables
How can we assess the association between a metric dependent variable and a nominal independent variable, i.e. an independent variable that has more than two qualitatively different values? One such variable is country of residence. The original coding of the country variable makes it unfit for use in regression analyses. Regression analysis requires numerical variable values, but this variable has strings of letters as value codes. This can be fixed by recoding into numerical values, but what numerical values should we choose? There is no meaningful way in which we can create a generally applicable numerical ranking of countries of residence. Unfortunately, that is what we have to create if we want to represent more than two different countries by one single variable in a regression analysis.
Rather than using one single variable, the solution is to recode the country variable into a set of dichotomous variables. In chapter 1, we saw that we can use gender, a dichotomous variable with no natural ranking of its two values, as an independent variable. We could use the same method to compare two countries. We can deselect the other countries, assign the value 1 to those that live in one of the two countries and 0 to those who live in the other, and, finally, use this dichotomous, numerical variable as an independent variable. Now, if we are comparing three or more countries and want to see how living in one rather than another of them is associated with the average value of our dependent variable, we can just extend this technique by creating a set of dichotomous variables with 1 and 0 values.
Dichotomous variables with 1 and 0 values are called ‘dummy variables’. Hence, the technique has been dubbed the dummy variable method. This is how we proceed: We choose one country as our ‘reference category’ and make one dummy variable for each of the other countries. A particular country’s dummy is coded as follows: Persons are assigned value 1 if they live in that country and value 0 if they do not. Thus, those who belong to our reference country sample are assigned the value 0 on all dummy variables.