Example, part one: Create dummy variables

If you choose to use the syntax, you should still read through the following text.

Syntax for example

* The following command causes the cases to be weighted by the design weight variable 'dweight'.

WEIGHT BY dweight.

* The following commands cause SPSS to select for analysis those cases that belong to the British, Polish or Norwegian sample (values GB, PL and NO on the country variable) and have lower values than 1975 on the birth year variable . * In this process the commands create a filter variable (filter_$) with value 1 for the selected cases and value 0 for the non-selected cases. * Change the last part of line 2 (which starts after the first equals sign) if you wish to select other cases (if you do this, you should also change the variable label, which can be found within double quotation marks on line 3).

COMPUTE filter_$=(cntry = 'GB' | cntry = 'PL' | cntry = 'NO') & yrbrn < 1975.
VARIABLE LABEL filter_$ "cntry = 'GB' or 'PL' or 'NO' & yrbrn < 1975 (FILTER)".
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.

* Use this command to create a dummy variable that assigns value 1 to members of the Polish sample and 0 to the other selected cases.

COMPUTE Poland = ANY(cntry,'PL').
VARIABLE LABELS Poland 'Lives in Poland'.

* Creates a dummy variable that assigns the value 1 to members of the British sample and 0 to the rest.

COMPUTE Greatbritain = ANY(cntry,'GB').
VARIABLE LABELS Greatbritain 'Lives in Great Britain'.

* Runs regression with the two dummy variables as independent variables and length of education as dependent variable.

/METHOD=ENTER Poland Greatbritain.


Let us say that we wish to estimate the association between country and length of education and that we wish to use data from Great Britain, Poland and Norway. First, we must choose a reference country. This choice has some implications for what kind of information the analysis will produce. (It does not affect the assessment of whether or not there is an association between the variables but, of course, it affects what comparisons are made between individual countries.) You may, therefore, prefer to choose a reference country that is particularly suited as a basis with which to compare the other countries, but you may also prefer to choose a reference country that is represented by a relatively large sample, because this improves the accuracy of the regression coefficients as estimates of population coefficients.

There are no small country samples in the ESS data set, so you do not have to worry about sample sizes when comparing countries using ESS data, but other variables’ values may be distributed differently. Check how frequently the different values of a variable occur by going to ‘Descriptive statistics’ and ‘Frequencies’ on the ‘Analyze’ menu. Find the variables of interest on the variable list on the left, select them, put them in the ‘Variables’ field on the right and click ‘OK’. Frequency tables will appear in the output window. It is always advisable to check the value distributions of the different variables before you perform transformations on them or use them in regression analyses. Use histograms to check the distributions of metric variables.

If some of the persons have missing values or ‘don’t know’ answers etc. on the nominal variable, you may also want to create a separate dummy variable for this category of persons, with value 1 for those who have missing values and value 0 for all the others. If you do not do this, you run the risk of including people with missing values in your reference category, which is probably not where you would want them to appear, or, alternatively, you will fail to utilise all your data, which is normally not a good idea if the number of missing values is large. (You may have to take special precautions to compute a dummy for persons with missing values.) Here, however, we present an example where there are no missing values.

Say that for some reason we choose to compare Britons and Poles with Norwegians. We therefore choose the Norwegian sample as our reference category and create one dummy variable for each of the other two countries: one in which the British are assigned value 1, while people from the other two countries are assigned 0, and one on which the Polish are assigned 1 while Britons and Norwegians are assigned 0. These variables can be computed in several ways. Here, we propose to use the following procedure: Use the ‘Compute Variable’ module from the ‘Transform’ menu. After opening the dialoguebox, give the new variable a name (e.g. Great Britain) and a label (e.g. ‘Lives in Great Britain’). Then, on the right-hand side of the equals sign, type ANY(cntry,'GB'), or select the ANY(?,?) function from the ‘Functions and special variables’ list by clicking it before clicking the arrow. Then replace the first question mark with the country variable’s name and the second question mark with the string value of Britons (GB) in single quotes. (See how to get information about value codes here.) Finally, click ’OK’.

Figure 13. Computing a dummy variable

Repeat the procedure to create a variable for members of the Polish sample. Check Figure 13 to see which changes you have to make to the commands in the dialogue box to create this second dummy variable. If you want to compare additional countries with the reference country, you must include them in the active data set and create dummy variables for each one of them. You can also compare groups of countries, but in this case you must use the population size weight in order to take account of their different population sizes. A link to information about the uses of this weight can be found here. See also our discussion of problems associated with pooling of country samples in chapter 7.

The coding of our two new dummy variables is displayed in table 7.

Table 7. Nominal variable’s categories’ codes on dummy variable sets
Dummy for PolandDummy for Great Britain
Value Poles 1 0
Value Britons 0 1
Value Norwegians 0 0

Go to next page >>