Chapter 7: Regression based on samples from several countries

What if we wish to use data from several countries simultaneously in a multiple regression analysis? This may create problems because the ESS survey sample is stratified, with county as the stratifying variable, while the SPSS ordinary linear regression module presupposes that we use a non-stratified, simple random sample. SPSS offers two different extensions of linear regression analysis that may alleviate this problem: a module for complex survey analysis and a mixed models module that handles multilevel analysis. You may want to check the virtues and possibilities of these modules if you plan to do regression analysis on data from many countries. If you only use individual level variables and data from a few countries, ordinary linear regression analysis may be an admissible option, but then you may have to take special precautions. You could, for example, weight the cases with the product of the ESS design weight and the ESS population size weight. (Use the ‘Compute’ feature in the ‘Transform’ menu to compute the product.) However, weighting the cases without making adequate adjustments to the standard error estimates may corrupt the statistical tests. Alternatively, to get more accurate statistical tests, you could skip weighting and enter country dummy variables as independent variables in your model.

But the latter solution presupposes that the associations between dependent and independent variables are constant across countries, which is frequently not the case. Therefore, the best ordinary linear regression solution might be to drop weighting and use regression models that allow regression slope coefficients to vary between countries. This can be achieved either by running separate regression analyses for each country (which brings us back to the solutions discussed in the previous chapters) or by supplementing our models with so called interaction terms (which are computed by multiplying each country dummy with every other independent variable, or at least with every other variable whose association with the dependent variable varies substantially between countries). It goes without saying that the total number of terms and coefficients in such a model may become excessively high if we include many independent variables and countries. We therefore recommend that, if the number of countries and interaction terms proliferate, and, in particular, if you want to assess the weighted mean association between variables across a group of countries rather than country-specific associations, you should drop ordinary linear regression and use the ‘General linear model’ program under SPSS’s ‘Complex samples’ module instead. On the next page, we will demonstrate how you can perform regression analyses with interaction terms.