Analysing cross sectional survey data using linear regression methods: A 'hands on' introduction using ESS data
By Associate Professor Odd Gåsdal
To be able to follow the instructions and solve the exercises in this topic, you need to have a copy of SPSS installed on your computer, and you should download and use the dataset 'Regression'.
It is recommended that you start with the first chapter, and then proceed chronologically. In this way, you will develop your skills gradually.
Use the menu to the left to navigate.
The main purpose of linear regression analysis is to assess associations between dependent and independent variables. In this chapter, you will learn the basic idea behind this technique. You will also learn how to create a graphic presentation of the association between two variables.
Visual inspection of regression lines may be convenient, but their steepness and direction are usually indicated by numbers rather than figures. These numbers are called regression coefficients. This chapter will teach you how to compute them.
In the preceding chapters, we assume that a straight line (or a linear function) can describe the association between variables. Linear models may not always fit the data. This chapter shows how the fit can be improved by adapting a curved regression line instead of a linear one.
Is it possible by analysing survey data to learn anything about those who did not participate? The answer is yes, provided that the survey was properly conducted on the basis of a random sample. In order to establish how accurate a single sample-based regression coefficient is as an estimate of the population coefficient, we need to know the size of the standard error. An estimate of the standard error is computed in a standard regression analysis, and this chapter shows how we can use this estimate to test the so-called null hypothesis about the value of the population regression coefficient.
If we wish to include ‘country’ as an independent variable in our regression analysis, we face the problem that country is not a metric variable. (Country is a nominal variable.) There is no meaningful way in which we can carry out a generally applicable numerical ranking of countries of residence. The solution is to recode the country variable into a set of dichotomous variables. This chapter explains how this can be done.
Dependent variables are always associated with more than one other variable. Some of these variables will normally be associated with each other. We cannot distinguish this common part of the association from the association that is unique to each unless we include them all in the regression analysis. Thus, multiple regression is regression analysis with more than one independent variable. This chapter provides an introduction to multiple regression using an example based on Polish data.
One of the problems we have to address if we wish to use data from several countries simultaneously in a multiple regression analysis, is that the associations between dependent and independent variables may not be constant across countries. We can deal with this problem by running separate regression analyses for each country, or by supplementing our models with so-called interaction terms. This chapter demonstrates how to perform regression analyses using interaction terms.
Linear regression analysis presupposes that the variables are metric. In surveys, a large proportion of the variables are ordinal, and often measures of attitudes. Individual measures of attitudes tend to be inaccurate because they only extract particular aspects of the general attitudes we wish to measure, or because people’s answers to single-attitude questions are plagued by random inaccuracies. Both these problems can be alleviated somewhat by combining the values of several indicator variables into scales. This chapter demonstrates how such scales can be built.