The estimation of the quality of survey questions

There are many different procedures for estimating the quality of questions and of measures for complex concepts. The best known is perhaps the test-retest design [Lor68] for estimating the reliability of questions. An adjustment of this approach was the Quasi-simplex model [Hei69], [Wil70] used by [Alw91] and [Alw07]. The Multitrait - Multimethod (MTMM) design was suggested in order to take into account the effects of the method used [Cam59]. It was further developed by Andrews [And84] and others for survey questions. For concepts with multiple indicators, different procedures have been developed based on latent variable models such as factor analysis [Law71], [Har76] and latent class analysis [Hag88], [Ver03], [Bie11]. Furthermore, scaling methods have been developed, such as the Thurstone scale, Likert scale etc. [Tor58], the Gutmann scale and Mokken scale [Mok71], the Unfolding scale [Sch97], Rasch scale [Ras60] and Item Response theory [Ham91]. For the advantages and disadvantages of these different procedures, we refer to this literature.

All these procedures require at least two questions to estimate the quality of each concept. That means that the number of questions has to be at least twice the number of concepts one wishes to take into account in the analysis. As a result, these procedures lead to rather costly and time-consuming research involving rather complex procedures. Besides, all these procedures provide estimates of the quality of specific questions or concepts for the formulation of specific questions used in a specific questionnaire and context. Thus, generalization is not easily possible.

This means that a lot of research has to be done before the final data collection in order to correct for measurement errors in all variables in the study. This is so much work that it is only seldom done. So, the question is whether there is a procedure that is less time-consuming and expensive for estimating the quality of survey questions and of composite scores for concepts with multiple indicators.

From the very start of the European Social Survey (ESS), Saris has emphasized that the measures will contain errors and that, without correction for these errors, the results will be questionable and incomparable across countries. Therefore, since the beginning of 2002, each survey in the ESS has contained four to six MTMM experiments to evaluate the quality of the questions.1 These MTMM experiments were carried out in most countries and all rounds. An example of such an experiment was presented in the first chapter.

In the normal MTMM experiment suggested by [Cam59], the respondent has to provide responses to three different questions (i.e. traits) measured using three different methods [And84]. Because people had to answer the same question approximately three times, we might expect memory effects. In order to cope with the memory effects in the MTMM experiments, it has been suggested by [Sar04] that the sample can be randomly split into different subgroups and the same question asked only twice in each group. This design was named the Split - Ballot Multitrait - Multimethod (SB-MTMM) design. [Sar04] also showed that this design enables estimation of the reliability and validity (complement of the method effect) and the quality of each question.2 In recent years, all experiments in the ESS rounds have been analysed using the SB-MTMM procedure. Consequently, after the first three ESS rounds, more than 250 SB-MTMM experiments have been conducted in more than 20 countries (languages), including approximately 2,700 questions. Thus, because of the results obtained after the first three rounds of the ESS, together with the results obtained by previous and simultaneous MTMM analysis done by other research agencies, the reliability, validity and quality of 3,726 questions is now known. However, this information is not enough because, at the same time in the ESS, more than 62,000 questions were asked about values, norms, policy preferences, feelings etc. the quality of which was not analysed. Thus, a different approach was required.

Go to next page >>