From estimation to prediction

The new idea was to code the characteristics1 of the questions involved in the experiments and, with this knowledge, to develop an algorithm to predict the quality of the questions [Sar07]. If that prediction proved successful, the same algorithm could be used to predict the quality of any other question not necessarily involved in the experiments. Work on this new approach has been subsidized by the European Commission for Infrastructure research and has led to the development of the program Survey Quality Predictor (SQP). It contains the quality information relating to the questions involved in the MTMM experiments, but can also be used to predict the quality of questions from other studies. For the complete report about the development of SQP 2.0, we refer to [Sar14] and [Sar11]. Here, we only mention the basic steps that were introduced in this process.

The first step was to use the characteristics and the context of the questions involved in MTMM experiments as predictors of their quality. Therefore, a program was developed to code the characteristics of the questions that were involved in the MTMM experiments. For details of these characteristics, we refer to [Sar11]. People who spoke the different languages involved in the experiments and who were able to understand English coded the questions. This was a very elaborate task, but the results were quite rewarding, as we will show below.

The next step was to choose a procedure to study the relationship between the question characteristics and the quality estimates for these questions. For this purpose, we did not choose the regression model used in the past [Sar07] but what is known as the ‘Random forest’ approach developed by [Bre01], because it was suggested to be the most efficient prediction procedure for this kind of problem.

It turned out that this procedure provided rather good predictions of the reliability and validity of our data. The explained variance (R2) for reliability was 0.65 and for validity 0.84. As a result, the prediction of the quality was rather good [Obe11].

Go to next page >>