Transformation equation



(equation 1):


Here are the steps to get there:

  1. A standard linear equation takes this format:

  2. Putting in the two points we know, we have:

  3. With two unknowns, these equations can be solved:

  4. Putting m and c back into the original equation we have:

  5. When the minimum and maximum values for the competence indicator are included equation 1 emerges.

As you can see, if you put in a z-score of 0, the t-score will not be 5 unless the maximum and minimum values are of the same magnitude (e.g. 1.2 and -1.2), which they never will be for the standardised scores from survey questions. Instead, the minimum is almost always further from 0 (the mean), than the maximum is. This is because the mean tends to be above the mid-point of the distribution of untransformed response codes - a problem related to negative skew. For example, in response to the question “how much of the time during the past week could you not get going”, most people responded ‘never’ or ‘some of the time’ - the bottom two responses. Very few people responded with the other two responses (‘most of the time’ or ‘all of the time’).

As a result, if we want the mean untransformed score to always be halfway along our transformed distribution, then we have to stretch parts of that distribution. The smoothest way to do this is by having a linear stretch function, such that the end of the distribution which is closer to the mean is stretched whilst that furthest from the mean is compressed. Including a linear stretch makes the transformation. The function takes the following form:

If you are feeling brave why not try and solve to work out what the constants m, d and c are for a given indicator i with a minimum z-score of mini and a maximum z-score of maxi?

Any luck?

The trick is to start by putting in the three known points and trying to solve the equations that drop out.

Well, it comes out like this:

The whole thing simplifies to this:

And looks something like this:

Transformation should always be done at the last possible moment before presenting data, as the curvilinear relationship can distort patterns. For example, to get the mean score for a particular country on a particular indicator, the average of the z-scores across all individuals should first be calculated, and only then transformed; rather than transforming the z-scores for each individual and then taking the average.

Even taking these precautions, transformed scores must be treated with caution. The curvilinear transformation results in scores at one end of the distribution being stretched more than those at the other end. However, one should not be overly concerned with this distortion as it assumes that the original scales used in the questions are linear. Such faith would be ill-founded. For example, it is not necessarily the case that the difference between ‘all or almost all of the time’ (a response scored as ‘4’ for some questions) and ‘most of the time’ (scored as ‘3’), is the same as the difference between ‘most of the time’ (‘3’) and ‘some of the time’ (‘2’).

Go to next page >>