# Centring of variables

It is well known that the multiple (OLS) regression model is invariant under linear transformations. If we transform the variables, the estimates change in a predictable way, and we can recalculate to find the coefficients based on the untransformed variables (Hox 2010: 61-70). If we divide age by 10, the regression coefficient and its standard error become 10 times larger. In multiple regression models, this only holds for models without polynomials, such as age and age squared, and for models without interaction terms (Aiken & West 1991). In these situations, the invariance is limited to the most complex term, the highest polynomial and the interaction effect, but not to the main effects.

Why would we want to transform the variables? The two most popular reasons for transforming variables are to obtain common metrics and to ascertain that the regression constant is interpretable. In psychology, standardization of the variables is a common solution to both problems. Centring is sufficient for the second purpose. Centring means measuring one or more variables in deviations from the mean (X^{c}_{i} = X_{i} - x̄). In regression models, the constant (intercept) is the predicted value of the dependent variable when all explanatory variables are set to zero. If a regression model contains age, the regression constant will be the predicted value for new born persons (age=0). To avoid this, we can measure age in deviations from the overall mean, or from other values such as the minimum age in the data.

In linear multilevel models, this invariance only holds for variance component models. In situations with random coefficients (slopes), Hox (2010: 60-61) illustrates that linear transformations of the explanatory variable in question will affect the residuals of the slopes. However, as his empirical example shows (p. 62-63), the -2 log likelihood statistic remains the same for models with the original scoring, with standardized values, and with centred variables. In other words, the models are equivalent. The variance components change, but so do their standard deviations, so that the ratio of the variances to their standard errors remains roughly the same.

In multilevel models, where variation in the intercepts is of interest, it is especially desirable that the intercepts refer to variable values represented in the data. If we grand mean centre all variables as explained above, the regression constant will be the predicted mean for persons with the average values for the explanatory variables. The variances of the intercept and the slope can be interpreted as the expected variances for the average person. It is also advantageous to centre variables with the zero value outside the observed range in the data when interpreting cross-level interactions.

An alternative to grand mean centring is to centre on the group mean. Assume that we have data from rounds 1 or 2 of the European social survey, where the countries are the level 2 units or groups. Assume further that household income is one of the explanatory variables. As income was measured in intervals, we can replace the interval number by the category mean. Whether we use the raw numbers, their natural logs or grand mean centring, we conceal two sources of income differences: differences within each country and differences between the countries. In such situations, group mean centring makes more sense than grand mean centring. For a more thorough discussion of centring, see Hox (2010: 68-69).

The special purpose software for multilevel modelling, HLM and Mlwin, has options for automatic implementation of centring. In SPSS and Stata, grand mean centring has to be done manually by creating centred versions of variables by using *compute* in SPSS and *generate* in Stata. Group mean centring can be performed in one step in SPSS using the aggregate command, while in Stata the operation requires two steps. First, *collapse* creates a separate file with the aggregated variables and *merge* can be used to add the aggregated file to the level 1 (individual) file.