# Statistical tests in multilevel analysis

The tests of interest are significance tests and confidence intervals for the regression coefficients and variance components, and likelihood ratio tests for the comparison of nested models. Some programs also compute fit indices that can be used to compare non-nested models.

### Testing regression coefficients

In OLS regression, statistical significance tests for the regression coefficients are based on the following null hypothesis:

H_{0} : β = 0 and H_{1}: β ≠ 0

The test statistic is the t-ratio formed by dividing the estimated regression coefficient by its standard error:

_{b}

The t-ratio is thus the parameter estimate divided by its standard error. The sampling distribution of the t-statistic is the Student’s t-distribution with n - k degrees of freedom, where n is the sample size and k the number of parameters. Like the normal distribution, the t-distribution is symmetrical, but flatter for low degrees of freedom. As the sample size increases, the t-distribution approaches the normal distribution.

In multilevel analysis, the standard normal sampling distribution is assumed under the null hypothesis. This follows from the assumption about normal distributed residuals. Accordingly, the Stata procedures for estimating multilevel models, xtreg and xtmixed, report the Z-statistics instead of the t. The null hypothesis is identical to the one from the t-test and the Z-ratio is formed in the same way as the t-ratio:

In SPSS, the Z-statistic is replaced by the Wald statistic, which is z squared, and it follows the chi-square distribution. Users of Mlwin will have to compute the test statistic manually or do it from the menus. In HLM, the t-test with J - k - 1 is used, where J is the number of groups (level 2 units) and k is the number of explanatory variables. This test is more conservative than the Z-test and the Wald-test in small samples.

### Testing variance components

Testing variance components is less straightforward, although the special purpose software programs as well as SPSS and Stata produce estimates and standard errors. The Z-test is not satisfactory because the sampling distributions of the variance components are skewed and not normal. Especially in a situation with few groups and small variance components, the Z (and the Wald) statistic is clearly non-normally distributed. Still, variance estimates of twice the size of their standard errors or more may be taken as a first indication of statistical significance. However, there is agreement in the literature that the best way of testing the statistical significance of variance components is to use the likelihood ratio test. This test can also be used to test nested models. It can be seen as a parallel to the F-test in OLS regression analysis.

### Comparing nested models

Nested models can be viewed as simplifications of a more general model achieved by removing one or more of its random or fixed parameters. From the likelihood functions, all software programs produce a statistic that shows how well the model fits the data. This statistic, sometimes called the ‘deviance’, is defined as -2*ln(Likelihood), or just -2LL. We do not know the sampling distribution of this statistic, but the difference between the -2LLs (or the ratio of the likelihoods) for two models can be assumed to be chi-square distributed, with the degrees of freedom determined by the difference in the number of parameters in the two models. In general, we assume that the lower the deviance, the better the model. In the definition of the test statistic, model K is the most general one, and model K-H the smaller one, nested within the former with H fewer parameters:

χ^{2}_{H} = -2LL_{K-H} - -2LL_{K}

Stata reports Log Likelihood rather than -2LL, and for Stata output the test statistic can be defined as follows:

χ^{2}

_{H}= -2(LL

_{K-H}- LL

_{K}) or χ

^{2}

_{H}= 2(LL

_{K}- LL

_{K-H})

Note that, in the second version, the log likelihood for model K-H is deducted from the log likelihood of model K. Rabe-Hesketh and Skrondal (2012, 88-89) argue that this test statistic is conservative when testing variance components, since they have a lower boundary of zero. This means that the test could be seen as one-sided and the probability value reported could be divided by two. Note that the - 2LL statistic depends upon the sample size. Valid tests therefore have to be based on identical sample sizes for the two models to be compared.