# Estimation of multilevel models

This non-technical description of the estimations procedures for multilevel models is largely based on Hox (2010, Chapter 3). Multilevel models are normally estimated by Maximum Likelihood (ML), Restricted Maximum Likelihood (RML) or Iterative Generalized Least Squares (IGLS) algorithms. The main idea behind ML estimation is to find the estimates of the model parameters that have most likely produced the observed data, i.e. the covariances and the variances among the variables in the model. In large samples, ML estimates are reasonably robust against mild violations of assumptions such as non-normal errors. In ML estimation, the maximum likelihood function is maximized. In the full information ML method, both the regression coefficients and the variance components are included in the likelihood functions. In the RML method, only the variance components are included in the likelihood function, and the regression coefficients are estimated in a second step. The RML method seems to produce less biased estimates of the variance components, especially in small samples. The difference between the two estimation methods is normally insignificant. The ML method is still used because it has some other advantages over the RML method. It is computationally easier and, since the regression coefficients are included in the likelihood functions, likelihood ratio tests can be used to compare nested models that differ in the fixed part, i.e. the number of regression coefficients.

ML estimation requires an iterative procedure with starting values for the regression coefficients taken from OLS regression estimates and with zeros for the variance components. In the first iteration, a complex iteration procedure is used to try to improve on the starting values. Then the likelihood function is evaluated and the second iteration is performed. This procedure continues until the process converges, i.e. until the changes in the estimates are below a small threshold. Sometimes, however, the models do not converge. The most common cause of this is the inclusion of variance components that are close to zero. The remedy for this is to simplify the random part of the model.

Maximum likelihood estimation is a complex technical subject. An illustration of the principle of ML estimation in a very simple situation could therefore be helpful. More satisfactory explanations are found in Rabe-Hesketh and Skrondal (2012), Chapter 2.10-11, and in Raudenbush and Bryk (2002), Chapter 3.