# All pages

# Chapter 4: Wage determination in firms

This chapter will conclude our analysis of wage determination in firms. We will follow the recommended strategy in four steps, as outlined above. The questions can be answered using SPSS Mixed or Stata Xtmixed.

### Describing the data

Let us start with the wage determination example for which we have already estimated OLS regression models. The first lines of the file (SPSS version) are shown below:

Figure 4.1.

The first variable is the firm identifier and the second the employee identification code. Note that the file is sorted by the firm identifier, so that the employees are nested within the firms. There are five employees in the first firm, three in the second, and four in the third. The rest of the variables should be familiar, except for the firm level variables: *private, size and profit*. A distinction is drawn between private sector firms (private=1) and public sector firms (organizations) (private=0). The second variable is firm size measured by number of employees, and the final firm level variable is a measure of profit per employee measured in the same years as the other variables are measured.

Before we start, describe the variables by using *Describe* in SPSS or *Summarize* in Stata.

We have information for all employees for most variables, with three exceptions. Wage is only reported for 3,759 employees, EGP classes for 4,042, and profit for only 1,143 firms. The reason for the latter is that profit is not available for non-profit organizations such as the public administration.

# Step 1: Estimate the null model

Using SPSS Mixed#### SPSS menus

We find the Mixed procedure under Analyze in the menu:

Figure 4.4.

This opens the Linear Mixed Models dialogue box, where Subjects (level 2 unit identifier) have to be defined as shown here:

Figure 4.5.

Next, press *Continue* and define the dependent variable as shown. In the null model, we do not need any factors (categorical explanatory variables) or covariates (continuous explanatory variables).

Figure 4.6.

Next, press Random and define Combinations as *firmno*.

Figure 4.7.

Press *Continue* and then *Estimation* to change from default to ML estimation. In addition, the default settings for the estimation procedures can be changed here:

Figure 4.8.

Press *Continue* and then *Statistics* to define the output to be printed. The minimum option is to select Parameters estimates:

Figure 4.9.

#### SPSS syntax

Pressing *Paste* produces the syntax below. The lines with criteria are only necessary if the default values for the estimation procedure need to be changed.

Table 4.1. Estimates of Fixed Effects^{a}

Table 4.2. Estimates of Covariance Parameters^{a}

The following one-line command will suffice:

#### Stata output

Table 4.3

Note that the Stata output also includes a Likelihood ratio (LR) test, where the current model is compared to the linear regression model. The probability value indicates that the random intercepts represent a significant improvement compared with the OLS model. Also note that ‘variance’ is added to the command to produce an estimate of the random terms as variances rather than as standard deviations.

### Question (both Stata and SPSS)

Calculate and interpret the intraclass correlation (ICC). Does the result indicate that a two-level model is required?

AnswerICC = 0.288

Interpretation:

About 29 per cent of the variance in wages stems from differences among firms. This clearly indicates that a multilevel model is required.

# Step 2: Develop the level 1 model

Using SPSS Mixed

Add individual level explanatory variables, edyears, age, agesqr and female, to the null model. Compare the results with the OLS model.

#### SPSS menus

In the main menu, select Analyze, Mixed models, Linear. In the first dialogue box, enter firmno under Subjects and press continue. In the following box, define wage as the dependent variable and edyears, age, agesqr, and female as covariates.

Press Fixed and define the ‘fixed effects’, i.e. the fixed regression coefficients.

The rest is similar to the dialogue boxes for the null model.

#### SPSS syntax

Pressing Paste produces the following syntax:

#### SPSS output

Table 4.4. Estimates of Fixed Effects^{a}

Table 4.5. Estimates of Covariance Parameters^{a}

### Comparison with OLS model

The fixed regression coefficients differ from the OLS estimates, but not substantially so. Returns on education are slightly lower (4.49 vs. 4.56) and the female disadvantage is smaller (-15.37 vs. 17.39). The fixed coefficients are interpreted in the same way as for OLS regression. The standard errors of the coefficients seem to be surprisingly similar. This is probably due to the high number of firms (880).

### Questions:

- Calculate the pseudo R squares for the two levels.
- How much of the between-firm variation in wages can be explained by compositional factors?

Answers:

- Explained variance at level 1: (643.936 - 476.334)/643.936 = 0.260. Explained variance at level 2: (260.893 - 116.291)/260.893 = 0.554.
- About 55 per cent.

# Step 3: Develop the random model

Should one or more of the regression coefficients be set free to vary among the firms? In other words, do one or more of the regression coefficients show significant variation among the firms? This step should be guided by theory, especially in situations with many x-variables. What about our model? Due to market imperfections, differences can be expected between firms in terms of the marginal wage premium for an extra year of education.

### Question:

Use SPSS or Stata to check whether the effects of years of education on wages show statistically significant variation among the firms? Estimate the best model.

#### Tip:

Use the Wald (Z) test as an initial evaluation of the variance components. Next, use likelihood ratio (LR) tests to compare models with and without random coefficients.

χ^{2}

_{H}= -2LL

_{K-H}- -2LL

_{K}

Answer SPSS Mixed

SPSS does not let us automatically perform the LR test. We have to estimate each model and compute the chi square as the difference between the -2LL for the models found in the *information criteria* table. The table of covariance parameters shows that this is the smaller model without any random coefficient.

Table 4.7. Information Criteria^{a}

Table 4.8. Estimates of Covariance Parameters^{a}

The output for the models with the coefficient of education defined as random:

Table 4.9. Information Criteria^{a}

Table 4.10. Estimates of Covariance Parameters^{a}

With the exception of the covariance term, the estimates divided by their standard errors are all greater than 2. The LR test statistic: 34,407.44 - 34,344.278 = 63.16. The test statistic is chi-square distributed, with the difference in the number of parameters as degrees of freedom. In this example, the random coefficient model has two more random terms. The outcome is statistically significant at any conventional level and the coefficient of education seems to vary among the firms.

Can we drop the covariance term UN(2,1)?

The LR test statistic: 34,346.014 - 34,344.278 = 1.74. This is chi-square distributed with one degree of freedom. The outcome is not statistically significant and the covariance can be dropped from the model.

The main output from the best model so far:

Table 4.11. Information Criteria^{a}

Table 4.12. Estimates of Fixed Effects^{a}

Table 4.13. Estimates of Covariance Parameters^{a}

Answer Stata xtmixed

LR tests can be performed manually using this formula:

χ^{2}

_{H}= 2(LL

_{K}- LL

_{K-H})

In Stata, however, estimates from models can be stored and the results used to perform an LR test for edyears as follows:

### Output

The test statistic is chi-square distributed, with the difference in the number of parameters as degrees of freedom. In this example, the random coefficient model has two more random terms. The outcome is statistically significant at any conventional level and the coefficient of education seems to vary among the firms.

Do we need the covariance between the level 2 residuals? This should also be decided using an LR test. The outcome below shows that the term can be deleted:

The first model is based on the default settings for the covariance structure and, in the second model, the covariance structure is specified as unstructured.

The best model so far:

Table 4.14.

# Step 4: Add level 2 explanatory variables

Do large firms pay higher wages than small firms? Do they also pay higher returns on education than small firms? Answer these two questions by estimating models with firm size.

#### Tip

It is always useful to try various versions of some variables. For firm size, the natural log transformation and recoding size into small and large firms are both worth trying. The latter is recommended as a starting point since it makes the interpretation of the cross-level interaction easier. The -2LLs for the models with the size alternatives could indicate the empirically best version.

#### Question

Why might *lnsize* (the natural log of size) work better in the analysis than *size*?

SPSS Mixed

First, we have to create the alternative versions of size and the cross-level interaction terms.

SPSS Frequencies can be used to find the quartiles:

Table 4.15. Statistics

Let us use 70 as the cut-off point between small and large firms:

The cross-level interactions:

Firstly, trying out the three versions of the main effect of firm size indicates that *lnsize* is the best version, followed by *large*. Next, estimate models with the cross-level interactions.

*edlarge*:

Table 4.16. Estimates of Fixed Effects^{a}

Table 4.17. Estimates of Covariance Parameters^{a}

With *lnlarge*:

Table 4.18. Estimates of Fixed Effects^{a}

Table 4.19. Estimates of Covariance Parameters^{a}

The two main conclusions are as follows:

Firm size has a direct or main effect on the wage level, as large firms pay better. Firm size does not seem to affect the return on education, however, although the latter seems to vary among firms. Interpretation is easiest for the model with large. On average, large firms pay about NOK 5 more per hour. The marginal effect of education is almost identical in small (4.428) and in large (4.429) firms. The estimate of the variance in the slope residuals can be used to calculate a confidence interval for the effects of education among firms. The standard error is 2.13, which means a confidence interval of 4.43 +- 4.18 = 0.25 - 8.61.

Stata xtmixed

First, we have to create the alternative versions of size and the cross-level interaction terms.

The cross-level interactions:

First, trying out the three versions of the main effect of firm size indicates that lnsize is the best version, followed by large. Next, estimate models with the cross-level interactions.

With *edlarge*:

Table 4.20.

With *edlnsize*:

Table 4.21.

The two main conclusions are as follows:

Firm size has a direct or main effect on the wage level, as large firms pay better. Firm size does not seem to affect the return on education, however, although the latter seems to vary among firms. Interpretation is easiest for the model with large. On average, large firms pay about NOK 5 more per hour. The marginal effect of education is almost identical in small (4.428) and in large (4.429) firms. The estimate of the variance in the slope residuals can be used to calculate a confidence interval for the effects of education among the firms. The standard error is 2.13, which means a confidence interval of 4.43 +- 4.18 = 0.25 - 8.61.

### Final note

The file also contains a private versus public sector indicator. I leave it is an open exercise to try adding this firm level variable to the model. Another possibility is to redo the analysis with the natural logarithm of wage as the dependent variable.