# Chapter 4: Wage determination in firms

This chapter will conclude our analysis of wage determination in firms. We will follow the recommended strategy in four steps, as outlined above. The questions can be answered using SPSS Mixed or Stata Xtmixed.

### Describing the data

Let us start with the wage determination example for which we have already estimated OLS regression models. The first lines of the file (SPSS version) are shown below:

Figure 4.1.

The first variable is the firm identifier and the second the employee identification code. Note that the file is sorted by the firm identifier, so that the employees are nested within the firms. There are five employees in the first firm, three in the second, and four in the third. The rest of the variables should be familiar, except for the firm level variables: private, size and profit. A distinction is drawn between private sector firms (private=1) and public sector firms (organizations) (private=0). The second variable is firm size measured by number of employees, and the final firm level variable is a measure of profit per employee measured in the same years as the other variables are measured.

Before we start, describe the variables by using Describe in SPSS or Summarize in Stata.

SPSS solution:

Figure 4.2.

Stata solution:

Figure 4.3.

We have information for all employees for most variables, with three exceptions. Wage is only reported for 3,759 employees, EGP classes for 4,042, and profit for only 1,143 firms. The reason for the latter is that profit is not available for non-profit organizations such as the public administration.

Page 1

# Step 1: Estimate the null model

Using SPSS Mixed

We find the Mixed procedure under Analyze in the menu:

Figure 4.4.

This opens the Linear Mixed Models dialogue box, where Subjects (level 2 unit identifier) have to be defined as shown here:

Figure 4.5.

Next, press Continue and define the dependent variable as shown. In the null model, we do not need any factors (categorical explanatory variables) or covariates (continuous explanatory variables).

Figure 4.6.

Next, press Random and define Combinations as firmno.

Figure 4.7.

Press Continue and then Estimation to change from default to ML estimation. In addition, the default settings for the estimation procedures can be changed here:

Figure 4.8.

Press Continue and then Statistics to define the output to be printed. The minimum option is to select Parameters estimates:

Figure 4.9.

#### SPSS syntax

Pressing Paste produces the syntax below. The lines with criteria are only necessary if the default values for the estimation procedure need to be changed.

MIXED wage
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=| SSTYPE(3)
/METHOD=ML
/PRINT=SOLUTION
/RANDOM=INTERCEPT | SUBJECT(firmno) COVTYPE(VC)
/EMMEANS=TABLES(OVERALL).

Table 4.1. Estimates of Fixed Effectsa

Table 4.2. Estimates of Covariance Parametersa

Stata xtmixed

The following one-line command will suffice:

. xtmixed wage || firmno: , ml variance

#### Stata output

Table 4.3

Note that the Stata output also includes a Likelihood ratio (LR) test, where the current model is compared to the linear regression model. The probability value indicates that the random intercepts represent a significant improvement compared with the OLS model. Also note that ‘variance’ is added to the command to produce an estimate of the random terms as variances rather than as standard deviations.

### Question (both Stata and SPSS)

Calculate and interpret the intraclass correlation (ICC). Does the result indicate that a two-level model is required?

ICC = 0.288

Interpretation:
About 29 per cent of the variance in wages stems from differences among firms. This clearly indicates that a multilevel model is required.

Page 2

# Step 2: Develop the level 1 model

Using SPSS Mixed

Add individual level explanatory variables, edyears, age, agesqr and female, to the null model. Compare the results with the OLS model.

In the main menu, select Analyze, Mixed models, Linear. In the first dialogue box, enter firmno under Subjects and press continue. In the following box, define wage as the dependent variable and edyears, age, agesqr, and female as covariates.

Press Fixed and define the ‘fixed effects’, i.e. the fixed regression coefficients.

The rest is similar to the dialogue boxes for the null model.

#### SPSS syntax

Pressing Paste produces the following syntax:

MIXED wage WITH edyears age agesqr female
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=edyears age agesqr female | SSTYPE(3)
/METHOD=ML
/PRINT=SOLUTION
/RANDOM=INTERCEPT | SUBJECT(firmno) COVTYPE(VC).

#### SPSS output

Table 4.4. Estimates of Fixed Effectsa

Table 4.5. Estimates of Covariance Parametersa

Using Stata xtmixed

. xtmixed wage edyears age agesqr female || firmno: , variance

Table 4.6.

### Comparison with OLS model

The fixed regression coefficients differ from the OLS estimates, but not substantially so. Returns on education are slightly lower (4.49 vs. 4.56) and the female disadvantage is smaller (-15.37 vs. 17.39). The fixed coefficients are interpreted in the same way as for OLS regression. The standard errors of the coefficients seem to be surprisingly similar. This is probably due to the high number of firms (880).

### Questions:

1. Calculate the pseudo R squares for the two levels.
2. How much of the between-firm variation in wages can be explained by compositional factors?

1. Explained variance at level 1: (643.936 - 476.334)/643.936 = 0.260. Explained variance at level 2: (260.893 - 116.291)/260.893 = 0.554.
Page 3

# Step 3: Develop the random model

Should one or more of the regression coefficients be set free to vary among the firms? In other words, do one or more of the regression coefficients show significant variation among the firms? This step should be guided by theory, especially in situations with many x-variables. What about our model? Due to market imperfections, differences can be expected between firms in terms of the marginal wage premium for an extra year of education.

### Question:

Use SPSS or Stata to check whether the effects of years of education on wages show statistically significant variation among the firms? Estimate the best model.

#### Tip:

Use the Wald (Z) test as an initial evaluation of the variance components. Next, use likelihood ratio (LR) tests to compare models with and without random coefficients.

χ2H = -2LLK-H - -2LLK

SPSS does not let us automatically perform the LR test. We have to estimate each model and compute the chi square as the difference between the -2LL for the models found in the information criteria table. The table of covariance parameters shows that this is the smaller model without any random coefficient.

Table 4.7. Information Criteriaa

Table 4.8. Estimates of Covariance Parametersa

The output for the models with the coefficient of education defined as random:

Table 4.9. Information Criteriaa

Table 4.10. Estimates of Covariance Parametersa

With the exception of the covariance term, the estimates divided by their standard errors are all greater than 2. The LR test statistic: 34,407.44 - 34,344.278 = 63.16. The test statistic is chi-square distributed, with the difference in the number of parameters as degrees of freedom. In this example, the random coefficient model has two more random terms. The outcome is statistically significant at any conventional level and the coefficient of education seems to vary among the firms.

Can we drop the covariance term UN(2,1)?

The LR test statistic: 34,346.014 - 34,344.278 = 1.74. This is chi-square distributed with one degree of freedom. The outcome is not statistically significant and the covariance can be dropped from the model.

The main output from the best model so far:

Table 4.11. Information Criteriaa

Table 4.12. Estimates of Fixed Effectsa

Table 4.13. Estimates of Covariance Parametersa

LR tests can be performed manually using this formula:

χ2H = 2(LLK - LLK-H)

In Stata, however, estimates from models can be stored and the results used to perform an LR test for edyears as follows:

/*Fitting random intercepts and storing results*/
quietly xtmixed wage edyears age agesqr female || firmno: , mle nolog
estimates store ri
/*Fitting random coefficients and storing results*/
quietly xtmixed wage edyears age agesqr female || firmno: edyears, mle nolog cov(un)
estimates store rc
/*Running the likelihood-ratio test to compare the two models*
/Lrtest ri rc

### Output

Likelihood-ratio test: LR chi2(2) = 63.16
(Assumption: ri nested in rc): Prob > chi2 = 0.0000

The test statistic is chi-square distributed, with the difference in the number of parameters as degrees of freedom. In this example, the random coefficient model has two more random terms. The outcome is statistically significant at any conventional level and the coefficient of education seems to vary among the firms.

Do we need the covariance between the level 2 residuals? This should also be decided using an LR test. The outcome below shows that the term can be deleted:

Likelihood-ratio test: LR chi2(1) = 1.74
(Assumption: ri nested in rc): Prob > chi2 = 0.1877

The first model is based on the default settings for the covariance structure and, in the second model, the covariance structure is specified as unstructured.

The best model so far:

Table 4.14.

Page 4

# Step 4: Add level 2 explanatory variables

Do large firms pay higher wages than small firms? Do they also pay higher returns on education than small firms? Answer these two questions by estimating models with firm size.

#### Tip

It is always useful to try various versions of some variables. For firm size, the natural log transformation and recoding size into small and large firms are both worth trying. The latter is recommended as a starting point since it makes the interpretation of the cross-level interaction easier. The -2LLs for the models with the size alternatives could indicate the empirically best version.

#### Question

Why might lnsize (the natural log of size) work better in the analysis than size?

Size is very right-skewed and the natural log shrinks the right tail of the distribution.

SPSS Mixed

First, we have to create the alternative versions of size and the cross-level interaction terms.

Compute lnsize = ln(size).

SPSS Frequencies can be used to find the quartiles:

FREQUENCIES VARIABLES=size
/FORMAT=NOTABLE
/NTILES=4
/ORDER=ANALYSIS.

Table 4.15. Statistics

Let us use 70 as the cut-off point between small and large firms:

Recode size (low thru 70 =0)(71 thru high=1)into large.

The cross-level interactions:

Compute edsize = edyears*size.
Compute edlnsize = edyears*lnsize.
Compute edlarge = edyears*large.

Firstly, trying out the three versions of the main effect of firm size indicates that lnsize is the best version, followed by large. Next, estimate models with the cross-level interactions.

With edlarge:

Table 4.16. Estimates of Fixed Effectsa

Table 4.17. Estimates of Covariance Parametersa

With lnlarge:

Table 4.18. Estimates of Fixed Effectsa

Table 4.19. Estimates of Covariance Parametersa

The two main conclusions are as follows:

Firm size has a direct or main effect on the wage level, as large firms pay better. Firm size does not seem to affect the return on education, however, although the latter seems to vary among firms. Interpretation is easiest for the model with large. On average, large firms pay about NOK 5 more per hour. The marginal effect of education is almost identical in small (4.428) and in large (4.429) firms. The estimate of the variance in the slope residuals can be used to calculate a confidence interval for the effects of education among firms. The standard error is 2.13, which means a confidence interval of 4.43 +- 4.18 = 0.25 - 8.61.

Stata xtmixed

First, we have to create the alternative versions of size and the cross-level interaction terms.

generate lnsize = ln(size)
generate large =.
replace large = 0 if (size < 71)
replace large = 1 if (size > 70)

The cross-level interactions:

generate edsize = edyears*size
generate edlnsize = edyears*lnsize
generate edlarge = edyears*large

First, trying out the three versions of the main effect of firm size indicates that lnsize is the best version, followed by large. Next, estimate models with the cross-level interactions.

With edlarge:

. xtmixed wage edyears age agesqr female large edlarge || firmno: edyears, mle variance

Table 4.20.

With edlnsize:

Table 4.21.

The two main conclusions are as follows:

Firm size has a direct or main effect on the wage level, as large firms pay better. Firm size does not seem to affect the return on education, however, although the latter seems to vary among firms. Interpretation is easiest for the model with large. On average, large firms pay about NOK 5 more per hour. The marginal effect of education is almost identical in small (4.428) and in large (4.429) firms. The estimate of the variance in the slope residuals can be used to calculate a confidence interval for the effects of education among the firms. The standard error is 2.13, which means a confidence interval of 4.43 +- 4.18 = 0.25 - 8.61.

### Final note

The file also contains a private versus public sector indicator. I leave it is an open exercise to try adding this firm level variable to the model. Another possibility is to redo the analysis with the natural logarithm of wage as the dependent variable.

Page 5