# Multilevel Models

#### Doug Hemken

#### February 2015

The MPlus language has options that allow you to work with mulilevel data in long form, in the style of mixed modeling software in contrast to the wide (or multivariate) form, typically used in SEM approaches to growth modeling and repeated measures. The long form makes it easier to work with unordered, unbalanced clusters of observations, in that it allows the user to leave many constraints of the model unstated (they become assumptions) which would have to be specified explicitly for data in wide form. This also allows for much simpler and more compact output.

MPlus allows the user to work at up to three levels in long form. More than that requires working in a mix of long and wide form.

(The user also has the option of working purely in wide form. The MPlus language has commands for reshaping data in either direction. In the SSCC we tend to use general purpose statistical software such as Stata, R, SAS, or SPSS for data manipulation, and just use MPlus for its modeling strengths.)

The chief conceptual insight behind these models is that random effects are unobserved, latent variables. So random effects can be thought of as similar to residual variances and the latent factors in a measurement model. This is very clear when the data are analyzed in wide form, but perhaps less so when the data are in long form.

## Three/Five Options

For most multilevel analyses using data in long form, there will be three to five things you need to specify to MPlus:

- VARIABLES: CLUSTER=
*varlist*;

Name the variable or variables within which the random effect is observed.

- VARIABLES: [WITHIN=
*varlist*;] [BETWEEN=*varlist*;]

(Optional) If there are independent effects that appear at just one level of the analysis, they must be declared. Variables not listed will have both a fixed and a random component.

- ANALYSIS: TYPE= {TWOLEVEL | THREELEVEL} [RANDOM];

Specify the number of levels, and if there are random slopes, then **random**.

- ANALYSIS: ESTIMATOR = ML;

(Optional) The default is robust ML, so if you want ML with the usual standard errors, you must ask for it.

- MODEL: %WITHIN% [
*termspec*]; %BETWEEN% [*termspec*];

Either the *within* or *between* part of the model must be specified, often both will be specified.

### The Random Intercept-Only Model

Sometime called anova style random effects, or variance components analysis. We are simply decomposing the distribution of y into a grand mean, variation between the mean y for each cluster, and the residual variation (within cluster). The basic specification is this:

```
TITLE: Two-level model, random intercept only
(based on MPLus example 9.1)
DATA: FILE = ex9.1a.dat;
! data are in long form
VARIABLE: NAMES = y x w xm cluster;
usevariables y cluster;
CLUSTER = cluster;
ANALYSIS: TYPE = TWOLEVEL;
estimator = ml;
MODEL:
%WITHIN%
;
! either %within% or %between% must be specified
! even if it includes no variables !
! %BETWEEN%
! ;
```

Note you get a warning about y not being connected to any other variables - you can ignore this.

The model above might be more clearly specified by

```
MODEL:
%WITHIN%
y;
%BETWEEN%
y;
```

### The Random Intercept Regression Model

Using the same data set from example 9.1a in the MPlus documentation we have a random intercept model with an additional regression variable.

```
TITLE: Two-level model
Regression with random intercept
DATA: FILE = ex9.1a.dat;
VARIABLE: NAMES = y x w xm clus;
usevariables y x clus;
within = x;
! x varies only within clus;
CLUSTER = clus;
ANALYSIS: TYPE = TWOLEVEL;
estimator = ml;
MODEL:
%WITHIN%
y on x;
%BETWEEN%
y; ! If you skip between, you get a warning;
```

Note that the regression effect of x is fixed. You could also have a fixed effect at the *between* level: just declare it and specify it in the appropriate part of the model.

### The Random Slope Regression Model

Continuing with the same example, we can ask what the model looks like if we include a random slope for x, in addition to the average slope. (It turns out this is another bad example, because there isn’t much random variation.)

Now we need to add a **random** option to the analysis type, and declare a latent variable to represent the random variation of the slopes.

```
TITLE: Two-level model
Regression with random slope
DATA: FILE = ex9.1a.dat;
VARIABLE: NAMES = y x w xm clus;
usevariables y x clus;
within = x;
CLUSTER = clus;
ANALYSIS: TYPE = TWOLEVEL RANDOM;
ESTIMATOR = ML;
! note the addition of the RANDOM type
MODEL:
%WITHIN%
slope | y on x;
! slope is a latent variable that captures
! the random variation of the x effects;
%BETWEEN%
y;
```

(*Slope* could be any valid variable name. The pipe character is the key, here.)

Informally, comparing the output of this model to the previous one we see that the parameter estimates are pretty nearly the same, and crucially the variance of the random slope is not significantly different than zero in the Wald test.

```
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Within Level
Residual Variances
Y 1.020 0.050 20.214 0.000
Between Level
Means
Y 2.024 0.140 14.500 0.000
SLOPE 0.720 0.035 20.546 0.000
Variances
Y 2.010 0.290 6.930 0.000
SLOPE 0.005 0.020 0.249 0.804
```

The mean of *slope* is the average x effect, while the variance of *slope* is the random effect.

### Between Effect on Random Slopes

For our final simple model, we consider the relation between a level two variable and those random slopes.

```
TITLE: Two-level model
Regression with random intercept
(based on MPLus example 9.2)
DATA: FILE = ex9.2a.dat;
VARIABLE: NAMES = y x w xm clus;
usevariables y x w clus;
within = x;
between = w;
CLUSTER = clus;
ANALYSIS: TYPE = TWOLEVEL random;
estimator = ml;
MODEL:
%WITHIN%
slope | y on x;
%BETWEEN%
y slope on w; ! here is w on slope;
y with slope;
```