# Confirmatory Factor Analysis

#### Doug Hemken

#### January 2015

Confirmatory factor analysis is the measurement model within a structural equations model. We have already seen one example of a very simple (and poorly fitting) CFA.

Assume we have already done an EFA, decided we have two factors, and that variables y1 through y3 load on factor 1 while variables y4 through y6 load on factor 2. (If you have looked at the EFA material, you might do this as an exercise. I’d suggest making two MPlus runs.)

After these preliminaries, here is a confirmatory factor analysis, straight from the MPlus documentation:

```
title: CFA with two factors
data: file = ex5.1.dat;
variable: names = y1-y6;
model: f1 BY y1-y3;
f2 BY y4-y6;
```

As you’ve seen, you have the option of rescaling this model in a variety of ways, perhaps constraining the latent variables to have unit variance (i.e. fixing the variance to 1, or standardizing), or setting the scale with different observed variables (like the ones with the smallest residual variances).

From here, the analyst might head in a number of directions.

## Modification Indices

One question that can come up is trying to determine whether any observed variables are cross-loaded on more than one factor. If you did an exploratory analysis, you might have noticed indications there. But if you are starting from a theoretical measurement model (perhaps based on the literature?), you might start with the CFA and then check modification indices. You get these by asking for additional output.

```
title: CFA and modification indices
data: file = ex5.1.dat;
variable: names = y1-y6;
model: f1 BY y1-y3;
f2 BY y4-y6;
output: modindices;
```

You may find this example a little frustrating, because the model fits fairly well as is! By default only modification indices greater than 10 are printed, so in this example you get nothing. Sorry about that! You can set the bar lower, if you like, and actually get some output:

```
title: CFA and modification indices
data: file = ex5.1.dat;
variable: names = y1-y6;
model: f1 BY y1-y3;
f2 BY y4-y6;
output: modindices(1);
```

Looking at the output, you should see that the biggest modification to model fit will come from allowing the residuals of y3 and y6 to be correlated. Assuming this is a significant modification, and that we have some theoretical justification for it, you allow correlations among residuals in the same manner as specifying correlations among independent variables, using the **with** operator.

```
title: CFA and modification indices
data: file = ex5.1.dat;
variable: names = y1-y6;
model: f1 BY y1-y3;
f2 BY y4-y6;
y3 with y6;
output: modindices(1);
```

We can look at the parameter estimates and tests, and see that this correlation isn`t really significant, or we can do a likelihood ratio test if we are still unsure. Note, too, that this modification was correlated with the others, so their values dropped below our bar. We are probably better off with our original, simpler model.

## Other measurement models (Equality constraints)

The analyst might also wonder how this measurement model compares to other, simpler measurement models. In particular we might want to compare our *congeneric model* to a *tau-equivalent model* or even to a *parallel model*.

The tau-equivalent model assumes that each observed measure has equal weight when measuring its factor. Specifying this requires attention to several details.

```
title: CFA tau-equivalent
data: file = ex5.1.dat;
variable: names = y1-y6;
model: f1 BY y1* y2 y3 (a);
f2 BY y4* y5 y6 (b);
f1 @1 f2 @1;
```

We constrain parameters to be equal by marking them in the **model:** command with a common label. Here the statement `f1 BY y1* y2 y3 (a);`

specifies three parameters. We free the first one, otherwise it would be fixed at 1 by default. And at the end of the statement we include the label `(a)`

, which is understood to mean the same label for all three parameters. An alternative way to write the same thing would be (fragment)

```
f1 by y1* (a);
f1 by y2 (a);
f1 by y3 (a);
```

So for the tau-equivalent model we constrain the measures of factor one to have equal loadings, the measures of factor two to have equal loadings (to each other), and the variances of our factors, the latent variables, to both be just 1 (one).

A likelihood ratio test comparing these two models, the congeneric and the tau-equivalent, has to be computed by other means, but looking at the log-likelihood of each model (H0) we see that \(2*(4908.663 -4906.609)=4.118\) on 4 degrees of freedom (difference in the number of free parameters). As a quantile in a chi-square distribution, that has an associated probability of 0.3915869. If we carry out this calculation in R, it looks like

`1-pchisq(2*(4908.663-4906.609), df=4)`

`## [1] 0.3915869`

In other words, the tau-equivalent model works about as well as the congeneric model.

If we further investigate the parallel model, we add constraints to specify that the residual variances in each measurement scale are equal.

```
title: CFA parallel
data: file = ex5.1.dat;
variable: names = y1-y6;
model: f1 BY y1* y2 y3 (a);
f2 BY y4* y5 y6 (b);
f1 @1 f2 @1;
y1 y2 y3 (c);
y4 y5 y6 (d);
```

Note here how naming a variable, like `x1;`

specifies a variance if the variable is exogenous, but specifies a residual variance if the variable is endogenous. This can be confusing if you are looking at MPlus code but not actually familiar with the model.

Here, the conclusions you come to with respect to model comparison depend on which model you use as a basis of comparison. A likelihood ratio test comparing the parallel to the tau-equivalent model has a p-value of 0.0329627 (try it!). However, an LR test directly comparing the parallel model to the congeneric model yields a p-value of 0.0674937. Apparently the equal-loading assumption dominates, here.