---
title: "Structural Equation Modeling with Stata"
author: "Doug Hemken"
date: "October 2015"
---
```{r setup, echo=FALSE, message=FALSE}
source("../StataMDsetup.r")
```

# Introduction
[Stata notes](../stata.html)

This workshop series assumes you already have a knowledge of
Structural Equation Modeling, and are mainly interested in
learning how to use Stata to estimate these models.  We will start
with simple models, and try to make things more complicated/nuanced
from there.

There are two core Stata commands for structural equation modeling:
`sem` for models built on multivariate normal assumptions, and
`gsem` for models with generalized linear components.

In the usual Stata command style, both `sem` and `gsem` will be
used as *estimation* commands, and each will allow a host of
*post-estimation* commands to further examine the models.

We will take our first example from the MPlus documentation.
```{r infile, collectcode=TRUE}
infile x1-x3 using "..\..\MPlus\Basics\Sample stats\ex3.1.dat"
* The file is found at
*    "http://www.ssc.wisc.edu/~hemken/MPlus/Basics/Sample stats/ex3.1.dat"
```

A quick visualization of our data shows us three variables with
differing degrees of correlation:
```{r graphmatrix, results="hide", echo=1}
graph matrix _all, half maxes(yscale(range(-5 5)) ylabel(-4(4)4))
graph export "Covmodel_files/scattermatrix.png", replace
```
![Scatterplot matrix](Covmodel_files/scattermatrix.png)

We can start by using the usual Stata commands to look at some 
descriptive statistics for our data.  One of the advantages of
using Stata for SEM is that we have all of the usual data
manipulation and statistical commands at our fingertips!
```{r correlate, collectcode=TRUE}
correlate , means covariance
```

These are sample covariances, with $N-1$ used in the denominator.
Stata\'s `sem` command reports maximum likelihood covariances, with
$N$ used in the denominator.  We can use the usual Stata command
language to convert like this:
```{r convert}
matrix CV = r(C)*(r(N)-1)/r(N)
matrix list CV
```

# Covariance Model

The covariances (plus the means) form a saturated model for our data, that is,
they perfectly fit the covariance matrix (plus the means vector).

(Check this against the previous result.)

```{r covmodel}
sem (x1-x3 -> )
```

## Model Specification

*Paths* are specified in `sem` using parentheses and an \"arrow\", 
which may point either
to the left or to the right (`->` or `<-` are equivalent).

Multiple variables (we may use Stata\`s typical *varlist* shortcuts) 
may be collected on either side of the path arrow, or
paths may be specified separately.  Our covariance model could be written
```
sem (x1-x3 -> )
sem (<- x1-x3 )
sem (x1->)(<-x2)(x3->)
```

(You ought to be able to come up with a few more variants.)

## Sample Covariances, instead
Alternatively, we can have `sem` report sample covariances:
```{r altcov}
sem (x1-x3 -> ), nm1 // sample covariances instead of ML
```

## Correlations
If we are interested in the standardized solution to this model,
this would be just the correlation matrix.
```{r std}
correlate
pwcorr, sig
sem (x1-x3 -> ), standardized
```

Next: [Elementary path models](ElementaryPaths.html)
```{r cleanup, engine="R", echo=FALSE}
    unlink("profile.do")
```