Elementary Path Models in Stata

Doug Hemken

October 2015

Elementary Path Models

Stata notes

We follow Kline (2011) in specifying models with two and three observed variables and direct effects among the variables.

With two observed variables we have two means and three variance/covariances. With three observed variables we have three means and six variance/covariances. Keeping track of how many "observations" (pieces of information) we start with is helpful in understanding when a model is saturated.

Bivariate Regression ("Single Cause")

Continuing with our example data.

In our model output we will see one mean ("x2 <- _cons"), one variance (the error variance, "var(e.x2"), and one regression path ("x2 <- x1"). Not shown in our output are the mean and variance of the exogenous variable, x1. Having used all our degrees of freedom, this model is saturated: it perfectly predicts the observed covariance matrix and the observed means.

infile x1-x3 using "Z:\PUBLIC_WEB\MPlus\Basics\Sample stats\ex3.1.dat"
sem (x1 -> x2)
(500 observations read)


Endogenous variables

Observed:  x2

Exogenous variables

Observed:  x1

Fitting target model:

Iteration 0:   log likelihood =  -1515.201  
Iteration 1:   log likelihood =  -1515.201  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     =  -1515.201

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x2 <-      |
          x1 |   .4479402   .0225188    19.89   0.000     .4038042    .4920762
       _cons |  -.2158931   .0366072    -5.90   0.000    -.2876418   -.1441444
-------------+----------------------------------------------------------------
    var(e.x2)|   .6104389   .0386075                      .5392715    .6909982
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

A similar example, but note here the parameter estimates are not significant - x2 does not predict x3. (But in terms of overall model fit, this is still an example of a saturated model.)

sem (x2 -> x3)
Endogenous variables

Observed:  x3

Exogenous variables

Observed:  x2

Fitting target model:

Iteration 0:   log likelihood = -1430.0543  
Iteration 1:   log likelihood = -1430.0543  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     = -1430.0543

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x3 <-      |
          x2 |   .0257818    .041816     0.62   0.538     -.056176    .1077395
       _cons |  -.0421945   .0437277    -0.96   0.335    -.1278991    .0435102
-------------+----------------------------------------------------------------
    var(e.x3)|    .956053   .0604661                      .8445926    1.082223
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

Multiple Regression ("Correlated Causes")

Here x1 and x3 are correlated exogenous variables.

sem (x1 x3 -> x2)
Endogenous variables

Observed:  x2

Exogenous variables

Observed:  x1 x3

Fitting target model:

Iteration 0:   log likelihood =  -2124.388  
Iteration 1:   log likelihood =  -2124.388  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     =  -2124.388

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x2 <-      |
          x1 |   .5382112   .0230425    23.36   0.000     .4930488    .5833736
          x3 |  -.3352289   .0365523    -9.17   0.000    -.4068701   -.2635878
       _cons |  -.2737943   .0344525    -7.95   0.000    -.3413199   -.2062688
-------------+----------------------------------------------------------------
    var(e.x2)|   .5225365   .0330481                      .4616172    .5914954
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

We note that this is also a saturated model. Although the output only reports three parameters related to the variance/covariances, three more are assumed among the exogenous variables, x1 and x3: we have two more variances and a covariance. They are not reported because they are assumed to be equal to the observed values, that is, they are part of the model but they are not estimated.

If we want variances, covariances, and means of exogenous variables to be reported, we must ask for them.

sem (x1 x3 -> x2), variance(x1 x3) covariance(x1*x3) means(x1 x3)
Endogenous variables

Observed:  x2

Exogenous variables

Observed:  x1 x3

Fitting target model:

Iteration 0:   log likelihood =  -2124.388  
Iteration 1:   log likelihood =  -2124.388  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     =  -2124.388

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x2 <-      |
          x1 |   .5382112   .0230425    23.36   0.000     .4930488    .5833736
          x3 |  -.3352289   .0365523    -9.17   0.000    -.4068701   -.2635878
       _cons |  -.2737943   .0344525    -7.95   0.000    -.3413199   -.2062688
-------------+----------------------------------------------------------------
     mean(x1)|   .4848463   .0693915     6.99   0.000     .3488414    .6208512
     mean(x3)|  -.0421612   .0437443    -0.96   0.335    -.1278984    .0435759
-------------+----------------------------------------------------------------
    var(e.x2)|   .5225365   .0330481                      .4616172    .5914954
      var(x1)|   2.407592   .1522695                      2.126906    2.725321
      var(x3)|   .9567799   .0605121                      .8452347    1.083046
-------------+----------------------------------------------------------------
   cov(x1,x3)|   .6483204   .0738086     8.78   0.000     .5036581    .7929826
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

As a shortcut, it is possible to ask for any one of the variance/covariances and any one of the means, and all will be reported.

Notice that the keywords variance and covariance are singular, never plural, and are often abreviated as var and cov. Another potentially confusing shortcut is that var and cov may be used interchangeably, that is, a model could specify var(x1*x2) or cov(x1). The only abreviation for means is mean.

Multiple Outcomes (Multiple Response)

This model is not saturated, we have three variances and two regression paths.

sem (x1 -> x2 x3)
Endogenous variables

Observed:  x2 x3

Exogenous variables

Observed:  x1

Fitting target model:

Iteration 0:   log likelihood = -2163.2588  
Iteration 1:   log likelihood = -2163.2588  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     = -2163.2588

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x2 <-      |
          x1 |   .4479402   .0225188    19.89   0.000     .4038042    .4920762
       _cons |  -.2158931   .0366072    -5.90   0.000    -.2876418   -.1441444
  -----------+----------------------------------------------------------------
  x3 <-      |
          x1 |   .2692816   .0254907    10.56   0.000     .2193207    .3192425
       _cons |  -.1727214   .0414384    -4.17   0.000    -.2539393   -.0915036
-------------+----------------------------------------------------------------
    var(e.x2)|   .6104389   .0386075                      .5392715    .6909982
    var(e.x3)|   .7821991   .0494706                      .6910072    .8854255
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(1)   =     77.74, Prob > chi2 = 0.0000

A saturated version of this model would have correlated error terms.

sem (x1 -> x2 x3), cov(e.x2*e.x3)
Endogenous variables

Observed:  x2 x3

Exogenous variables

Observed:  x1

Fitting target model:

Iteration 0:   log likelihood =  -2124.388  
Iteration 1:   log likelihood =  -2124.388  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     =  -2124.388

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x2 <-      |
          x1 |   .4479402   .0225188    19.89   0.000     .4038042    .4920762
       _cons |  -.2158931   .0366072    -5.90   0.000    -.2876418   -.1441444
  -----------+----------------------------------------------------------------
  x3 <-      |
          x1 |   .2692816   .0254907    10.56   0.000     .2193207    .3192425
       _cons |  -.1727214   .0414384    -4.17   0.000    -.2539393   -.0915036
-------------+----------------------------------------------------------------
    var(e.x2)|   .6104389   .0386075                      .5392715    .6909982
    var(e.x3)|   .7821991   .0494706                      .6910072    .8854255
-------------+----------------------------------------------------------------
    cov(e.x2,|
        e.x3)|  -.2622158   .0330527    -7.93   0.000     -.326998   -.1974336
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

Indirect Effects

This model, with both direct effects and an indirect effect, is saturated.

sem (x1 -> x2 x3)(x2 -> x3)
Endogenous variables

Observed:  x2 x3

Exogenous variables

Observed:  x1

Fitting target model:

Iteration 0:   log likelihood =  -2124.388  
Iteration 1:   log likelihood =  -2124.388  

Structural equation model                       Number of obs     =        500
Estimation method  = ml
Log likelihood     =  -2124.388

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  x2 <-      |
          x1 |   .4479402   .0225188    19.89   0.000     .4038042    .4920762
       _cons |  -.2158931   .0366072    -5.90   0.000    -.2876418   -.1441444
  -----------+----------------------------------------------------------------
  x3 <-      |
          x2 |  -.4295529   .0468371    -9.17   0.000    -.5213519   -.3377539
          x1 |   .4616956   .0315655    14.63   0.000     .3998284    .5235628
       _cons |  -.2654589   .0396501    -6.70   0.000    -.3431716   -.1877463
-------------+----------------------------------------------------------------
    var(e.x2)|   .6104389   .0386075                      .5392715    .6909982
    var(e.x3)|   .6695635   .0423469                      .5915032    .7579255
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

Previous: Covariance models Next: Confirmatory Factor Analysis