Once you've run a regression, the next challenge is to figure out what the results mean. The margins command, new in Stata 11, is a powerful tool for understanding a model.
The examples in this article will use the auto data set included with Stata. Load it with:
sysuse auto
One of the basic questions after running a regression is "If X changes, how much does that change Y?" or in calculus terms, "What is the derivative of Y with respect to X?"
In the case of simple linear regression, the answer is usually the coefficient on X. Run the following regression:
reg price i.foreign c.weight##c.weight displacement
This regresses the price of the car on foreign, weight and weight squared, and displacement. If you're not familiar with the new factor and interaction notation also introduced in Stata 11, see the Factor Variables section in Stata for Researchers: Usage and Syntax.
In the model run above, the coefficient on displacement is about 3.64, meaning that if you increase displacement by one the expected price increases by $3.64. However, weight is not so simple: if you increase weight by one that increases both weight and weight squared, and the total effect depends on what weight was to begin with.
This is where the margins command becomes useful. With the dydx() option, margins calculates the derivative of the mean expected outcome with respect to the variable you specify.
margins, dydx(displacement)
gives you 3.64, the original coefficient on displacement. The standard error, P-value and 95% confidence interval are also very similar to the original regression results, though they're calculated differently and thus not quite identical. But consider:
margins, dydx(weight)
This gives a result of 2.73, which is nothing like the coefficient on either weight or weight squared.
What margins does here is take the numerical derivative of the mean expected price with respect to weight. In doing so, margins looks at the actual data. Thus it considers the effect of changing the Honda Civic's weight from 1,760 pounds as well as changing the Lincoln Continental's from 4,840 (the weight squared term is more important with the latter than the former). It then averages them along with all the other cars to get its result of 2.73, or that each additional pound of weight adds $2.73 to the mean expected price.
Another approach is to set all the variables to their means, then find the derivative of expected price with respect to weight at that point. You can do that by adding the atmeans option:
margins, dydx(weight) atmeans
In this case the result is the same. But consider a slightly more complicated model, where weight and weight squared are both interacted with foreign:
reg price i.foreign##c.weight##c.weight displacement
In with this model:
margins, dydx(weight)
gives 3.08, while:
margins, dydx(weight) atmeans
gives 3.30. The mean effect is not necessarily the same as the effect at the mean.
margins is even more useful for models with binary outcomes, where interpretation is always difficult. Before you can run one, you need a binary dependent variable. Create it with the following:
gen bigEngine=(displacement>150)
displacement is a measure of engine size, and we'll call anything over 150 cubic inches "big."
Now consider the model:
logit bigEngine i.foreign weight
The coefficients tell us that being foreign makes a car less likely to have a big engine, while being heavy makes it more likely. But by how much? margins gives us one way to answer that question:
margins, dydx(weight)
This tells us that the derivative of the mean expected probability of having a big engine with respect to weight is .0002664. This suggests that if you had 3753 types of cars and added a pound of weight to all of them, you'd expect one to switch from having a "small" engine to a "big" engine--not a very big effect. However, note that the atmeans option changes things somewhat:
margins, dydx(weight) atmeans
gives .0008623, or one expected change per 1160 cars.
Next try:
margins, dydx(foreign)
Stata knows that foreign is an indicator variable because you specified it as i.foreign in the model so, instead of looking at small changes, margins considers the effect of changing foreign from 0 to 1. If all the cars in the sample were domestic (which they are not) and then became foreign, the mean expected probability of having a big engine would fall by 0.186. This is almost one change for every five cars, a much bigger effect.
Multinomial logit models can be even harder to interpret because the coefficients only compare two states. Copy and paste the following command to load a data set that was carefully constructed to illustrate the pitfalls of interpreting multinomial logit results:
use http://www.ssc.wisc.edu/sscc/pubs/files/margins_mlogit.dta
It contains two variables, an integer y that takes on the values 1, 2 and 3; and a continuous variable x. They are negatively correlated (cor y x).
Now run the following model:
mlogit y x
The coefficient of x for outcome 2 is negative, so it's tempting to say that as x increases the probability of y being 2 decreases. But in fact that's not the case, as the margins command will show you:
margins, dydx(x) predict(outcome(2))
The predict() options allows you to choose the response margins is examining. predict(outcome(2)) specifies that you're interested in the expected probability of outcome 2. And in fact the probability of outcome 2 increases with x, the derivative being 0.016.
How can that be? Recall that the coefficients given by mlogit only compare the probability of a given outcome with the base outcome. Thus the x coefficient of -5.34 for outcome 2 tells you that as x increases, observations are likely to move from outcome 2 to outcome 1. Meanwhile the x coefficient of -21.292 for outcome 3 tells you that as x increases observations are likely to move from outcome 3 to outcome 1. What it doesn't tell you is that as x increases observations also move from outcome 3 to outcome 2, and in fact that effect dominates the movement from 2 to 1.
You can see it if you change the base category of the regression:
mlogit y x, base(2)
Now the coefficients tell you about the probability of each outcome compared to outcome 2, and the fact that the negative x coefficient for outcome 3 is much larger (in absolute terms) than the positive x coefficient for outcome 1 indicates that increasing x increases the probability of outcome 2.
margins can also predict the level of the outcome variable under various scenarios. Sometimes these "counter-factuals" can be interesting results in and of themselves: "What would would the mean income be if all the blacks in my sample were white?" "What would the mean test score have been if the school's demographics hadn't changed?"
Load the automobile data set again and re-run our first regression:
sysuse auto
reg price i.foreign c.weight##c.weight displacement
To examine the impact of foreign on the mean expected price, type:
margins foreign
This sets foreign to zero for all cars, leaving the other variables unchanged, finds the predicted price for each car, and then averages them. It then sets foreign to one for all cars and repeats the process. If you wanted to set all the other variables to their means instead, you'd add the atmeans option just like before:
margins foreign, atmeans
The foreign variable can only take on two values (Stata knows this because you marked it as i.foreign) so the margins command calculated its results for both of them. Obviously you can't look at all possible values for continuous variables, so for continuous variables you have to specify the values you're interested in with the at() option. For example, to see what the mean expected price would be if all the cars weighted 3,000 pounds, type:
margins, at(weight=3000)
If you wanted to compare different values of weight, replace the 3000 with a list of numbers in parentheses:
margins, at(weight=(2000 3000 4000))
You can include multiple variables in the at() option, allowing you to set up any scenario you're interested in. For example, you can find what the mean expected price would be if all the cars were foreign and weighed 3,000 pounds with:
margins, at(weight=3000 foreign=1)
If you're interested in a statistic that margins can't calculate (say, the effect on a particular car) there is an alternative technique for examining counter-factual scenarios. It involves actually changing the data, making sure you can get the real data back, and then using the predict command. For more information see Making Predictions with Counter-Factual Data in Stata.
Last Revised: 2/17/2010
