In this project, I investigate how mechanism-based explanations can be accomplished using regression modeling.
Update 2017: The poster won the 2017 Best Poster Award by the Faculty of Social and Behavioral Science of the University of Utrecht.
Update 2018: Presented at the PAA 2018 in Denver, CO.
Benjamin Rosche Posts
Unfortunately, the emphasis in sociology tends to be on critiques of theory rather than on synthesizing and devising a way forward; on detecting the weakness of existing theory rather than on identifying their most useful strengths, on which we can build and develop.
- Hakim, C. (2000). Work-Lifestyle Choices in the 21st Century: Preference Theory. Oxford University Press.
Based on this post by Michael Norman Mitchell, I want to remember the difference between the Marginal Effect at Mean and the Average Marginal Effect.
I conduct an ordered logistic regression (proportional odds model) to predict the effect of sex on likelihood of voting:
ologit vote c.age i.female c.chfreq c.edu_years, allbase coeflegend
Marginal Effect at Mean
To obtain the probability of vote = “certainly not” at the mean of the other covariates, I execute this margins-command:
margins i.female, predict(outcome(1)) atmeans // .0511123; .0596168 margins, dydx(i.female) predict(outcome(1)) atmeans // direct calculation of the MEM
Alternatively by hand:
lincom _b[vote:age] * 37.62104 + _b[vote:1.female] * 1 + _b[vote:chfreq] * 7.235061 + _b[vote:edu_years] * 12.02012 di 1 / (1 + exp(3.333369-_b[cut1:_cons])) // .051112 lincom _b[vote:age] * 37.62104 + _b[vote:1.female] * 0 + _b[vote:chfreq] * 7.235061 + _b[vote:edu_years] * 12.02012 di 1 / (1 + exp(3.496284-_b[cut1:_cons])) // .0596168
The marginal effect of being female on likelihood of voting at the mean of age, chfreq and edu_years then is -.0085045.
Average Marginal Effect
Instead of setting all covariates to their respective means, the average marginal effects leaves (besides female) all covariates like they were observed. The probability of voting = “certainly not” is then calculated assuming first every observation in the data-set is female and second every observation in the data-set is male. The average marginal effect is the difference in those predicted probabilities. The margins command offers the most convenient way to do this:
margins i.female, predict(outcome(1)) // .0583623; .0677809 margins, dydx(i.female) predict(outcome(1)) // direct calculation of the AME
If one wishes to do this by hand, follow these steps:
Leaving all other covariates like they were observed, I change the data in the sense, that everybody is female now and predict the probability of vote = “certainly not. Then I do the opposite, namely assuming every individual is male. The AME is the difference in those predicted probabilities:
clonevar female_backup = female // As I have to recode female, I create a copy of it replace female = 1 predict y1, pr outcome(1) replace female = 0 predict y0, pr outcome(1) replace female = female_backup // I change female back to it's original values sum y1 y0 di .0583623 - .0677809 // -.0094186
The average marginal effects of being female on the likelihood on voting then is -.0094348.
In this example MEM (-.0085045) and AME (-.0094186) are very close together. This might not be the case if atmeans is not representative for the sample. (Example???)
It is surprising why often discussions between sociology students end up in a discussion about the appropriateness of Rational-Choice Theory (RCT). Although I confirm that a behavioral theory on the micro-level is necessary to develop a proper explanation, I sometimes get the impression that many sociology students are actually more interested in (social) psychological insights. In the article of Batenburg, Raub and Snijders (2003, p.142f.) I found a perfectly summarized comment on this:
Granovetter’s criticism of the shortcomings of the neoclassical model of perfect markets of “atomized” actors and transactions has often been enthusiastically endorsed and taken to imply that one had better abandon rational choice models in favor of more “realistic”, socially inspired models of man. It has been widely overlooked that he sharply opposes “psychological revisionism” which he characterizes as “an attempt to reform economic theory by abandoning an absolute assumption of rational decision making” (1985, p.505). Rather, he suggests to maintain the rationality assumption: “[W]hile the assumption of rational action must always be problematic, it is a good working hypothesis that should not easily be abandoned. What looks to the analyst like nonrational behavior may be quite sensible when situational constraints, especially those of embeddedness are fully appreciated” (1985, p.506). He argues that investments in tracing the effects of embeddedness are more promising for sociologists than investments in the modification of the rationality assumption: “My claim is that however naive that psychology [of rational choice] may be, this is not where the main difficulty lies – it is rather the neglect of social structure” (1985, p.506).
Hence, Granovetter advocates a rich set of assumptions on how the social structure influences actors. In addition to that I think, due to methodological difficulties, the mirco-to-macro link is often neglected, too. A relatively new stream within sociological rational choice theory and explanatory sociology, analytical sociology, is focused on doing precisely this more appropriately. Kalter and Kroneberg (2014) give a good overview over this development.
- Batenburg, R. S., Raub, W., & Snijders, C. (2003). Contacts and Contracts: Dyadic Embeddedness and the Contractual Behavior of Firms. In V. Buskens, W. Raub, & C. Snijders (Eds.), Research in the Sociology of Organizations: Vol. 20. The Governance of Relations in Markets and Organizations (pp. 135–188). Bingley: Emerald.
- Granovetter, M. (1985). Economic action and social structure: the problem of embeddedness. American Journal of Sociology, 481–510.
- Kalter, F., & Kroneberg, C. (2014). Between Mechanism Talk And Mechanism Cult: New Emphases in Explanatory Sociology And Empirical Research. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 66, 91–115.
It seems to me that the margins command really gives me a better understanding of the (linear) regression method itself. So far I have managed to predict expected values of the dependent variable conditional to the independent ones as well as the average marginal effects of the independent variables.
After a linear regression this command reports the average marginal effects. If there is no interaction effect included within the model, this command then basically reports the b-coefficients (as they really are dy/dx). But as soon as I include a interaction into the model, this command won’t report the b-coefficient of the interaction, but computes the average effect of a variable.
margins, at(age=(20 40 60) liberalism=(0(1)10) gender=0)
dydx(*) the margins command computes the expected values of the dependent variable. But it does it in a nice way as I can condition the expected values on the independent variables like I wish. Here I specified, that I want prediction of Y for:
- age=20, gender=0: for each value of liberalism between 0 and 10 in steps of 1
- age=40, gender=0: for each value of liberalism between 0 and 10 in steps of 1
- age=60, gender=0: for each value of liberalism between 0 and 10 in steps of 1
In the end STATA predicts 11 * 3 = 33 values.
margins, at(age=(20 40 60) gender=0) atmeans
If I do not want to specify a whole range of values I can add
atmeans. STATA then basically takes the mean of liberalism and I only get 3 predicted values.
margins, at(age=(20 40 60)) over(gender) atmeans
If I have a categorical variable I can use
over() to condition the expected values (or dy/dx) on it, e.g. gender. I noted that STATA then also conditions the means of the other variables in the OLS model on gender.
So what to do with all this values? First of all, I can interpret them. But even more important we can compute nice graphs from it.
marginsplot, xdimension(at(liberalism)) recast(line) recastci(rarea) plot1opts(lcolor(black) lpattern(solid)) ci1opts(fcolor(gs12) lcolor(gs12) lpattern(solid))
xdimension(at(liberalism)) is the most important option here. The variable of which I computed the most values should be inserted here. For instance, it would make no sense to put age on the x-axis as we only have 3 values for each level of liberalism, but 40 values of liberalism for each age group. As I defined a interaction within the OLS model, I thereby can plot a nice conditional effects plot: