--- title: "Reconstructing a Graph - ISIS Enabled Attacks" author: "Doug Hemken" date: "February 2017" --- ```{r setup, echo=FALSE, message=FALSE} source("../StataMDsetup.r") opts_chunk$set(results="hide") ``` # Introduction One good way to familiarize yourself with graphing concepts and to learn the specifics of some particular statistical software is to try reproducing a graph you\'ve seen. For example, consider this graph on ISIS involvement in a number of terrorist attacks, found in an article in the [New York Times](http://www.nytimes.com/interactive/2017/02/04/world/isis-remote-control-enabled-attack.html). ![Original](NYTISIS/nytimesisisattacks.PNG) #Data, Graphical Object, Level of Measurement To get started in Stata, we need to think about what graphical objects we are using, what data defines their positions or aesthetic characteristics, and what level of measurement is used for our coordinates. In this plot, each graphical object is a circle, centered on a point. Each point is positioned by time and type of terrorist attack. While time is a continuous variable, type of attack is categorical (perhaps ordered). Additionally, each point marker is sized in proportion to the number of people killed, and the color of each marker is determined by whether or not the attack killed some of its victims. We also have labels for some locations. The location of circular markers as points narrows our focus to three Stata commands: `graph dot`, `graph twoway dot`, or `graph twoway scatter`. The presence of a categorical coordinate, the type of attack, should suggest using `graph dot`. However, keep the `twoway` commands in mind, as `twoway` commands generally give us more options for customizing our graph. ##Data First we read in the data. We are given four variables: date, location, number of people killed, and the role of ISIS in the attack. We\'ll go ahead and convert human-readable dates (a string variable) into a numeric format suitable for graphing. ```{r readdata, collectcode=TRUE} import delimited "NYTISIS/ISISattacks.csv" generate attack = date(date, "MDY") format %tdMon_dd,_CCYY attack ``` ```{r listdata, results="markup"} list in 1/5, noobs ``` ##Graph Dot A first try at graphing using `graph dot` is promising: we appear to have the right graphing object and coordinates. ```{r graphdot} graph dot attack, over(location) asyvars over(role) /// legend(off) exclude0 graph export "NYTISIS/graphdot.png", replace ``` ![graph dot](NYTISIS/graphdot.png) However, there are a number of problems here that will be difficult to overcome. One is that we have no option to resize the markers by another variable. A second problem is that marker colors are defined by location, not the number of deaths. A third problem is that the points are a mean attack date within locations, and because some locations occur more than once, we cannot switch to an \"asis\" date. Related to this, the time coordinate is not scaled in a way that makes sense to humans! These limitations push us to use `scatter` instead. ##Twoway Scatter We will need to do some additional data setup in order to use `twoway scatter`. Our categorical variable, role, will need to be encoded numerically. We will want deaths versus injuries as separate variables, so that we can use multiple y variables for color coding. And we want non-missing values for deaths in order to use that variable to set marker sizes. We\'ll also set up some variable labels and value labels, which become textual guides of various sorts in our graph. ```{r moredata, collectcode=TRUE} encode role, generate(ISIS) label(rolelbl) separate ISIS, by(dead < .) label variable ISIS0 "injuries only" label variable ISIS1 "sized by number of deaths" label values ISIS? rolelbl replace dead = 1 if dead ==. ``` With this setup, the basic `scatter` command is fairly succinct. ```{r basicscatter} scatter ISIS? attack [w=dead] graph export "NYTISIS/basicscatter.png", replace ``` ![Basic scatter](NYTISIS/basicscatter.png) Visually, this is only a little better than `graph dot`, but it clears up all of the problems identified above, and gives us a path to move forward. #Refinement Through Graph Options ##Yscale Better y coordinates and guide make it easier to see we are on the right track. (Note we cannot simply suppress the y-ticks if we have y-labels.) We give ourselves more room for the markers with `yscale(range())`, reverse the direction of the coordinates to match the original graph, and add labels. ```{r yscale} scatter ISIS? attack [w=dead], /// yscale(range(0.5 3.5) reverse noline) /*ytick(none)*/ /// ylabel(1(1)3, valuelabel angle(horizontal)) graph export "NYTISIS/yscale.png", replace ``` ![yscale](NYTISIS/yscale.png) ##Xscale A few similar options give us a better x axis and guide, and aspect ratio. We can suppress the variable name with `xtitle("")`, move the coordinates to the top with `xscale(alt)`, extend the graphing area with `tscale(range())` (which is like `xscale`, but for date-time data values). Finally, we set the aspect ratio with `ysize()` and `xsize()`. ```{r xscale} scatter ISIS? attack [w=dead], /// yscale(range(0.5 3.5) reverse noline) /// ylabel(1(1)3, valuelabel angle(horizontal)) /// xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) /// ysize(2) xsize(6) graph export "NYTISIS/xscale.png", replace ``` ![xscale](NYTISIS/xscale.png) Our graph is beginning to shape up! ##Marker Labels Adding marker labels, however, poses something of a challenge. It turns out that in Stata, you cannot use both marker labels and marker weights in the same graph specification. If we try, we see that labels are ignored for y variables that have weights ~= 1. ```{r mlabel} scatter ISIS? attack [w=dead], /// yscale(range(0.5 3.5) reverse noline) /// ylabel(1(1)3, valuelabel angle(horizontal)) /// xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) /// ysize(2) xsize(6) /// mlabel(location) mlabposition(6) graph export "NYTISIS/mlabel.png", replace ``` ![Simple mlabel](NYTISIS/mlabel.png) Here we lack the labels we want, and do not have the ones we do. What we can do is overlay our plot with another plot that has just labels in it. This will require a little more data set up to get the labels positioned appropriately. Note the Beirut data point is not like the others! ```{r labeldata, collectcode=TRUE} generate isis2 = ISIS1 + 0.4 if location ~= "Beirut" * And select just certain labels to use generate location2 = location if dead >= 14 ``` The second `scatter` does the work we need. ```{r labeloverlay} twoway (scatter ISIS1 ISIS0 attack [w=dead]) /// (scatter isis2 attack, /// msymbol(none) mlabel(location2) mlabposition(0)) /// , yscale(range(0.5 3.5) reverse noline) /// ylabel(1(1)3, valuelabel angle(horizontal)) /// xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) /// ysize(2) xsize(6) graph export "NYTISIS/labeloverlay.png", replace ``` ![Label layer](NYTISIS/labeloverlay.png) Here it is important that `msymbol()` and `mlabel()` are options to just the second scatter plot. They cannot be used with the first scatter plot, nor as global plot options. ##Color, Legend, Title, Notes Now we are in the home stretch, and we can use more options to clean up color schemes, the legend, and add a title and notes. Notice that the colors are assigned in each graph layer, while the legend and text are addressed as global options. ```{r final} twoway (scatter ISIS1 ISIS0 attack [w=dead], /// mfcolor(red*.5 gray) mlcolor(gs50 gs50) mlwidth(thin thin)) /// (scatter isis2 attack, /// msymbol(none) mlabel(location2) mlabposition(0) mlabcolor(black)) /// , yscale(range(0.5 3.5) reverse noline) /// ylabel(1(1)3, valuelabel angle(horizontal)) /// xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) /// ysize(2) xsize(6) /// legend(order(1 2) position(12) region(style(none))) /// title("ISIS attacks, outside of its self-proclaimed caliphate") /// note("Recreated from:" /// "www.nytimes.com/interactive/2017/02/04/world/" /// "isis-remote-control-enabled-attack.html") graph export "NYTISIS/isisfinal.png", replace ``` ![Final graph](NYTISIS/isisfinal.png) If you wanted to continue to better match the original, you might: * relabel the time line. * come up with a little tighter position for the location labels, which vary in the NYTimes original. A final quality that we cannot mimic in Stata is the \"transparency\", the way the overlaid markers add to the intensity of color. In Stata we simply have \"the last ink applied, wins\". ```{r cleanup, engine="R", echo=FALSE} unlink("profile.do") ```