---
title: "Reconstructing a Graph - ISIS Enabled Attacks"
author: "Doug Hemken"
date: "February 2017"
---
```{r setup, echo=FALSE, message=FALSE}
source("../StataMDsetup.r")
opts_chunk$set(results="hide")
```

# Introduction
One good way to familiarize yourself with graphing concepts and
to learn the specifics of some particular statistical software is to try
reproducing a graph you\'ve seen.

For example, consider this graph on ISIS involvement in a number of
terrorist attacks, found in an article in
the [New York Times](http://www.nytimes.com/interactive/2017/02/04/world/isis-remote-control-enabled-attack.html).

![Original](NYTISIS/nytimesisisattacks.PNG)

#Data, Graphical Object, Level of Measurement
To get started in Stata, we need to think about what graphical
objects we are using, what data defines their positions or
aesthetic characteristics, and what level of measurement is used
for our coordinates.

In this plot, each graphical object is a circle, centered on
a point.  Each point is positioned by time and type of terrorist
attack.  While time is a continuous variable, type of attack
is categorical (perhaps ordered).  Additionally, each point
marker is sized in proportion to the number of people killed,
and the color of each marker is determined by whether or not the
attack killed some of its victims.  We also have labels for
some locations.

The location of circular markers as points narrows our focus
to three Stata commands:  `graph dot`, `graph twoway dot`, or
`graph twoway scatter`.

The presence of a categorical coordinate, the type of attack,
should suggest using
`graph dot`.  However, keep the `twoway` commands in mind, as
`twoway` commands generally give us more options for
customizing our graph.

##Data
First we read in the data.  We are given four variables:
date, location, number of people killed, and the
role of ISIS in the attack.  We\'ll go ahead
and convert human-readable dates (a string variable)
into a numeric format suitable for graphing.
```{r readdata, collectcode=TRUE}
import delimited "NYTISIS/ISISattacks.csv"
generate attack = date(date, "MDY")
format %tdMon_dd,_CCYY attack
```

```{r listdata, results="markup"}
list in 1/5, noobs
```

##Graph Dot
A first try at graphing using `graph dot` is promising:
we appear to have the right graphing object and
coordinates.
```{r graphdot}
graph dot attack, over(location) asyvars over(role) ///
    legend(off) exclude0
graph export "NYTISIS/graphdot.png", replace
```

![graph dot](NYTISIS/graphdot.png)

However, there are a number of problems here that will be
difficult to overcome.  One is that we have no option to
resize the markers by another variable.  A second problem
is that marker colors are defined by location,  not the
number of deaths.  A third problem is that the points are
a mean attack date within locations, and because some
locations occur more than once, we cannot switch to an
\"asis\" date.  Related to this, the time coordinate is
not scaled in a way that makes sense to humans!

These limitations push us to use `scatter` instead.

##Twoway Scatter
We will need to do some additional data setup in order to use
`twoway scatter`.  Our categorical variable, role, will need
to be encoded numerically.  We will want deaths versus injuries
as separate variables, so that we can use multiple y variables
for color coding.  And we want non-missing values for deaths
in order to use that variable to set marker sizes.

We\'ll also set up some variable labels and value labels, which
become textual guides of various sorts in our graph.

```{r moredata, collectcode=TRUE}
encode role, generate(ISIS) label(rolelbl)
separate ISIS, by(dead < .)
label variable ISIS0 "injuries only"
label variable ISIS1 "sized by number of deaths"
label values ISIS? rolelbl
replace dead = 1 if dead ==.
```

With this setup, the basic `scatter` command is fairly succinct.
```{r basicscatter}
scatter ISIS? attack [w=dead]
graph export "NYTISIS/basicscatter.png", replace
```

![Basic scatter](NYTISIS/basicscatter.png)

Visually, this is only a little better than `graph dot`, but it clears up
all of the problems identified above, and gives us a path to move forward.

#Refinement Through Graph Options
##Yscale
Better y coordinates and guide make it easier to see we are on the right track.
(Note we cannot simply suppress the y-ticks if we have y-labels.)

We give ourselves more room for the markers with `yscale(range())`, 
reverse the direction of the coordinates to match the original graph,
and add labels.
```{r yscale}
scatter ISIS? attack [w=dead], ///
	yscale(range(0.5 3.5) reverse noline) /*ytick(none)*/ ///
	ylabel(1(1)3, valuelabel angle(horizontal))	
graph export "NYTISIS/yscale.png", replace
```
![yscale](NYTISIS/yscale.png)

##Xscale
A few similar options give us a better x axis and guide, and aspect ratio.
We can suppress the variable name with `xtitle("")`, move the coordinates
to the top with `xscale(alt)`, extend the graphing area 
with `tscale(range())` (which
is like `xscale`, but for date-time data values).  Finally, we set
the aspect ratio with `ysize()` and `xsize()`.
```{r xscale}
scatter ISIS? attack [w=dead], ///
	yscale(range(0.5 3.5) reverse noline) ///
	ylabel(1(1)3, valuelabel angle(horizontal)) ///
	xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) ///
	ysize(2) xsize(6)
graph export "NYTISIS/xscale.png", replace
```
![xscale](NYTISIS/xscale.png)

Our graph is beginning to shape up!

##Marker Labels
Adding marker labels, however, poses something of a challenge.  It
turns out that in Stata, you cannot use both marker labels and
marker weights in the same graph specification.  If we try, we see
that labels are ignored for y variables that have weights ~= 1.

```{r mlabel}
scatter ISIS? attack [w=dead], ///
	yscale(range(0.5 3.5) reverse noline) ///
	ylabel(1(1)3, valuelabel angle(horizontal)) ///
	xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) ///
	ysize(2) xsize(6) ///
	mlabel(location) mlabposition(6)
graph export "NYTISIS/mlabel.png", replace
```
![Simple mlabel](NYTISIS/mlabel.png)

Here we lack the labels we want, and do not have the ones we do. 

What we can do is overlay our plot with another plot that has
just labels in it.  This will require a little more data set up
to get the labels positioned appropriately.  Note the Beirut
data point is not like the others!

```{r labeldata, collectcode=TRUE}
generate isis2 = ISIS1 + 0.4 if location ~= "Beirut"
* And select just certain labels to use
generate location2 = location if dead >= 14
```

The second `scatter` does the work we need.
```{r labeloverlay}
twoway (scatter ISIS1 ISIS0 attack [w=dead]) ///
	(scatter isis2 attack, ///
		msymbol(none) mlabel(location2) mlabposition(0)) ///
	, yscale(range(0.5 3.5) reverse noline) ///
	ylabel(1(1)3, valuelabel angle(horizontal)) ///
	xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) ///
	ysize(2) xsize(6) 
graph export "NYTISIS/labeloverlay.png", replace
```
![Label layer](NYTISIS/labeloverlay.png)

Here it is important that `msymbol()` and `mlabel()` are options to
just the second scatter plot.  They cannot be used with the first
scatter plot, nor as global plot options.

##Color, Legend, Title, Notes
Now we are in the home stretch, and we can use more options to 
clean up color schemes, the legend, and add a title and notes.
Notice that the colors are assigned in each graph layer, while
the legend and text are addressed as global options.
```{r final}
twoway (scatter ISIS1 ISIS0 attack [w=dead], ///
		mfcolor(red*.5 gray) mlcolor(gs50 gs50) mlwidth(thin thin)) ///
	(scatter isis2 attack, ///
	 msymbol(none) mlabel(location2) mlabposition(0) mlabcolor(black)) ///
	, yscale(range(0.5 3.5) reverse noline) ///
	ylabel(1(1)3, valuelabel angle(horizontal)) ///
	xtitle("") xscale(alt noline) tscale(range(1Sep2014 1Feb2017)) ///
	ysize(2) xsize(6) ///
	legend(order(1 2) position(12) region(style(none))) ///
	title("ISIS attacks, outside of its self-proclaimed caliphate") ///
	note("Recreated from:" ///
        "www.nytimes.com/interactive/2017/02/04/world/" ///
        "isis-remote-control-enabled-attack.html")
graph export "NYTISIS/isisfinal.png", replace
```
![Final graph](NYTISIS/isisfinal.png)

If you wanted to continue to better match the original, you might: 

* relabel the time line.
* come up with a little tighter position for the location labels,
which vary in the NYTimes original.

A final quality that we cannot mimic in Stata is the \"transparency\", the way
the overlaid markers add to the intensity of color.  In Stata we simply
have \"the last ink applied, wins\".

```{r cleanup, engine="R", echo=FALSE}
unlink("profile.do")
```