---
title: "Graphical Objects in Stata"
author: "Doug Hemken"
date: "January 2017"
output:
html_document:
includes:
after_body: ../../Rmd/bottomRmdKB.html
before_body: ../../Rmd/topRmdKB.html
in_header: ../../Rmd/headRmdKB.html
self_contained: no
theme: null
highlight: null
css: ../../Rmd/Rmd.css
toc: yes
---
```{r setup, echo=FALSE, message=FALSE}
source("../StataMDsetup.r")
opts_chunk$set(results="hide")
```
# Introduction
The fundamental graphical objects with which we can work in Stata are things
like points, lines, line segments, curves, and areas. We additionally have objects
like bars, box-and-whisker symbols, and pies that are fundamental in the sense
that they are specified by a simple key word.
Some graphical objects have a simple relation to the data -
they use the data as is,
untransformed. A `scatter` plot would be a familiar example. Other objects often
involve some summary of the data, such as calculating counts, percents, means,
regression lines, or confidence intervals. A `bar` plot is often a summary of the
data, and does not depict the individual data values.
Graphical objects may be defined in terms of one to four variables. A `bar` chart
of percents within a categorical variable is specified by that single variable.
A `scatter` plot requires two variables to specify, while range plots require
three and paired coordinate plots require four variables.
Finally, some graphical objects are defined in relation to categorical variables,
while other objects require two continuous/numeric variables. Somewhat confusingly
for a beginner, objects that appear the same visually may not necessarily be
defined at the same levels of measurement - for instance we have categorical
`bar` plots but also `twoway bar` plots. The distinction is not only conceptual
but has practical implications for what graphical elements may or may not be
layered together.
Let\'s illustrate these with some familiar data.
```{r data-setup, collectcode=TRUE}
sysuse auto, clear
* Create a categorical variable
generate maker = substr(make, 1, strpos(make, " ")-1)
replace maker = make if strpos(make, " ")==0
label variable maker "Manufacturer"
```
## Continuous by Continous
Perhaps the easiest place to begin is by considering how to plot
points. In the two dimensional space of a printed page or a computer
screen, with coordinates given in Cartesian style, it is intuitive
that we need two variables to define our points, and that each observation
gives us a conceptually distinct point (whether or not they are
visually distinct depends upon the actual data values).
We can use these same points to define the vertices along a line (\"line\" in the
graphical sense, not the mathematical sense, i.e\. a continuously connected series
of line segments). For this to make visual sense the data often needs to be
sorted (usually along x), otherwise Stata simply connects observation $n$ to
observation $n+1$.
We can overlay `scatter` and `line` plots, but Stata also allows us to treat
the combination as a fundamental graphical object, called `connected`.
```{r}
* Objects anchored by single (x,y) pairs
* Points
scatter mpg weight, title("scatter") name(g1)
* Line segments
*sort weight mpg // usually used with ordered data
line mpg weight, sort title("line") name(g2)
* Line segments AND points
twoway connected mpg weight, sort title("connected") name(g3)
scatter mpg weight, sort connect(l) // internally, the same as "connected"
// overlay, different color ink
twoway (scatter mpg weight) (line mpg weight, sort), title("scatter || line") name(g4)
graph combine g1 g2 g3 g4, title("Anchored by (x,y) points")
graph export "GraphingObjects_files/points.png", replace
```
![Defined by x and y](GraphingObjects_files/points.png)
## Pseudo-range plots
Range plots use graph objects that are defined by two points -
minimum and maximum, lower and upper, left and right. In the
case where these
two points happen to be vertically aligned, and the minimum is
always the x-axis (i.e\. $y=0$), only one $(x,y)$ point is
needed to locate the range element: we can refer to these as
*pseudo-range* plots.
Be aware that you will encounter many plots are drawn
in this visual style where the \"range\" may not particularly
be the point. And Stata will not automatically include
$y=0$ in the graph - only when this is within the range
of recorded outcomes.
```{r}
* Objects anchored by arbitrary (x,y) points, and the x axis
* Bars
* perhaps most useful as a programmer's tool, for use with data
* already in summary form, and/or in combination with other
* "twoway" geometrical objects.
twoway bar mpg weight, title("bar") name(g1) // first glance
// relation to scatter
twoway (bar mpg weight) (scatter mpg weight), title("bar || scatter") name(g2)
gsort - mpg weight // to better see overlay/collision of bars
twoway bar mpg weight, title("sorted to see overlay") name(g3)
graph combine g1 g2 g3, title("Bars connecting x axis to points")
graph export "GraphingObjects_files/bars.png", replace
* similar to bars
twoway spike mpg weight, title("spike") name(g4)
// "spike" is to "bar" as "line" is to "scatter"
twoway dropline mpg weight, title("dropline") name(g5) // like "connected"
twoway (spike mpg weight) (scatter mpg weight), title("spike || scatter") name(g6)
// two color
graph combine g1 g4 g5 g6, title("Lines connecting x axis to points")
graph export "GraphingObjects_files/spikes.png", replace
```
![Pseudo-range bar plots](GraphingObjects_files/bars.png)
![Pseudo-range plots](GraphingObjects_files/spikes.png)
```{r, engine='R', echo=FALSE, message=FALSE}
unlink("profile.do")
```