---
title: "Stata and the Grammar of Graphics"
author: "Doug Hemken"
date: "January 2017"
output:
html_document:
includes:
after_body: ../../Rmd/bottomRmdKB.html
before_body: ../../Rmd/topRmdKB.html
in_header: ../../Rmd/headRmdKB.html
self_contained: no
theme: null
highlight: null
css: ../../Rmd/Rmd.css
toc: yes
---
```{r setup, echo=FALSE, message=FALSE}
require(knitr)
if (file.exists("C:/Program Files (x86)/Stata14/StataMP-64.exe")) {
statapath <- "C:/Program Files (x86)/Stata14/StataMP-64.exe"
} else if (file.exists("C:/Program Files (x86)/Stata14/StataSE-64.exe")) {
statapath <- "C:/Program Files (x86)/Stata14/StataSE-64.exe"
}
opts_chunk$set(engine="stata", engine.path=statapath, comment="", results="hide")
knit_hooks$set(collectcode = function(before, options, envir) {
if (!before) {
profile <- file("profile.do", open="at")
writeLines(options$code, profile)
close(profile)
}
})
source_hook <- knit_hooks$get("source")
knit_hooks$set(source = function(x, options) {
y <- strsplit(x, "\n")[[1]]
# Find and remove graph export in Stata source
graphexport <- grep("^graph export.*", y)
if (length(graphexport)>0) {y <- y[-(graphexport)]}
# Now treat the result as regular source code
source_hook(y, options)
})
hook_output <- knit_hooks$get("output")
knit_hooks$set(output = function(x, options) {
y <- strsplit(x, "\n")[[1]]
# Remove command echo in Stata log
commandlines <- grep("^\\.", y)
if (length(commandlines)>0) {y <- y[-(grep("^\\.", y))]}
# Some commands have a leading space?
if (length(grep("^[[:space:]*]\\.", y))>0) {
y <- y[-(grep("^[[:space:]*]\\.", y))]
}
# Ensure a trailing blank line
if (length(y)>0 && y[length(y)] != "") { y <- c(y, "") }
# Remove blank lines at the top of the Stata log
firsttext <- min(grep("[[:alnum:]]", y))
if (firsttext != Inf) {y <- y[-(1:(firsttext-1))]}
# Now treat the result as regular output
hook_output(y, options)
})
```
# Introduction
Stata, like other general purpose statistical software, includes commands
for creating graphics based on data.
A conceptual framework that attempts to describe all data-based graphs is
*The Grammar of Graphics* (Second Edition, 2005) by Leland Wilkinson
(with several contributors). This gives us a roadmap for navigating
statistical graphics in general, and Stata graphics in particular.
In specifying any graph, we must describe:
* the data and variables used to define various features of the graph
* the graphical or geometric objects that will be used to represent
the data
* the coordinates and guides that will be used to position graphical
objects on the page or screen, and
* the annotation that gives the graph meaning, and a host of aesthetic
qualities each element of the graph will have.
These four conceptual areas are independent of each other, and in Wilkinson\'s
formulation are further refined into even more independent dimensions.
Once we have specified the data and the graphical objects, everything
else will have default values.
# Data and Variables
The most common and fundamental graph commands use a variable or variables
from Stata\'s data set. However, there are other possibilities. For example,
there are a number of postestimation graphs that rely on the background
information that Stata stores after any estimations command, which can
include data in the form of scalars, matrices, or macro variables. Still
other graph commands take only scalar values as input (\"scalar\" in the
mathematical sense).
As in statistical estimation, you will often have to get your data into
shape before you can use it in a graph command, but you will also find
that some graph commands do some data manipulation for you.
# Graphical Objects
Conceptually, the basic elements for graphing in a two dimensional space
are points, lines, and bounded areas. In practice, most software (Stata
included) lets us treat such things as bars and box-and-whisker symbols
as distinct graphical objects. Stata also has a variety of different
line segment objects.
In Stata, the various graphing commands are specified according to the
graphical object they produce: `scatter` produces points, `line`
produces lines, `area` produces bounded areas, `bar` produces bars,
etc.
The minimum specification for most `graph` commands is the name of a
graphing object, and the names of one or more variables in the data
set. Based on these two specifications, everything else necessary
to render a graph has a default value.
```{r}
sysuse auto // load a data set
scatter mpg weight // specify graphing object and variables
graph export "scatter.png"
```
![Test](scatter.png)
# Coordinates and Guides
With the exception of pie charts, Stata largely draws graphs using some
version of Cartesian coordinates.
# Annotation and Aesthetics