--- title: "Stata and the Grammar of Graphics" author: "Doug Hemken" date: "January 2017" output: html_document: includes: after_body: ../../Rmd/bottomRmdKB.html before_body: ../../Rmd/topRmdKB.html in_header: ../../Rmd/headRmdKB.html self_contained: no theme: null highlight: null css: ../../Rmd/Rmd.css toc: yes --- ```{r setup, echo=FALSE, message=FALSE} require(knitr) if (file.exists("C:/Program Files (x86)/Stata14/StataMP-64.exe")) { statapath <- "C:/Program Files (x86)/Stata14/StataMP-64.exe" } else if (file.exists("C:/Program Files (x86)/Stata14/StataSE-64.exe")) { statapath <- "C:/Program Files (x86)/Stata14/StataSE-64.exe" } opts_chunk$set(engine="stata", engine.path=statapath, comment="", results="hide") knit_hooks$set(collectcode = function(before, options, envir) { if (!before) { profile <- file("profile.do", open="at") writeLines(options$code, profile) close(profile) } }) source_hook <- knit_hooks$get("source") knit_hooks$set(source = function(x, options) { y <- strsplit(x, "\n")[[1]] # Find and remove graph export in Stata source graphexport <- grep("^graph export.*", y) if (length(graphexport)>0) {y <- y[-(graphexport)]} # Now treat the result as regular source code source_hook(y, options) }) hook_output <- knit_hooks$get("output") knit_hooks$set(output = function(x, options) { y <- strsplit(x, "\n")[[1]] # Remove command echo in Stata log commandlines <- grep("^\\.", y) if (length(commandlines)>0) {y <- y[-(grep("^\\.", y))]} # Some commands have a leading space? if (length(grep("^[[:space:]*]\\.", y))>0) { y <- y[-(grep("^[[:space:]*]\\.", y))] } # Ensure a trailing blank line if (length(y)>0 && y[length(y)] != "") { y <- c(y, "") } # Remove blank lines at the top of the Stata log firsttext <- min(grep("[[:alnum:]]", y)) if (firsttext != Inf) {y <- y[-(1:(firsttext-1))]} # Now treat the result as regular output hook_output(y, options) }) ``` # Introduction Stata, like other general purpose statistical software, includes commands for creating graphics based on data. A conceptual framework that attempts to describe all data-based graphs is *The Grammar of Graphics* (Second Edition, 2005) by Leland Wilkinson (with several contributors). This gives us a roadmap for navigating statistical graphics in general, and Stata graphics in particular. In specifying any graph, we must describe: * the data and variables used to define various features of the graph * the graphical or geometric objects that will be used to represent the data * the coordinates and guides that will be used to position graphical objects on the page or screen, and * the annotation that gives the graph meaning, and a host of aesthetic qualities each element of the graph will have. These four conceptual areas are independent of each other, and in Wilkinson\'s formulation are further refined into even more independent dimensions. Once we have specified the data and the graphical objects, everything else will have default values. # Data and Variables The most common and fundamental graph commands use a variable or variables from Stata\'s data set. However, there are other possibilities. For example, there are a number of postestimation graphs that rely on the background information that Stata stores after any estimations command, which can include data in the form of scalars, matrices, or macro variables. Still other graph commands take only scalar values as input (\"scalar\" in the mathematical sense). As in statistical estimation, you will often have to get your data into shape before you can use it in a graph command, but you will also find that some graph commands do some data manipulation for you. # Graphical Objects Conceptually, the basic elements for graphing in a two dimensional space are points, lines, and bounded areas. In practice, most software (Stata included) lets us treat such things as bars and box-and-whisker symbols as distinct graphical objects. Stata also has a variety of different line segment objects. In Stata, the various graphing commands are specified according to the graphical object they produce: `scatter` produces points, `line` produces lines, `area` produces bounded areas, `bar` produces bars, etc. The minimum specification for most `graph` commands is the name of a graphing object, and the names of one or more variables in the data set. Based on these two specifications, everything else necessary to render a graph has a default value. ```{r} sysuse auto // load a data set scatter mpg weight // specify graphing object and variables graph export "scatter.png" ``` ![Test](scatter.png) # Coordinates and Guides With the exception of pie charts, Stata largely draws graphs using some version of Cartesian coordinates. # Annotation and Aesthetics