--- title: "Graphical Objects in Stata" author: "Doug Hemken" date: "January 2017" output: html_document: includes: after_body: ../../Rmd/bottomRmdKB.html before_body: ../../Rmd/topRmdKB.html in_header: ../../Rmd/headRmdKB.html self_contained: no theme: null highlight: null css: ../../Rmd/Rmd.css toc: yes --- ```{r setup, echo=FALSE, message=FALSE} source("../StataMDsetup.r") opts_chunk$set(results="hide") ``` # Introduction The fundamental graphical objects with which we can work in Stata are things like points, lines, line segments, curves, and areas. We additionally have objects like bars, box-and-whisker symbols, and pies that are fundamental in the sense that they are specified by a simple key word. Some graphical objects have a simple relation to the data - they use the data as is, untransformed. A `scatter` plot would be a familiar example. Other objects often involve some summary of the data, such as calculating counts, percents, means, regression lines, or confidence intervals. A `bar` plot is often a summary of the data, and does not depict the individual data values. Graphical objects may be defined in terms of one to four variables. A `bar` chart of percents within a categorical variable is specified by that single variable. A `scatter` plot requires two variables to specify, while range plots require three and paired coordinate plots require four variables. Finally, some graphical objects are defined in relation to categorical variables, while other objects require two continuous/numeric variables. Somewhat confusingly for a beginner, objects that appear the same visually may not necessarily be defined at the same levels of measurement - for instance we have categorical `bar` plots but also `twoway bar` plots. The distinction is not only conceptual but has practical implications for what graphical elements may or may not be layered together. Let\'s illustrate these with some familiar data. ```{r data-setup, collectcode=TRUE} sysuse auto, clear * Create a categorical variable generate maker = substr(make, 1, strpos(make, " ")-1) replace maker = make if strpos(make, " ")==0 label variable maker "Manufacturer" ``` ## Continuous by Continous Perhaps the easiest place to begin is by considering how to plot points. In the two dimensional space of a printed page or a computer screen, with coordinates given in Cartesian style, it is intuitive that we need two variables to define our points, and that each observation gives us a conceptually distinct point (whether or not they are visually distinct depends upon the actual data values). We can use these same points to define the vertices along a line (\"line\" in the graphical sense, not the mathematical sense, i.e\. a continuously connected series of line segments). For this to make visual sense the data often needs to be sorted (usually along x), otherwise Stata simply connects observation $n$ to observation $n+1$. We can overlay `scatter` and `line` plots, but Stata also allows us to treat the combination as a fundamental graphical object, called `connected`. ```{r} * Objects anchored by single (x,y) pairs * Points scatter mpg weight, title("scatter") name(g1) * Line segments *sort weight mpg // usually used with ordered data line mpg weight, sort title("line") name(g2) * Line segments AND points twoway connected mpg weight, sort title("connected") name(g3) scatter mpg weight, sort connect(l) // internally, the same as "connected" // overlay, different color ink twoway (scatter mpg weight) (line mpg weight, sort), title("scatter || line") name(g4) graph combine g1 g2 g3 g4, title("Anchored by (x,y) points") graph export "GraphingObjects_files/points.png", replace ``` ![Defined by x and y](GraphingObjects_files/points.png) ## Pseudo-range plots Range plots use graph objects that are defined by two points - minimum and maximum, lower and upper, left and right. In the case where these two points happen to be vertically aligned, and the minimum is always the x-axis (i.e\. $y=0$), only one $(x,y)$ point is needed to locate the range element: we can refer to these as *pseudo-range* plots. Be aware that you will encounter many plots are drawn in this visual style where the \"range\" may not particularly be the point. And Stata will not automatically include $y=0$ in the graph - only when this is within the range of recorded outcomes. ```{r} * Objects anchored by arbitrary (x,y) points, and the x axis * Bars * perhaps most useful as a programmer's tool, for use with data * already in summary form, and/or in combination with other * "twoway" geometrical objects. twoway bar mpg weight, title("bar") name(g1) // first glance // relation to scatter twoway (bar mpg weight) (scatter mpg weight), title("bar || scatter") name(g2) gsort - mpg weight // to better see overlay/collision of bars twoway bar mpg weight, title("sorted to see overlay") name(g3) graph combine g1 g2 g3, title("Bars connecting x axis to points") graph export "GraphingObjects_files/bars.png", replace * similar to bars twoway spike mpg weight, title("spike") name(g4) // "spike" is to "bar" as "line" is to "scatter" twoway dropline mpg weight, title("dropline") name(g5) // like "connected" twoway (spike mpg weight) (scatter mpg weight), title("spike || scatter") name(g6) // two color graph combine g1 g4 g5 g6, title("Lines connecting x axis to points") graph export "GraphingObjects_files/spikes.png", replace ``` ![Pseudo-range bar plots](GraphingObjects_files/bars.png) ![Pseudo-range plots](GraphingObjects_files/spikes.png) ```{r, engine='R', echo=FALSE, message=FALSE} unlink("profile.do") ```