--- title: "Data Sources for Stata Graphs" author: "Doug Hemken" date: "January 2017" output: html_document: html_document: includes: before_body: ../../Rmd/topKBprod.html after_body: ../../Rmd/bottomKBprod.html in_header: ../../Rmd/headKBprod.html css: ../../Rmd/Rmd.css self_contained: no theme: null highlight: null toc: yes --- ```{r setup, echo=FALSE, message=FALSE} source("../StataMDsetup.r") opts_chunk$set(results="hide") ``` # Data Sources In Stata, data comes in a variety of forms: the working data set, but also matrices, scalars, and the lists of data objects called \"stored results\" returned by commands (`return` and `ereturn` results). ## Data set Most of the fundamental graphing commands in Stata require data from the working data set. As with statistical analysis in Stata, all of the data required for a graphing problem will generally have to be in just one data set. Depending on the graphing task at hand, you may have to calculate new variables, merge data, reshape data, or calculate summary statistics - in short, any data manipulation task may be part of your set up for graphing. Keep in mind that the observations in a data set can represent a variety of things. A particularly important distinction for graphing is that some data sets may contain observations of individual units, while other data sets may contain summary statistics for groups of data. For graphing, it is not unusual for a summary data set to be a useful data source. For example, the following two `graph` commands use variables from a Stata data set, but while `scatter` uses individual units of observation, a `dot` plot of percents within groups uses collapsed data and requires some manipulation to set up. ```{r, collectcode=TRUE} sysuse auto ``` ```{r} graph twoway scatter mpg weight graph export "GraphDataSources/scatter.png", replace preserve collapse foreign, by(rep78) replace foreign = foreign *100 label variable foreign "Foreign (%)" graph dot (asis) foreign, over(rep78) graph export "GraphDataSources/dot.png", replace restore ``` From individual observations: ![Individual data](GraphDataSources/scatter.png) From grouped/summary data: ![Summary data](GraphDataSources/dot.png) ## Mathematical Scalars and Macro Variables Not every graphic command requires a data set! In particular it is possible to draw graphs of mathematically specified functions using the `twoway function` command. ```{r} preserve clear // no data! graph twoway function y = sqrt(x), range(0 5) /// title("{&function}(x) = {&sqrt}x") graph export "GraphDataSources/sqrtx.png", replace restore ``` Given a function specified in terms of place-holder variables $y$ and $x$, and given a data `range`, Stata draws something akin to a line graph, but without the data. This does require you to specify at least two numerical values, the minimum and the maximum of the graphing range. ![Function graph](GraphDataSources/sqrtx.png) At times we may want to draw a graph based on scalar values estimated from our data. An important programming detail is that Stata does not allow us to use Stata's scalar values in writing code - such numbers have to be converted to macro variables to be sensible to the Stata interpreter. Suppose we wanted to draw a regression line using `twoway function` (as we will soon see, we might also use `twoway lfit`). We need four numerical scalars: the regression slope, the intercept, and a graphing minimum and maximum for $x$. ```{r} quietly summarize weight local min = r(min) // convert from scalar to macro variable local max = r(max) quietly regress mpg weight local intercept = _b[_cons] local slope = _b[weight] * We could clear the data at this point! twoway function y = `intercept' + `slope'*x, range(`min' `max') /// title("Regression line") ytitle("Mileage (mpg)") /// xtitle("Weight (lbs.)") graph export "GraphDataSources/regression.png", replace ``` ![Using macros](GraphDataSources/regression.png) In a similar vein, Stata has several \"immediate\" graphing commands that take numerical arguments: `twoway scatteri`, `twoway pci`, and `twoway pcarrowi`. While these can be used with no data set, they are most often useful to add graphical elements to another graph. ```{r} twoway pci 1 1 2 2 /// 2 2 1 3 /// 1 3 1 1 graph export "GraphDataSources/triangle.png", replace ``` ![Immediate commands](GraphDataSources/triangle.png) ## Stored Results (`ereturn`) Finally, there are a number of graphics commands that require you to first estimate something. For example, after a regression you might want to visually examine the distribution of the residuals versus the predicted (fitted) values. ```{r} quietly regress mpg c.weight##c.weight rvfplot graph export "GraphDataSources/rvf.png", replace ``` ![Post-estimation graph](GraphDataSources/rvf.png) ```{r, engine='R', echo=FALSE, message=FALSE} unlink("profile.do") ```