--- title: "Add a point to a scatterplot" author: "Doug Hemken" date: "October 2015" output: html_document: includes: after_body: ../Rmd/bottomRmdKB.html before_body: ../Rmd/topRmdKB.html in_header: ../Rmd/headRmdKB.html self_contained: no theme: null toc: yes --- [Stata notes](../stata.html) There are a couple of approaches one could take to add a single point to a scatterplot. One is to overlay the scatterplot with the plot produced by `scatteri`, an immediate scatterplot. In this example, we will plot the overall mean in both the $x$ and the $y$ variables. A linear regression of $y$ on $x$ always passes through this point. A regression with higher-order terms seldom passes through this point! ```{r, echo=FALSE, message=FALSE} require(knitr) if (file.exists("C:/Program Files (x86)/Stata14/StataMP-64.exe")) { statapath <- "C:/Program Files (x86)/Stata14/StataMP-64.exe" } else if (file.exists("C:/Program Files (x86)/Stata14/StataSE-64.exe")) { statapath <- "C:/Program Files (x86)/Stata14/StataSE-64.exe" } opts_chunk$set(engine="stata", engine.path=statapath, results="hide", comment="") source_hook <- knit_hooks$get("source") knit_hooks$set(source = function(x, options) { y <- strsplit(x, "\n")[[1]] # Find and remove graph export in Stata source graphexport <- grep("^graph export.*", y) if (length(graphexport)>0) {y <- y[-(graphexport)]} # Now treat the result as regular source code source_hook(y, options) }) writeLines(c("sysuse auto"), "profile.do") ``` ## Identifying the mean First we identify the mean values of $y$ and $x$ and save them as `local` macro variables. ```{r} sysuse auto summarize price, meanonly local X = r(mean) summarize mpg, meanonly local Y = r(mean) ``` ## Overlay scatter and scatteri Then we overlay the scatterplot and the immediate scatterplot of the single point. ```{r, echo=5} summarize price, meanonly local X = r(mean) summarize mpg, meanonly local Y = r(mean) twoway (scatter mpg price) (scatteri `Y' `X', msymbol(D)) graph export "addpoint_files/gm1.png", replace ``` The `msymbol(D)` gives us a large, diamond-shaped point marker. ![scatter and scatteri](Addpoint_files/gm1.png) ## Label the point, add a regression line ```{r, echo=5:6} summarize price, meanonly local X = r(mean) summarize mpg, meanonly local Y = r(mean) twoway (scatter mpg price)(lfit mpg price) /// (scatteri `Y' `X' (6) "Grand Mean", msymbol(D)) graph export "addpoint_files/gm2.png", replace ``` The `"(6)"` is a clock position for the point label. ![With label and regression](Addpoint_files/gm2.png) ## Use a quadratic fit, add better annotation ```{r, echo=5:8} summarize price, meanonly local X = r(mean) summarize mpg, meanonly local Y = r(mean) twoway (scatter mpg price)(qfit mpg price) /// (scatteri `Y' `X' (6) "Grand Mean", msymbol(D)), /// xtitle("Price ($)") ytitle("Mileage (mpg)") /// legend(order(1 "Observed" 2 "Predicted" 3 "Grand Mean")) graph export "addpoint_files/gm3.png", replace ``` Here we can see that the regression line misses the grand mean. ![Quadratic fit and annotation](Addpoint_files/gm3.png) ```{r, engine='R', echo=FALSE, message=FALSE} unlink("profile.do") ```