Stata notes

There are a couple of approaches one could take to add a single point to a scatterplot. One is to overlay the scatterplot with the plot produced by scatteri, an immediate scatterplot.

In this example, we will plot the overall mean in both the \(x\) and the \(y\) variables. A linear regression of \(y\) on \(x\) always passes through this point. A regression with higher-order terms seldom passes through this point!

Identifying the mean

First we identify the mean values of \(y\) and \(x\) and save them as local macro variables.

  sysuse auto
  summarize price, meanonly
  local X = r(mean)
  summarize mpg, meanonly
  local Y = r(mean)

Overlay scatter and scatteri

Then we overlay the scatterplot and the immediate scatterplot of the single point.

  twoway (scatter mpg price) (scatteri `Y' `X', msymbol(D))

The msymbol(D) gives us a large, diamond-shaped point marker.

scatter and scatteri

Label the point, add a regression line

  twoway (scatter mpg price)(lfit mpg price) ///
    (scatteri `Y' `X' (6) "Grand Mean", msymbol(D))

The "(6)" is a clock position for the point label.

With label and regression

Use a quadratic fit, add better annotation

  twoway (scatter mpg price)(qfit mpg price) ///
    (scatteri `Y' `X' (6) "Grand Mean", msymbol(D)), ///
    xtitle("Price ($)") ytitle("Mileage (mpg)") ///
    legend(order(1 "Observed" 2 "Predicted" 3 "Grand Mean"))

Here we can see that the regression line misses the grand mean.

Quadratic fit and annotation