The Size of a Typical Deviation: s

In the basic linear regression model, the deviation of the response from the regression line, (y_i-left( beta_0+beta_1x_iright) ), is not an observable quantity because the parameters (beta_0) and (beta_1) are not observed. However, by using estimators (b_0) and (b_1), we can approximate this deviation using begin{equation*} e_i=y_i-widehat{y}_i=y_i-left( b_0+b_1x_iright) , end{equation*} known as the residual.

Residuals will be critical to developing strategies for improving model specification in Section 2.6. We now show how to use the residuals to estimate (sigma ^2). From a first course in statistics, we know that if one could observe the deviations (varepsilon _i), then a desirable estimate of (sigma ^2) would be ((n-1)^{-1}sum_{i=1}^{n}left( varepsilon _i-overline{varepsilon }right) ^2). Because {(varepsilon _i)} are not observed, we use the following.

Definition. An estimator of (sigma ^2), the mean square error (MSE), is defined as begin{equation} s^2=frac{1}{n-2}sum_{i=1}^{n}e_i{}^2. label{BLRs2} end{equation} The positive square root, (s=sqrt{s^2},) is called the residual standard deviation.

Comparing the definitions of (s^2) and ((n-1)^{-1}sum_{i=1}^{n}left( varepsilon _i-overline{varepsilon }right) ^2), you will see two important differences. First, in defining (s^2) we have not subtracted the average residual from each residual before squaring. This is because the average residual is zero, a special property of least squares estimation. This result can be shown using algebra and is guaranteed for all data sets.

Second, in defining (s^2) we have divided by n-2 instead of n-1. Intuitively, dividing by either n or n-1 tends to underestimate (sigma ^2). The reason is that, when fitting lines to data, we need at least two observations to determine a line. For example, we must have at least three observations for there to be any variability about a line. How much “freedom” is there for variability about a line? We will say that the error degrees of freedom is the number of observations available, n, minus the number of observations needed to determine a line, 2 (with symbols, df=n-2). However, as we saw in the least squares estimation subsection, we do not need to identify two actual observations to determine a line. The idea is that if an analyst knows the line and n-2 observations, then the remaining two observations can be determined, without variability. When dividing by n-2, it can be shown that (s^2) is an unbiased estimator of (sigma ^2).

We can also express (s^2) in terms of the sum of squares quantities. That is,
begin{equation*} s^2=frac{1}{n-2}sum_{i=1}^{n}left( y_i-widehat{y}_iright) ^2= frac{Error~SS}{n-2}=MSE. end{equation*}
This leads us to the analysis of variance, or ANOVA, table:

begin{matrix}
begin{array}{c}
text{ANOVA Table}
end{array}\
begin{array}{llcl} hline text{Source} & text{Sum of Squares} & df & text{Mean Square} \ hline text{Regression} & Regression~SS & 1 & Regression~MS \ text{Error} & Error~SS & n-2 & MSE \ text{Total} & Total~SS & n-1 & \ hline end{array} end{matrix}

The ANOVA table is merely a bookkeeping device used to keep track of the sources of variability; it routinely appears in statistical software packages as part of the regression output. The mean square column figures are defined to be the sums of square ((SS)) figures divided by their respective degrees of freedom ((df)). In particular, the mean square for errors ((MSE)) equals ( s^2) and the regression sum of squares equals the regression mean square. This latter property is specific to the regression with one variable case; it is not true where we consider more than one explanatory variable.

The error degrees of freedom in the ANOVA table is (n-2). The total degrees of freedom is n-1, reflecting the fact that the total sum of squares is centered about the mean (at least two observations are required for positive variability). The single degree of freedom associated with the regression portion means that the slope, plus one observation, is enough information to determine the line. This is because it takes two observations to determine a line and at least three observations for there to be any variability about the line.

The analysis of variance table for the lottery data is:

begin{matrix}
begin{array}{c}
text{ANOVA Table}
end{array}\
begin{array}{lrrr} hline text{ANOVA Table} \ hline text{Source} & text{Sum of Squares} & df & text{Mean Square} \ hline text{Regression} & {2,527,165,015} & {1} & {2,527,165,015} \ text{Error} & {690,116,755} & {48} & {14,377,432} \ Total & {3,217,281,770} & {49} & \ hline end{array} end{matrix}

From this table, you can check that (R^2=78.5%) and (s=3,792.)

[WpProQuiz 11]

[raw] [/raw]