2.8 Illustrative Regression Computer Output

Computers and statistical software packages that perform specialized calculations play a vital role in modern-day statistical analyses. Inexpensive computing capabilities have allowed data analysts to focus on relationships of interest. Specifying models that are attractive merely for their computational simplicity is much less important now compared to times before the widespread availability of inexpensive computing. An important theme of this text is to focus on relationships of interest and to rely on widely available statistical software to estimate the models that we specify.

With any computer package, generally the most difficult parts of operating the package are the (i) input, (ii) using the commands and (iii) interpreting the output. You will find that most modern statistical software packages accept spreadsheet or text-based files, making input of data relatively easy. Personal computer statistical software packages have menu-driven command languages with easily accessible on-line help facilities. Once you decide what to do, finding the right commands is relatively easy.

This section provides guidance in interpreting the output of statistical packages. Most statistical packages generate similar output. Below, three examples of standard statistical software packages, EXCEL, SAS and R are given. The annotation symbol “[.]” marks a statistical quantity that is described in the legend. Thus, this section provides a link between the notation used in the text and output from some of the standard statistical software packages.

EXCEL Output

Regression Statistics 
Multiple R              0.886283[F]
R Square                0.785497[k]
Adjusted R Square       0.781028[l]
Standard Error          3791.758[j]
Observations                  50[a]

ANOVA          df         SS              MS             F       Significance F 
Regression   1[m]   2527165015 [p]  2527165015 [s]  175.773[u]   1.15757E-17[v] 
Residual    48[n]   690116754.8[q]  14377432.39[t] 
Total       49[o]   3217281770 [r] 

         Coefficients    Standard Error      t Stat         P-value 
Intercept    469.7036[b] 702.9061896[d]   0.668230846[f] 0.507187[h] 
X Variable 1 0.647095[c] 0.048808085[e]   13.25794257[g] 1.16E-17[i] 

SAS Output

                           The SAS System
                          The REG Procedure 
                      Dependent Variable: SALES 

                         Analysis of Variance                                
                                   Sum of           Mean 
Source                   DF        Squares         Square     F Value      Pr > F 
Model                  1[m]   2527165015[p]   2527165015[s]   175.77[u]    <.0001[v] 
Error                 48[n]    690116755[q]     14377432[t] 
Corrected Total       49[o]   3217281770[r] 

         Root MSE           3791.75848[j]    R-Square     0.7855[k]
         Dependent Mean     6494.82900[H]    Adj R-Sq     0.7810[l]
         Coeff Var            58.38119[I] 

                         Parameter Estimates
                                Parameter       Standard 
Variable     Label        DF     Estimate        Error      t  Value    Pr > |t|
Intercept    Intercept     1   469.70360[b]  702.90619[d]    0.67[f]    0.5072[h] 
POP          POP           1     0.64709[c]    0.04881[e]   13.26[g]    <.0001[i] 

R Output

Analysis of Variance Table 

Response: SALES
           Df     Sum Sq      Mean Sq        F value         Pr(>F) 
POP        1[m] 2527165015[p] 2527165015[s] 175.77304[u] <2.22e-16[v]*** 
Residuals 48[n]  690116755[q]   14377432[t] 
--- 
Call: lm(formula = SALES ~ POP) 

Residuals:
    Min     1Q Median     3Q    Max
  -6047  -1461   -670    486  18229 

Coefficients:
             Estimate     Std. Error t value     Pr(>|t|) 
(Intercept) 469.7036[b] 702.9062[c]  0.67[f]    0.51     [h] 
POP           0.6471[c]   0.0488[e] 13.26[g]   <2e-16 ***[i] 
--- 
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 3790[j] on 48[n] degrees of freedom
Multiple R-Squared: 0.785[k],      Adjusted R-squared: 0.781[l] 
F-statistic:  176[u] on 1[m] and 48[n] DF,  p-value: <2e-16[v]

Legend Annotation Definition, Symbol

[a] Number of observations (n).
[b] The estimated intercept (b_0).
[c] The estimated slope (b_1).
[d] The standard error of the intercept, (se(b_0)).
[e] The standard error of the intercept, (se(b_1)).
[f] The (t)-ratio associated with the intercept, (t(b_0) = b_0/se(b_0)).
[g] The (t)-ratio associated with the slope, (t(b_1) = b_1/se(b_1)).
[h] The (p)-value associated with the intercept; here, ( p-value=Pr(|t_{n-2}|>|t(b_0)|)), where (t(b_0)) is the realized value (0.67 here) and (t_{n-2}) has a (t)-distribution with (df=n-2).
[i] The (p)-value associated with the slope; here, ( p-value=Pr(|t_{n-2}|>|t(b_1)|)), where (t(b_1)) is the realized value (13.26 here) and (t_{n-2}) has a (t)-distribution with ( df=n-2 ).
[j] The residual standard deviation, (s).
[k] The coefficient of determination, (R^2).
[l] The coefficient of determination adjusted for degrees of freedom, (R_{a}^2). (This term will be defined in Chapter 3.)
[m] Degree of freedom for the regression component. This is 1 for one explanatory variable.
[n] Degree of freedom for the error component, (n-2), for regression with one explanatory variable.
[o] Total degrees of freedoms, (n-1).
[p] The regression sum of squares, (Regression~SS).
[q] The error sum of squares, (Error~SS).
[r] The total sum of squares, (Total~SS).
[s] The regression mean square, (Regression~MS = Regression~SS/1), for one explanatory variable.
[t] The error mean square, (s^2=Error~MS = Error~SS/(n-2)), for one explanatory variable.
[u] The (F-ratio=(Regression~MS)/(Error~MS)). (This term will be defined in Chapter 3.)
[v] The (p)-value associated with the (F-ratio). (This term will be defined in Chapter 3.)
[w] The observation number, (i).
[x] The value of the explanatory variable for the (i)th observation, (x_i).
[y] The response for the (i)th observation, (y_i).
[z] The fitted value for the (i)th observation, (widehat{y}_i).
[A] The standard error of the fit, (se(widehat{y}_i)).
[B] The residual for the (i)th observation, (e_i).
[C] The standardized residual for the (i)th observation, (e_i/se(e_i)). The standard error (se(e_i)) will be defined in Section 5.3.1.
[F] The multiple correlation coefficient is the square root of the coefficient of determination, (R=sqrt{R^2}). This will be defined in Chapter 3.
[G] The standardized coefficient is (b_1s_x/s_y) For regression with one explanatory variable, this is equivalent to (r), the correlation coefficient.
[H] The average response, (overline{y}).
[I] The coefficient of variation of the response is (s_y/overline{y}). SAS prints out (100s_y/overline{y}).

[raw] [/raw]