Computers and statistical software packages that perform specialized calculations play a vital role in modern-day statistical analyses. Inexpensive computing capabilities have allowed data analysts to focus on relationships of interest. Specifying models that are attractive merely for their computational simplicity is much less important now compared to times before the widespread availability of inexpensive computing. An important theme of this text is to focus on relationships of interest and to rely on widely available statistical software to estimate the models that we specify.
With any computer package, generally the most difficult parts of operating the package are the (i) input, (ii) using the commands and (iii) interpreting the output. You will find that most modern statistical software packages accept spreadsheet or text-based files, making input of data relatively easy. Personal computer statistical software packages have menu-driven command languages with easily accessible on-line help facilities. Once you decide what to do, finding the right commands is relatively easy.
This section provides guidance in interpreting the output of statistical packages. Most statistical packages generate similar output. Below, three examples of standard statistical software packages, EXCEL, SAS and R are given. The annotation symbol “[.]” marks a statistical quantity that is described in the legend. Thus, this section provides a link between the notation used in the text and output from some of the standard statistical software packages.
EXCEL Output
Regression Statistics Multiple R 0.886283[F] R Square 0.785497[k] Adjusted R Square 0.781028[l] Standard Error 3791.758[j] Observations 50[a] ANOVA df SS MS F Significance F Regression 1[m] 2527165015 [p] 2527165015 [s] 175.773[u] 1.15757E-17[v] Residual 48[n] 690116754.8[q] 14377432.39[t] Total 49[o] 3217281770 [r] Coefficients Standard Error t Stat P-value Intercept 469.7036[b] 702.9061896[d] 0.668230846[f] 0.507187[h] X Variable 1 0.647095[c] 0.048808085[e] 13.25794257[g] 1.16E-17[i]
SAS Output
The SAS System The REG Procedure Dependent Variable: SALES Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1[m] 2527165015[p] 2527165015[s] 175.77[u] <.0001[v] Error 48[n] 690116755[q] 14377432[t] Corrected Total 49[o] 3217281770[r] Root MSE 3791.75848[j] R-Square 0.7855[k] Dependent Mean 6494.82900[H] Adj R-Sq 0.7810[l] Coeff Var 58.38119[I] Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 469.70360[b] 702.90619[d] 0.67[f] 0.5072[h] POP POP 1 0.64709[c] 0.04881[e] 13.26[g] <.0001[i]
R Output
Analysis of Variance Table Response: SALES Df Sum Sq Mean Sq F value Pr(>F) POP 1[m] 2527165015[p] 2527165015[s] 175.77304[u] <2.22e-16[v]*** Residuals 48[n] 690116755[q] 14377432[t] --- Call: lm(formula = SALES ~ POP) Residuals: Min 1Q Median 3Q Max -6047 -1461 -670 486 18229 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 469.7036[b] 702.9062[c] 0.67[f] 0.51 [h] POP 0.6471[c] 0.0488[e] 13.26[g] <2e-16 ***[i] --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3790[j] on 48[n] degrees of freedom Multiple R-Squared: 0.785[k], Adjusted R-squared: 0.781[l] F-statistic: 176[u] on 1[m] and 48[n] DF, p-value: <2e-16[v]
Legend Annotation Definition, Symbol
[a] Number of observations (n).
[b] The estimated intercept (b_0).
[c] The estimated slope (b_1).
[d] The standard error of the intercept, (se(b_0)).
[e] The standard error of the intercept, (se(b_1)).
[f] The (t)-ratio associated with the intercept, (t(b_0) = b_0/se(b_0)).
[g] The (t)-ratio associated with the slope, (t(b_1) = b_1/se(b_1)).
[h] The (p)-value associated with the intercept; here, ( p-value=Pr(|t_{n-2}|>|t(b_0)|)), where (t(b_0)) is the realized value (0.67 here) and (t_{n-2}) has a (t)-distribution with (df=n-2).
[i] The (p)-value associated with the slope; here, ( p-value=Pr(|t_{n-2}|>|t(b_1)|)), where (t(b_1)) is the realized value (13.26 here) and (t_{n-2}) has a (t)-distribution with ( df=n-2 ).
[j] The residual standard deviation, (s).
[k] The coefficient of determination, (R^2).
[l] The coefficient of determination adjusted for degrees of freedom, (R_{a}^2). (This term will be defined in Chapter 3.)
[m] Degree of freedom for the regression component. This is 1 for one explanatory variable.
[n] Degree of freedom for the error component, (n-2), for regression with one explanatory variable.
[o] Total degrees of freedoms, (n-1).
[p] The regression sum of squares, (Regression~SS).
[q] The error sum of squares, (Error~SS).
[r] The total sum of squares, (Total~SS).
[s] The regression mean square, (Regression~MS = Regression~SS/1), for one explanatory variable.
[t] The error mean square, (s^2=Error~MS = Error~SS/(n-2)), for one explanatory variable.
[u] The (F-ratio=(Regression~MS)/(Error~MS)). (This term will be defined in Chapter 3.)
[v] The (p)-value associated with the (F-ratio). (This term will be defined in Chapter 3.)
[w] The observation number, (i).
[x] The value of the explanatory variable for the (i)th observation, (x_i).
[y] The response for the (i)th observation, (y_i).
[z] The fitted value for the (i)th observation, (widehat{y}_i).
[A] The standard error of the fit, (se(widehat{y}_i)).
[B] The residual for the (i)th observation, (e_i).
[C] The standardized residual for the (i)th observation, (e_i/se(e_i)). The standard error (se(e_i)) will be defined in Section 5.3.1.
[F] The multiple correlation coefficient is the square root of the coefficient of determination, (R=sqrt{R^2}). This will be defined in Chapter 3.
[G] The standardized coefficient is (b_1s_x/s_y) For regression with one explanatory variable, this is equivalent to (r), the correlation coefficient.
[H] The average response, (overline{y}).
[I] The coefficient of variation of the response is (s_y/overline{y}). SAS prints out (100s_y/overline{y}).