Is the Explanatory Variable Important?: The t-Test

We respond to the question of whether the explanatory variable is important by investigating whether or not (beta_1)=0. The logic is that if (beta_1)=0, then the basic linear regression model no longer includes an explanatory variable x. Thus, we translate our question of the importance of the explanatory variable into a narrower question that can be answered using the hypothesis testing framework. This narrower question is, is ( H_0:beta_1)=0 valid? We respond to this question by looking at the test statistic:

begin{equation*} t-mathrm{ratio}=frac{mathrm{estimator-hypothesized~value~of~Parameter}} {mathrm{standard~error~of~the~estimator}}. end{equation*}

For the case of (H_0:beta_1)=0 , we examine t-ratio ( t(b_1)=b_1/se(b_1)) because the hypothesized value of (beta_1) is 0. This is the appropriate standardization because, under the null hypothesis and the model assumptions described in Section 2.4, the sampling distribution of (t(b_1)) can be shown to be the t-distribution with df=n-2 degrees of freedom. Thus, to test the null hypothesis (H_0) against the alternative (H_{a}:beta_1neq 0), we reject (H_0) if favor of (H_{a}) if (|t(b_1)|) exceeds a t-value. Here, this t-value is a percentile from the t-distribution using df=n-2 degrees of freedom. We denote the significance level as (alpha ) and this t-value as (t_{n-2,1-alpha /2}).

Example: Lottery Sales – Continued. For the lottery sales example, the residual standard deviation is s=3,792. From Table 2.1, we have (s_x = 11,098). Thus, the standard error of the slope is (se(b_1) = 3792/(11098sqrt{50-1})=0.0488). From Section 2.1, the slope estimate is (b_1=0.647). Thus, the t-statistic is (t(b_1)) = 0.647/0.0488 = 13.4. We interpret this by saying that the slope is 13.4 standard errors above zero. For the significance level, we use the customary value of (alpha ) = 5%. The 97.5th percentile from a t-distribution with df=50-2=48 degrees of freedom is (t_{48,0.975}=2.011). Because |13.4|>2.011, we reject the null hypothesis that the slope (beta_1)=0 in favor of the alternative that (beta_1 neq 0).

Making decisions by comparing a t-ratio to a t-value is called a t-test. Testing (H_0:beta_1=0) versus (H_{a}:beta_1neq 0) is just one of many hypothesis tests that can be performed, although it is the most common. Table 2.3 outlines alternative decision-making procedures. These procedures are for testing (H_0:beta_1 = d) where d is a user-prescribed value that may be equal to zero or any other known value. For example, in our Section 2.7 example, we will use d=1 to test financial theories about the stock market.

begin{matrix}
begin{array}{c} text{Table 2.3 Decision-Making Procedures for Testing} ~H_0:beta_1 = d
end{array}\scriptsize
begin{array}{cc} hline
text{Alternative Hypothesis} (H_{a}) & text{Procedure: Reject } H_0 text{ in favor of } H_{a} text{ if} \ hline
beta_1 > d & t-mathrm{ratio} > t_{n-2,1-alpha }. \
beta_1 < d & t-mathrm{ratio} < -t_{n-2,1-alpha }. \
beta_1neq d & |t-mathrm{ratio}mathit{|} > t_{n-2,1-alpha /2}. \ hline
end{array}\small
begin{array}{l}text{Notes: The significance level is } alpha . text{ Here, }t_{n-2,1-alpha }text{ is the }(1-alpha )text{th percentile}\ text{ from the t-distribution using }df=n-2text{ degrees of freedom.} \ text{ The test statistic is }t-mathrm{ratio} = (b_1 -d)/se(b_1) . \ hline end{array}
end{matrix}

Alternatively, one can construct probability p– values and compare these to given significant levels. The p-value is a useful summary statistic for the data analyst to report since it allows the report reader to understand the strength of the deviation from the null hypothesis. Table 2.4 summarizes the procedure for calculating p-values.

begin{matrix}
begin{array}{c}text{Table 2.4 Probability Values for Testing }H_0:beta_1 = d
end{array}\scriptsize
begin{array}{cccc} hline text{Alternative} & & & \ text{Hypothesis} (H_a) & beta_1 > d & beta_1 < d & beta_1neq d \ hline text{p-value} & Pr(t_{n-2} > t-mathrm{ratio}) & Pr(t_{n-2} < t-mathrm{ratio}) & Pr (|t_{n-2}| > |t-mathrm{ratio}mathit{|}) \ hline end{array}\scriptsize
begin{array}{l}
text{Notes: Here, }t_{n-2}text{ is a t-distributed random Variable with }df=n-2 text{ degrees of freedom.}phantom{XX}\
text{The Test Statistic is }t-mathrm{ratio} = (b_1 -d) / se(b_1) .\hline
end{array}
end{matrix}

Another interesting way of addressing the question of the importance of an explanatory variable is through the correlation coefficient. Remember that the correlation coefficient is a measure of linear relationship between x and y. Let’s denote this statistic by (r(y,x)). This quantity is unaffected by scale changes in either variable. For example, if we multiply the x variable by the number (b_1), then the correlation coefficient remains unchanged. Further, correlations are unchanged by additive shifts. Thus, if we add a number, say (b_0), to each x variable, then the correlation coefficient remains unchanged. Using a scale change and an additive shift on the x variable can be used to produce the fitted value ( widehat{y}=b_0+b_1x). Thus, using notation, we have (r(y,x)=r(y, widehat{y}).) We may thus interpret the correlation between the responses and the explanatory variable to be equal to the correlation between the responses and the fitted values. This leads then to the following interesting algebraic fact, (R^2=r^2.) That is, the coefficient of determination equals the correlation coefficient squared. This is much easier to interpret if one thinks of (r) as the correlation between observed and fitted values.

[raw] [/raw]