Confidence Intervals

Investigators often cite the formal hypothesis testing mechanism to respond to the question “Does the explanatory variable have a real influence on the response?” A natural follow-up question is “To what extent does x affect y?” To a certain degree, one could respond using the size of the t-ratio or the p-value. However, in many instances a confidence interval for the slope is more useful.

To introduce confidence intervals for the slope, recall that (b_1) is our point estimator of the true, unknown slope (beta_1). Section 2.4 argued that this estimator has standard error (se(b_1)) and that (left( b_1-beta_1right) /se(b_1)) follows a t-distribution with n-2 degrees of freedom. Probability statements can be inverted to yield confidence intervals. Using this logic, we have the following confidence interval for the slope (beta_1).

Definition. A (100(1-alpha))% confidence interval for the slope (beta_1) is begin{equation}label{E2:ConfIntb1} b_1pm t_{n-2,1-alpha /2} ~se(b_1). end{equation}

As with hypothesis testing, (t_{n-2,1-alpha /2}) is the (1-( alpha )/2)th percentile from the t-distribution with df=n-2 degrees of freedom. Because of the two-sided nature of confidence intervals, the percentile is 1 – (1 – confidence level) / 2. In this text, for notational simplicity we generally use a 95% confidence interval, so the percentile is 1-(1-.0.95)/2 = 0.975. The confidence interval provides a range of reliability that measures the usefulness of the estimate.

In Section 2.1, we established that the least squares slope estimate for the lottery sales example is (b_1)=0.647. The interpretation is that if a zip code’s population differs by 1,000, then we expect mean lottery sales to differ by $647. How reliable is this estimate? It turns out that ( se(b_1)=0.0488) and thus an approximate 95% confidence interval for the slope is begin{equation*} 0.647pm (2.011)(.0488), end{equation*} or (0.549, 0.745). Similarly, if population differs by 1,000, a 95% confidence interval for the expected change in sales is (549, 745). Here, we use the (t)-value (t_{48,0.975}=2.011) because there are 48 (= n-2) degrees of freedom and, for a 95% confidence interval, we need the 97.5th percentile.

[raw] [/raw]