1.1 What is Regression Analysis?

In this section, you learn how to:
  • Describe regression briefly, i.e., in a nutshell
  • Explain Galton’s height example as a regression application

Video Overview of the Section (Alternative .mp4 Version – 5:46 min)

Statistics is about data. As a discipline, it is about the collection, summarization and analysis of data to make statements about the real world. When analysts collect data, they are really collecting information that is quantified, that is, transformed to a numerical scale. There are easy, well-understood rules for reducing the data, using either numerical or graphical summary measures. These summary measures can then be linked to a theoretical representation, or model, of the data. With a model that is calibrated by data, statements about the world can be made.

Statistical methods have had a major impact on several fields of study.

  • In the area of data collection, the careful design of sample surveys is crucial to market research groups and to the auditing procedures of accounting firms.
  • Experimental design is a second subdiscipline devoted to data collection. The focus of experimental design is on constructing methods of data collection that will extract information in the most efficient way possible. This is especially important in fields such as agriculture and engineering where each observation is expensive, possibly costing millions of dollars.
  • Other applied statistical methods focus on managing and predicting data. Process control deals with monitoring a process over time and deciding when intervention is most fruitful. Process control helps manage the quality of goods produced by manufacturers.
  • Forecasting is about extrapolating a process into the future, whether it be sales of a product or movements of an interest rate.

Regression analysis is a statistical method used to analyze data. As we will see, the distinguishing feature of this method is the ability to make statements about variables after having controlled for values of known explanatory variables. Important as other methods are, it is regression analysis that has been most influential. To illustrate, an index of business journals, ABI/INFORM, lists over twenty-four thousand articles using regression techniques over the thirty-year period 1978-2007. And these are only the applications that were considered innovative enough to be published in scholarly reviews!

Regression analysis of data is so pervasive in modern business that it is easy to overlook the fact that the methodology is barely over 120 years old. Scholars attribute the birth of regression to the 1885 presidential address of Sir Francis Galton to the anthropological section of the British Association of the Advancement of Sciences. In that address, described in Stigler (1986), Galton provided a description of regression and linked it to normal curve theory. His discovery arose from his studies of properties of natural selection and inheritance.

To illustrate a data set that can be analyzed using regression methods, Table 1.1 displays some data included in Galton’s 1885 paper.This table displays the heights of 928 adult children, classified by an index of their parents’ height. Here, all female heights were multiplied by 1.08, and the index was created by taking the average of the father’s height and rescaled mother’s height. Galton was aware that the parents’ and the adult child’s height could each be adequately approximated by a normal curve. In developing regression analysis, he provided a single model for the joint distribution of heights.

\begin{matrix}
\begin{array}{c}
\text{Table 1.1. Galton’s 1885 Regression Data}
\end{array}\scriptsize \\
\tiny
\begin{array}{c|ccccccccccc|r} \hline \text{Height of} & \text{Parents’} & \text{Height} \\ \text{adult child} \ \text{in inches} & < 64.0 & 64.5 & 65.5 & 66.5 & 67.5 & 68.5 & 69.5 & 70.5 & 71.5 & 72.5 & > 73.0 & \text{Total} \\ \hline > 73.7 & – & – & – & – & – & – & 5 & 3 & 2 & 4 & – & 14 \\ 73.2 & – & – & – & – & – & 3 & 4 & 3 & 2 & 2 & 3 & 17 \\ 72.2 & – & – & 1 & – & 4 & 4 & 11 & 4 & 9 & 7 & 1 & 41 \\ 71.2 & – & – & 2 & – & 11 & 18 & 20 & 7 & 4 & 2 & – & 64 \\ 70.2 & – & – & 5 & 4 & 19 & 21 & 25 & 14 & 10 & 1 & – & 99 \\ 69.2 & 1 & 2 & 7 & 13 & 38 & 48 & 33 & 18 & 5 & 2 & – & 167 \\ 68.2 & 1 & – & 7 & 14 & 28 & 34 & 20 & 12 & 3 & 1 & – & 120 \\ 67.2 & 2 & 5 & 11 & 17 & 38 & 31 & 27 & 3 & 4 & – & – & 138 \\ 66.2 & 2 & 5 & 11 & 17 & 36 & 25 & 17 & 1 & 3 & – & – & 117 \\ 65.2 & 1 & 1 & 7 & 2 & 15 & 16 & 4 & 1 & 1 & – & – & 48 \\ 64.2 & 4 & 4 & 5 & 5 & 14 & 11 & 16 & – & – & – & – & 59 \\ 63.2 & 2 & 4 & 9 & 3 & 5 & 7 & 1 & 1 & – & – & – & 32 \\ 62.2 & – & 1 & – & 3 & 3 & – & – & – & – & – & – & 7 \\ < 61.2 & 1 & 1 & 1 & – & – & 1 & – & 1 & – & – & – & 5 \\ \hline \text{Total} & 14 & 23 & 66 & 78 & 211 & 219 & 183 & 68 & 43 & 19 & 4 & 928 \\ \hline \end{array}\scriptsize
\\ \begin{array}{c}
Source: \text{Stigler (1986).}
\end{array}
\end{matrix}

Table 1.1 shows that much of the information concerning the height of an adult child can be attributed to, or “explained,” in terms of the parents’ height. Thus, we use the term explanatory variable for measurements that provide information about a variable of interest. Regression analysis is a method to quantify the relationship between a variable of interest and explanatory variables. The methodology used to study the data in Table 1.1 can also be used to study actuarial and other risk management problems, the thesis of this book.

[raw] [/raw]