11.1 Binary Dependent Variables

In this section, you learn how to:
  • Describe the Bernoulli distribution
  • Describe the linear probability model and its limitations

Video Overview of the Section (Alternative .mp4 Version – 6:39 min)


We have already introduced binary variables as a special type of discrete variable that can be used to indicate whether or not a subject has a characteristic of interest, such as gender for a person or ownership of a captive insurance company for a firm. Binary variables also describe whether or not an event of interest has occurred, such as an accident. A model with a binary dependent variable allows one to predict whether an event has occurred or a subject has a characteristic of interest.

Example: MEPS Expenditures. Section 11.4 will describe an extensive database from the Medical Expenditure Panel Survey (MEPS) on hospitalization utilization and expenditures. For these data, we will consider
begin{eqnarray*}small
y_i = left{
begin{array}{ll}
1 & itext{th person was hospitalized during the period} \
0 & text{otherwise}
end{array}
right. .
end{eqnarray*} There are n=2,000 persons in this sample, distributed as:
begin{matrix}{text{Table 11.1 Hospitalization by Gender}} \ small
begin{array}{ll|ll}hline
& & text{Male} & text{Female} \ hline
text{Not hospitalized} & y=0 & 902 (95.3%) & ~~~~941 (89.3%) \
text{Hospitalized} & y=1 & ~~44 ( 4.7%) & ~~~~113 (10.7%) \
text{Total} & & 946 & 1,054 \ hline
end{array}
end{matrix} Table 11.1 suggests that gender has an important influence on whether someone becomes hospitalized.

R Code and Output for Table 11.1

Like the linear regression techniques introduced in prior chapters, we are interested in using characteristics of a person, such as their age, sex, education, income and prior health status, to help explain the dependent variable y. Unlike the prior chapters, now the dependent variable is discrete and not even approximately normally distributed. In limited circumstances, linear regression can be used with binary dependent variables – this application is known as a linear probability model.

[raw] [/raw]