|
Stata's bootstrap command makes
it easy to bootstrap just about any statistic you can calculate.
The results of almost all Stata commands can be bootstrapped
immediately, and it's relatively straightforward to put any other
results you've calculated in a form that can be bootstrapped. This
publication will show you how.
If you're just looking to bootstrap the results of a Stata command,
all you'll need is a basic familiarity with Stata. However, if
you need to calculate something else and then bootstrap it you'll
need to write an official Stata program to do so. If you're
not familiar with writing Stata programs (which are not the same
as do files) you'll want to take a look at Programming
in Stata,
in particular the section on programs.
Bootstrapping Results from Stata Commands
If there is a single Stata command that calculates the result
you need, you can simply tell Stata to bootstrap the result of
that command. As an example, load the automobile data that comes
with Stata and consider trying to find the mean of the mpg variable.
The summarize (sum)
command will do exactly what you want:
sysuse auto
sum mpg
But how will the bootstrap command find the number it needs in
all that output? The answer is that you will tell it where to
look in the return vector.
The Return Vector
In addition to the output you see on the screen or in your log,
all Stata commands quietly put their results in a return vector.
You can refer to this vector in subsequent commands, or in the
case of bootstrap you can tell it
what part of the return vector you care about.
To see the current contents of the return vector, type
return list
The sum command is a basic command (as opposed
to an estimation command) so its return vector is called r(). Looking over the list, you'll see that r(mean) is
the number you want. You're now ready to actually carry out the
bootstrap.
The bootstrap Command Syntax
The basic syntax for a bootstrap command is simple:
bootstrap var=r(result):
command
Here var is
simply what you want to call the quantity you're bootstrapping.
You're welcome to choose any name you like as long as it
meets the usual rules for a Stata variable name. In our case
meanMPG would be appropriate.
r(result)tells
the bootstrap command to look in the r() vector
for the particular result you're interested in. We're interested
in r(mean).
Finally command should
be replaced by the actual command that calculates the result
you want. In our case it's sum mpg.
Putting this all together, the command to bootstrap the mean of
the variable mpg is simply:
bootstrap meanMPG=r(mean): sum mpg
When you run that you'll get a note explaining that bootstrap can't
exclude missing values and such unless you're working with
an estimation command (more on them shortly) but that won't be
a problem in this case. The results you want will follow.
What if you wanted to bootstrap two different quantities? No problem,
just list them both:
bootstrap meanMPG=r(mean) maxMPG=r(max): sum
mpg
Bootstrapping Estimation Commands
Estimation commands are slightly different in that they store their
results in the e() vector rather than the r() vector and must
be listed by typing ereturn list rather
than return
list. To see this, type the following:
reg mpg weight foreign
ereturn list
One warning: bootstrap is an estimation command, so after running
it the e() vector will contain the results of the bootstrap,
not the results of the command you were bootstrapping.
Suppose you wanted to bootstrap the F-statistic for some odd reason.
All you'd have to do is type:
bootstrap f=e(F): reg mpg weight foreign
A more common example would be to bootstrap the coefficients.
They're available in e(b) but that's
a matrix so getting at them individually would be complicated.
Fortunately this is so common that it's set up as a convenient
special case: if
bootstrap is given nothing to bootstrap,
it will look for an
e(b) matrix and bootstrap that. Thus
all you need to type is:
bootstrap: reg mpg weight foreign
Bootstrap Options
The bootstrap command has a fair
number of options available. The nowarn option
will get rid of that annoying message about e(sample) that
you got after our first example. The reps option
allows you choose how many bootstrap replications are performed--the
default is 50. For a full list of options type help
bootstrap.
However, all these options apply to the bootstrap command
and not to the command you're bootstrapping. Thus they go after
a comma as always, but before the semicolon that ends the bootstrap
part of the command. You could then have another comma at the
end of the command to be bootstrapped, followed by options that
apply to it. For example:
bootstrap perc90=r(p90), nowarn reps(25):
sum mpg, detail
This bootstraps the 90th percentile of mpg,
which is only available if sum is
given the detail option. It also suppresses
the warning message and only does 25 replications. Note where all
those options are located in the command.
Bootstrapping Results You've Calculated
If all you need to do is bootstrap the results of existing Stata
commands you may want to stop here, especially since things are about
to get a bit more complicated.
If there's no single Stata command that will calculate a result
you want to bootstrap, you'll just have to write your own. As
you hopefully know from reading Programming
in Stata, Stata allows you to write programs that act like regular
Stata commands. You can even make them return results so that
they'll work with bootstrap.
Suppose you wanted to bootstrap the statistic "Mean weight of those
cars in the top quartile for mpg." Calculating the statistic
isn't hard to do:
xtile quartile=mpg, nq(4)
sum weight if quartile==4
But since it requires two commands it can't be bootstrapped as
is. We'll need to write a program that carries out those two
steps and returns the result in r().
program define topQuartileMean, rclass
xtile quartile=mpg, nq(4)
sum weight if quartile==4
return scalar tqm=r(mean)
drop quartile
end
Most of this should be familiar, but there are a few additional
elements that need to be explained.
Adding the rclass option to the
program definition tells Stata that this program will be putting
things in the r() vector. The
return command is what actually does
so, and scalar means this particular
result is a single number as opposed to a matrix like e(b).
We're calling our returned value tqm (as
in top quartile mean) so it will be available after the program
runs as r(tqm).
The number we're putting in it is the r(mean) result
from the previous sum command--not
a result of our topQuartileMean program,
which doesn't have results yet.
Also note that we
need to drop the quartile variable
at the end so we can create a new one in the next bootstrap replication.
Now that the program topQuartileMean is defined, you can use it
with bootstrap just like any other Stata command:
bootstrap tqm=r(tqm): topQuartileMean
You'll then get your results.
|