|
The SSCC is pleased to announce that FALCON, a 64-bit Linux server
with Stata/MP installed, is now available. FALCON is not intended
for general use, and for most purposes will perform no better than
HAL. But Stata jobs will run faster (some of them much faster) and
FALCON can run some jobs which simply cannot be run on other servers.
64-Bit Linux
Most Pentium processors (and their AMD competitors) work with 32
bits of information at a time. This limits the total amount of memory
the processor can keep track of to four gigabytes (2^32) and in
most cases the maximum that can be assigned to a given task is two
gigabytes. As memory has become cheaper and computers with two gigabytes
of RAM or more have become common, this limitation has become more
and more problematic.
The solution is processors that work with 64 bits of information
at a time, along with operating systems and software written specifically
for these processors. FALCON represents the first such server at
the SSCC. FALCON currently has four gigabytes of actual RAM and
a single job can use all of this and more (though claiming more
will cause the server to use swap space and Stata jobs which use
swap space will run very slowly). If we find that users need even
more memory than this, we can add more.
Note that most jobs which do not need more than two gigabytes of
RAM will not benefit from using a 64-bit processor. Continue to
run them on KITE or HAL--unless you're interested in Stata/MP.
Stata/MP
FALCON is also the first SSCC server to run Stata/MP, though it
will not be the last. Stata/MP is a special version of Stata written
to take advantage of machines with multiple processors.
All of the SSCC's servers have two physical processors. In addition,
a technology called hyperthreading allows each processor to focus
on two jobs at once, which is why the servers appear to have four.
Having additional processors allows the servers to work efficiently
on many jobs at the same time. However, a given job can only run
on one processor at a time. Thus having additional processors does
not help any given job run faster.
It could help if the job could be broken up into pieces that can
be run at the same time on different processors. This is known as
parallel processing, and given that chip makers are finding it easier
to provide multiple processors than to continue making their current
processors faster, it's a very hot topic in computing. But it's
not always possible: in many tasks, later steps cannot even begin
without the results of earlier steps. Even when parallelization
is possible, it requires rewriting the program to do it. (If you
are writing your own programs and want to take advantage of parallel
processing, keep in mind that the SSCC has a Beowulf
cluster for parallel processing jobs.)
In creating Stata/MP, Stata Corporation found as many places where
tasks could be parallelized as they could, and rewrote Stata itself
accordingly. Thus the do files you write will take advantage of
parallel processing automatically without any changes in Stata's
syntax. Your do files will run without any modification at all.
How much they will benefit from parallelization depends on what
you're doing. Linear regression, for example, can be heavily parallelized
and will run nearly twice as fast. Most time-series methods, on
the other hand, cannot be heavily parallelized and don't benefit
nearly as much. Stata Corporation claims an average performance
increase of 40%. (For full details, including a report of the performance
gains for every Stata command, see the Stata/MP
web site.)
Our plan is to make Stata/MP available on FALCON and through Condor.
We have ordered upgraded servers for our Condor flock, and they
should be ready for use with Stata/MP installed by fall. But at
this time FALCON is the only server with Stata/MP available. Since
Stata/MP is designed to take advantage of all the processors on
a server, it would defeat the purpose to have multiple Stata/MP
jobs running at the same time. Thus FALCON is restricted to just
one Stata user on a first-come, first-served basis--if someone is
already running a Stata job on FALCON, Stata will not allow anyone
else to run Stata there. Until Stata/MP is available through Condor
as well, we ask that users adhere to the following:
Guidelines for Use
- Only use FALCON to run Stata/MP if having your Stata job finish
more quickly would be a significant help to your work
- Only use FALCON for batch jobs, not interactive sessions
- Do not run Stata jobs that will take more than three days to
complete (jobs still running after three days will be terminated
automatically)
This will give everyone a fair chance to use Stata/MP. Once the
new Condor servers are available and there are other places where
people can use Stata/MP these restrictions will be lifted.
Keep in mind that you can see opportunities for parallelization
which Stata/MP cannot. For example, if you are running bootstrapping
replications using a foreach or while loop, Stata/MP will try to
parallelize each command but will still run the replications sequentially.
On the other hand you could modify your program so that six different
Condor jobs each do one-sixth of your replications and the whole
process will get done in one-sixth the time. Stata/MP might only
reduce the time by a third.
|