|
The SSCC's Condor cluster makes a tremendous amount of computing
power available to SSCC users. Condor can be used to run Stata,
SAS, Matlab, R, C/C++, and Fortran jobs.
Complete documentation
on Condor is available from the UW Computer Science Department.
This publication will give you specific information about our cluster,
easy ways we've created to use Condor (including the ability to submit Stata jobs to Condor via the web), and an introduction to some
of the basic Condor functions.
The Hardware
The cluster is currently made up of
30
virtual machines running Linux. For details about their specifications, please see Computing Resources at the SSCC. The Condor machines have
access to the entire Linux file system just like the other Linux
servers. Most Linux programs do not have to be changed at all to run on
Condor, though programs written use Windows Stata, SAS, Matlab, or R may require some modifications.
The Condor Software
Condor is an open source
project at the University of Wisconsin's Computer Science Department.
It is designed to take advantage of today's cheap, fast, but highly
distributed computing power.
Condor groups computers together into a "flock" and creates
a central queue of jobs. It then sends these jobs out to idle computers
when they become available. This eliminates the need for users to
try to identify which computers are the least busy. Once a job is assigned to a computer, it gets exclusive use of that computer until the job is done.
The price of Condor is that it takes up to 30 seconds to put a
job in the queue, assign it to a machine, and actually start
running it (though the machines can be busy doing other jobs
during this time). Thus if you are running small jobs and want
immediate results, it is better to continue running them on a
normal server. As a rule of thumb, if your job will take more
than a few minutes on one of our Linux servers, it's a good candidate
for Condor.
Easy Ways to Submit
Jobs to Condor
In order to submit jobs to Condor you must be logged in to Kite
(type ssh kite at the Linux prompt
on any other server). Kite is technically part of the flock but
it is used only to submit jobs, never to run them.
Stata
To submit a Stata job to Condor, type:
condor_stata -b do dofile
where dofile should be replaced by the name of Stata do file you want to run. (Note that this is almost identical to running a do file in batch mode.) Stata jobs submitted to Condor will use Stata/MP, the multiprocessor version of Stata. However, if the Condor flock is busy they may only get to use one processor.
Note that you can also submit Stata jobs to Condor via the web, completely avoiding the need to use Linux.
SAS
To submit a SAS job to Condor, type:
condor_sas program &
where program should be replaced by the name of the SAS
program you want to run.
The Condor flock has enough licenses to run 12 SAS jobs at a time. Other jobs will not use those machines that have SAS licenses if others are available (except for the newer and faster machines, which are the first choice of all Condor jobs).
Matlab
To submit a Matlab job to Condor, type:
condor_matlab program.m program.log &
where program should be replaced by the name of the Matlab
program you want to run. (The command submitted to the server
is actually /software/matlab/bin/matlab
-nojvm -nodisplay < program.m > program.log)
Other Jobs
Use condor_do to
run any other simple Linux job. The syntax is simply:
condor_do "command"
&
where command
is any command you could type at the Linux prompt, including arguments.
For example, to run an R program called program.R type:
condor_do "R < program.R > program.out
--no-save --vanilla" &
Monitoring the Status
of Condor Jobs
There are two commands that can tell you the status of the Condor
flock or your job.
condor_status tells you the state
of all the Condor machines, including whether they are available
for new jobs.
condor_q tells you the status of
all the jobs currently running or waiting to be run, including yours.
Note that our server
status page also tells you how many Condor jobs are running
and how many CPUs are available.
Managing Condor Jobs
If you change your mind, condor_rm
can remove jobs from the Condor queue.
condor_rm ID
will remove the job with the specified ID. Use condor_q
to find the ID of your job.
condor_rm <your login
name>
will remove all jobs belonging to you. For example, type:
condor_rm 151
to remove job 151, or
condor_rm rdimond
to remove all jobs belonging to rdimond. You cannot remove other peoples'
jobs, for obvious reasons. Note that jobs are marked for removal
immediately, but it may be a few minutes before they are actually
removed.
Using condor_submit
to Submit Multiple Jobs
condor_submit allows you to submit
many jobs at the same time, and gives you more control over
how it is run. The price is that you have to create a submit
file that tells Condor how to run your job. Most Condor users
at the SSCC will never need to use use anything but the scripts we've written for you and can safely skip
this section.
For full documentation of the many available options, please see
the Condor
Documentation. The following is an example submit file for running
a program you've written, including comments to explain what it's
doing:
# Specify that the program was not linked to
the Condor libraries
universe=vanilla
# The program to run
executable=a.out
# Where to put the output
output=outputfile
# Use fastest available machine, but not one with SAS if possible
rank = ((Activity==\"Idle\")*100) + (KFlops/1000000) - (HAS_SAS==TRUE)
# Save your current environment, in particular the current directory
getenv = true
# Don't send me email when it's done
Notification = never
# Actually queue the job
queue
Since the SSCC's Condor flock does not checkpoint jobs there's
no real need to link your programs to the Condor libraries. But
if you do, change the first line to universe=standard.
The rank line specifies that your job will prefer to go to 1) an idle machine, 2) a newer and faster machine and 3) a machine that doesn't have SAS, saving those machines for SAS jobs. However if the only available machines have SAS your program will use them, and if the newer and faster machines are available it will use them even though they all have SAS.
|