An Introduction to Condor

The SSCC's Condor cluster makes a tremendous amount of computing power available to SSCC users. Condor can be used to run Stata, SAS, Matlab, R, C/C++, and Fortran jobs.

Complete documentation on Condor is available from the UW Computer Science Department. This article will give you specific information about our cluster, easy ways we've created to use Condor (including the ability to submit Stata jobs to Condor via the web), and an introduction to some of the basic Condor functions.

The Hardware

The cluster is currently made up of 30 virtual machines running Linux. For details about their specifications, please see Computing Resources at the SSCC. The Condor machines have access to the entire Linux file system just like the other Linux servers. Most Linux programs do not have to be changed at all to run on Condor, though programs written use Windows Stata, SAS, Matlab, or R may require some modifications.

The Condor Software

Condor is an open source project at the University of Wisconsin's Computer Science Department. It is designed to take advantage of today's cheap, fast, but highly distributed computing power.

Condor groups computers together into a "flock" and creates a central queue of jobs. It then sends these jobs out to idle computers when they become available. This eliminates the need for users to try to identify which computers are the least busy. Once a job is assigned to a computer, it gets exclusive use of that computer until the job is done.

The price of Condor is that it takes up to 30 seconds to put a job in the queue, assign it to a machine, and actually start running it (though the machines can be busy doing other jobs during this time). Thus if you are running small jobs and want immediate results, it is better to continue running them on a normal server. As a rule of thumb, if your job will take more than a few minutes on one of our Linux servers, it's a good candidate for Condor.

Easy Ways to Submit Jobs to Condor

In order to submit jobs to Condor you must be logged in to Kite (type ssh kite at the Linux prompt on any other server). Kite is technically part of the flock but it is used only to submit jobs, never to run them.

Stata

To submit a Stata job to Condor, type:

condor_stata -b do dofile

where dofile should be replaced by the name of Stata do file you want to run. (Note that this is almost identical to running a do file in batch mode.) Stata jobs submitted to Condor will use Stata/MP, the multiprocessor version of Stata. However, if the Condor flock is busy they may only get to use one processor.

Note that you can also submit Stata jobs to Condor via the web, completely avoiding the need to use Linux.

SAS

To submit a SAS job to Condor, type:

condor_sas program &

where program should be replaced by the name of the SAS program you want to run.

The Condor flock has enough licenses to run 12 SAS jobs at a time. Other jobs will not use those machines that have SAS licenses if others are available (except for the newer and faster machines, which are the first choice of all Condor jobs).

Matlab

To submit a Matlab job to Condor, type:

condor_matlab program.m program.log &

where program should be replaced by the name of the Matlab program you want to run. (The command submitted to the server is actually /software/matlab/bin/matlab -nojvm -nodisplay < program.m > program.log)

Other Jobs

Use condor_do to run any other simple Linux job. The syntax is simply:

condor_do "command" &

where command is any command you could type at the Linux prompt, including arguments. For example, to run an R program called program.R type:

condor_do "R < program.R > program.out --no-save --vanilla" &

Monitoring the Status of Condor Jobs

There are two commands that can tell you the status of the Condor flock or your job.

condor_status tells you the state of all the Condor machines, including whether they are available for new jobs.

condor_q tells you the status of all the jobs currently running or waiting to be run, including yours.

Note that our server status page also tells you how many Condor jobs are running and how many CPUs are available.

Managing Condor Jobs

If you change your mind, condor_rm can remove jobs from the Condor queue.

condor_rm ID

will remove the job with the specified ID. Use condor_q to find the ID of your job.

condor_rm <your login name>

will remove all jobs belonging to you. For example, type:

condor_rm 151

to remove job 151, or

condor_rm rdimond

to remove all jobs belonging to rdimond. You cannot remove other peoples' jobs, for obvious reasons. Note that jobs are marked for removal immediately, but it may be a few minutes before they are actually removed.

Using condor_submit to Submit Multiple Jobs

condor_submit allows you to submit many jobs at the same time, and gives you more control over how it is run. The price is that you have to create a submit file that tells Condor how to run your job. Most Condor users at the SSCC will never need to use use anything but the scripts we've written for you and can safely skip this section.

For full documentation of the many available options, please see the Condor Documentation. The following is an example submit file for running a program you've written, including comments to explain what it's doing:

# Specify that the program was not linked to the Condor libraries
universe=vanilla
# The program to run
executable=a.out
# Where to put the output
output=outputfile
# Use fastest available machine, but not one with SAS if possible
rank = ((Activity==\"Idle\")*100) + (KFlops/1000000) - (HAS_SAS==TRUE)
# Save your current environment, in particular the current directory
getenv = true
# Don't send me email when it's done
Notification = never
# Actually queue the job
queue

Since the SSCC's Condor flock does not checkpoint jobs there's no real need to link your programs to the Condor libraries. But if you do, change the first line to universe=standard.

The rank line specifies that your job will prefer to go to 1) an idle machine, 2) a newer and faster machine and 3) a machine that doesn't have SAS, saving those machines for SAS jobs. However if the only available machines have SAS your program will use them, and if the newer and faster machines are available it will use them even though they all have SAS.

Last Revised: 10/28/2008