An Introduction to Condor

The SSCC's Condor cluster makes a tremendous amount of computing power available to SSCC users. Condor can be used to run Stata, SAS, Matlab, R, C/C++, and Fortran jobs.

Complete documentation on Condor is available from the UW Computer Science Department. This article will give you specific information about our cluster, easy ways we've created to use Condor (including the ability to submit Stata jobs to Condor via the web), and an introduction to some of the basic Condor functions.

The Hardware

The cluster is currently made up of 48 CPUs running Linux. This includes two CPUs on each of the Linstats. For details about the flock's specifications, please see Computing Resources at the SSCC. The Condor machines have access to Linux home and project directories just like Linstat. Most Linux programs do not have to be changed at all to run on Condor, though programs written to use Windows Stata, SAS, Matlab, or R will probably require modifications.

The Condor Software

Condor is an open source project at the University of Wisconsin's Computer Science Department. Condor groups computers into "flocks" and when you submit a job to Condor it finds an available computer in the flock to run your job. Thus you don't need to try to identify which computers are busy and which are not.

In a standard Condor flock, high priority jobs can preempt low priority jobs, with the progress of the low priority jobs being "checkpointed" (i.e. their progress is saved). Users who are running lots of jobs have their priority temporarily lowered, ensuring others have a chance to run jobs as well.

Unfortunately, checkpointing does not work with the statistical software used at the SSCC, so we've turned off the entire preemption mechanism. Thus the SSCC's Condor flock is not a scheduling system that decides when jobs should run and makes sure everyone can run jobs, but a matchmaking system that matches jobs with available computers. Because preemption is turned off, we must ask users not to submit more than 15 jobs at a time so that everyone gets a chance to run jobs.

Easy Ways to Submit Jobs to Condor

You can submit Condor jobs from any Linstat server.

Stata

To submit a Stata job to Condor, type:

condor_stata dofile

where dofile should be replaced by the name of Stata do file you want to run. (You can also use the same syntax as running a batch job on the server you're using: condor_stata -b do dofile. The result will be the same.) Stata jobs submitted to Condor will use Stata/MP, the multiprocessor version of Stata. However, if the Condor flock is busy they may only get to use one processor.

Note that you can also submit Stata jobs to Condor via the web, completely avoiding the need to log into Linstat.

SAS

To submit a SAS job to Condor, type:

condor_sas program

where program should be replaced by the name of the SAS program you want to run. There are also some special-purposes scripts for submitting SAS jobs to particular servers. For details see Running Large SAS Jobs on Linstat.

Matlab

To submit a Matlab job to Condor, type:

condor_matlab program.m program.log &

where program should be replaced by the name of the Matlab program you want to run. (The command submitted to the server is actually /software/matlab/bin/matlab -nojvm -nodisplay < program.m > program.log)

R

To submit an R job to Condor, type:

condor_R program.R program.log &

where program should be replaced by the name of the R program you want to run. (The command submitted to the server is actually R < program.R > program.log --no-save)

Other Jobs

Use condor_do to run any other simple Linux job. The syntax is simply:

condor_do "command" &

where command is any command you could type at the Linux prompt, including arguments. For example, if you wanted to run an R program called program.R with different arguments than condor_R uses you could type:

condor_do "R < program.R > program.log --vanilla" &

Monitoring the Status of Condor Jobs

Condor will send you an email when your job is complete--it will be sent to your SSCC email address. There are also two commands that can tell you the status of the Condor flock or your job.

condor_status tells you the state of all the Condor machines, including whether they are available for new jobs.

condor_q tells you the status of all the jobs currently running or waiting to be run, including yours.

Managing Condor Jobs

If you change your mind, condor_rm can remove jobs from the Condor queue. At this time you must be logged into the same Linstat server you used to submit the job in order to remove it.

condor_rm ID

will remove the job with the specified ID. Use condor_q to find the ID of your job.

condor_rm username

will remove all jobs belonging to you. For example, type:

condor_rm 151

to remove job 151, or

condor_rm rdimond

to remove all jobs belonging to rdimond. You cannot remove other peoples' jobs, for obvious reasons. Note that jobs are marked for removal immediately, but it may be a few minutes before they are actually removed.

Using condor_submit to Submit Multiple Jobs

condor_submit allows you to submit many jobs at the same time, and gives you more control over how it is run. The price is that you have to create a submit file that tells Condor how to run your job. Most Condor users at the SSCC will never need to use use anything but the scripts we've written for you and can safely skip this section.

For full documentation of the many available options, please see the Condor Documentation. The following is an example submit file for running a program you've written, including comments to explain what it's doing:

# Specify that the program was not linked to the Condor libraries
universe=vanilla
# The program to run
executable=a.out
# Where to put the output
output=outputfile
# Use fastest available machine, but not one with SAS if possible
rank = ((Activity==\"Idle\")*100) + (KFlops/1000000) - (HAS_SAS==TRUE)
# Save your current environment, in particular the current directory
getenv = true
# Don't send me email when it's done
Notification = never
# Actually queue the job
queue

Since the SSCC's Condor flock does not checkpoint jobs there's no real need to link your programs to the Condor libraries. But if you do, change the first line to universe=standard.

The rank line specifies that your job will prefer to go to 1) an idle machine, 2) a newer and faster machine and 3) a machine that doesn't have SAS, saving those machines for SAS jobs. However if the only available machines have SAS your program will use them, and if the newer and faster machines are available it will use them even though they all have SAS.

Last Revised: 7/18/2011