The SSCC's Condor cluster makes a tremendous amount of computing power available to SSCC users. Condor can be used to run Stata, SAS, Matlab, R, C/C++, and Fortran jobs.
Complete documentation on Condor is available from the UW Computer Science Department. This article will give you specific information about our cluster, easy ways we've created to use Condor (including the ability to submit Stata jobs to Condor via the web), and an introduction to some of the basic Condor functions.
The cluster is currently made up of 48 CPUs running Linux. This includes two CPUs on each of the Linstats. For details about the flock's specifications, please see Computing Resources at the SSCC. The Condor machines have access to Linux home and project directories just like Linstat. Most Linux programs do not have to be changed at all to run on Condor, though programs written to use Windows Stata, SAS, Matlab, or R will probably require modifications.
Condor is an open source project at the University of Wisconsin's Computer Science Department. It is designed to take advantage of today's cheap, fast, but highly distributed computing power.
Condor groups computers together into a "flock" and creates a central queue of jobs. It then sends these jobs out to idle computers when they become available. This eliminates the need for users to try to identify which computers are the least busy.
The price of Condor is that it takes up to 30 seconds to put a job in the queue, assign it to a machine, and actually start running it. Thus if you are running small jobs and want immediate results, it is better to continue running them on a normal server. As a rule of thumb, if your job will take more than a few minutes on one of our Linux servers, it's a good candidate for Condor.
You can submit Condor jobs from any Linstat server.
To submit a Stata job to Condor, type:
condor_stata dofile
where dofile should be replaced by the name of Stata do file you want to run. (You can also use the same syntax as running a batch job on the server you're using: condor_stata -b do dofile. The result will be the same.) Stata jobs submitted to Condor will use Stata/MP, the multiprocessor version of Stata. However, if the Condor flock is busy they may only get to use one processor.
Note that you can also submit Stata jobs to Condor via the web, completely avoiding the need to log into Linstat.
To submit a SAS job to Condor, type:
condor_sas program
where program should be replaced by the name of the SAS program you want to run. There are also some special-purposes scripts for submitting SAS jobs to particular servers. For details see Running Large SAS Jobs on Linstat.
To submit a Matlab job to Condor, type:
condor_matlab program.m program.log &
where program should be replaced by the name of the Matlab program you want to run. (The command submitted to the server is actually /software/matlab/bin/matlab -nojvm -nodisplay < program.m > program.log)
To submit an R job to Condor, type:
condor_R program.R program.log &
where program should be replaced by the name of the R program you want to run. (The command submitted to the server is actually R < program.R > program.log --no-save)
Use condor_do to run any other simple Linux job. The syntax is simply:
condor_do "command" &
where command is any command you could type at the Linux prompt, including arguments. For example, if you wanted to run an R program called program.R with different arguments than condor_R uses you could type:
condor_do "R < program.R > program.log --vanilla" &
Condor will send you an email when your job is complete--it will be sent to your SSCC email address. There are also two commands that can tell you the status of the Condor flock or your job.
condor_status tells you the state of all the Condor machines, including whether they are available for new jobs.
condor_q tells you the status of all the jobs currently running or waiting to be run, including yours.
If you change your mind, condor_rm can remove jobs from the Condor queue. At this time you must be logged into the same Linstat server you used to submit the job in order to remove it.
condor_rm ID
will remove the job with the specified ID. Use condor_q to find the ID of your job.
condor_rm username
will remove all jobs belonging to you. For example, type:
condor_rm 151
to remove job 151, or
condor_rm rdimond
to remove all jobs belonging to rdimond. You cannot remove other peoples' jobs, for obvious reasons. Note that jobs are marked for removal immediately, but it may be a few minutes before they are actually removed.
condor_submit allows you to submit many jobs at the same time, and gives you more control over how it is run. The price is that you have to create a submit file that tells Condor how to run your job. Most Condor users at the SSCC will never need to use use anything but the scripts we've written for you and can safely skip this section.
For full documentation of the many available options, please see the Condor Documentation. The following is an example submit file for running a program you've written, including comments to explain what it's doing:
# Specify that the program was not linked to
the Condor libraries
universe=vanilla
# The program to run
executable=a.out
# Where to put the output
output=outputfile
# Use fastest available machine, but not one with SAS if possible
rank = ((Activity==\"Idle\")*100) + (KFlops/1000000) - (HAS_SAS==TRUE)
# Save your current environment, in particular the current directory
getenv = true
# Don't send me email when it's done
Notification = never
# Actually queue the job
queue
Since the SSCC's Condor flock does not checkpoint jobs there's no real need to link your programs to the Condor libraries. But if you do, change the first line to universe=standard.
The rank line specifies that your job will prefer to go to 1) an idle machine, 2) a newer and faster machine and 3) a machine that doesn't have SAS, saving those machines for SAS jobs. However if the only available machines have SAS your program will use them, and if the newer and faster machines are available it will use them even though they all have SAS.
Last Revised: 7/18/2011
