A Beowulf cluster is a group of computers networked together to provide the ability to run tasks in parallel. If your task can be broken up into pieces such that multiple computers can be working on different parts of the task at the same time, a Beowulf cluster will act almost like a supercomputer.
The SSCC has two Beowulf clusters, PALAH and FLASH. PALAH is mostly intended for learning and testing parallel processing code, though it's certainly capable of producing significant results. It has 18 Athlon 2800+ computing cores with 18GB of RAM. FLASH is the production cluster. As of this writing it had 146 Xeon computing cores (3.0-3.4 GHz) with about 150GB of RAM.
If after reading this article you'd like to use the SSCC's Beowulf clusters, please contact the clusters' administrator, Ryan Horrisberger (rhorrisb@ssc.wisc.edu). He will set up an account for you on PALAH and schedule a brief orientation. He'll also create a directory for you and/or your project. If after learning to use PALAH you'd like to move up to FLASH, speak with Ryan again.
This article will focus on using PALAH, but once you have some experience using PALAH you won't have any difficulties switching to FLASH.
SSCC's Beowulf clusters are set up to exclusively run parallelized code written in FORTRAN or C/C++ using either MPI or PVM. You should not run standard Fortran or C/C++ (unparallelized) jobs on this cluster. Instead, these jobs should be run on SSCC's other compute cluster called Condor. Condor is also a good choice if you want to use PVM (Parallel Virtual Machine) to run parallel code and the Beowulf clusters are busy. Condor can also run Stata and R jobs.
Submit your jobs to the Beowulf cluster if you want to use MPI (Message Passing Interface), which is easier to learn but less powerful than PVM, or want to use PVM and get the fastest possible results.
For performance reasons, the Beowulf clusters do not have access to the shared SSCC file system; they use local disks exclusively. Thus you will need to transfer any data or other files you need using SFTP. You can start an SFTP program on your computer and connect to palah.ssc.wisc.edu, or you can log in to PALAH, run SFTP there, and connect to ftp.ssc.wisc.edu (for Linux files) or ntftp.ssc.wisc.edu (for Windows files). See Using SFTP for instructions. Keep in mind that files on PALAH are not backed up, so you should plan to copy any important files to the regular network on a regular basis.
The general sequence for running an MPI job is as follows:
The example Fortran program /share/apps/mpich/examples/pi.f90 which computes pi in parallel is used to illustrate the steps for submitting a job. It assumes that you are logged on to PALAH and are in your home directory. We highly recommend that you run this example the first time you log on to PALAH in order to verify that everything is running correctly. Note that the cluster does not understand relative paths, so use absolute paths throughout your programs.
1. Check the status of the cluster online by opening a browser and going to http://palah.ssc.wisc.edu/ganglia/. If the graph labeled Palah CPU last hour is not nearly all Idle CPU (i.e. gray) then you should wait until the load goes down to submit a new job. Only one person should have jobs running at a time. Otherwise, system performance will be poor for everyone.
2. Compile the program using the Intel Fortran compiler:
> mpif90 /share/apps/mpich/examples/pi.f90 -O3 -ip -ipo -unroll -static -o pi.bin
where
-O3 enables aggressive optimization,
-ip enables single-file Interprocedural
(IP) optimizations (within files),
-ipo enables multi-file IP optimizations
(between files),
-unroll sets the maximum number of
times to unroll loops,
-static prevents linking with shared
libraries (will generate warnings),
-o specifies where to write the output.
For more information regarding the command line options type: man mpif90.
To compile C programs use mpicc instead; for C++ use mpicxx.
You'll get some warnings about the portability of statically linked applications, but they can be ignored since you'll not be running the programs on any other machines.
3. Start the MPI Daemons. The MPI daemons must be running on each node you wish to utilize in your job:
> mpdboot -n 9
where -n # is the number of nodes you want to use. 9 is the maximum.
4. Type mpdtrace to verify that the daemons started successfully:
> mpdtrace
The output will be a list of available nodes, such as the following:
palah
compute-0-2
compute-0-3
compute-0-1
compute-0-0
compute-0-5
compute-0-4
compute-0-6
compute-0-7
The output should include the same number of nodes that you specified in step 3.
5. Run the program using mpiexec:
> mpiexec -n 18 ~/pi.bin
where -n is the number of processes you want to use. Usually this number should be two times the number of nodes you specified in step 3.
Note that ~/pi.bin is the output file you specified in step 2. Remember that all paths must be absolute (eg: ~/pi.bin, not ./pi.bin).
Your output will be similar to:
Process 2 of 18 is alive
Process 0 of 18 is alive
Process 6 of 18 is alive
Process 5 of 18 is alive
. . .
pi is approximately: 3.1415926535898362 Error is: 0.0000000000000431
6. After your job finishes, shutdown the MPI daemons to free up resources for others:
> mpdallexit
> killprocs.sh ssh
The following links provide more information about running jobs using a Beowulf cluster:
SSCC staff cannot write or debug your programs for you, but if you need assistance submitting jobs, please contact the cluster's administrator, Ryan Horrisberger (rhorrisb@ssc.wisc.edu).
Last Revised: 7/30/2008
