Any time you give Linux something to do, you've created a job. Of course many Linux commands execute almost instantly (cd, ls, etc.), but others may run for hours, days, or even longer. In these cases, how a job is run will impact both what you can do and how the system performs for all other users. The SSCC's Linux servers are a shared resource, and it is up to each member share nicely.
Normally when you type a command, it is processed and you see the results (if any) before the cursor returns and you can type a new command. These jobs are said to be running in the foreground, and that may be exactly what you want if your job will run very quickly or you cannot proceed until you have your results. But you can tell Linux not to wait. When you put a job in the background, the cursor returns immediately and you can keep giving commands and doing other work while the your job is running. When it finishes, a message will appear on your screen.
To run a job in the background, simply add an ampersand (&) at the end of the command line. For example if I type
> stata -b do myprogram
Stata will start and run myprogram.do in the foreground. Thus my session will be unavailable until the job is done. On the other hand,
> stata -b do myprogram &
will start Stata in the background. The cursor returns immediately, and I can read email, edit other programs, etc. while I wait. When it is done I will see
[1] Done stata -b do myprogram
Note that a job which creates a separate window (emacs, for example) will be completely functional in the background. What makes it a background process is that your shell (the main session window) is ready for more commands. On the other hand if a program without a window is running in the background and needs input from you (for example if SAS runs out of resources), it will halt until you put in the foreground and give it the input it needs.
A job running in the background will keep running even if you log out, so it is quite possible to start a long job before you leave in the evening, log out, and get the results the next morning.
If you have a job running in the foreground and you want to do something else, simply press CTRL-z (note that if the current job has opened a window of some sort, you must return to your shell window before pressing CTRL-z). The current job will be suspended and you will get your cursor back. If you want the job to run while you are doing other things, type bg to put it in the background. You can also type fg to move it back to the foreground, either from being suspended or from the background.
It can be very easy to lose track of jobs you have running in the background, but there are several commands that can tell you about them.
jobs will list all the jobs you started this session that are not yet complete. For example:
kite, ~> jobs
[1] - Running emacs
[2] + Suspended emacs
The number in brackets is the job number, and you can use that number preceded by a percent sign (%) to refer to the job. Naming a job will move it to the foreground, so in this case %2 is similar to fg (except you don't have to keep track of which job is considered the "current" job). Adding an ampersand moves it to the background, so %2 & is similar to bg.
You can list jobs started in a previou session using the ps command (think processes). The syntax is ps -u username. For example:
kite, ~> ps -u rdimond
PID TTY TIME CMD
29413 pts/30 00:00:00 tcsh
1601 pts/30 00:00:00 emacs
1602 pts/30 00:00:00 emacs
1605 pts/30 00:00:00 ps
Note how the bracketed numbers have been replaced by the PID (Process IDentification) and the list is more complete, including your shell (in this case the tcsh shell), and the ps command itself. Note that PID's cannot be used to move things from foreground to background. On the other hand this is the only way to check on jobs from previous sessions.
Sometimes you will change your mind about a job, and occasionally things even go wrong. In these cases, the kill command can be invaluable. Simply type kill and then the job number or PID. For example:
> kill %2
or
> kill 1602
This doesn't actually stop the job, it merely requests that it shut down, giving the program an opportunity to clean up temporary files and such. Unfortunately both SAS and SPSS will not do so, so if you kill one of these jobs, please go to the /tmp directory and manually delete all files and directories belonging to you. On the other hand, adding the -9 signal to the kill command will kill a program immediately with or without its consent. Thus:
> kill -9 1602
will kill process 1602.
Linux will allow you to put as many jobs as you want in the background, and it will try to work on them all at once. This means it is quite possible for a single user to run so many jobs that everyone else is "crowded out." If necessary SSCC staff will intervene to stop this (see the preceding section). On the other hand, Condor handles multiple jobs very efficiently and has plenty of available capacity. So if you are planning on doing any resource intensive computing, you really should check out Condor.
The general rule is that you should only have one major job running at a time per interactive Linux server. Text editors, email, etc. are not a problem, but Stata, SAS, SPSS, and most user-written programs are resource intensive and will affect others. Keep in mind that Linux will split the available CPU time among all the running jobs. So if you run three jobs simultaneously, they will each take three times as long to run, saving you no time but making much less CPU time available for others (the one exception to this would be if the server has an idle CPU, but you shouldn't count on this).
Condor is designed to process large numbers of jobs. For full details please see An Introduction to Condor, but the essence of Condor is that we have a pool of Linux servers which only run jobs submitted to them through the Condor program. Unlike standard Linux jobs, Condor jobs never interfere with each other, since each job gets exclusive use of a CPU. Thus if you submit your jobs to Condor, they will not slow down the server for anyone else (or be slowed down by anyone else).
The price is that it takes about 30 seconds for Condor to process a job and assign it to a machine. Thus if you are running a 20 second job and will be waiting for the results, it would be counterproductive to use Condor. But if you have many jobs to run, or a single big job, Condor is a great tool. It's not a panacea since it can only be used for Stata, R, and most user-written C/C++ and FORTRAN code, but that covers a very large fraction of the computing done at the SSCC.
You can run Stata jobs on Condor by logging in to KITE and using the command condor_stata instead of the usual stata to submit batch jobs. The command syntax is identical, for example:
> condor_stata -b do stataprog
If you want to run programs other than Stata using Condor, or want to submit many jobs at once, please see An Introduction to Condor.
Consider the following two scripts. Both run three SAS jobs. The one on the left will tie up the server it is run on, the one on the right will not. And it will execute in about the same amount of time:
| Bad Script | Good Script |
|---|---|
| sas prog1 & |
sas prog1 |
The bad script places all three jobs in the background, so they all run at the same time and compete for resources. The good script runs them in the foreground, so they will run one at a time. However you do not need to wait for them: simply run the script itself in the background and your shell will be available for other work.
Of course if these weren't SAS jobs you could use Condor and the three programs would be run on three different CPUs and thus execute in one third the time.
The at command allows you to run a job at a time you specify. For example, you could run a big, resource intensive job at 1:00 AM when no one is likely to be on. There are several ways to use at.
If you want to just type in the job you want to run later, type
> at time
Linux will give you a prompt (at>), and you can then enter the command(s). When you are done, press CTRL-D. The time parameter will understand just about any reasonable format, including at 1:00, at 1:00am, at 1am, at 13:00 (1:00pm), at noon, at midnight, or at teatime (4:00pm). Note that if you do not specify am or pm, it is assumed you are using 24-hour time.
You can also put the commands you want executed in a file and use the -f switch:
> at time -f file
To list the jobs currently waiting to run, type
> atq
To remove a job, type
> atrm job
where job is the ID obtained by listing your jobs with atq.
Note that if you submit your jobs to Condor, they will not affect other users and will get plenty of resources no matter when you run them.
Linux jobs can have different priorities, which affects how much CPU time they will use if many jobs are competing for CPU time. The nice command is an easy way to lower the priority of your job if it's not important that it run quickly (which is being nice to your fellow users). Simply add nice to the beginning of whatever command you are running:
nice sas prog1
will run the SAS program prog1 at low priority.
Of course if you're not concerned about your job running quickly, it's even nicer to submit it to Condor. If you're detecting a theme here, it's because the SSCC has enough computing power available through Condor to make these methods unnecessary for most people. It's just a matter of using Condor whenever it is appropriate. Other tools, like at and nice, are described here for the benefit of those running jobs that can't use Condor, such as SAS.
Last Revised: 3/15/2007
