|
Any time you give Linux something to do, you've created a job. Of course many
Linux commands execute almost instantly (cd, ls,
etc.), but others may run for hours, days, or even longer. In these cases, how
a job is run will impact both what you can do and how the system performs for
all other users. The SSCC's Linux servers are a shared resource, and it is up
to each member share nicely.
Foreground and Background Jobs
Normally when you type a command, it is processed and you see the results (if
any) before the cursor returns and you can type a new command. These jobs are
said to be running in the foreground, and that may be exactly what you want
if your job will run very quickly or you cannot proceed until you have your
results. But you can tell Linux not to wait. When you put a job in the background,
the cursor returns immediately and you can keep giving commands and doing other
work while the your job is running. When it finishes, a message will appear
on your screen.
To run a job in the background, simply add an ampersand (&) at the end
of the command line. For example if I type
> stata -b do myprogram
Stata will start and run myprogram.do in the
foreground. Thus my session will be unavailable until the job is done. On the
other hand,
> stata -b do myprogram &
will start Stata in the background. The cursor returns immediately, and I can
read email, edit other programs, etc. while I wait. When it is done I will see
[1] Done stata -b do myprogram
Note that a job which creates a separate window (emacs, for example) will be
completely functional in the background. What makes it a background process
is that your shell (the main session window) is ready for more commands. On
the other hand if a program without a window is running in the background and
needs input from you (for example if SAS runs out of resources), it will halt
until you put in the foreground and give it the input it needs.
A job running in the background will keep running even if you log out, so it
is quite possible to start a long job before you leave in the evening, log out,
and get the results the next morning.
Switching Between Foreground
and Background
If you have a job running in the foreground and you want to do something else,
simply press CTRL-z (note that if the current
job has opened a window of some sort, you must return to your shell window before
pressing CTRL-z). The current job will be suspended
and you will get your cursor back. If you want the job to run while you are
doing other things, type bg to put it in the
background. You can also type fg to move it back
to the foreground, either from being suspended or from the background.
Managing Background Jobs
It can be very easy to lose track of jobs you have running in the background,
but there are several commands that can tell you about them.
jobs will list all the jobs you started this
session that are not yet complete. For example:
kite, ~> jobs [1] - Running emacs [2] + Suspended emacs
The number in brackets is the job number, and you can use that number preceded
by a percent sign (%) to refer to the job. Naming a job will move it to the
foreground, so in this case %2 is similar to
fg (except you don't have to keep track of which
job is considered the "current" job). Adding an ampersand moves it
to the background, so %2 & is similar to
bg.
You can list jobs started in a previou session using the ps
command (think processes). The syntax is ps -u username.
For example:
kite, ~> ps -u rdimond PID TTY TIME CMD 29413 pts/30 00:00:00 tcsh 1601 pts/30 00:00:00 emacs 1602 pts/30 00:00:00 emacs 1605 pts/30 00:00:00 ps
Note how the bracketed numbers have been replaced by the PID (Process IDentification)
and the list is more complete, including your shell (in this case the tcsh
shell), and the ps command itself. Note that
PID's cannot be used to move things from foreground to background. On the other
hand this is the only way to check on jobs from previous sessions.
Killing a Job
Sometimes you will change your mind about a job, and occasionally things even
go wrong. In these cases, the kill command can
be invaluable. Simply type kill and then the
job number or PID. For example:
> kill %2
or
> kill 1602
This doesn't actually stop the job, it merely requests that it shut down, giving
the program an opportunity to clean up temporary files and such. Unfortunately
both SAS and SPSS will not do so, so if you kill one of these jobs, please go
to the /tmp directory and manually delete all
files and directories belonging to you. On the other hand, adding the -9
signal to the kill command will kill a program
immediately with or without its consent. Thus:
> kill -9 1602
will kill process 1602.
Running Jobs Nicely
Linux will allow you to put as many jobs as you want in the background, and
it will try to work on them all at once. This means it is quite possible for
a single user to run so many jobs that everyone else is "crowded out."
If necessary SSCC staff will intervene to stop this (see the
preceding section). On the other hand, Condor handles multiple jobs very
efficiently and has plenty of available capacity. So if you are planning
on doing any resource intensive computing, you really should check out Condor.
The general rule is that you should
only have one major job running at a time per interactive
Linux server. Text editors, email, etc. are not a problem,
but Stata, SAS, SPSS, and most user-written programs are resource intensive
and will affect others. Keep in mind that Linux will split the available CPU
time among all the running jobs. So if you run three jobs simultaneously, they
will each take three times as long to run, saving you no time but making much
less CPU time available for others (the one exception to this would be if the
server has an idle CPU, but you shouldn't count on this).
Condor
Condor is designed to process large numbers of jobs. For full details please
see An Introduction to Condor,
but the essence of Condor is that we have a pool of Linux servers which only
run jobs submitted to them through the Condor program. Unlike standard Linux
jobs, Condor jobs never interfere with each other, since each job gets exclusive
use of a CPU. Thus if you submit your jobs to Condor, they will not slow down
the server for anyone else (or be slowed down by anyone else).
The price is that it takes about 30 seconds for Condor to process a job and
assign it to a machine. Thus if you are running a 20 second job and will be
waiting for the results, it would be counterproductive to use Condor. But if
you have many jobs to run, or a single big job, Condor is a great tool. It's
not a panacea since it can only be used for Stata, R, and most user-written
C/C++ and FORTRAN code, but that covers a very large fraction of the computing
done at the SSCC.
You can run Stata jobs on Condor by logging in to KITE and using the command
condor_stata instead of the usual stata
to submit batch jobs. The command syntax is identical, for example:
> condor_stata -b do stataprog
If you want to run programs other than Stata using Condor, or want to submit
many jobs at once, please see An
Introduction to Condor.
Scripts
Consider the following two scripts. Both run three SAS jobs. The one on the
left will tie up the server it is run on, the one on the right will not. And
it will execute in about the same amount of time:
| Bad Script |
Good Script |
| sas prog1 &
sas prog2 &
sas prog3 & |
sas prog1
sas prog2
sas prog3 |
The bad script places all three jobs in the background, so they all run at
the same time and compete for resources. The good script runs them in the foreground,
so they will run one at a time. However you do not need to wait for them: simply
run the script itself in the background and your shell will be available for
other work.
Of course if these weren't SAS jobs you could use Condor and the three programs
would be run on three different CPUs and thus execute in one third the time.
Running a Job Later
The at command allows you to run a job at a
time you specify. For example, you could run a big, resource intensive job at
1:00 AM when no one is likely to be on. There are several ways to use at.
If you want to just type in the job you want to run later, type
> at time
Linux will give you a prompt (at>), and you
can then enter the command(s). When you are done, press CTRL-D.
The time parameter will understand just about any reasonable format, including
at 1:00, at 1:00am,
at 1am, at 13:00
(1:00pm), at noon, at
midnight, or at teatime (4:00pm). Note
that if you do not specify am or pm,
it is assumed you are using 24-hour time.
You can also put the commands you want executed in a file and use the -f
switch:
> at time -f file
To list the jobs currently waiting to run, type
> atq
To remove a job, type
> atrm job
where job is
the ID obtained by listing your jobs with atq.
Note that if you submit your jobs to Condor, they will not affect other users
and will get plenty of resources no matter when you run them.
Nice
Linux jobs can have different priorities, which affects how much CPU time they
will use if many jobs are competing for CPU time. The nice
command is an easy way to lower the priority of your job if it's not important
that it run quickly (which is being nice to your
fellow users). Simply add nice to the beginning
of whatever command you are running:
nice sas prog1
will run the SAS program prog1 at low priority.
Of course if you're not concerned about your job running quickly, it's even
nicer to submit it to Condor. If you're detecting a theme here, it's because
the SSCC has enough computing power available through Condor to make these methods
unnecessary for most people. It's just a matter of using Condor whenever it
is appropriate. Other tools, like at and nice,
are described here for the benefit of those running jobs that can't use Condor,
such as SAS.
|