|
As time passes new SSCC users tend to have more and more experience with computers,
but mostly using PCs running Windows. However, statistical work for research
often requires the power of Linux. Fortunately a program called Samba makes
the Linux file system available to Windows, so you can actually do a lot
of work with Linux programs from Windows. This publication will teach you
how to run Stata, SAS or SPSS programs on the SSCC's Linux servers while
using Windows as much as possible. Similar techniques also work for other
programs, like R, Matlab, C/C++, FORTRAN, etc.
Write Your Program
The first step is to write your program (or at least a first draft of it)
using a text editor. We suggest TextPad because it follows all the Windows
conventions you're used to so you'll be able to use it right away, but it
also includes many features that are useful to programmers such as syntax
highlighting. However, you're welcome to use any text editor you prefer.
Note that it must be a text editor and not a word processor like Word: word
processors save formatting information along with the text, and Linux programs
will be confused by that formatting.
If you can put your data, your program, and any output all in the same directory
your program won't need to specify where these other files are found. You
can simply give the file name and Linux will understand the file is
in the same directory as the program itself. If you need to reference other
directories, see the discussion of how Linux works with directories under
Change to the Proper Directory below.
Save Your Program and Data on the Linux File System
The programs you'll write and data you'll use should probably be stored in
either your Linux home directory or a Linux project directory. If you are
logged into Winstat through PRIMO or using a PC in the Sewell Social Sciences
Building, your Linux home directory is mapped as the drive
and Linux project directories are mapped as the drive.
If you are logged into Winstat through SOE or if you are connecting from
elsewhere using VPN, you'll need to map
the directory or directories you need but this is not difficult.
The simplest thing is to put both your program and your data in your home
directory ( drive) because when you log into
Linux you'll start in your home directory automatically. But you may need
to use a project directory if you are collaborating with other people or
if your data are too big for your home directory. If your work is particularly
complex you may also need to use subdirectories within your home directory
to keep it organized. Using a location other than your home directory will
add a step later.
Note that Linux doesn't like spaces in file or directory names (Windows will
let you put them in, but to use them in Linux you'd have to put them in quotes)
and it is case-sensitive.
Log Into Linux
Next log into Linux. We suggest using X-Win32 as your
client program: it's already installed on Winstat and can be freely downloaded
and installed elsewhere by UW faculty, staff and students. Click on the preceding
link for full instructions on its use, but the short version is to start
the program, click once on the icon it creates in the lower right of your screen,
and choose a server to log into.
Stata users can use any server, though you should only use Falcon if your
job requires more memory than Hal or Kite can provide. SAS users must use
Hal or Kite, and SPSS users must use Falcon (see the instructions
for setting up new sessions in X-Win32). If you want to submit jobs to Condor you must
log into Kite.
Change to the Proper Directory
If you saved your program directly in your home directory ( drive),
you can skip this step entirely. But if you saved it elsewhere you'll need
to use the cd (change
directory) command to go to the directory where you saved your program. The
general syntax is just
cd directory
where directory is
the directory you want to change to. This requires knowing a bit about how
Linux works with directories, which will also be useful if your programs
need to refer to files in other directories. A few important differences
from Windows:
- Directories are separated using the forward slash (/) rather than the backslash
(\).
- There are no drives in Linux. All directories are part of a single tree
structure.
- The "root" of the tree is denoted by an initial forward slash.
- Directories that don't start with a forward slash are assumed to be under
your current directory.
- Linux does not like spaces in file or directory names (you have to put the
whole name in quotes if it includes a space)
- Linux is case-sensitive, unlike Windows. To Linux, file and File are two
different files.
When you first log into Linux you start in your home directory, so if you
had a folder on your drive called you'd change to it by typing
cd research
The Linux name of the drive is /project.
Since this starts with a forward slash it means "go up to the root of the
directory tree, then down to project." If you
were working on the project and thus
saved your program and data in , you'd
need to type
cd /project/hivaids
to get there.
Run Your Program
You're now ready to actually run your program. The details will depend on
which statistical package you're using:
Stata
If you want to work in interactive Stata just like in Windows, type
xstata
However, it's usually somewhat more efficient to work in batch mode. If your
program were called dofile.do, you'd run it by
typing
stata -b do dofile
You could also submit your job to Condor, which is an
excellent choice for jobs that will take more than a few minutes. To do so
simply type:
condor_stata -b do dofile
Remember you must be logged into Kite to submit jobs to Condor.
You can also submit Stata jobs to Condor via the web: see Submit
a Stata Job to Condor.
SAS
If your SAS program were called sasprog.sas, you'd run it by typing
sas sasprog
You can invoke an interactive SAS session by typing just sas,
but it's rather clumsy and few people find it useful.
SPSS
If your SPSS program were called syntax.sps and you wanted to save the output
in output.log, you'd type
spssb -f syntax.sps -out output.log
The SSCC does not have interactive SPSS for Linux.
View Your Output
You're now ready to take a look at your output and see how well your program
worked.
Stata
Your program should contain a log command that specifies a log file. Simply
open this log file using your text editor.
SAS
If your program is called sasprog.sas, SAS
will create a log file called sasprog.log when
it runs. If the program creates any output it will also create sasprog.lst.
Open both files in your text editor. Be sure to look at the log file before
trusting the output file. One common scenario is that an error in the program
causes it to crash before creating any output, but the output file from a
previous run still exists and it won't be obvious that it's not from the
current run.
SPSS
Open the file you specified with the -out option
when you ran the program using your text editor.
Change Your Program and Run it
Again
Most likely your program won't work the first time you run it, so you'll need
to make changes and run it again. Remember to save the program after making
the changes--Linux reads the program off of the disk, not from your text
editor! In Linux the up arrow will retrieve your previous command, which
is a handy shortcut for running a program repeatedly. If you're using TextPad,
it will notice when output files are changed and prompt you to reload them.
Then you can see if your changes actually fixed the problems, and make further
changes as needed.
|