|
Stata is an interactive statistical graphics software package that contains
a wide variety of statistical applications including some that are difficult
(at best) to compute in other statistical software packages. These include censored
normal and tobit regression, multinomial logistic regression, and ordered logit
and probit estimation.
The current version of Stata available on SSCC's Linux computers is 9. A Windows
version of Stata is also available on the lab PCs in Social Science 3218
and 4218 (refer to Using
Windows Terminal Servers) and Windows Terminal Servers (Winstat1-3),
Invoking Stata
You have three ways of running Stata on Linux: an interactive windowed Graphical
User Interface (GUI), an interactive non-windowed interface which Stata calls
console mode, and a noninteractive batch mode. The following examples show
how to invoke Stata in each of these modes.
Interactive Windowed GUI Mode
Stata can only be accessed in this mode from an X-display such as a Windows-based
Terminal or a PC running software like X-Win32. Those accessing Stata interactively
from a Telnet-type window (like SecureCRT) should skip ahead to the next section
on the Console mode.
Typing xstata at the Linux prompt from an X-display
brings up Stata in an interactive windowed GUI mode. In this mode, you can use
menus to do some tasks but the command line is still accessible. For example,
to exit Stata, you can either type exit at the
command line or choose from the File menu.
This handout only documents commands.
Interactive Console Mode
Typing stata at the Linux prompt brings up Stata
in interactive console mode. The program prompts you with a period (.).
At this point you can begin entering Stata commands. Type: exit,
clear to terminate your Stata session and return control to the operating
system.
Noninteractive Mode
To invoke Stata in noninteractive mode, use the stata
command with the -b (batch) option:
stata -b do filename
For example:
> stata -b do statarun
will execute the commands in the file statarun.do
and write the output to statarun.log. If your
command file extension is named something other than .do,
you need to specify the full file name on the command line.
Below is an example of a Stata program that might be contained in statarun.do.
The program reads in data and computes a logistic regression:
infile vote count race polview pold1-pold6 using ~/diss/elect.dat
logistic vote race polview [fweight=count]
To run a noninteractive Stata program in the background, simply add an &
at the end of the Linux command line. The advantage of running noninteractive
programs in the background is that you do not have to wait until your Stata
program finishes execution before you get the Linux prompt. In other words,
your shell is available for other work.
Keyboard Shortcuts
Stata provides several keyboard shortcuts for entering commands quickly and
fixing mistakes when working in interactive mode. For example, CTRL-R
retrieves the previously typed command line. Refer to the Getting
Started with Stata for Linux manual for a complete list of keyboard shortcuts.
Keeping a Log of your Stata Session
When working in interactive mode, you may want to record a copy of your session
to a file. Start a log by typing:
log using filename, text
The log is closed automatically when you exit Stata. Then you can use the Linux
more or lpr command
to list or print your log file.
Note: Be sure and add the text option
to the log command. Otherwise, the log will be
formatted in Stata Markup Command Language (SMCL) which contains codes to control
the format of the text. If viewed in any program other than the Stata viewer,
these codes will be included in the text, making it difficult to pick out the
content of the file.
If you want to create a log file that contains only the command lines that
you enter in a Stata session, type:
cmdlog using filename
Using Stata's On-Line Help System
Stata provides an extensive on-line "Help" system which can be accessed
from two Stata commands: help and search.
To get help on a particular command, type: help commandname
To get a complete list of help topics, type: help contents
To obtain all references to a topic, both in the on-line help and the Internet,
type: findittopic
For example, to obtain help on Stata's regress
command, type:
. help regress
If you want Stata to tell you all the sources of information that have to
do with Regression Analysis in general, type:
. findit regress
Stata Command Syntax
With few exceptions, the basic language syntax for Stata is:
[by varlist:]
command [varlist] [=expression]
[if expression] [in
range] [weights] [,
options]
where square brackets denote optional qualifiers. command
denotes a Stata command, varlist denotes a list
of variable names. For example, typing the command
. summarize
results in summary statistics for all the variables in your data set. Typing
. summarize vote count
results in summary statistics for just the variables vote and count. To get
more detailed summary statistics for the two variables specified, type:
. summarize vote count, detail
You can also prefix most Stata commands with .
This instructs Stata to process the command separately for groups of observations
defined by the variable list specified. You can also restrict the scope of a
command to certain subsets of observations with the qualifiers if
or in for most Stata commands.
You can write statements over more than one line. Command, variable and option
names may be abbreviated to the shortest string of characters that uniquely
identifies them. For other shortcut methods, refer to Chapter 13 of the User's
Guide.
Reading Data
This section describes how to read in ASCII data stored separately on your
disk using the infile command. If your data is
stored in a file that was created by another software package, you can use software
like STAT/TRANSFER or DBMS/COPY to convert the file to a Stata system file.
Stata system files are described later in this handout.
Free-Formatted Input
The simplest method of reading data is by listing variable names without column
locations. This method is referred to as free-formatted input. You can use free-formatted
input when the variables are recorded in the same order for each case, but not
necessarily in the same locations. Values may be separated by blanks and/or
commas. Numeric missing values must be indicated by single periods (.). The
command for reading a file using free-formatted input is:
infile varlist using filename
where varlist is
the names of the variables and filename
is the name of the file that contains the raw data. For example,
infile age wgt1-wgt6 using ~/rawdata/weights.dat
If the raw data to be read into Stata using free-formatted input contain character
or string variables, you must precede the variable's name with the keyword str
followed immediately by the length of the string. For example, if the above
data set contained an additional variable called lastname
which contained the last name of the person and the longest name was 20 characters,
the following command would read in the data:
infile age wgt1-wgt6 str20 lastname using ~/rawdata/weights.dat
Note that if values for a string variable contains blanks or other special
characters, the string must be enclosed in single or double quotes in the data
file. Otherwise you will need to specify how many characters to read as discussed
below.
Fixed-Formatted Input
The most common way data are read into Stata occurs when the data are formatted
in the file according to some uniform structure and is referred to as fixed-formatted
input. The command for reading a file using fixed-formatted input is:
infile using filename
where filename is
the name of the file that contains what Stata calls a dictionary. A dictionary
describes the contents of the file and will allow reading files in fixed or
free format. The data may be in the same file as the dictionary or in another
file. The following example instructs Stata to read the dictionary contained
in the file ~/rawdata/cps.dct:
infile using ~/rawdata/cps.dct
The general syntax for the contents of the dictionary file is:
dictionary [using datafile] {
varlist }
where datafile is
the name of the file containing the data. If using datafile
is not specified, the data are assumed to start at the record following the
close brace ( } ). The varlist contains both
the variables and any information needed to read them.
_lines(n) specifies that each observation has
n lines. Stata doesn't care where this statement
appears, but good style suggests it should be before the variables.
Other options go with a variable. _line(x) _column(y)
tells Stata to read the variable from line x
and column y of the current observation. If you
do not specify this, Stata will simply proceed from wherever finished reading
the last variable. You can also specify a format such as int,
byte, or str.
Finally you can specify how many characters to read into the variable using
%NumberType where Number
is the number of characters to read and Type
is the type of variable (you'll almost always use f
for numbers or s for strings). This must follow
the variable name, all the other options come before it.
In the following example, the data and dictionary are contained in the same
file:
dictionary
{
_lines(1)
_line(1) _column(1) int age %2f
_line(1) _column(4) float weight %5f
}
12 100.5
13 110.0
15 130.5
All of the formatting information is optional. In many cases, Stata's default
action (read variables from left to right and top to bottom, with spaces or
commas separatng them) works just fine. The dictionary
dictionary
{
age
weight
}
12 100.5
13 110.0
15 130.5
will give the same results as the previous one except that age
will be read as a float, which takes a bit more
memory.
In the following example, the data and dictionary are contained in different
files:
dictionary using data.dat {
name 20s
weight 5f
}
This is just a simple overview of reading formatted data. Refer to the Stata
Reference Manual for a complete discussion of reading formatted data.
Record Length and Dictionaries
Sometimes Stata has trouble reading data that has been written out from other
software. For instance, SPSS writes out data without inserting carriage returns
at the end of each line. Stata is then unable to accurately determine record
length. To solve this problem, you can use the _lrecl(n)
command within the dictionary. For example,
dictionary using data.dat {
_lrecl(90)
name 20s
weight 5f
}
In this case, the _lrecl(90) command specifies
that the length of each record is 90.
Reading Compressed Data
You can read compressed ASCII data into Stata by writing the data to a named
pipe and then using the named pipe as the filename you specify on the INFILE
command. For example, to read the compressed file, afqt48.dat.Z,
into an interactive Stata session, follow the steps below:
-
From Stata, send the mknod command to the
operating system (by preceding the Stata command with !)
to create a named pipe:
!mknod mypipe.pip p
-
Send the zcat command to the operating
system to write the data to the named pipe in the background:
!zcat afqt48.dat > mypipe.pip &
-
Refer to the named pipe in your INFILE command:
infile wt sed fed ge22 black using mypipe.pip
-
Send a command to the operating system to remove the pipe:
!rm mypipe.pip
This method will not work in Stata programs run in noninteractive mode. Instead,
you must put the two Linux commands in a script. To do this, create a file with
lines similar to the ones below:
#!/bin/sh
fname=$1
rm -f mypipe.pip
mknod mypipe.pip p
zcat $fname > mypipe.pip &
To make the file a script (an executable file), use the chmod
command. For instance, if the above file is called myprog,
type the following at the Linux prompt:
chmod +x myprog
Your Stata .do file then needs to include the
following:
!myprog afqt49.dat.Z >& /dev/null < /dev/null
infile wt sed fed ge22 black using mypipe.pip
!rm mypipe.pip
Stata shells out to myprog, passing it the name
of the file which it wishes to uncompress. The ampersand at the end of the zcat
command in myprog directs Linux to put that process
in the background, allowing myprog to return
immediately to Stata. While the zcat command
uncompresses afqt48.dat.Z and sends the uncompressed
data to the pipe (mypipe.pip) in the background,
Stata's infile command reads the uncompressed
data from that pipe.
This method will allow you to read compressed ASCII data, but you cannot use
a Stata data set without uncompressing it.
Saving and Using Stata System
Files
If you plan to use a data set repeatedly with Stata, it will be more convenient
to save the data as a Stata system file. System files are self-documenting and
reflect transformations made to the data. The SAVE command stores the data currently
read into Stata. For example,
. save census
creates a Stata system file with the name census.dta.
Following is an example program that reads in raw data and creates a Stata system
file:
infile vote count race polview pold1-pold6 using~/diss/elect.dat
save elect
The use command loads a Stata system file previously
saved. For example,
. use elect
Managing Memory
Stata stores data in memory (RAM). As a result, Stata runs quickly, but the
amount of data you can analyze is limited by the amount of memory on your computer.
It is also a memory hog. Because of this, it is very important to adhere to
the following restrictions with Stata:
- Use caution when requesting memory from Stata.
- Run large batch jobs using Condor. Refer to An Introduction
to Condor for details, but it's just a matter of replacing the stata
command with condor_stata on KITE.
- Never run more than one job at a time. If you submit a job and then realize
you've made a mistake, kill the job before submitting another. Refer to Running
Jobs in Linux for a discussion on how to kill jobs.
- If you need to run Stata interactively HAL is ideal for large jobs, but
use NORMAN or KITE for smaller ones.
Requesting Additional Memory
By default, Stata allocates ten megabytes of memory to its data areas when
it is invoked. Use the -m command line switch to request additional memory.
For example, to request 200 megabytes of memory, type:
> stata -m200
Alternatively, you can request additional memory once you are in interactive
Stata with the set memory command. For example,
to request 100 megabytes of memory:
. set memory 100m
Determining how much additional memory you may need is often a trial-and-error
exercise. Start by specifying 25mb and doubling that amount until your job will
run. Specify no more than 350 mb. If your job still won't run, contact Consultant
for assistance.
The only restriction with the SET MEMORY command is that whenever you use it,
there cannot be any data in memory already. If you have a data set in memory,
you need to save it, clear memory, reset the total, and then use it again:
. save dogs, replace
file dogs.dta saved
. clear
. set memory 10m
(10240k)
. use dogs
Following are a few hints to follow which help conserve memory:
- When inputting data for the first time, specify appropriate storage types
(refer to the input and infile
commands in the Stata Reference Manual for details about storage types).
- When working with existing data, have Stata store the data as compactly
as possible by using Stata's compress command.
- Store short, repeated strings as numeric variables with value labels (refer
to the labels and encode
commands in the Stata Reference Manual).
- Using the discard command will clear all
automatically loaded programs from memory. This includes information Stata
has stored about a previously fit model.
Managing /tmp Space with Stata
By default Stata uses for writing scratch
files. When these directories are full, you may not be able to execute commands
in Stata. If this happens, it is possible to redirect these scratch files to
another location by exiting Stata and setting the TMPDIR environment variable
with the following Linux command:
> setenv TMPDIR directory
where directory is some alternative directory where you can write. Refer to
SSCC Pub. #30: How to Avoid Running Out of Disk Space for a complete discussion
on disks.
Getting GRAPH Output
You can display high resolution graphs only when using the interactive windowed
GUI mode of Stata (invoked with the xstata command). If you are using the interactive
console mode of Stata (invoked with the stata command), you can either use the
plot command to get two-way scatter plots or
have the graph command write the graph directly
to a file that you then print.
Displaying GRAPH Output in the Interactive Windowed GUI Mode
High resolution graphics may be generated in GUI Stata with the graph
command. For example:
. graph twoway scatter y x
generates a scatter plot of the variables Y by X in a new window on your screen.
You can then type print to print a copy of the
graph on the printer in the fourth floor computer lab (Soc. Sci. 4218). Alternatively,
you can choose from the
menu. If you want to print the graph on a different printer, you will need to
save the graph first and then print it from Linux with the lpr
command. This is described below.
Saving GRAPH Output
You can save graph output from any of Stata's
three different interfaces. This is useful for two reasons:
- If you are using the interactive console mode, Stata can not display the
graphs you request with the graph command.
But, it can save the graphs to a file that can then be printed.
- Stata can convert saved graphs to Postscript and encapsulated Postscript
files that can then be inserted in documents.
To save a copy of a graph, use the saving option.
For example:
. graph twoway scatter y x, saving(mypic)
This creates a file called mypic.gph containing
the graph.
Note that in interactive console mode, nothing is displayed except Stata's
prompt indicating it is ready for your next command.
Saved graphs can be printed from Linux using the lpr
command. For example:
> lpr -Puser7single mypic.gph
Accessing Programs (ado-files) in the Stata
Technical Bulletin
The Stata Technical Bulletin (STB) was a printed
journal which was published roughly every other month. It contains articles
written by Stata Corporation as well as Stata users. It has been replaced by
the Stata Journal, which has less emphasis on technical
issues and more on techniques. We have a set of these publications in the CDE
Print Library in 4457 Social Science.
Some STB articles include software enhancements to Stata called ado-files which
can be installed into Stata. These ado-files are available for download from
Stata's web site. Software additions included
in the STBs come in two flavors: official updates and user-written additions.
Official Updates
SSCC staff incorporate official updates into Stata as they become available.
You can use the update query command to verify
that Stata is using the latest available updates:
. update query
The update query command makes a connection
to Stata's web site to check for all available official updates. If you discover
that we have not installed all the official updates available, send email to
Consultant
and we will update the software.
Note that we download official updates to Stata about every two months. To
get a list of Stata updates, type the following command:
.help whatsnew
User-written Additions
User-written additions to Stata published in the STBs are available from Stata's
web site for download just as are the official updates. Unlike the official
updates though, you need to request them individually (one at a time). For this
reason and also because user-written additions do not undergo quality assurance
testing like official updates do, SSCC staff do not update Stata with user-written
additions. If you want a user-written addition, you need to download the addition
yourself to your own directory. Instructions are provided in SSCC Publication,
Finding and Installing User-Written Stata Programs.
Stata Documentation
The following Stata manuals are available for short term loan from the CDE
Print Library in 4457 Social Science:
- Stata Reference Manual: Version 9 (four volumes)
- Stata User's Guide: Version 9
- Getting Started with Stata 9 for Linux
- Stata Graphics Manual Version 9
Stata's web site also has a lot of information
including a support section which has a searchable database for finding answers
to common questions.
To subscribe to Stata's list serv, visit Stat/Transfer's
list subscription service. This web site provides a subscription service
to all the major statistical software list servers including Stata. The
Stata list serv provides a depth of information and support that is essentially
impossible for staff at any one institution (like ours) to duplicate.
|