Using Stata on Linux

Stata is an interactive statistical graphics software package that contains a wide variety of statistical applications including some that are difficult (at best) to compute in other statistical software packages. These include censored normal and tobit regression, multinomial logistic regression, and ordered logit and probit estimation.

The current version of Stata available on SSCC's Linux computers is 9. A Windows version of Stata is also available on the lab PCs in Social Science 3218 and 4218 (refer to Using Windows Terminal Servers) and Windows Terminal Servers (Winstat1-3),

Invoking Stata

You have three ways of running Stata on Linux: an interactive windowed Graphical User Interface (GUI), an interactive non-windowed interface which Stata calls console mode, and a noninteractive batch mode. The following examples show how to invoke Stata in each of these modes.

Interactive Windowed GUI Mode

Stata can only be accessed in this mode from an X-display such as a Windows-based Terminal or a PC running software like X-Win32. Those accessing Stata interactively from a Telnet-type window (like SecureCRT) should skip ahead to the next section on the Console mode.

Typing xstata at the Linux prompt from an X-display brings up Stata in an interactive windowed GUI mode. In this mode, you can use menus to do some tasks but the command line is still accessible. For example, to exit Stata, you can either type exit at the command line or choose Exit from the File menu. This handout only documents commands.

Interactive Console Mode

Typing stata at the Linux prompt brings up Stata in interactive console mode. The program prompts you with a period (.). At this point you can begin entering Stata commands. Type: exit, clear to terminate your Stata session and return control to the operating system.

Noninteractive Mode

To invoke Stata in noninteractive mode, use the stata command with the -b (batch) option:

stata -b do filename

For example:

> stata -b do statarun

will execute the commands in the file statarun.do and write the output to statarun.log. If your command file extension is named something other than .do, you need to specify the full file name on the command line.

Below is an example of a Stata program that might be contained in statarun.do. The program reads in data and computes a logistic regression:

infile vote count race polview pold1-pold6 using ~/diss/elect.dat
logistic vote race polview [fweight=count]

To run a noninteractive Stata program in the background, simply add an & at the end of the Linux command line. The advantage of running noninteractive programs in the background is that you do not have to wait until your Stata program finishes execution before you get the Linux prompt. In other words, your shell is available for other work.

Keyboard Shortcuts

Stata provides several keyboard shortcuts for entering commands quickly and fixing mistakes when working in interactive mode. For example, CTRL-R retrieves the previously typed command line. Refer to the Getting Started with Stata for Linux manual for a complete list of keyboard shortcuts.

Keeping a Log of your Stata Session

When working in interactive mode, you may want to record a copy of your session to a file. Start a log by typing:

log using filename, text

The log is closed automatically when you exit Stata. Then you can use the Linux more or lpr command to list or print your log file.

Note: Be sure and add the text option to the log command. Otherwise, the log will be formatted in Stata Markup Command Language (SMCL) which contains codes to control the format of the text. If viewed in any program other than the Stata viewer, these codes will be included in the text, making it difficult to pick out the table of the file.

If you want to create a log file that contains only the command lines that you enter in a Stata session, type:

cmdlog using filename

Using Stata's On-Line Help System

Stata provides an extensive on-line "Help" system which can be accessed from two Stata commands: help and search.

To get help on a particular command, type: help commandname

To get a complete list of help topics, type: help tables

To obtain all references to a topic, both in the on-line help and the Internet, type: findittopic

For example, to obtain help on Stata's regress command, type:

. help regress

If you want Stata to tell you all the sources of information that have to do with Regression Analysis in general, type:

. findit regress

Stata Command Syntax

With few exceptions, the basic language syntax for Stata is:

[by varlist:] command [varlist] [=expression] [if expression] [in range] [weights] [, options]

where square brackets denote optional qualifiers. command denotes a Stata command, varlist denotes a list of variable names. For example, typing the command

. summarize

results in summary statistics for all the variables in your data set. Typing

. summarize vote count

results in summary statistics for just the variables vote and count. To get more detailed summary statistics for the two variables specified, type:

. summarize vote count, detail

You can also prefix most Stata commands with by varlist:. This instructs Stata to process the command separately for groups of observations defined by the variable list specified. You can also restrict the scope of a command to certain subsets of observations with the qualifiers if or in for most Stata commands.

You can write statements over more than one line. Command, variable and option names may be abbreviated to the shortest string of characters that uniquely identifies them. For other shortcut methods, refer to Chapter 13 of the User's Guide.

Reading Data

This section describes how to read in ASCII data stored separately on your disk using the infile command. If your data is stored in a file that was created by another software package, you can use software like STAT/TRANSFER or DBMS/COPY to convert the file to a Stata system file. Stata system files are described later in this handout.

Free-Formatted Input

The simplest method of reading data is by listing variable names without column locations. This method is referred to as free-formatted input. You can use free-formatted input when the variables are recorded in the same order for each case, but not necessarily in the same locations. Values may be separated by blanks and/or commas. Numeric missing values must be indicated by single periods (.). The command for reading a file using free-formatted input is:

infile varlist using filename

where varlist is the names of the variables and filename is the name of the file that contains the raw data. For example,

infile age wgt1-wgt6 using ~/rawdata/weights.dat

If the raw data to be read into Stata using free-formatted input contain character or string variables, you must precede the variable's name with the keyword str followed immediately by the length of the string. For example, if the above data set contained an additional variable called lastname which contained the last name of the person and the longest name was 20 characters, the following command would read in the data:

infile age wgt1-wgt6 str20 lastname using ~/rawdata/weights.dat

Note that if values for a string variable contains blanks or other special characters, the string must be enclosed in single or double quotes in the data file. Otherwise you will need to specify how many characters to read as discussed below.

Fixed-Formatted Input

The most common way data are read into Stata occurs when the data are formatted in the file according to some uniform structure and is referred to as fixed-formatted input. The command for reading a file using fixed-formatted input is:

infile using filename

where filename is the name of the file that contains what Stata calls a dictionary. A dictionary describes the tables of the file and will allow reading files in fixed or free format. The data may be in the same file as the dictionary or in another file. The following example instructs Stata to read the dictionary contained in the file ~/rawdata/cps.dct:

infile using ~/rawdata/cps.dct

The general syntax for the tables of the dictionary file is:

dictionary [using datafile] { varlist }

where datafile is the name of the file containing the data. If using datafile is not specified, the data are assumed to start at the record following the close brace ( } ). The varlist contains both the variables and any information needed to read them.

_lines(n) specifies that each observation has n lines. Stata doesn't care where this statement appears, but good style suggests it should be before the variables.

Other options go with a variable. _line(x) _column(y) tells Stata to read the variable from line x and column y of the current observation. If you do not specify this, Stata will simply proceed from wherever finished reading the last variable. You can also specify a format such as int, byte, or str. Finally you can specify how many characters to read into the variable using %NumberType where Number is the number of characters to read and Type is the type of variable (you'll almost always use f for numbers or s for strings). This must follow the variable name, all the other options come before it.

In the following example, the data and dictionary are contained in the same file:

dictionary
{
_lines(1)
_line(1) _column(1) int age %2f
_line(1) _column(4) float weight %5f
}
12 100.5
13 110.0
15 130.5

All of the formatting information is optional. In many cases, Stata's default action (read variables from left to right and top to bottom, with spaces or commas separatng them) works just fine. The dictionary

dictionary
{
age
weight
}
12 100.5
13 110.0
15 130.5

will give the same results as the previous one except that age will be read as a float, which takes a bit more memory.

In the following example, the data and dictionary are contained in different files:

dictionary using data.dat {
name 20s
weight 5f
}

This is just a simple overview of reading formatted data. Refer to the Stata Reference Manual for a complete discussion of reading formatted data.

Record Length and Dictionaries

Sometimes Stata has trouble reading data that has been written out from other software. For instance, SPSS writes out data without inserting carriage returns at the end of each line. Stata is then unable to accurately determine record length. To solve this problem, you can use the _lrecl(n) command within the dictionary. For example,

dictionary using data.dat {
_lrecl(90)
name 20s
weight 5f
}

In this case, the _lrecl(90) command specifies that the length of each record is 90.

Reading Compressed Data

You can read compressed ASCII data into Stata by writing the data to a named pipe and then using the named pipe as the filename you specify on the INFILE command. For example, to read the compressed file, afqt48.dat.Z, into an interactive Stata session, follow the steps below:

  1. From Stata, send the mknod command to the operating system (by preceding the Stata command with !) to create a named pipe:

    !mknod mypipe.pip p

  2. Send the zcat command to the operating system to write the data to the named pipe in the background:

    !zcat afqt48.dat > mypipe.pip &

  3. Refer to the named pipe in your INFILE command:

    infile wt sed fed ge22 black using mypipe.pip

  4. Send a command to the operating system to remove the pipe:

    !rm mypipe.pip

This method will not work in Stata programs run in noninteractive mode. Instead, you must put the two Linux commands in a script. To do this, create a file with lines similar to the ones below:

#!/bin/sh
fname=$1
rm -f mypipe.pip
mknod mypipe.pip p
zcat $fname > mypipe.pip &

To make the file a script (an executable file), use the chmod command. For instance, if the above file is called myprog, type the following at the Linux prompt:

chmod +x myprog

Your Stata .do file then needs to include the following:

!myprog afqt49.dat.Z >& /dev/null < /dev/null
infile wt sed fed ge22 black using mypipe.pip
!rm mypipe.pip

Stata shells out to myprog, passing it the name of the file which it wishes to uncompress. The ampersand at the end of the zcat command in myprog directs Linux to put that process in the background, allowing myprog to return immediately to Stata. While the zcat command uncompresses afqt48.dat.Z and sends the uncompressed data to the pipe (mypipe.pip) in the background, Stata's infile command reads the uncompressed data from that pipe.

This method will allow you to read compressed ASCII data, but you cannot use a Stata data set without uncompressing it.

Saving and Using Stata System Files

If you plan to use a data set repeatedly with Stata, it will be more convenient to save the data as a Stata system file. System files are self-documenting and reflect transformations made to the data. The SAVE command stores the data currently read into Stata. For example,

. save census

creates a Stata system file with the name census.dta. Following is an example program that reads in raw data and creates a Stata system file:

infile vote count race polview pold1-pold6 using~/diss/elect.dat
save elect

The use command loads a Stata system file previously saved. For example,

. use elect

Managing Memory

Stata stores data in memory (RAM). As a result, Stata runs quickly, but the amount of data you can analyze is limited by the amount of memory on your computer. It is also a memory hog. Because of this, it is very important to adhere to the following restrictions with Stata:

  • Use caution when requesting memory from Stata.
  • Run large batch jobs using Condor. Refer to An Introduction to Condor for details, but it's just a matter of replacing the stata command with condor_stata on KITE.
  • Never run more than one job at a time. If you submit a job and then realize you've made a mistake, kill the job before submitting another. Refer to Managing Jobs on Linstat for a discussion on how to kill jobs.
  • If you need to run Stata interactively HAL is ideal for large jobs, but use NORMAN or KITE for smaller ones.

Requesting Additional Memory

By default, Stata allocates ten megabytes of memory to its data areas when it is invoked. Use the -m command line switch to request additional memory. For example, to request 200 megabytes of memory, type:

> stata -m200

Alternatively, you can request additional memory once you are in interactive Stata with the set memory command. For example, to request 100 megabytes of memory:

. set memory 100m

Determining how much additional memory you may need is often a trial-and-error exercise. Start by specifying 25mb and doubling that amount until your job will run. Specify no more than 350 mb. If your job still won't run, contact the Help Desk for assistance.

The only restriction with the SET MEMORY command is that whenever you use it, there cannot be any data in memory already. If you have a data set in memory, you need to save it, clear memory, reset the total, and then use it again:

. save dogs, replace
file dogs.dta saved

. clear

. set memory 10m
(10240k)

. use dogs

Following are a few hints to follow which help conserve memory:

  • When inputting data for the first time, specify appropriate storage types (refer to the input and infile commands in the Stata Reference Manual for details about storage types).
  • When working with existing data, have Stata store the data as compactly as possible by using Stata's compress command.
  • Store short, repeated strings as numeric variables with value labels (refer to the labels and encode commands in the Stata Reference Manual).
  • Using the discard command will clear all automatically loaded programs from memory. This includes information Stata has stored about a previously fit model.

Managing /tmp Space with Stata

By default Stata uses /tmp for writing scratch files. When these directories are full, you may not be able to execute commands in Stata. If this happens, it is possible to redirect these scratch files to another location by exiting Stata and setting the TMPDIR environment variable with the following Linux command:

> setenv TMPDIR directory

where directory is some alternative directory where you can write. Refer to SSCC Pub. #30: How to Avoid Running Out of Disk Space for a complete discussion on /tmp disks.

Getting GRAPH Output

You can display high resolution graphs only when using the interactive windowed GUI mode of Stata (invoked with the xstata command). If you are using the interactive console mode of Stata (invoked with the stata command), you can either use the plot command to get two-way scatter plots or have the graph command write the graph directly to a file that you then print.

Displaying GRAPH Output in the Interactive Windowed GUI Mode

High resolution graphics may be generated in GUI Stata with the graph command. For example:

. graph twoway scatter y x

generates a scatter plot of the variables Y by X in a new window on your screen.

You can then type print to print a copy of the graph on the printer in the fourth floor computer lab (Soc. Sci. 4218). Alternatively, you can choose Print from the File menu. If you want to print the graph on a different printer, you will need to save the graph first and then print it from Linux with the lpr command. This is described below.

Saving GRAPH Output

You can save graph output from any of Stata's three different interfaces. This is useful for two reasons:

  1. If you are using the interactive console mode, Stata can not display the graphs you request with the graph command. But, it can save the graphs to a file that can then be printed.
  2. Stata can convert saved graphs to Postscript and encapsulated Postscript files that can then be inserted in documents.

To save a copy of a graph, use the saving option. For example:

. graph twoway scatter y x, saving(mypic)

This creates a file called mypic.gph containing the graph.

Note that in interactive console mode, nothing is displayed except Stata's prompt indicating it is ready for your next command.

Saved graphs can be printed from Linux using the lpr command. For example:

> lpr -Puser7single mypic.gph

Accessing Programs (ado-files) in the Stata Technical Bulletin

The Stata Technical Bulletin (STB) was a printed journal which was published roughly every other month. It contains articles written by Stata Corporation as well as Stata users. It has been replaced by the Stata Journal, which has less emphasis on technical issues and more on techniques. We have a set of these publications in the CDE Print Library in 4457 Social Science.

Some STB articles include software enhancements to Stata called ado-files which can be installed into Stata. These ado-files are available for download from Stata's web site. Software additions included in the STBs come in two flavors: official updates and user-written additions.

Official Updates

SSCC staff incorporate official updates into Stata as they become available. You can use the update query command to verify that Stata is using the latest available updates:

. update query
(contacting http://www.stata.com)

The update query command makes a connection to Stata's web site to check for all available official updates. If you discover that we have not installed all the official updates available, send email to the Help Desk and we will update the software.

Note that we download official updates to Stata about every two months. To get a list of Stata updates, type the following command:

.help whatsnew

User-written Additions

User-written additions to Stata published in the STBs are available from Stata's web site for download just as are the official updates. Unlike the official updates though, you need to request them individually (one at a time). For this reason and also because user-written additions do not undergo quality assurance testing like official updates do, SSCC staff do not update Stata with user-written additions. If you want a user-written addition, you need to download the addition yourself to your own directory. Instructions are provided in Finding and Installing User-Written Stata Programs.

Stata Documentation

The following Stata manuals are available for short term loan from the CDE Print Library in 4457 Social Science:

  • Stata Reference Manual: Version 9 (four volumes)
  • Stata User's Guide: Version 9
  • Getting Started with Stata 9 for Linux
  • Stata Graphics Manual Version 9

Stata's web site also has a lot of information including a support section which has a searchable database for finding answers to common questions.

To subscribe to Stata's list serv, visit Stat/Transfer's list subscription service. This web site provides a subscription service to all the major statistical software list servers including Stata. The Stata list serv provides a depth of information and support that is essentially impossible for staff at any one institution (like ours) to duplicate.

Last Revised: 3/15/2007