Stata for Researchers: Introduction

Stata is the most popular program for statistical analysis at the SSCC, as it is extremely powerful and relatively easy to learn. Its straightforward but flexible syntax makes it a good choice for data management, and it implements a very large number of statistical techniques. Stata also has a an extensive user community which has made a great deal of code available, including many additional estimators. We've been quite pleased with Stata at the SSCC, and we think you'll find it extremely useful.

The goal of Stata for Researchers (as opposed to Stata for Students) is to give you a solid foundation that you can build on to become an expert Stata user. If your goal is to learn just enough Stata to get you through a particular course you should probably read Stata for Students instead.

There are two different approaches one can take to Stata. One is to use it as an interactive tool: you start Stata, load your data, and start typing or clicking on commands. This is an excellent way to learn Stata because you get immediate feedback; thus it's how you'll spend most of your time as you work through the Stata for Researchers series. It is also a good way to explore your data, figure out what you want to do, and check that your programs worked properly. However, interactive work cannot be easily or reliably replicated, or modified if you change your mind. It's also very difficult to recover from mistakes--there's no "undo" command in Stata.

The other approach is to treat Stata as a programming language. In this approach you write your programs, called do files, and then run them when they're complete. A do file contains exactly the same Stata commands you'd type in interactive Stata, but since they're written in a permanent file they can be debugged or modified and then rerun at will. They also serve as an exact record of how you obtained your results--a lab notebook for the social scientist. Any work you intend to publish or present should be done using do files. Thus this series will for the most part ignore Stata's graphical user interface and prepare you to write do files for research.

About This Series

Stata for Researchers contains the following sections:

  1. Introduction
  2. Usage and Syntax
  3. Working With Data
  4. Statistics
  5. Working with Groups
  6. Hierarchical Data
  7. Combining Data Sets
  8. Graphics
  9. Do Files and Project Management
  10. Learning More

You'll be ready to do useful work after finishing the section on Statistics, but please be sure to read Do Files and Project Management (out of order if necessary) before starting a research project using Stata.

Some of the articles in this series use example files. If you are on the SSCC network these files can be found in X:\SSCC Tutorials\StataResearch. Alternatively, you can download a zip file containing the example files. Copy these files to a convenient location like U:\StataResearch and make that location your current working directory whenever you're doing the examples in this series (we'll show you how shortly).

Each topic includes exercises, and solutions to them are forthcoming. While many of the exercises are short questions to test your understanding of the material, others require more work and are designed to give you experience working with Stata. If you are currently involved in a research project it may be a better use of your time to get your Stata experience by working on your project. If you get stuck on an exercise it's probably best to move on regardless.

Running Stata at the SSCC

The SSCC makes Stata available on Winstat, in our computer labs, on our Linux servers and through Condor. For details about the capabilities of the SSCC's servers see Computing Resources at the SSCC. Most SSCC members run Stata/SE on Winstat, but some jobs require different resources.

Stata/SE vs. Stata/MP

Stata/MP uses multiple processors to speed up its calculations. How much that helps depends on the particular commands you're using, but it's usually substantial. On the other hand, Stata/MP is more expensive and the SSCC has a relatively small number of Stata/MP licenses. Thus we ask you to only use Stata/MP when you have a long do file to run, and use Stata/SE for everyday work. You don't have to make any changes to your do files to run them using Stata/MP.

Windows vs. Linux

Stata looks and acts the same whether it's running on Windows or Linux (or on a Mac). However, due to the way Windows manages memory Windows Stata can only load data sets that are smaller than about 900 megabytes, no matter how much memory your computer has. 900 megabytes is plenty for most jobs, but if you need more you'll need to switch to Linux. Our 32-bit Linux servers (Kite, Hal) can load data sets up to 2,800 megabytes in size. If you need more than that, switch to our 64-bit Linux server, Falcon. It can load data sets that require up to 27 gigabytes of memory.

To run Stata on a Linux server, log in (probably using X-Win32) and then type xstata.

Condor

The SSCC Condor flock is ideal for running very long jobs (jobs that will take days, weeks or even longer) or for running multiple jobs at the same time.

For more information about running jobs see Do Files and Project Management.

Next: Usage and Syntax

Last Revised: 10/8/2009