Stata for Researchers: Introduction

Stata is the most popular program for statistical analysis at the SSCC, as it is extremely powerful and relatively easy to learn. Its straightforward but flexible syntax makes it a good choice for data management, and it implements a very large number of statistical techniques. Stata also has a an extensive user community which has made a great deal of code available, including many additional estimators. We've been quite pleased with Stata at the SSCC, and we think you'll find it extremely useful.

The goal of Stata for Researchers (as opposed to Stata for Students) is to give you a solid foundation that you can build on to become an expert Stata user. If your goal is to learn just enough Stata to get you through a particular course you should probably read Stata for Students instead.

There are two different approaches one can take to Stata. One is to use it as an interactive tool: you start Stata, load your data, and start typing or clicking on commands. This is an excellent way to learn Stata because you get immediate feedback; thus it's how you'll spend most of your time as you work through the Stata for Researchers series. It is also a good way to explore your data, figure out what you want to do, and check that your programs worked properly. However, interactive work cannot be easily or reliably replicated, or modified if you change your mind. It's also very difficult to recover from mistakes--there's no "undo" command in Stata.

The other approach is to treat Stata as a programming language. In this approach you write your programs, called do files, and run them when they're complete. A do file contains the same commands you'd type in interactive Stata, but since they're written in a permanent file they can be debugged or modified and then rerun at will. They also serve as an exact record of how you obtained your results--a lab notebook for the social scientist. Any work you intend to publish or present should be done using do files. Thus this series will for the most part ignore Stata's graphical user interface and prepare you to write do files for research.

About This Series

Stata for Researchers contains the following sections:

  1. Introduction
  2. Usage and Syntax
  3. Working With Data
  4. Statistics
  5. Working with Groups
  6. Hierarchical Data
  7. Combining Data Sets
  8. Graphics
  9. Do Files and Project Management
  10. Learning More

You'll be ready to do useful work after finishing the section on Statistics, but please be sure to read Do Files and Project Management (out of order if necessary) before starting a research project using Stata.

Some of the articles in this series use example files. If you are on the SSCC network these files can be found in X:\SSCC Tutorials\StataResearch. Alternatively, you can download a zip file containing all the example files. Copy these files to a convenient location like U:\StataResearch and make that location your current working directory whenever you're doing the examples in this series (we'll show you how shortly).

Each topic includes exercises, and solutions are given for most of them. While many of the exercises are short questions to test your understanding of the material, others require more work and are designed to give you experience working with Stata. If you are currently involved in a research project it may be a better use of your time to get your Stata experience by working on your project. If you get stuck on an exercise it's probably best to move on. On the other hand, you can learn from reading the solutions even if you don't do all the exercises.

Running Stata at the SSCC

The SSCC makes Stata available on Winstat, in our computer labs, on Linstat, and through Condor. For details about the capabilities of the SSCC's servers see Computing Resources at the SSCC. Most SSCC members run Stata/MP on Winstat, but some jobs require different resources.

Stata/MP vs. Stata/MP4

The SSCC runs Stata/MP, the multi-processor version of Stata. Most of our licenses are for the two processor version of Stata/MP, but we have a small number of the more expensive four processor licenses. Use regular Stata/MP for day-today work, especially writing do files, but feel free to use Stata/MP4 any time you need to run a do file that will take more than a minute or two. You don't have to make any changes to your do files to run them using Stata/MP4.

Windows vs. Linux

Stata looks and acts the same whether it's running on Windows or Linux (or on a Mac). However, Linstat (the SSCC's Linux computing cluster) has much more memory than Winstat (the SSCC's Windows Terminal Server Farm), and is better suited for long jobs. Running Stata jobs on Linstat is probably easier than you think: see Using Linstat to learn how.

Condor

The SSCC Condor flock is ideal for running very long jobs (jobs that will take days, weeks or even longer) or for running multiple jobs at the same time.

For more information about running jobs see Do Files and Project Management.

Next: Usage and Syntax

Last Revised: 8/19/2011