Data Wrangling in Stata: Learning More

This is part nine of Data Wrangling in Stata.

You've now learned a great deal about how to wrangle data in Stata. However, you'll almost certainly need to learn more at some point in your Stata career. Thus we'll conclude by discussing resources for doing so.

Help

Your first resource is the Stata help files, which are far better than most. Most of the time, you'll find what you need more quickly in the help files than by googling.

Help for Commands

To see the help for a particular command type help and then the name of the command in the Command window. For example, type:

help mlogit

This will show you an abbreviated version of the documentation for the mlogit (multinomial logit) command. For the full documentation, click View complete PDF manual entry at the top. This includes:

  • The Title and Description of the command.
  • A Quick Start section that shows you how the command is used, which is great if you just need a refresher on the syntax
  • A full Syntax diagram for the command and a list of available options. It also tells you what kinds of weights are allowed.
  • A detailed description of all the Options.
  • Remarks and Examples that can give you a pretty good start on both the Stata and the statistics involved in using the command.
  • Methods and formulas if you need to know exactly what it's doing.
  • References you should read if you plan on using a model in your research that you've never formally studied.
  • An Also see section—if it turns out that a command isn't quite what you need, the chances are good that the command you actually need is listed there.

Note that every command that runs a statistical model has a separate entry for postestimation tasks, like prediction or calculating margins. You can see it with:

help mlogit postestimation

Typing help functions will give you a list of the functions you can use in mathematical expressions, while help egen will give you a list of egen functions.

findit

Often you'll know what you want to do but not the name of the command that will do it. Then findit is your best bet—think of it as Google for Stata. For example, suppose you want to do something with Heckman selection models. If you type

findit heckman

you'll get a tremendous amount of information. First Stata will search the help files and point out that there is a heckman command, along with related commands like suest and treatreg. Then it will search the Frequently Asked Questions files on Stata's web site and the large Stata web site at UCLA (the UCLA web site contains a great deal of useful information, but unfortunately it's no longer being updated). Finally it will search through the user-written programs that have appeared in the Stata Journal, the old Stata Technical Bulletin, or in the Boston College Statistical Software Components archive. You can find out what these programs do by reading their help files (.hlp), and if you decide they'll be useful to you you can download and install them by clicking on the click here to install link. See Finding and Installing User-Written Stata Programs for more information.

Effective Googling

Of course Google will be a useful tool as well. Usually you can find what you need by searching for Stata and then the command or topic of interest. If you are getting a particularly obscure error message, googling for that exact error message (put it in quotes) can often find discussions of the exact problem you're facing.

SSCC Resources

The SSCC's Knowledge Base has a large section on Stata, including our training curriculum and discussions of specific topics. Once you feel confident using Stata's basic syntax, we strongly suggest reading Stata Programming Essentials. It will teach you things like how to do the same thing to ten different variables without having to write it out ten times.

The SSCC offer classes on Stata each semester. Look for further development of our Data Science Tools for Research curriculum as well as topical Stata Workshops.

Finally, the SSCC's statistical consultants are available to assist SSCC members. We cannot write your Stata programs for you. But we will be more than happy to help with planning your project, figuring out the commands that will make your program work, and of course finding and fixing bugs, along with consulting on statistical methodology.

Practice

The most important resource for learning Stata is practice. If you don't use the skills and knowledge you've gained from reading this series within the next few weeks (at most) you'll lose them rapidly. If you don't have a current research project that will require you to use Stata, make one up.

One particular pitfall to watch out for is "I'll just do it in Excel." It may be true that you can carry out a particular task in Excel faster than you can first learn how to do it in Stata and then actually carry it out. But if you do it in Stata anyway, the next time it comes up you'll be able to do it much more quickly in Stata than in Excel, and more reproducibly, and with less likelihood of error. You'll also build up your general Stata expertise, so that soon you'll be able to do things faster in Stata even if you've never done them before. Now that you've spent the time to learn Stata, plan on never using Excel for research again.

Exercise: Load the auto data set that comes with Stata (sysuse auto). The make variable contains first the company that built the care and then the name of the car model. In Introduction to Stata, we created a company variable with gen company=word(make,1). Now create a model variable. But since the model name is sometimes more than one word, you can't just use gen model=word(make,2). Instead, look up subinstr() in the help files, and use it to create model by replacing company and the space that follows it with nothing.

This brings us to the end of Data Wrangling in Stata. We hope it has been useful to you.

Last Revised: 8/24/2019