Data Wrangling in Stata

Author

Russell Dimond

Published

January 12, 2023

1 Introduction

Most data sets need to be transformed in some way before they can be analyzed, a process that’s come to be known as “data wrangling.” Data Wrangling in Stata will introduce you to the key concepts, tools, and skills of data wrangling, implementing them in Stata. You’ll learn a lot about Stata from this book, but the primary focus is on the tasks you’ll need to carry out.

If you’re new to Stata, we recommend working through our Introduction to Stata before proceeding. We’ll start by very briefly reviewing some basic Stata concepts that should be familiar to you, but if they’re not, Introduction to Stata will do a much better job of teaching them to you.

To get the most out of Data Wrangling in Stata you need to be an active participant. Open Stata, and type in and run the example code yourself. This will help you retain more, and ensure you get all the details right—Stata is always happy to tell you when you’re wrong. Do the exercises (some of them are straightforward applications of what you just learned; others will require more creativity). Data wrangling is not something you read and understand—it’s a skill you must practice.

The example files for this class are available on the SSCC’s web server. The example code will download them directly from there. For real work, you’ll usually download you data and work with it locally, so if you plan to work through the entire book we suggest you download the example files and working with them locally too. They can be obtained within Stata by running:

net get dws, from(https://ssc.wisc.edu/sscc/stata/)

If this fails on your computer, try:

net get dws, from(http://ssc.wisc.edu/sscc/stata/)

This will put the example files in your current working directory. If you are comfortable doing so, create a folder for the example files, make that Stata’s working directory, and run the command above. If not, we’ll talk about how to do all this in the Reading in Data chapter and you can get the example files then.

If you do download the files, you can replace commands like

use https://sscc.wisc.edu/sscc/stata/dws/acs.dta

with just:

use acs

This web book was written in JupyterLab using pystata-kernel and rendered by Quarto. The notebooks themselves are included in the example files. You’ll notice a few things that are only there to make the notebooks look better, like set linesize and using the ab(30) option with list to avoid abbreviating variable names. More generally, you’ll usually want to use browse where the book uses list.