Stata for Students is designed for undergraduate students taking methodology classes in the social sciences at UW-Madison, but it will be useful to students taking similar classes elsewhere or anyone looking for a basic introduction to Stata. Graduate students and other researchers, and those who hope to someday be graduate students or researchers, should start with Introduction to Stata and then read Data Wrangling in Stata.
Stata for Students divided into short articles that cover a single subject. You should read all the articles in the Stata Basics section before you do anything else. We also recommend reading the articles in the Understanding Stata section, as they will help everything else make sense and make you a more efficient Stata user. After that you can read just the articles that correspond to the material covered in your class.
You will learn more if you actually carry out the steps described in these articles. All of the articles include examples you can do yourself. They use a subsample from the 2014 General Social Survey, which you'll download by doing the example in Managing Stata Files. (The General Social Survey (GSS) is a project of the independent research organization NORC at the University of Chicago, with principal funding from the National Science Foundation.) If you have a homework assignment to work on you may prefer to just read the articles and then immediately apply what you've learned to your assignment. In that case you can the ignore the specific instructions for the examples.
- Comments and Other Tools for Making Do Files Readable
- Creating Variables and Labels
- Using Graphs
- Reading Data from a Spreadsheet or CSV File
- Downloading Data from Qualtrics and Importing it into Stata
Statistical Commands by Class
Statistical Commands by Topic
- describe: Information about a data set and what it contains
For a variable that describes categories (like sex or race) rather than quantities (like income) frequencies tell you how many observations are in each category. These are examples of univariate statistics, or statistics that describe a single variable.
Categorical variables are also sometimes called factor variables. Indicator variables (also called binary or dummy variables) are just categorical variables with two categories. Frequency tables for a single variable are sometimes called one-way tables.
For a variable that describes quantities (like income) the mean tells you what the expected value of the variable is, and the standard deviation tells you how much it varies. However, the median and percentiles often give you a better sense of how the variable is distributed, especially for variables that are not symmetric (like income, which often has a few very high values). These are also univariate statistics.
Quantitative variables are often called continuous variables. Means are often called averages, and variance is just the standard deviation squared. The median is also the 50th percentile.
For two categorical variables, frequencies tell you how many observations fall in each combination of the two categorical variables (like black women or hispanic men) and can give you a sense of the relationship between the two variables. These are examples of bivariate statistics, or statistics that describe the joint distribution of the two variables.
Tables of frequencies for two variables are often called two-way tables, contingency tables, or crosstabs.
For a quantitative variable and a categorical variable, the mean value of the quantitative variable for those observations that fall in each category of the categorical variable can give you a sense of how the two variables are related. Of then the question of interest is whether the distribution of the quantitative variable is different for different categories. These are also examples of bivariate statistics.
For three or more categorical variables, frequencies will tell you how many observations fall in each combination of the variables and give you a sense of their relationships just like they did with two categorical variables. These are examples of multivariate statistics.
For a quantitative variable and two or more categorical variables, the the mean value of the quantitative variable for those observations in each combination of the categorical variables can give you a sense of how the variables are related just like they did with a quantitative variable and one categorical variable. These are examples of multivariate statistics.
- correlate: Correlations between variables
Estimates and Hypothesis Tests
- mean or ci mean: Estimate the population mean and its confidence interval for a variable
- ttest: Test hypotheses about means
- prtest: Test hypotheses about proportions
- histogram: Graphical representation of a variable's distribution
- graph bar: Bar graph representing summary statistics
- scatter: Scatterplot of two variables
Last Revised: 1/3/2017