Working With Data

Examples to accompany Stata for Researchers

generate / replace

Let’s start with

help generate

and our basic form is

generate newvar = expression

Where expressions can take a huge variety of forms: a mix of variable names, constants, operators and functions.

Using the auto data set, calculate an inflation-adjusted price for each car type. See BLS Inflation Calculator.

. sysuse auto
(1978 Automobile Data)

. generate price2017 = 3.94*price

. * check, this is a linear transformation
. scatter price2017 price

Conditional Values

Suppose we wanted to calculate the current price in Euros, but just for foreign cars.

. generate europrice = .81*price2017 if foreign==1
(52 missing values generated)

. * check means
. summarize *price*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
   price2017 |         74    24291.11    11621.01   12966.54   62669.64
   europrice |         22    20376.07     8367.58   11961.37   41456.29

Replacing Values

Suppose you wanted to recode weight in Scientific Units. You might try

. generate weight = weight/2.2
variable weight already defined
r(110);

But that gives you an error. In general, if you want to write over existing data (or files), you need to say replace. In this case, replace is a command name (in other cases, it is an option keyword).

. replace weight = weight/2.2
variable weight was int now float
(74 real changes made)

. * check, correlation with price is the same
. * oops! can't check because we overwrote our data!
. corr weight price2017
(obs=74)

             |   weight pri~2017
-------------+------------------
      weight |   1.0000
   price2017 |   0.5386   1.0000


Missing Values

Suppose you want to reverse the repair scale: instead of 1 being a poor repair record, 5 should be the worst value. You could do

. generate repairs = 6 - rep78
(5 missing values generated)

. * check, crosstab
. tabulate rep78 repairs, missing

    Repair |
    Record |                              repairs
      1978 |         1          2          3          4          5          . |     Total
-----------+------------------------------------------------------------------+----------
         1 |         0          0          0          0          2          0 |         2 
         2 |         0          0          0          8          0          0 |         8 
         3 |         0          0         30          0          0          0 |        30 
         4 |         0         18          0          0          0          0 |        18 
         5 |        11          0          0          0          0          0 |        11 
         . |         0          0          0          0          0          5 |         5 
-----------+------------------------------------------------------------------+----------
     Total |        11         18         30          8          2          5 |        74 


Notice that a missing input value becomes a missing output value.

recode

egen

Details

Mathematical Expressions

Logical Expressions

Probability and Random Numbers

Dates and Times

String Expression

String Conversions(real())

Conversions

encode / decode

destring / tostring