Stata has many tools for working with dates. This article will introduce you to some of the most useful and easy to use features.
A Stata date is simply a number, but with the %td format applied Stata will interpret that number as "number of days since January 1, 1960." You can then use that number in a variety of ways. Stata has similar tools that measure time in terms of milliseconds, months, quarters, years and more. This article will focus on days, but if you know how to work with days you can quickly learn the others.
Often the first task is to convert the data you've been given into official Stata dates.
If you've been given a date in string form, such as "November 3, 2010", "11/3/2010" or "2010-11-03 08:35:12" it can be converted using the date function. The date function takes two arguments, the string to be converted, and a series of letters called a "mask" that tells Stata how the string is structured. In a date mask, Y means year, M means month, D means day and # means an element should be skipped.
Thus the mask MDY means "month, day, year" and can be used to convert both "November 3, 2010" and "11/3/2010". A date like "2010-11-03 08:35:12" requires the mask YMD### so that the last three numbers are skipped. If you are interested in tracking the time of day you need to switch to the clock function and the %tc format so time is measured in milliseconds rather than days, but they are very similar.
To see this in action, type (or copy and paste) the following into Stata:
This is an example data set containing the above dates as dateString1, dateString2 and dateString3. To convert them to Stata dates do the following:
Note that the mask goes in quotes.
Another common scenario gives you dates as three separate numeric variables, one for the year, one for the month and one for the day. The year, month and day variables in the example data set contain the same date as the others but in this format. To convert such dates to Stata dates, use the mdy function. It takes three numeric arguments: the month, day and year to be converted.
While the four date variables you've created are perfectly functional dates as far as Stata is concerned, they're difficult for humans to interpret. However, the %td format tells Stata to print them out as human readable dates:
format date1 %td
format date2 %td
format date3 %td
format date4 %td
This turns the 18569 now stored in all four variables into 03nov2010 (18,569 days since January 1, 1960) in all output. Try a list to see the result. If you remember your varlist syntax, you can do them all at once with:
format date? %td
You can have Stata output dates in different formats as well. For instructions type help dates and then click on the link Formatting date and time values.
Often your goal in creating a Stata date will be to create a time variable that can be included in a statistical command. If so, you can probably use it with no further modification. However, there are some common data preparation tasks involving dates.
If you need to refer to a particular date in your code, then in principle you could refer to it by number. However, it's usually more convenient to use the same functions used to import date variables. For example, the following are all equivalent ways of referring to November 3, 2010:
date("November 3, 2010","MDY")
The td pseudofunction was designed for tasks like this and is somewhat more convenient to use. It takes a single argument (which cannot be a variable name) and converts it to a date on the assumption that the argument is a string containing a date in the format day, month, year. This matches the output of the %td format, e.g. 3nov2010. Thus the following is also equivalent:
However, the following is not:
This will be interpreted as March 11, 2010, not November 3, 2010.
Sometimes you need to pull out the components of a date. You can do so with the year, month and day functions:
Since dates are just numbers, before and after are equivalent to less than and greater than. Thus:
gen after2010=(date1>date("January 1 2010","MDY"))
Durations in days can be found using simple subtraction. The example data set contains the dates beginning and ending, and you can find out the duration of the interval between them with:
Durations in months are more difficult because months vary in length. One common approach is to ignore days entirely and calculate the duration solely from the year and month components of the dates involved:
Just keep in mind that this approach says January 31 and February 1 are one month apart, while January 1 and January 31 are zero months apart.
If you need to add (or subtract) a period measured in days to a date, it is straightforward to do so. Just remember to format all new date variables as dates with %td:
format %td tenDaysLater yesterday
If the period is measured in weeks, just multiply by 7. Months are again problematic since different months have different lengths. Years have the same problem if you need to be precise enough to care about leap years.
You can avoid this by building a new date based on the components of the old one, modified as required. The only trick is that you must handle year changes properly. For example, the following works properly:
format %td oneMonthLater
oneMonthLater is now December 3, 2010. But the following does not:
format %td twoMonthsLaterBad
This tries to set the month component of the new date to 13, which is invalid. It needs to be January of the next year instead. The following code will do allow you to add or subtract any number of months (just change the final number in the first line and the name of the new variable):
format %td twoMonthsLater
drop newMonth newYear
If you need to do such things frequently you might want to turn this bit of code into a program, or even an ado file.
To read the full documentation on Stata dates, type help dates and then click on the dates and times link at the top (the PDF documentation is much easier to read in this case). There you'll learn to:
Last Revised: 11/9/2010