Working with Dates in Stata

Stata has many tools for working with dates. This article will introduce you to some of the most useful and easy to use features.

A Stata date is simply a number, but with the %td format applied Stata will interpret that number as "number of days since January 1, 1960." You can then use that number in a variety of ways. Stata has similar tools that measure time in terms of milliseconds, months, quarters, years and more. This article will focus on days, but if you know how to work with days you can quickly learn the others.

Often the first task is to convert the data you've been given into official Stata dates.

Converting Strings to Dates

If you've been given a date in string form, such as "November 3, 2010", "11/3/2010" or "2010-11-03 08:35:12" it can be converted using the date function. The date function takes two arguments, the string to be converted, and a series of letters called a "mask" that tells Stata how the string is structured. In a date mask, Y means year, M means month, D means day and # means an element should be skipped.

Thus the mask MDY means "month, day, year" and can be used to convert both "November 3, 2010" and "11/3/2010". A date like "2010-11-03 08:35:12" requires the mask YMD### so that the last three numbers are skipped. If you are interested in tracking the time of day you need to switch to the clock function and the %tc format so time is measured in milliseconds rather than days, but they are very similar.

To see this in action, type (or copy and paste) the following into Stata:

use https://www.ssc.wisc.edu/sscc/pubs/files/dates.dta

This is an example data set containing the above dates as dateString1, dateString2 and dateString3. To convert them to Stata dates do the following:

gen date1=date(dateString1,"MDY")
gen date2=date(dateString2,"MDY")
gen date3=date(dateString3,"YMD###")

Note that the mask goes in quotes.

Converting Numbers to Dates

Another common scenario gives you dates as three separate numeric variables, one for the year, one for the month and one for the day. The year, month and day variables in the example data set contain the same date as the others but in this format. To convert such dates to Stata dates, use the mdy function. It takes three numeric arguments: the month, day and year to be converted.

gen date4=mdy(month,day,year)

Formatting Date Variables

While the four date variables you've created are perfectly functional dates as far as Stata is concerned, they're difficult for humans to interpret. However, the %td format tells Stata to print them out as human readable dates:

format date1 %td
format date2 %td
format date3 %td
format date4 %td

This turns the 18569 now stored in all four variables into 03nov2010 (18,569 days since January 1, 1960) in all output. Try a list to see the result. If you remember your varlist syntax, you can do them all at once with:

format date? %td

You can have Stata output dates in different formats as well. For instructions type help dates and then click on the link Formatting date and time values.

Using Dates

Often your goal in creating a Stata date will be to create a time variable that can be included in a statistical command. If so, you can probably use it with no further modification. However, there are some common data preparation tasks involving dates.

Date Constants

If you need to refer to a particular date in your code, then in principle you could refer to it by number. However, it's usually more convenient to use the same functions used to import date variables. For example, the following are all equivalent ways of referring to November 3, 2010:

18569
date("November 3, 2010","MDY")
mdy(11,3,2010)

The td pseudofunction was designed for tasks like this and is somewhat more convenient to use. It takes a single argument (which cannot be a variable name) and converts it to a date on the assumption that the argument is a string containing a date in the format day, month, year. This matches the output of the %td format, e.g. 3nov2010. Thus the following is also equivalent:

td(3nov2010)

However, the following is not:

td(11/3/2010)

This will be interpreted as March 11, 2010, not November 3, 2010.

Extracting Date Components

Sometimes you need to pull out the components of a date. You can do so with the year, month and day functions:

gen year1=year(date1)
gen month1=month(date1)
gen day1=day(date1)

Before and After

Since dates are just numbers, before and after are equivalent to less than and greater than. Thus:

gen before2010=(date1<td(1,1,2010))
gen after2010=(date1>date("January 1 2010","MDY"))

Durations and Intervals

Durations in days can be found using simple subtraction. The example data set contains the dates beginning and ending, and you can find out the duration of the interval between them with:

gen duration=ending-beginning

Durations in months are more difficult because months vary in length. One common approach is to ignore days entirely and calculate the duration solely from the year and month components of the dates involved:

gen durationInMonths=(year(ending)-year(beginning))*12+month(ending)-month(beginning)

Just keep in mind that this approach says January 31 and February 1 are one month apart, while January 1 and January 31 are zero months apart.

Date Arithmetic

If you need to add (or subtract) a period measured in days to a date, it is straightforward to do so. Just remember to format all new date variables as dates with %td:

gen tenDaysLater=date1+10
gen yesterday=date1-1
format %td tenDaysLater yesterday

If the period is measured in weeks, just multiply by 7.

Months and Years

Months and years are problematic because they don't always represent the same amount of time. A month can be anything from 28 to 31 days, and a calendar year is usually 365 days but is 366 days in leap year.

Suppose today's date were April 20th, 2017. If I asked you what the date will be in one month, you'd probably respond May 20th, 2017. If I asked you what the date will be in one year, you'd probably respond April 20th, 2018.

Now consider two particularly problematic dates: January 31, 2016 and February 29, 2016. One month after January 31st cannot be February 31st, because no such day exists. Similarly, one year after February 29th, 2016 cannot be February 29, 2017, because February 29th only exists in leap years. So what should be done?

If the dates in your data are mostly just months, consider storing them as dates in monthly format, where the underlying number is the number of months since January, 1960 rather than the number of days. Then all these issues go away. You can convert dates to monthly dates with the mofd() function and then give them the %tm format to make them readable:

clear
use https://www.ssc.wisc.edu/sscc/pubs/files/moredates.dta

gen monthlyDate=mofd(date)
gen oneMonthLater1=monthlyDate+1
gen oneYearLater1=monthlyDate+12
format monthlyDate oneMonthLater1 oneYearLater1 %tm

Alternatively, you can define a "standard month" of 30 or 31 days or a "standard year" of 365 days (no, 365.25 days won't work unless you're storing time as well as date). This has the advantage of making all time intervals uniform. It's also easy to program:

gen oneMonthLater2=date+30
gen oneYearLater2=date+365
format oneMonthLater2 oneYearLater2 %td

However, note that 30 days after January 31, 2016 is March 1, 2016, and 365 days after January 31, 2016 is January 30, 2017. That might not be ideal in some contexts.

Another practical definition of "one month later" would be "the same day in the next month if that exists, otherwise the last day of the next month." Then one month after January 31st would be February 28th or 29th depending on the year. You can implement this, without programming in how many days each month has, using the following algorithm:

  1. Convert the date to a monthly date and add one month to it
  2. Create a new date based on the month and year from the date created in step 1, and the day of the month from the original date
  3. If the resulting date is invalid (e.g. February 31st) subtract days until you get a valid date

The following code implements this algorithm:

gen oneMonthLaterTemp=dofm(mofd(date)+1)
gen oneMonthLater3=mdy(month(oneMonthLaterTemp),day(date),year(oneMonthLaterTemp))
egen numInvalid=total(oneMonthLater3==.) // calculate number of dates that are invalid
local i 1 // number of days to subtract from invalid dates
while (numInvalid>0) {
replace oneMonthLater3=mdy(month(oneMonthLaterTemp),day(date)-`i',year(oneMonthLaterTemp)) if oneMonthLater3==.
local i=`i'+1 // increase number of days to subtract
drop numInvalid
egen numInvalid=total(oneMonthLater3==.) // see if we still have invalid dates
}
drop oneMonthLaterTemp numInvalid
format oneMonthLater3 %td

The mofd() function converts a regular date to a monthly date, and the dofm() function converts a monthly date to a regular date.

Very similar code could be used for adding years without needing to program in which years are leap years.

Learning More

To read the full documentation on Stata dates, type help dates and then click on the dates and times link at the top (the PDF documentation is much easier to read in this case). There you'll learn to:

  • Work with times
  • Use intervals other than days, such as months, quarters or years
  • Create your own date format for output (e.g. November 3rd, 2010 rather than 3nov2010)
  • Track leap seconds, in case you need to be extremely precise--you'll also find an explanation of why such things exist

Complete Do File

The following is a do file containing all the code from this article (except for a few code fragments that were discussed but can't be run by themselves):

clear all
set more off
use https://www.ssc.wisc.edu/sscc/pubs/files/dates.dta

gen date1=date(dateString1,"MDY")
gen date2=date(dateString2,"MDY")
gen date3=date(dateString3,"YMD###")
gen date4=mdy(month,day,year)
format date? %td

gen year1=year(date1)
gen month1=month(date1)
gen day1=day(date1)

gen before2010=(date1< td(1,1,2010))
gen after2010=(date1>date("January 1 2010","MDY"))

gen duration=ending-beginning
gen durationInMonths=(year(ending)-year(beginning))*12+month(ending)-month(beginning)

gen tenDaysLater=date1+10
gen yesterday=date1-1
format tenDaysLater yesterday %td

clear
use https://www.ssc.wisc.edu/sscc/pubs/files/moredates.dta

gen monthlyDate=mofd(date)
gen oneMonthLater1=monthlyDate+1
gen oneYearLater1=monthlyDate+12
format monthlyDate oneMonthLater1 oneYearLater1 %tm

gen oneMonthLater2=date+30
gen oneYearLater2=date+365
format oneMonthLater2 oneYearLater2 %td

gen oneMonthLaterTemp=dofm(mofd(date)+1)
gen oneMonthLater3=mdy(month(oneMonthLaterTemp),day(date),year(oneMonthLaterTemp))
egen numInvalid=total(oneMonthLater3==.) // calculate number of dates that are invalid
local i 1 // number of days to subtract from invalid dates
while (numInvalid>0) {
replace oneMonthLater3=mdy(month(oneMonthLaterTemp),day(date)-`i',year(oneMonthLaterTemp)) if oneMonthLater3==.
local i=`i'+1 // increase number of days to subtract
drop numInvalid
egen numInvalid=total(oneMonthLater3==.) // see if we still have invalid dates
}
drop oneMonthLaterTemp numInvalid
format oneMonthLater3 %td

Last Revised: 4/20/2017