8 Dates

There are several kinds of computation we typically want to do with dates:

  • order and compare dates
  • extract categories of time (year, month, day of the week)
  • calculate elapsed time (differences between dates)
  • increment or decrement dates (a month later, a week earlier)

8.1 Representing Dates

Dates (and times) can be awkward to work with. To begin with, we usually reference points on the calendar (a specific date) with a set of category labels - “year”-“month”-“day”. To compute with these, it is useful to translate them to a number line - each date is a point on one continuous time line. By thinking of calendar dates as points on a line, say \(a\) and \(b\), it becomes clear how they are ordered and how to measure the distance between two points: \(\lvert b-a \rvert\).

However, a second difficulty with dates is that our time units - the category labels “year”, “month”, and “day” - all vary in length. That is, some years have 365 days while others have 366. The length of a month varies from 28 to 31 days. And some days have 23 hours while others have 24 or 25 hours (switching from standard time to daylight savings and back). If two dates are 30 days apart, has more than a “month” passed, exactly a “month”, or not quite a “month”?

8.1.1 The Time Line

In R there are several different ways to solve the dilemmas posed by our measures of dates and times, with different assumptions and constraints. The simplest of these is the Date class (there are also two date-time classes). The Date class translates calendar dates to a time line of integers, where 0 is “1970-01-01”, 1 is “1970-01-02”, -1 is “1969-12-31”, etc. The fundamental unit is one day.

In the following example, we take a date given as a character string and convert it to numeric form. Numeric values with class Date print in a human-readable format. If we coerce a numeric date to a plain numeric class, we can see the underlying number.

x <- "1970-01-01"
y <- as.Date(x)
print(y)
[1] "1970-01-01"
class(y)
[1] "Date"
as.numeric(y)
[1] 0

Today’s date is

Sys.Date()
[1] "2021-06-02"
as.numeric(Sys.Date())
[1] 18780

In other words, today (when this document was last updated) is 18780 days after 1970-01-01.

8.1.2 Date Formats

When converting labeled dates to numeric dates, an initial problem is the huge variety of ways in which we record dates as character strings. You might encounter “2020-11-03” (international standard), “11/03/2020” (a typical American representation), or even “November 3, 2020” (another typical American representation), all of which label the same point on the calendar.

The international standard is the R default, so it needs no special handling. Typical American date representations require you to specify a format to make the conversion to a Date.

In this context, a format is a character string that specifies the template for reading dates.

as.Date("2020-11-03") # default format, %Y-%m-%d
[1] "2020-11-03"
as.Date("11/03/2020", format="%m/%d/%Y")
[1] "2020-11-03"
as.Date("November 3, 2020", format="%B %e, %Y")
[1] "2020-11-03"

Notice that the separators - dashes (-), slashes (/), spaces, and commas - are included when specifying the format. In the last example, %B is a complete month name, followed by %e a day of the month with a leading space, followed by ,, a comma and a space, followed by %Y, a four digit year.

See help(as.Date) and help(strptime) for extensive details.

8.2 Extracting Date Categories

The same formats are used when we want to extract category labels - months or years - from a Date. We use the strftime() function to convert from a numeric Date to a category label.

In this example we extract the year part of several dates.

dates  <- c("04/10/1964", "06/18/1965", "09/21/1966")
ndates <- as.Date(dates, format="%m/%d/%Y")
strftime(ndates, format="%Y")
[1] "1964" "1965" "1966"

Notice that these are returned as character values!

There a several ways we might label months: with a full name, with an abbreviated name, or with a numeral. Each of these has its own format code.

strftime(ndates, format="%b")
[1] "Apr" "Jun" "Sep"
strftime(ndates, format="%m")
[1] "04" "06" "09"

Again, the result is a vector of character values.

8.3 Elapsed Time

Storing dates as numeric values makes it easy to compute elapsed times: you just subtract one date from another. The difference is the number of days that have passed.

How many days have passed since January 1, 2000?

daysgoneby <- Sys.Date() - as.Date("2000-01-01") 
daysgoneby
Time difference of 7823 days

The result is numeric data, but with a new class, difftime. However, the time units to print difftimes are limited because units larger than weeks are ambiguous. If you prefer elapsed time in years, you may choose to use a numeric approximation.

class(daysgoneby)
[1] "difftime"
yearsold <- as.numeric(daysgoneby/365.25)
yearsold
[1] 21.41821

Notice that with this approximation, no two dates are ever exactly one year apart!

8.4 Incrementing Dates

Another limitation of the Date class is that incrementing or decrementing by units other than days is awkward - again the ambiguity of months and years is an obstacle.

Suppose we wanted to increment some dates by one month. We could try

dates <- as.Date(c("2004-02-10", "2005-06-18", "2007-07-21"))
dates + 30
[1] "2004-03-11" "2005-07-18" "2007-08-20"

The first and third values here are probably not what we had in mind!

We usually think of retaining the same date, but incrementing the month category. This can be accomplished using base R functions, but is much more easily handled by using the lubridate package.

library(lubridate)

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
dates + months(1)
[1] "2004-03-10" "2005-07-18" "2007-08-21"

8.5 Dates Exercises

  1. Reading dates of another format:

    Other software uses other conventions for labeling date values. SAS and Stata both print dates as “10apr2004” by default.

    Convert the following SAS/Stata dates to R Dates:

    10apr2004
    18jun2005
    21sep2006
    12jan2007
  2. Average an standard deviation of dates:

    Using the dates from problem 1, calculate an average date. What class is the returned value?

    Calculate the standard deviation. What class is this? Why should the mean and standard deviation return values of different classes?

  3. Dates from date components:

    Occasionally you will work with data where the month, day, and year components of dates are stored as separate variables. To convert these to Dates, first paste them together.

    df <- data.frame(day = c(10, 18, 22),
                     month = c(4, 6, 9),
                     year  = c(2004, 2005, 2006))
  4. Selecting data based on a date cutoff:

    Given the following data frame, which has a single variable V1 containing dates, create an indictor showing which observations occur on or after 31 July. How many of these observations are there?

    set.seed(112)
    x <- as.Date(sample(as.Date("2019-01-01"):as.Date("2019-12-31"), 
                        10), 
                 origin="1970-01-01")
    
    df <- read.table(text=paste(x, collapse="\n"))
  5. Creating dates from integers

    In the previous example the sample() function draws 10 random integers. The result is converted to dates by as.Date(). For this type of coercion, R requires you to specify a date to serve as the origin for the incoming numbers. Most often this will be the same as the origin for Date values, “1970-01-01”.

    Convert the integers 0:5 to R dates, assuming the usual R origin.

    Other software use other origins for their timelines: date values in SAS and Stata use 01jan60 as their origin.

    Now assume the integers 0:5 are Stata date values. Convert these to R dates - what values do they take?

  6. Extract day of the week

    Going back to the dataframe in exercise 3, create a new vector with the day of the week (Sunday through Saturday) of each observation.

  7. Date sequences

    For incrementing a date by one step, functions from lubridate are the easiest to work with.

    For creating a longer sequence of dates, the base R seq() function is also useful.

    Lubridate’s “x + months()” form works well to increment dates early in months, and is easy to remember. Try

    as.Date("2020-01-05") + months(1:5)

    but observe the result when you try

    as.Date("2020-01-31") + months(1:5)

    Now look at how the %m+% operator handles these situations.

    as.Date("2020-01-05") %m+% months(1:5)
    as.Date("2020-01-31") %m+% months(1:5)

    Base R also has a seq() method for creating sequences of dates. Notice two differences. First, you get back the initial date as part of your result. Second, dates at the end of months can roll over into the next month.

    seq(as.Date("2020-01-05"), by="months", length.out=6)
    seq(as.Date("2020-01-31"), by="months", length.out=6)

    seq() is also awkward when working with vectors of initial dates!

  8. Date differences in years

    Create a sequence of anniversaries, from 1993-05-22 to the present. Hint: this is probably easiest with the seq() function! Use the diff function to calculate the length of time between successive anniversaries.

    If we convert difftimes to units of year by dividing by 365.25, which differences are exactly one year?