6 Logical

In R, logical values are stored as a distinct data type. Logical values are very efficient to store, and are used both in statistical modeling and in data management. In modeling, logical vectors are often called indicator or dummy variables. In data management, logical values also serve as conditional indicators.

6.1 Logical Values

There are three logical values:

  • TRUE
  • FALSE
  • NA (“missing”, or “unknown value”)

The names T and F are used by default as aliases for TRUE and FALSE, but be aware that you can redefine T and F. Do not do this accidentally! The names T and F print the values TRUE and FALSE.

T
[1] TRUE
F
[1] FALSE

See help(logical) and help(NA).

6.2 Logical Operators

R has the typical binary comparison operators (see help(Comparison)). These take data of arbitrary type as inputs (x and y) and return logical values.

  • Greater than, x > y, or equal to, x >= y
  • Less than, x < y, or equal to, x <= y
  • Equal to, x == y
  • Not equal, x != y

R also has the typical boolean operators for creating compound logical expressions. These take logical values as inputs (x and y) and return logical values.

  • And, x & y
  • Or, x | y
  • Not, !x

6.2.1 Making Comparisons

As with the mathematical operators, logical operators work pairwise with the elements of two vectors, returning a vector of comparisons. Where one vector is shorter than the other, recycling occurs.

In this first example, we set up an arbitrary numeric vector, \(a\), and ask if each element of \(a\) is a 3.

a <- c(1.1, 3, 5.3, 2)  # a numeric vector

f <- (a == 3)           # a vector of comparisons
f
[1] FALSE  TRUE FALSE FALSE

We can make comparisons with character values, too, but be aware that the result can depend on what language R thinks you are working in!

A <- c("a", "b", "e")
A > "d"
[1] FALSE FALSE  TRUE

Comparison of two vectors is done element by element. (The term “vectorized” is sometimes used to mean this sort of operation). In this example, each element of \(a\) is compared to one corresponding integer in 1 to 4.

a > 1:4
[1]  TRUE  TRUE  TRUE FALSE

If the two vectors being compared are of different lengths, recycling occurs just as we saw with numeric operators.

b <- c(1.1, 2)
a != b                  # silent recycling
[1] FALSE  TRUE  TRUE FALSE
a > 2:4                 # noisy recycling
Warning in a > 2:4: longer object length is not a multiple of shorter object length
[1] FALSE FALSE  TRUE FALSE

A somewhat different kind of comparison is the value match. Here we ask if values in the left-hand vector are elements of the set represented by the right-hand vector. Despite the use of two vectors, these are no longer pairwise comparisons and there is no recycling.

2 %in% a
[1] TRUE
1:4 %in% a    # elementwise on the left-hand side
[1] FALSE  TRUE  TRUE FALSE

%in% will often return the same result as ==:

z <- c(0, 1, 2)

z == 1
[1] FALSE  TRUE FALSE
z %in% 1
[1] FALSE  TRUE FALSE

However, if missing data are involved, the two behave differently. Where == returns NA, %in% returns FALSE. When subsetting a vector by a logical condition, be careful which one you use, since they will return different elements. See Missing Values below.

z[3] <- NA

z == 1
[1] FALSE  TRUE    NA
z %in% 1
[1] FALSE  TRUE FALSE
z[z == 1]
[1]  1 NA
z[z %in% 1]
[1] 1

6.2.2 Boolean Algebra

We also have the usual operators (“and”, “or”, “not”) for combining logical inputs to produce a logical outcome.

# &, "and" - satisfy both conditions
(a == 2) & (a < 5) 
[1] FALSE FALSE FALSE  TRUE
# |, "or" - satisfy at least one condition
(a == 2) | (a < 5) 
[1]  TRUE  TRUE FALSE  TRUE

6.2.3 Missing Values

The logical status of missing values is treated somewhat differently in R than in some other statistical software (Stata, SAS, SPSS). Where in some languages the result of a comparison is either true or false, in R a comparison may produce a “missing” or “unknown” result.

b <- c(1:4, NA)

b > 3   # in Stata the final value is "true"
[1] FALSE FALSE FALSE  TRUE    NA
b == 3  # in Stata and SAS the final value is "false"
[1] FALSE FALSE  TRUE FALSE    NA
b < 3   # in SAS the final value is "true"
[1]  TRUE  TRUE FALSE FALSE    NA

Likewise, Boolean operations on missing values produce missing results.

When checking for missing values a common mistake is to use a comparison. However, in R we use a testing function.

b == NA   # not useful, but doesn't produce an error!
[1] NA NA NA NA NA
is.na(b)  # the proper way to check
[1] FALSE FALSE FALSE FALSE  TRUE

6.3 Functions with Logical Vectors

A generic function is a function which uses different methods (implements different algorithms) depending on the class and type of the input data. (Recall the discussion in the chapter on Data Class.) A very few generic functions have specific methods for logical vectors, while most functions will coerce logical vectors to either a numeric vector or a factor.

summary(f)       # produces counts, but also notes mode
   Mode   FALSE    TRUE 
logical       3       1 

6.3.1 Coercion

If you have worked with other statistical software, you won’t be surprised that very often logical values are automatically coerced to the integers 0 and 1.

mean(f)          # coerced to numeric, a proportion
[1] 0.25
f + 1            # coercion in binary operators, too
[1] 1 2 1 1

You may also be aware that where numeric values are coerced into logical values, 0 is FALSE and anything else is TRUE (unless it is missing). (Recall Exercise 3 from Data Types.)

as.logical(-1:2)
[1]  TRUE FALSE  TRUE  TRUE

6.4 Testing Equality

There is one logical comparison that is particularly problematic when made by a computer: equality. Checking the equality of logical values, character values, and integer values is straightforward, but numeric values with decimal precision (stored as “doubles”) are often imprecise. Think of the decimal representation of \(1/3\), or 0.3333333, which must be truncated at some point: 0.3333… cannot continue forever.
(A computer works with binary representations, but the problem is conceptually the same.)

Even simple mathematical operations can introduce numerical deviations.

a <- 0.5 - 0.2  # 0.3
b <- 0.4 - 0.1  # 0.3

a == b          # Probably not what you expected!
[1] FALSE
a - b           # a small difference, but not exactly zero
[1] -5.551115e-17

We have two general approaches for handling this imprecision with comparisons of numeric vectors. In the special case where we want to know of all elements of two vectors are equivalent, we have a summary function all.equal. In the more general case, we test that the differences between two vectors are less than a numerical tolerance.

# An example with vectors
x <- seq(0, 0.5, by = 0.1)
y <- seq(0.1, 0.6, by=0.1)-0.1

x == y          # Not what you hoped for?  (the third element ....)
[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
x - y
[1]  0.000000e+00  0.000000e+00 -2.775558e-17  0.000000e+00  0.000000e+00  0.000000e+00
all.equal(x,y)  # checking all are equal
[1] TRUE

The smallest precision available on your computer is given by

.Machine$double.eps
[1] 2.220446e-16

We commonly take our maximum imprecision to be the square root of that value. So if we check for numerical equivalence, we get the result we expected earlier with ==.

tol <- sqrt(.Machine$double.eps)

x-y < tol
[1] TRUE TRUE TRUE TRUE TRUE TRUE

6.5 Logical Vector Exercises

  1. Indicators

    A typical use of a comparison is to create an indicator variable. Given the mean gas mileage of cars in the mtcars data, 20.090625, create a variable that indicates which cars have above average gas mileage.

    The mileage variable is mtcars$mpg.

  2. Conditions

    A logical vector may be used as a condition to select observations from another vector (as discussed in Numeric Vectors).

    Use the indicator from exercise 1 to select high mileage cars, and calculate their mean displacement, found in mtcars$disp.

    Use the same indicator and a Boolean operator to calculate the mean displacement of low mileage cars.

  3. Coercion

    Automatic coercion of logical to numeric values and of numeric to logical values will usually be very intuitive. One place this fails spectacularly is in indexing (extracting, subsetting).

    Consider this example, which at first blush might look like it should produce the same results in two different ways.

    v <- 1:4
    v[c(T,F,T,F)]
    v[c(1,0,1,0)]

    Why do these return two different vectors?

  4. Testing equality

    What happens when values really are not equal? Consider

    x <- seq(0, 0.5, by = 0.1)
    y <- c(seq(0.1, 0.5, by=0.1)-0.1, 1)
    
    x == y

    We see two FALSE values - do they mean the same thing? How can we get an unambiguous result?

    Suppose you want one value to summarize the equality of these two vectors. Try

    all.equal(x,y)

    The result in not a logical value! See help(isTRUE) and come up with a solution that is strictly TRUE or FALSE.