2 Data Types

Data values in R come in several different types. We can begin by considering three fundamental types of data (later we’ll add more):

  • numeric values (5, 3.14)
  • character values (“abc”, “Wisconsin”)
  • logical values (TRUE, FALSE)

The distinction is fundamental because it is common for operators (+, &) and functions to only work with specific types of data. When you are creating or debugging an R script, getting the data type right will be a common theme.

As a very simple example, we can add numbers, but not character values.

5 + 3.14
[1] 8.14
"abc" + "Wisconsin"
Error in "abc" + "Wisconsin": non-numeric argument to binary operator

Similarly, we can use the “and” operator (&) with logical values, but not character values.

TRUE & FALSE
[1] FALSE
"abc" & "Wisconsin"
Error in "abc" & "Wisconsin": operations are possible only for numeric, logical or complex types

2.1 Dynamic Typing

In R, the type of a data object can be changed at any point: types are dynamic or mutable. We call the process of changing the data type coercion. Coercion may occur in many different contexts.

2.1.1 Replacing Values in a Vector

Suppose we have a numeric vector x, and we replace the first element of x with a character value. Then all the values in x are coerced to the character type.

x <- sample(1:5, 5)
x
[1] 5 4 1 3 2
x[1] <- "abc" # replace the first value
x
[1] "abc" "4"   "1"   "3"   "2"  
x[4] + x[5]  # now add the last two elements of x
Error in x[4] + x[5]: non-numeric argument to binary operator

Notice that there is no message of any kind that the type of x has changed. Data coercion is a routine part of R processing. This is great when it works well, but it can be difficult to track down when something later breaks.

You can tell that x has become a character vector both by the quotes around the printed values, and by the error message when we try to add two elements.

We also have a variety of functions that test or report on the type of a data object. See help(is.numeric).

is.numeric(x)
[1] FALSE
mode(x)
[1] "character"

2.2 Exercises

  1. We have seen a numeric-to-character coercion. What happens when we try to go the other way, character-to-numeric? Try out

    • an integer coercion, e.g. as.numeric("8"). The quotes make the initial value a character type, which you can check with is.character("8").
    • a decimal coercion, from value "2.7".
    • a negative number
    • a number with extra white space around it, e.g. " 2.7 ".
    • a number written with a comma, e.g. "5,432".
    • a non-numeric character, such as "B".

    Notice that some examples give you both a warning and an answer!

  2. Logical-to-numeric coercion: try coercing these values.

    • TRUE (no quotes here!)
    • FALSE
    • NA

    How and why is the result different with "TRUE"?

  3. Numeric-to-logical coercions (as.logical)

    • 1
    • 2
    • 2.14
    • -2.14
    • 0

    What conclusion to you draw?

  4. Character-to-logical coercions. Are you bored yet? This is like practicing scales on a piano! Try these:

    • “TRUE” (quotes!)
    • “F”. If not quoted, is this a logical value?
    • “true”. If not quoted, is this a logical value?
    • “FAlse” (mixed case!)
    • “NA”
    • “1”
    • “green”
  5. Coercion sequences.

    • Coerce the numeric value 3.14 to character, and then to logical, and then to numeric. What value do you end up with?
    • What values can be recovered through this sequence of coercions? How many such values are there? Does the order of coercion matter?