# 3 Writing Functions - Basics

Eventually you will find that you want to write your own functions, and R is
designed to make this very easy. For many of us this first comes up when we
have several steps we want to do in sequence (an algorithm) using one of the
`apply`

functions. These steps might be a sequence of R expressions, or they might even
be a nested sequence of functions that we want to refer to by a simple name.

As an example, consider a function to count the missing values (`NA`

s) in a vector.
We might apply this to the columns of a data frame, to the rows, or within groups
specified by the values of some other vector.

```
# setup, a matrix with about 20% missing data
set.seed(20141117)
dm <- matrix(sample(0:9, 100, replace=TRUE), ncol=10)
dm[dm<2] <- NA
dm
```

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 2 9 5 6 NA 9 5 5 2
[2,] 2 6 2 6 3 7 7 9 9 4
[3,] NA 4 7 4 4 7 9 NA 5 8
[4,] NA NA 8 2 9 8 9 NA 5 7
[5,] 6 3 NA 7 3 NA 7 4 7 3
[6,] NA 5 3 3 9 NA 5 NA 6 NA
[7,] 3 8 3 NA 6 NA 7 NA NA 3
[8,] 7 7 NA 4 2 9 8 NA 8 9
[9,] 8 NA 6 5 3 6 8 NA NA NA
[10,] NA 3 9 5 3 2 9 5 5 6
```

We can count the total number of missing values in our matrix:

`sum(is.na(dm))`

`[1] 23`

But we have a problem using this code when we try to count NAs within each column:

`apply(dm, 2, sum(is.na))`

`Error in sum(is.na): invalid 'type' (builtin) of argument`

What we need is a function to use with `apply`

.

## 3.1 Defining a New function

Defining a function is pretty simple, really. Typically a function has a name, an argument list (parameters, or "formals"), a body (the expressions that act on the arguments), and a return value.

```
name <- function(arg1, arg2 ...){
expression(arg1)
...
value <- expression
return(value)
}
```

(A good place to look for basic documentation on writing functions is in
*An Introduction to R*, "Chapter 10: Writing your own functions". See also
`help("function")`

.)

In our example, `sum(is.na())`

will be our expression or *body*, as our *argument* we will
use `v`

to stand for an arbitrary data object,
and we want to *return* a scalar count that we'll call `rv`

. we'll give this function the name
`nmiss`

.

```
nmiss <- function( v) {
rv <- sum(is.na(v))
return(rv)
}
# Then use the function with the matrix
nmiss(dm)
```

`[1] 23`

```
# and use the function with apply()
apply(dm,1,nmiss) # missing per row
```

` [1] 1 0 2 3 2 4 4 2 4 1`

We can store these results in another object, in the usual way

```
dm.missing <- nmiss(dm)
dm.missing
```

`[1] 23`

First notice that a new object, `nmiss`

, has been added to our
workspace, the global environment.

Notice also that the objects `v`

and `rv`

(two arbitrary names
for data objects, i.e. two placeholders in our definition)
do *not* appear in our workspace.
Think of them as *local* to the `nmiss`

function,
or as objects within the `nmiss`

*enclosure* or *environment*.

As objects in our workspace, functions have class, and they can be printed, which shows us the details of how they were defined. You can do this with any function, not just those you define yourself!

`class(nmiss) # functions have class`

`[1] "function"`

`nmiss # as an "object" it can be printed`

```
function( v) {
rv <- sum(is.na(v))
return(rv)
}
<bytecode: 0x00000000148701c0>
```

### 3.1.1 More About Returns

Either the `return()`

object, or the last expression
evaluated is returned by the function.

For example, this is a common way of specifying "rv" as the returned object:

```
nmiss <- function( v) {
rv <- sum(is.na(v))
rv
#return(rv)
}
nmiss(dm)
```

`[1] 23`

This does * NOT* work, returning nothing (not even an error!, just a

`NULL`

value):```
nmiss <- function( v) {
rv <- sum(is.na(v))
}
nmiss(dm)
```

The last expression evaluated was an assignment, "<-", a function used for it's side effect.

But the following example *does* return what we want. We don't need to
name the object we want to return, we can simply return the value of the
last expression evaluated.

```
nmiss <- function( v) {
sum(is.na(v))
}
nmiss(dm)
```

`[1] 23`

### 3.1.2 One-liners and Anonymous Functions

Notice the last example could be written on one line:

```
nmiss <- function( v) { sum(is.na(v))}
nmiss(dm)
```

`[1] 23`

And because the body is a single expression we don't actually need braces for compound expressions:

```
nmiss <- function( v) sum(is.na(v))
nmiss(dm)
```

`[1] 23`

It is not uncommon to see simple functions both defined and used within the same expression:

`apply(dm, 1, function( v) sum(is.na(v))) # number of NAs per row`

` [1] 1 0 2 3 2 4 4 2 4 1`

The last example is often called an "anonymous function".

Here is another anonymous function (as an exercise, trace the order in which the various objects and functions are evaluated):

`(function( v) sum(is.na(v)))(dm)`

`[1] 23`

### 3.1.3 Style

Arguably, skipping the `return`

makes one-liners and anonymous functions easier to
read and debug. But as you move on to more complicated functions, especially those that may
conditionally have different sorts of return value, you will find it easier to read and debug
if you **do** include the `return`

expressions.

## 3.2 Exercises

In the Motor Trend car tests data,

`mtcars`

, vehicle weight,`wt`

, is reported in thousands of pounds. Write a function that converts thousands-of-pounds to kilgrams.*Bonus*: show (by calculation) that this leaves the correlation between`wt`

and`mpg`

unaffected.The same data set reports fuel consumption in miles per gallon. Write a function that converts this to kilometers per liter.

*Bonus*: show that this conversion still leaves the correlation unaffected.R does not have a standard-error-of-the-mean function. Write one, then produce a table of means and standard errors for the variables in the

`mtcars`

data.When working with time-series data it is often useful to identify gaps in the series. Write a function that identifies gaps by indicating observations preceded by a gap. For example:

```
1990 1991 1992 1993 1994 1997 1998 1999 2000
NA FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
```

*Bonus*: write a function that fills out the gap.

- In survey data, it is common for data to be coded with 8 = don't know and 9 =
refused to answer. We need to convert these to
`NA`

s for most statistical work. Write a function that takes a vector or matrix as input, and returns the recoded vector/matrix. For example:

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 2 9 5 6 NA 9 5 5 2
[2,] 2 6 2 6 3 7 7 9 9 4
[3,] NA 4 7 4 4 7 9 NA 5 8
[4,] NA NA 8 2 9 8 9 NA 5 7
[5,] 6 3 NA 7 3 NA 7 4 7 3
[6,] NA 5 3 3 9 NA 5 NA 6 NA
```

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 2 NA 5 6 NA NA 5 5 2
[2,] 2 6 2 6 3 7 7 NA NA 4
[3,] NA 4 7 4 4 7 NA NA 5 NA
[4,] NA NA NA 2 NA NA NA NA 5 7
[5,] 6 3 NA 7 3 NA 7 4 7 3
[6,] NA 5 3 3 NA NA 5 NA 6 NA
```

*Note*: This is a simplification of a couple of existing R functions. Functions such
as the one you are asked to produce are often termed "convenience" functions or
"wrappers". Depending on how you
solve this problem, the function you simplify may itself be a convenience wrapper - take
a look at the code inside the function you use!

- Average compounded growth. Given two vectors, one of starting values (say, starting salary) and another of ending values (current salary), we often want to characterize the growth rate that led from one to the other. If we additionally know how many growth periods there were between each pair of values, we can calculate an average growth rate that takes compounding into account.

Write a function that returns the average growth rate as a fraction. In other words, if y = ending value, x=starting value, t=number of time periods, and r = growth multiplier, return r-1. The fundamental relation is \[y=x*(r)^t\]

- Write a function returning "degree of consensus." 1 - (variance of respondents)/(max possible variance), where the respondents have given answers on some bounded scale (e.g. a Likert scale from 1 to 5). Total consensus = 1, maximum disagreement = 0.