<- list(
my_list a = c(1, 2, 3),
b = matrix(1:6, nrow = 2),
c = data.frame(x = 1:3, y = c("a", "b", "c")),
d = c(4, NA, 6),
e = list("foo", "bar", "baz")
)
Introduction
In this post I will talk about the use of the R functions apply()
, lapply()
, sapply()
, tapply()
, and vapply()
with examples.
These functions are all designed to help users apply a function to a set of data in R, but they differ in their input and output types, as well as in the way they handle missing values and other complexities. By using the right function for your particular problem, you can make your code more efficient and easier to read.
Let’s start with the basics.
The Basics
Before we dive into the details of each function, let’s define some terms:
- A vector is a one-dimensional array of data, like a list of numbers or strings.
- A matrix is a two-dimensional array of data, like a table of numbers.
- A data frame is a two-dimensional object that can hold different types of data, like a spreadsheet.
- A list is a collection of objects, which can be of different types, like a shopping bag full of different items.
Each of the five functions we’ll discuss here takes a list as input (although some can also take vectors or matrices). Let’s create a list object to use in our examples:
This list contains five elements:
- A vector of numbers (a)
- A matrix of numbers (b)
- A data frame with two columns (c)
- A vector of numbers with a missing value (d)
- A list of character strings (e)
Now that we have our data, let’s look at each of the functions in turn.
The Functions
apply()
The apply()
function applies a function to the rows or columns of a matrix or array. It is most commonly used with matrices, but can also be used with higher-dimensional arrays. The function takes three arguments:
- The matrix or array to apply the function to
- The margin (1 for rows, 2 for columns, or a vector of dimensions)
- The function to apply
Let’s apply the mean()
function to the columns of our matrix in my_list$b
:
apply(my_list$b, 2, mean)
[1] 1.5 3.5 5.5
This will return a vector of means for each column of the matrix
lapply()
The lapply()
function applies a function to each element of a list and returns a list of the results. It takes two arguments:
- The list to apply the function to
- The function to apply
Let’s apply the class()
function to each element of our list:
lapply(my_list, class)
$a
[1] "numeric"
$b
[1] "matrix" "array"
$c
[1] "data.frame"
$d
[1] "numeric"
$e
[1] "list"
This will return a list of the classes of each element.
sapply()
The sapply()
function is similar to lapply()
, but it simplifies the output to a vector or matrix if possible. It takes the same two arguments as lapply()
:
- The list to apply the function
- The function to apply
Let’s apply the length()
function to each element of our list using sapply():
sapply(my_list, length)
a b c d e
3 6 2 3 3
This will return a vector of lengths for each element.
tapply()
The tapply()
function applies a function to subsets of a vector or data frame, grouped by one or more factors. It takes three arguments:
- The vector or data frame to apply the function to
- The factor(s) to group the data by
- The function to apply
Let’s apply the mean()
function to the elements of our vector my_list$d
, grouped by whether they are missing or not:
tapply(my_list$d, !is.na(my_list$d), mean)
FALSE TRUE
NA 5
This will return a vector of means for each group where they are NOT NA.
vapply()
The vapply()
function is similar to sapply(), but allows the user to specify the output type and length, making it more efficient and less prone to errors. It takes four arguments:
- The list to apply the function to
- The function to apply
- The output type of the function
- The length of the output vector or matrix
Let’s apply the length()
function to each element of our list, specifying that the output type is an integer and the length is 1:
vapply(my_list, length, integer(1))
a b c d e
3 6 2 3 3
This will return a matrix of lengths for each element, with 1 row:
Conclusion
In this blog post, we have covered the basics of the apply()
, lapply()
, sapply()
, tapply()
, and vapply()
functions in R. These functions are all useful for applying a function to a set of data in R, but they differ in their input and output types, as well as in the way they handle missing values and other complexities. By using the right function for your particular problem, you can make your code more efficient and easier to read.
Voila!