apply

For-loops are very flexible and a good start to get to know the principle, but there’s another way to repeat tasks, which can be a bit more effective and reduces the amount of code. Introducing the apply-family.

The apply functions are specific to the data type or have a specific data type as output.

A quick review of data types:

vector.letter <- LETTERS[1:10]
vector.number <- 1:10
matrixM <- matrix(1:10, 10, byrow = TRUE)
dataframe <- as.data.frame(matrixM)
dataframe$Name <- vector.letter
list.all <- list(vector.number, matrixM, dataframe)
names(list.all) <- c("vector", "matrix", "dataframe")

lapply is commonly used with vectors and lists, and its output is a list. It will apply the function, which is given in the curly brackets, to each element of a list:

# Sample forest data
forest_data <- list(
  deciduous = data.frame(tree_type = "deciduous", tree_height = c(15, 20, 18, 22, 19)),
  coniferous = data.frame(tree_type = "coniferous", tree_height = c(25, 28, 24, 30, 27)),
  mixed = data.frame(tree_type = "mixed", tree_height = c(18, 22, 20, 24, 21))
)

# Name the list elements
names(forest_data) <- c("deciduous", "coniferous", "mixed")

# Calculate the mean tree height for each forest type using lapply
mean_heights <- lapply(forest_data, function(x) {
  mean(x$tree_height)
})

print(mean_heights)

For more examples, have a look at the next page.

sapply simplifies the output into a vector. However, it’s not always applicable and might just as well return a list:

sapply(1:10, function(x) x + 1)

apply can be used for arrays, data frames and matrices. Important is to add the MARGIN - apply will then apply the function along that, e.g. if you set the MARGIN to 2, R would apply the function along the rows of your data frame/array.

# Create a sample matrix
mat <- matrix(1:12, nrow = 3, byrow = TRUE)

# Print the matrix
print(mat)
# Output:
#      [,1] [,2] [,3] [,4]
# [1,]    1    2    3    4
# [2,]    5    6    7    8
# [3,]    9   10   11   12

# Calculate row sums using apply
row_sums <- apply(mat, 1, sum)

# Calculate column sums using apply
col_sums <- apply(mat, 2, sum)

# Print the row sums
cat("Row Sums: ", row_sums, "\n")

# Print the column sums
cat("Column Sums: ", col_sums, "\n")


tapply is used for group-based calculations:

# Sample forest data
forest_data <- data.frame(
  tree_type = c("deciduous", "coniferous", "deciduous", "coniferous", "deciduous"),
  tree_height = c(15, 25, 18, 28, 22)
)

# Calculate the mean tree height for each tree type using tapply
mean_heights <- tapply(forest_data$tree_height, forest_data$tree_type, mean)

# Print the result
print(mean_heights)

Test what you’ve learned so far!

Updated: