Sorting

“It is a capital mistake to theorize before one has data.” — Sherlock Holmes

Sorting vectors or lists

Vectors can be sorted using the sort function. If you want to sort a list, you have to access the actual elements since sort requires atomic vectors.

x <- c(7,5,8,2,10)
sort(x)

[1]  2  5  7  8 10
l <- list(x)
sort(l[[1]])

[1]  2  5  7  8 10

Sorting data frames

The logic of sorting data frames is different from the sort function shown above. Instead of directly getting a sorted output, one has to get the permutation of the ordering i.e. a vector which gives the position of the elements in ascending or descending order. This is realized by the order function, which can also be applied to vectors or lists.

y <- c("Z", "D", "R", "A", "O")
z <- c(10, 40, 20, 30, 50)

df <- data.frame(X = x, Y = y, Z = z)
df

##    X Y  Z
## 1  7 Z 10
## 2  5 D 40
## 3  8 R 20
## 4  2 A 30
## 5 10 O 50
df[order(df$X),]  # order by column X

##    X Y  Z
## 4  2 A 30
## 2  5 D 40
## 1  7 Z 10
## 3  8 R 20
## 5 10 O 50
df[order(df$Y),]  # order by column Y

##    X Y  Z
## 4  2 A 30
## 2  5 D 40
## 5 10 O 50
## 3  8 R 20
## 1  7 Z 10
df[order(df$Y, df$Z),]  # order by column Y and Z

##    X Y  Z
## 4  2 A 30
## 2  5 D 40
## 5 10 O 50
## 3  8 R 20
## 1  7 Z 10
# sorting would only be applicable for one row/column (i.e. one vector)
sort(df[,2])

[1] A D O R Z
Levels: A D O R Z
# for the vector and list example above, the followig would apply
x[order(x)]

[1]  2  5  7  8 10
l[[1]][order(l[[1]])]

[1]  2  5  7  8 10

Please note that the above examples are not the only way and that you might find other solutions for the same problem - this is something quite typical for very high level programming languages. Just to illustrate it, here comes the with function which evaluates an expression for the given data structure which requires that e. g. for a data frame it is placed at the position of the respective dimension inside the square brackets.

# sort a data frame by column X and Z
df[with(df, order(X, Z)), ]

##    X Y  Z
## 4  2 A 30
## 2  5 D 40
## 1  7 Z 10
## 3  8 R 20
## 5 10 O 50

Sorting factors

A quick note on sorting factors. Factors are categorial variables which can take on a value which is part of a predefined (and limited) set. Factors consist of two parts, the actual value at some position and the set of possible values called levels. This implies that two aspects of a factor can be ordered separately: the factor values which we see when printing the content of a data frame and the levels which we do not see when we print it (but which might affect the printig and plotting or some statistical operations).

Lets have a look at the data frame df again:

df
##    X Y  Z
## 1  7 Z 10
## 2  5 D 40
## 3  8 R 20
## 4  2 A 30
## 5 10 O 50

str(df)
## 'data.frame':	5 obs. of  3 variables:
##  $ X: num  7 5 8 2 10
##  $ Y: Factor w/ 5 levels "A","D","O","R",..: 5 2 4 1 3
##  $ Z: num  10 40 20 30 50

levels(df$Y)
[1] "A" "D" "O" "R" "Z"
df$Y  # alternatively to levels(df$Y) to print both the values and the levels

[1] Z D R A O
Levels: A D O R Z

As you see, column Y is not sorted but looking at its structure we see that the column is of type Factor and using the levels function, we note that the factor levels are obviously ordered in an increasing order.

Let’s sort column Y in a decreasing order and have a look at the factor levels afterwards.

df <- df[order(df$Y, decreasing = TRUE),]
levels(df$Y)

## [1] "A" "D" "O" "R" "Z"
df$Y

[1] Z R O D A
Levels: A D O R Z

Obviously, the value ordering in column Y has changed but not the ordering of its levels. To actually change the ordering of factor levels, we have to sort them explicitly.

df$Y <- factor(df$Y, levels(df$Y)[order(levels(df$Y), decreasing = TRUE)])
levels(df$Y)

[1] "Z" "R" "O" "D" "A"
df$Y

[1] Z R O D A
Levels: Z R O D A

For more information have a look at e.g. the respective sorting site at Quick R.

Updated: