R – Merging Two Data Frames Without Messing with the Rows’ Order

Standard

The merge() function is very useful to join two data sets together especially using a common variable (column).

The function offers several arguments to be used which makes it flexible in many ways. But one important and lacking feature is preserving the order of the rows.

There are two popular ways to account for this and result in a merged data set with the same row order as one of the original data sets.

The first is an ad-hoc solution.

This solution depends on an extra id column that is used to re-order the merged data set.
Consider two data frames df_1 and df_2 with a common column “label”. The process goes as follows:

# create a new variable (column) & assign each element an "id"
# from 1 to the number of rows of df_1
df_1$id <- 1:nrow(df_1) 

# merge the two data frames using the label column without sorting
merged <- merge(x = df_1, y = df_2, by = "label", sort = FALSE)

# order the merged data set using "id" & assign it to "ordered"
ordered <- merged[order(merged$id), ]

The resultant data set is now the two data sets merged with the same row order of the original data set df_1.
Of course, you can add the “id” column to either of the data frames, depending on the situation.

The second option is using another R function, the join() function.

This function solves the order problem that merge() doesn’t but is not as feature-rich as merge is. It is though as simple to use. Of course do not forget to load the Plyr library:

library(plyr)
merged_and_ordered <- join(df_1, df_2)

Reference: R – Merging two data frames while keeping the original row order, StackOverflow

Advertisement