How to Subset Data Frame in R by Multiple Conditions
code
rtip
operations
dplyr
datatable
Author
Steven P. Sanderson II, MPH
Published
March 7, 2024
Introduction
In data analysis with R, subsetting data frames based on multiple conditions is a common task. It allows us to extract specific subsets of data that meet certain criteria. In this blog post, we will explore how to subset a data frame using three different methods: base R’s subset() function, dplyr’s filter() function, and the data.table package.
Examples
Using Base R’s subset() Function
Base R provides a handy function called subset() that allows us to subset data frames based on one or more conditions.
# Load the mtcars datasetdata(mtcars)# Subset data frame using subset() functionsubset_mtcars <-subset(mtcars, mpg >20& cyl ==4)# View the resulting subsetprint(subset_mtcars)
In the above code, we first load the mtcars dataset. Then, we use the subset() function to create a subset of the data frame where the miles per gallon (mpg) is greater than 20 and the number of cylinders (cyl) is equal to 4. Finally, we print the resulting subset.
Using dplyr’s filter() Function
dplyr is a powerful package for data manipulation, and it provides the filter() function for subsetting data frames based on conditions.
# Load the dplyr packagelibrary(dplyr)# Subset data frame using filter() functionfilter_mtcars <- mtcars %>%filter(mpg >20, cyl ==4)# View the resulting subsetprint(filter_mtcars)
In this code snippet, we load the dplyr package and use the %>% operator, also known as the pipe operator, to pipe the mtcars dataset into the filter() function. We specify the conditions within the filter() function to create the subset, and then print the resulting subset.
Using data.table Package
The data.table package is known for its speed and efficiency in handling large datasets. We can use data.table’s syntax to subset data frames as well.
# Load the data.table packagelibrary(data.table)# Convert mtcars to data.tabledt_mtcars <-as.data.table(mtcars)# Subset data frame using data.table syntaxdt_subset_mtcars <- dt_mtcars[mpg >20& cyl ==4]# Convert back to data frame (optional)subset_mtcars_dt <-as.data.frame(dt_subset_mtcars)# View the resulting subsetprint(subset_mtcars_dt)
In this code block, we first load the data.table package and convert the mtcars data frame into a data.table using the as.data.table() function. Then, we subset the data using data.table’s syntax, specifying the conditions within square brackets. Optionally, we can convert the resulting subset back to a data frame using as.data.frame() function before printing it.
Conclusion
In this blog post, we learned three different methods for subsetting data frames in R by multiple conditions. Whether you prefer base R’s subset() function, dplyr’s filter() function, or data.table’s syntax, there are multiple ways to achieve the same result. I encourage you to try out these methods on your own datasets and explore the flexibility and efficiency they offer in data manipulation tasks. Happy coding!