grep() vs. grepl() in R

code
rtip
strings
grep
grepl
Author

Steven P. Sanderson II, MPH

Published

August 20, 2024

Introduction

Hey there, useR’s! Today, we’re going to talk about two super useful functions in R: grep() and grepl(). These functions might sound similar, but they have some key differences that are important to understand. Let’s break it down in a way that’s easy to grasp, even if you’re new to R programming.

What are grep() and grepl()?

Both grep() and grepl() are functions in R that help us search for patterns in text. Think of them as detectives looking for clues in a big pile of words!

  • grep(): This function is like a pointer. It tells you where it found the pattern you’re looking for.
  • grepl(): This one is more like a yes/no checker. It tells you if the pattern exists or not.

Let’s look at them one by one:

grep() - The Pointer

grep() searches for a pattern and tells you the position where it found matches. It’s like asking, “Where did you find my toy?” and getting the answer, “In the toy box and under the bed.”

Here’s a simple example:

fruits <- c("apple", "banana", "cherry", "date")
grep("a", fruits)

This will give you: 1 2 4

This means it found the letter “a” in the 1st, 2nd, and 4th positions of our fruits list.

grepl() - The Yes/No Checker

grepl() looks for the same patterns, but instead of telling you where it found them, it just says “Yes” (TRUE) or “No” (FALSE) for each item. It’s like asking, “Does this fruit have the letter ‘a’ in it?” and getting a yes or no for each fruit.

Let’s use the same example:

fruits <- c("apple", "banana", "cherry", "date")
grepl("a", fruits)

This will give you: TRUE TRUE FALSE TRUE

This means “apple”, “banana”, and “date” have the letter “a”, but “cherry” doesn’t.

When to Use Which?

Use grep() when you need to know the positions of matches. It’s great for:

  • Finding specific items in a list
  • Extracting matching elements

Use grepl() when you just need to know if a match exists. It’s perfect for:

  • Filtering data
  • Creating logical conditions

A Practical Example

Let’s say we have a list of email addresses, and we want to find all the ones from educational institutions (usually ending with .edu).

emails <- c("[email protected]", "[email protected]", "[email protected]", "[email protected]")

# Using grep()
edu_positions <- grep(".edu", emails)
print(edu_positions)
# Output: [2] 2 3

# Using grepl()
is_edu <- grepl(".edu", emails)
print(is_edu)
# Output: [1] FALSE  TRUE  TRUE FALSE

# Using grepl() to filter
edu_emails <- emails[grepl(".edu", emails)]
print(edu_emails)
# Output: [1] "[email protected]" "[email protected]"

In this example, grep() told us the positions of .edu emails, while grepl() gave us a TRUE/FALSE for each email. We then used grepl() to actually filter out the .edu emails.

Remember, both functions are super helpful, but they give you different types of information. grep() points to where the matches are, and grepl() tells you if there’s a match or not. Choose the one that fits your needs best!


Happy coding, and have fun exploring these awesome R functions!