# Example string
<- "data-science"
string
# Extract substring after the hyphen
<- sub(".*-", "", string)
result print(result) # Output: "science"
[1] "science"
Steven P. Sanderson II, MPH
July 2, 2024
Welcome back, R Programmers! Today, we’ll explore a common task: extracting a substring after a specific character in R. Whether you’re cleaning data or transforming strings, this skill is quite handy. We’ll look at three approaches: using base R, stringr
, and stringi
. Let’s dive in!
Base R provides several functions to manipulate strings. Here, we’ll use sub
and strsplit
to extract a substring after a specific character.
sub
The sub
function allows us to replace parts of a string based on a pattern. Here’s how to extract the part after a specific character, say a hyphen (-
).
# Example string
string <- "data-science"
# Extract substring after the hyphen
result <- sub(".*-", "", string)
print(result) # Output: "science"
[1] "science"
Explanation:
.*-
is a regular expression where .*
matches any character (except for line terminators) zero or more times, and -
matches the hyphen.""
is the replacement, effectively removing everything up to and including the hyphen.strsplit
The strsplit
function splits a string into substrings based on a delimiter.
# Example string
string <- "hello-world"
# Split the string at the hyphen
parts <- strsplit(string, "-")[[1]]
# Extract the part after the hyphen
result <- parts[2]
print(result) # Output: "world"
[1] "world"
Explanation:
strsplit(string, "-")
splits the string into parts at the hyphen, returning a list.[[1]]
extracts the first element of the list.[2]
extracts the second part of the split string.stringr
The stringr
package, part of the tidyverse, provides consistent and easy-to-use string functions.
str_extract
The str_extract
function extracts matching patterns from a string.
library(stringr)
# Example string
string <- "apple-pie"
# Extract substring after the hyphen
result <- str_extract(string, "(?<=-).*")
print(result) # Output: "pie"
[1] "pie"
Explanation:
(?<=-)
is a look behind assertion, ensuring the match occurs after a hyphen..*
matches any character zero or more times.str_split
Similar to strsplit
in base R, str_split
splits a string based on a pattern.
# Example string
string <- "open-source"
# Split the string at the hyphen
parts <- str_split(string, "-")[[1]]
# Extract the part after the hyphen
result <- parts[2]
print(result) # Output: "source"
[1] "source"
Explanation:
str_split(string, "-")
splits the string into parts at the hyphen, returning a list.[[1]]
extracts the first element of the list.[2]
extracts the second part of the split string.stringi
The stringi
package is another powerful tool for string manipulation, providing high-performance functions.
stri_extract
The stri_extract
function extracts substrings based on patterns.
library(stringi)
# Example string
string <- "front-end"
# Extract substring after the hyphen
result <- stri_extract(string, regex = "(?<=-).*")
print(result) # Output: "end"
[1] "end"
Explanation:
regex = "(?<=-).*"
uses a regular expression where (?<=-)
is a lookbehind assertion ensuring the match occurs after a hyphen, and .*
matches any character zero or more times.stri_split
Similar to strsplit
and str_split
, stri_split
splits a string based on a pattern.
# Example string
string <- "full-stack"
# Split the string at the hyphen
parts <- stri_split(string, regex = "-")[[1]]
# Extract the part after the hyphen
result <- parts[2]
print(result) # Output: "stack"
[1] "stack"
Explanation:
stri_split(string, regex = "-")
splits the string into parts at the hyphen, returning a list.[[1]]
extracts the first element of the list.[2]
extracts the second part of the split string.There you have it—three different ways to extract a substring after a specific character in R. Each method has its own benefits and can be handy depending on your specific needs. Give these examples a try and see which one works best for your data!
Happy coding!