How to Extract Substring Starting from the End of a String in R

code
rtip
operations
strings
regex
stringr
stringi
Author

Steven P. Sanderson II, MPH

Published

July 16, 2024

Introduction

Hey useR’s! Today, we’re going to discuss a neat trick: extracting substrings starting from the end of a string. We’ll cover how to achieve this using base R, stringr, and stringi. By the end of this post, you’ll have several tools in your R toolbox for string manipulation. Let’s get started!

Extracting Substring from the End of a String

Using Base R

First up, let’s use base R functions to extract substrings from the end of a string. The substr function is your friend here.

Here’s a simple example:

# Define a string
my_string <- "Hello, world!"

# Extract the last 6 characters
substring_from_end <- substr(my_string, nchar(my_string) - 5, nchar(my_string))

# Print the result
print(substring_from_end)
[1] "world!"

Explanation:

  • nchar(my_string) returns the total number of characters in my_string.
  • nchar(my_string) - 5 calculates the starting position of the substring, counting from the end.
  • substr(my_string, start, stop) extracts the substring from the start position to the stop position.

Using stringr

The stringr package makes string manipulation more straightforward and readable. We’ll use the str_sub function for this task.

First, install and load the stringr package if you haven’t already:

#install.packages("stringr")
library(stringr)

Now, let’s extract a substring from the end:

# Define a string
my_string <- "Hello, world!"

# Extract the last 6 characters using stringr
substring_from_end <- str_sub(my_string, -6, -1)

# Print the result
print(substring_from_end)
[1] "world!"

Explanation:

  • str_sub(my_string, start, end) extracts the substring from the start to the end position.
  • Negative indices in str_sub count from the end of the string. So -6 refers to the sixth character from the end, and -1 refers to the last character.

Using stringi

The stringi package is another powerful tool for string manipulation. We’ll use the stri_sub function here.

First, install and load the stringi package:

#install.packages("stringi")
library(stringi)

Let’s extract our substring:

# Define a string
my_string <- "Hello, world!"

# Extract the last 6 characters using stringi
substring_from_end <- stri_sub(my_string, from = -6, to = -1)

# Print the result
print(substring_from_end)
[1] "world!"

Explanation:

  • stri_sub(my_string, from, to) works similarly to str_sub, using from and to parameters to define the start and end positions.
  • Negative values count from the end of the string.

Try It Yourself!

Now it’s your turn! Try these methods on your own strings. Here are a few ideas to get you started:

  • Extract the last 3 characters of your name.
  • Extract the domain from an email address.
  • Experiment with different lengths and positions.

Conclusion

We’ve explored how to extract substrings from the end of a string using base R, stringr, and stringi. Each method has its own charm, so choose the one that fits your coding style best. String manipulation is a crucial skill in data cleaning and text analysis, so keep practicing and experimenting.


Happy coding! 🚀