The substring() function in R

rtip
Author

Steven P. Sanderson II, MPH

Published

August 14, 2023

Introduction

The substring() function in R is used to extract a substring from a character vector. The syntax of the function is:

substring(x, start, stop)

where:

  • x is the character vector from which to extract the substring
  • start is the starting position of the substring
  • stop is the ending position of the substring

The start and stop arguments can be either integers or character strings. If they are integers, they specify the positions of the characters in the string. If they are character strings, they specify the characters that should be used as the starting and ending positions of the substring.

Examples

Example 1

For example, the following code will extract the substring from the string “Hello, world!” that starts at the 5th character and ends at the 8th character:

substring("Hello, world!", 8, 12)
[1] "world"

As we see this will return the string “world”.

Example 2

The substring() function can also be used to extract the first N characters of a string, the last N characters of a string, or to replace a substring in a string.

To extract the first N characters of a string, you can use the following syntax:

substring(x, 1, N)

For example, the following code will extract the first 5 characters of the string “Hello, world!”:

substring("Hello, world!", 1, 5)
[1] "Hello"

As seen this will return the string “Hello”.

Example 3

To extract the last N characters of a string, you can use the following syntax:

substring(x, nchar(x) - N + 1, nchar(x))

where nchar(x) is the function that returns the length of the string x.

For example, the following code will extract the last 5 characters of the string “Hello, world!”:

s <- "Hello, world!"

substring(s, nchar(s) - 6 + 1, nchar(s))
[1] "world!"

This will return the string “world!”.

Example 4

To replace a substring in a string, you can use the following syntax:

substring(x, start, stop) <- value

where value is the string that you want to replace the substring with.

For example, the following code will replace the substring “world” in the string “Hello, world!” with the string “universe”:

s <- "Hello, world!"
substring(s, first = 8) <- "universe"
s
[1] "Hello, univer"

This will change the string to “Hello, univer”. You notice that it will not expand the original length of the string.

In addition to the substring() function, there are also a few other functions that can be used to extract substrings from strings in R. These functions are:

  • str_sub() from the stringr package
  • sql_left(), sql_right() and sql_mid() from the healthyR library

The str_sub() function from the stringr package is a more powerful and flexible function than the substring() function. It supports a wider range of arguments and it can be used to perform more complex string manipulations.

The sql_left(), sql_right() and sql_mid() functions from the `{healthyR} library are designed to be similar to the corresponding functions in SQL. They are easy to use and they can be a good choice for users who are familiar with SQL.

I encourage readers to try things on their own with the substring() function and the other functions mentioned in this blog post. There are many different ways to use these functions to extract substrings from strings in R. Experimenting with different functions and different arguments is a great way to learn how to use them effectively.

Here is a link to a blog post that shows some examples of how to use the sql_left(), sql_right() and sql_mid() functions: https://www.spsanderson.com/steveondata/posts/rtip-2023-03-01/index.html