How to Split a Character String and Get the First Element in R

code
rtip
operations
strings
Author

Steven P. Sanderson II, MPH

Published

June 5, 2024

Introduction

Hello, R community!

Today, we’re jumping into a common yet powerful task in data manipulation: splitting character strings and extracting the first element. We’ll explore how to accomplish this in base R, as well as using the stringi and stringr packages.

Let’s get started!


Examples

Using strsplit() in Base R

Base R provides the strsplit() function for splitting strings. Here’s a quick look at the syntax:

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
  • x: Character vector to be split.
  • split: The delimiter (separator) to use for splitting.
  • fixed: If TRUE, split is interpreted as a string, not a regular expression.
  • perl: If TRUE, perl-compatible regular expressions can be used.
  • useBytes: If TRUE, the operation is performed byte-wise rather than character-wise.

Example 1: Splitting a single string

string <- "apple,orange,banana"
split_result <- strsplit(string, ",")
first_element <- sapply(split_result, `[`, 1)
print(first_element)
[1] "apple"

Example 2: Splitting a vector of strings

strings <- c("apple,orange,banana", "cat,dog,mouse")
split_results <- strsplit(strings, ",")
first_elements <- sapply(split_results, `[`, 1)
print(first_elements)
[1] "apple" "cat"  

Using stringi Package

The stringi package offers a powerful function stri_split_fixed() for splitting strings. Let’s look at its syntax:

stri_split_fixed(str, pattern, n = -1, simplify = FALSE)
  • str: Character vector to be split.
  • pattern: The delimiter for splitting.
  • n: Maximum number of pieces to return.
  • simplify: If TRUE, returns a matrix.

Example 1: Splitting a single string

library(stringi)
string <- "apple,orange,banana"
split_result <- stri_split_fixed(string, ",")
first_element <- sapply(split_result, `[`, 1)
print(first_element)
[1] "apple"

Example 2: Splitting a vector of strings

strings <- c("apple,orange,banana", "cat,dog,mouse")
split_results <- stri_split_fixed(strings, ",")
first_elements <- sapply(split_results, `[`, 1)
print(first_elements)
[1] "apple" "cat"  

Using stringr Package

The stringr package provides str_split_fixed() and str_split() functions. Here’s the syntax for str_split():

str_split(string, pattern, n = Inf, simplify = FALSE)
  • string: Character vector to be split.
  • pattern: The delimiter for splitting.
  • n: Maximum number of pieces to return.
  • simplify: If TRUE, returns a matrix.

Example 1: Splitting a single string

library(stringr)
string <- "apple,orange,banana"
split_result <- str_split(string, ",")
first_element <- sapply(split_result, `[`, 1)
print(first_element)
[1] "apple"

Example 2: Splitting a vector of strings

strings <- c("apple,orange,banana", "cat,dog,mouse")
split_results <- str_split(strings, ",")
first_elements <- sapply(split_results, `[`, 1)
print(first_elements)
[1] "apple" "cat"  

Your Turn!

Now it’s your turn to practice! Try splitting different strings and extracting the first element using base R, stringi, and stringr. Experiment with various delimiters and see how each function handles them.


I look forward to hearing about your experiences with string manipulation in R!

Until next time, happy coding! 🚀