Reading Multiple Files with {purrr}

code
rtip
purrr
Author

Steven P. Sanderson II, MPH

Published

November 10, 2022

Introduction

There may be times when you have multiple structured files in the same folder, maybe they are .csv files. For this short tip, we will say that they are.

I will show the short script and then discuss it.

# Library Load ----
library(dplyr)
library(purrr)

# Set file path ----
folder    <- "FileFolder"
path      <- "C:/Some/Root/Path/"
full_path <- paste0(path,folder,"/")

# File List ----
file_list <- dir(full_path
                 , pattern = "\\.csv$"
                 , full.names = T)

# Read Files ----
files <- file_list %>%
  map(read.csv) %>%
  map(as_tibble)

# Clean File Names ----
file_names <- file_list %>%
  str_remove(full_path) %>%
  str_replace(
    pattern = "_OldStuff.csv", 
    replacement = "_NewStuff.csv"
  )

names(files) <- file_names

We load in {dplyr} for the pipe and the as_tibble function. After this we set out to create the file path. I have chosen to do this in two separate pieces as I have had experience with needing to go through different folders in the same root directory. While this could further be scripted I leave it as is.

folder is the folder that has the files of interest, in this case the .csv files. We then get the root path to that folder but not including it, this is defined as path in the above. After we have both folder and path we can create the full_path by using paste0

Now after this we use the base R function of dir to list out all of the files that fit the specific format of .csv with a regex pattern. I always want the name of the file as it allows me to go back to the file later and lets me name the files in the upcoming list later on.

Since these are .csv files I use purrr::map and then read.csv to read in all of the .csv files in the list that was created, we then used map again and this time used as_tibble to make sure that each file is a tibble and not something else like data.frame

Since I provided the argument of T to dir, full.names I can then get a character vector of the names of the files which then is applied to the file list.

Voila!