Introduction

Creating tables is a fundamental skill in R programming that allows you to summarize and analyze data effectively. This comprehensive guide will walk you through various methods of table creation using Base R, dplyr, and data.table. Whether you’re working with small datasets or handling large-scale data analysis, understanding these approaches will enhance your R programming toolkit.

Base R Table Creation

Using table() Function

# Basic frequency table
colors <- c("red", "blue", "red", "green", "blue", "red")
color_table <- table(colors)
print(color_table)

colors
 blue green   red 
    2     1     3

Cross Tabulation with xtabs()

# Create sample data
df <- data.frame(
  gender = c("M", "F", "M", "F", "M", "F"),
  department = c("HR", "IT", "HR", "HR", "IT", "IT")
)

# Create cross-tabulation
cross_tab <- xtabs(~ gender + department, data = df)
print(cross_tab)

      department
gender HR IT
     F  1  2
     M  2  1

Table Creation with dplyr

Basic Summarization

library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  summarise(
    count = n(),
    avg_mpg = mean(mpg),
    avg_hp = mean(hp)
  )

# A tibble: 3 × 4
    cyl count avg_mpg avg_hp
  <dbl> <int>   <dbl>  <dbl>
1     4    11    26.7   82.6
2     6     7    19.7  122. 
3     8    14    15.1  209.

Advanced Grouping

# Multiple group variables
mtcars %>%
  group_by(cyl, am) %>%
  summarise(
    count = n(),
    avg_mpg = round(mean(mpg), 1),
    .groups = "drop"
  )

# A tibble: 6 × 4
    cyl    am count avg_mpg
  <dbl> <dbl> <int>   <dbl>
1     4     0     3    22.9
2     4     1     8    28.1
3     6     0     4    19.1
4     6     1     3    20.6
5     8     0    12    15.1
6     8     1     2    15.4

Data.Table Approach

Basic data.table Usage

library(data.table)

# Convert to data.table
DT <- as.data.table(mtcars)

# Create summary table
DT[, .(
  count = .N,
  avg_mpg = mean(mpg)
), by = cyl]

     cyl count  avg_mpg
   <num> <int>    <num>
1:     6     7 19.74286
2:     4    11 26.66364
3:     8    14 15.10000

Advanced data.table Features

# Multiple calculations with by
DT[, .(
  count = .N,
  avg_mpg = mean(mpg),
  max_hp = max(hp),
  min_hp = min(hp)
), by = .(cyl, am)]

     cyl    am count  avg_mpg max_hp min_hp
   <num> <num> <int>    <num>  <num>  <num>
1:     6     1     3 20.56667    175    110
2:     4     1     8 28.07500    113     52
3:     6     0     4 19.12500    123    105
4:     8     0    12 15.05000    245    150
5:     4     0     3 22.90000     97     62
6:     8     1     2 15.40000    335    264

Your Turn!

Practice Exercise: Create a summary table of the iris dataset showing the average and standard deviation of Sepal.Length for each Species.

Click here for Solution!

# Using dplyr
library(dplyr)

iris %>%
  group_by(Species) %>%
  summarise(
    avg_length = mean(Sepal.Length),
    sd_length = sd(Sepal.Length)
  )

# A tibble: 3 × 3
  Species    avg_length sd_length
  <fct>           <dbl>     <dbl>
1 setosa           5.01     0.352
2 versicolor       5.94     0.516
3 virginica        6.59     0.636

df <- iris
# Using data.table
library(data.table)
setDT(df)[, .(
  avg_length = mean(Sepal.Length),
  sd_length = sd(Sepal.Length)
), by = Species]

      Species avg_length sd_length
       <fctr>      <num>     <num>
1:     setosa      5.006 0.3524897
2: versicolor      5.936 0.5161711
3:  virginica      6.588 0.6358796

Quick Takeaways

Base R provides simple, straightforward table creation
dplyr offers intuitive syntax for data manipulation
data.table excels in performance with large datasets
Choose the method based on your specific needs
Combine approaches when necessary for optimal results

FAQs

Which method is fastest for large datasets? data.table is optimized for performance and is generally fastest with large datasets.
Can I combine dplyr and data.table? Yes, you can use both in the same script, choosing the best tool for each task.
How do I export tables to other formats? Use packages like writexl for Excel, write.csv for CSV, or knitr for formatted output.
What’s the difference between table() and xtabs()? table() is simpler and works with vectors, while xtabs() offers more flexibility with formula notation.
How do I handle missing values in tables? Use na.rm = TRUE in summarise() or specify useNA = “always” in table().

Conclusion

Mastering table creation in R involves understanding the strengths of each approach. Base R offers simplicity, dplyr provides readability, and data.table delivers performance. Practice with different methods to determine which best suits your needs.

Engage!

Try implementing these examples with your own datasets. Share your experiences and questions in the comments below, and don’t forget to experiment with combining different approaches for optimal results.

Happy Coding! 🚀

You can connect with me at any one of the below:

Telegram Channel here: https://t.me/steveondata

LinkedIn Network here: https://www.linkedin.com/in/spsanderson/

Mastadon Social here: https://mstdn.social/@stevensanderson

RStats Network here: https://rstats.me/@spsanderson

GitHub Network here: https://github.com/spsanderson

Bluesky Network here: https://bsky.app/profile/spsanderson.com

My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ

You.com Referral Link: https://you.com/join/EHSLDTL6