# Basic frequency table
<- c("red", "blue", "red", "green", "blue", "red")
colors <- table(colors)
color_table print(color_table)
colors
blue green red
2 1 3
Steven P. Sanderson II, MPH
February 24, 2025
Programming, How to create tables in R, R table creation, Make tables in R, R data tables, Create data frames R, Table function R programming, R data manipulation, dplyr table creation, data.table package R, Cross tabulation R, How to create frequency tables in R using dplyr, Convert data frame to table in R base, Create summary statistics table R data.table, How to make contingency tables in R programming, Group by and summarize table creation R dplyr
Creating tables is a fundamental skill in R programming that allows you to summarize and analyze data effectively. This comprehensive guide will walk you through various methods of table creation using Base R, dplyr, and data.table. Whether you’re working with small datasets or handling large-scale data analysis, understanding these approaches will enhance your R programming toolkit.
# Multiple calculations with by
DT[, .(
count = .N,
avg_mpg = mean(mpg),
max_hp = max(hp),
min_hp = min(hp)
), by = .(cyl, am)]
cyl am count avg_mpg max_hp min_hp
<num> <num> <int> <num> <num> <num>
1: 6 1 3 20.56667 175 110
2: 4 1 8 28.07500 113 52
3: 6 0 4 19.12500 123 105
4: 8 0 12 15.05000 245 150
5: 4 0 3 22.90000 97 62
6: 8 1 2 15.40000 335 264
Practice Exercise: Create a summary table of the iris dataset showing the average and standard deviation of Sepal.Length for each Species.
# Using dplyr
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(
avg_length = mean(Sepal.Length),
sd_length = sd(Sepal.Length)
)
# A tibble: 3 × 3
Species avg_length sd_length
<fct> <dbl> <dbl>
1 setosa 5.01 0.352
2 versicolor 5.94 0.516
3 virginica 6.59 0.636
df <- iris
# Using data.table
library(data.table)
setDT(df)[, .(
avg_length = mean(Sepal.Length),
sd_length = sd(Sepal.Length)
), by = Species]
Species avg_length sd_length
<fctr> <num> <num>
1: setosa 5.006 0.3524897
2: versicolor 5.936 0.5161711
3: virginica 6.588 0.6358796
Which method is fastest for large datasets? data.table is optimized for performance and is generally fastest with large datasets.
Can I combine dplyr and data.table? Yes, you can use both in the same script, choosing the best tool for each task.
How do I export tables to other formats? Use packages like writexl
for Excel, write.csv
for CSV, or knitr
for formatted output.
What’s the difference between table() and xtabs()? table() is simpler and works with vectors, while xtabs() offers more flexibility with formula notation.
How do I handle missing values in tables? Use na.rm = TRUE in summarise() or specify useNA = “always” in table().
Mastering table creation in R involves understanding the strengths of each approach. Base R offers simplicity, dplyr provides readability, and data.table delivers performance. Practice with different methods to determine which best suits your needs.
Try implementing these examples with your own datasets. Share your experiences and questions in the comments below, and don’t forget to experiment with combining different approaches for optimal results.
Happy Coding! 🚀
You can connect with me at any one of the below:
Telegram Channel here: https://t.me/steveondata
LinkedIn Network here: https://www.linkedin.com/in/spsanderson/
Mastadon Social here: https://mstdn.social/@stevensanderson
RStats Network here: https://rstats.me/@spsanderson
GitHub Network here: https://github.com/spsanderson
Bluesky Network here: https://bsky.app/profile/spsanderson.com
My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ
You.com Referral Link: https://you.com/join/EHSLDTL6