data(mtcars)
Introduction
Bootstrap resampling is a powerful technique used in statistics and data analysis to estimate the uncertainty of a statistic by repeatedly sampling from the original data. In R, we can easily implement a bootstrap function using the lapply, rep, and sample functions. In this blog post, we will explore how to write a bootstrap function in R and provide an example using the “mpg” column from the popular “mtcars” dataset.
Bootstrap Function Implementation
To create a bootstrap function in R, we can follow these steps:
Step 1: Load the required dataset
Let’s begin by loading the “mtcars” dataset, which is included in the base R package:
Step 2: Define the bootstrap function
We’ll define a function called bootstrap()
that takes two arguments: data
(the input data vector) and n
(the number of bootstrap iterations).
<- function(data, n) {
bootstrap <- lapply(1:n, function(i) {
resampled_data <- sample(data, replace = TRUE)
resample # Perform desired operations on the resampled data, e.g., compute a statistic
# and return the result
})return(resampled_data)
}
<- bootstrap(mtcars$mpg, 5)
bootstrapped_samples bootstrapped_samples
[[1]]
[1] 21.0 18.1 33.9 21.4 17.3 19.2 19.2 15.8 16.4 30.4 18.1 14.3 32.4 10.4 15.0
[16] 16.4 30.4 17.8 21.4 19.2 17.3 22.8 14.3 22.8 30.4 18.7 13.3 13.3 15.2 10.4
[31] 15.0 13.3
[[2]]
[1] 18.7 32.4 21.0 10.4 15.0 14.7 24.4 10.4 32.4 10.4 21.0 19.7 21.4 10.4 30.4
[16] 17.3 10.4 22.8 15.2 15.2 21.4 15.8 21.4 33.9 24.4 15.2 18.1 19.2 21.0 24.4
[31] 15.5 21.0
[[3]]
[1] 15.5 30.4 21.0 22.8 27.3 18.1 21.0 13.3 15.2 17.3 15.8 21.0 18.1 14.3 17.8
[16] 15.8 21.0 18.1 19.2 24.4 19.2 22.8 18.7 14.3 26.0 21.4 22.8 32.4 14.7 15.2
[31] 15.2 14.3
[[4]]
[1] 13.3 21.0 13.3 15.0 19.2 18.1 18.1 19.2 22.8 18.7 26.0 21.4 14.7 14.3 17.8
[16] 22.8 19.7 21.4 30.4 30.4 18.7 17.3 16.4 21.5 18.1 21.0 17.8 21.4 14.3 19.7
[31] 32.4 18.7
[[5]]
[1] 15.0 21.4 21.5 26.0 17.3 30.4 18.1 17.8 17.3 30.4 24.4 32.4 21.0 17.8 33.9
[16] 32.4 19.2 22.8 19.7 16.4 17.8 22.8 14.3 33.9 21.5 10.4 21.4 26.0 33.9 14.7
[31] 21.5 18.1
In the above code, we use lapply
to generate a list of n
resampled datasets. Inside the lapply
function, we use the sample
function to randomly sample from the original data with replacement (replace = TRUE
). This ensures that each resampled dataset has the same length as the original dataset.
Step 3: Perform desired operations on resampled data
Within the lapply
function, you can perform any desired operations on the resampled data. This could involve calculating statistics, fitting models, or conducting hypothesis tests. Customize the code within the lapply
function to suit your specific needs.
Example: Bootstrapping the “mpg” column in mtcars: Let’s illustrate the usage of our bootstrap function by resampling the “mpg” column from the “mtcars” dataset. We will calculate the mean of the resampled datasets.
# Step 1: Load the dataset
data(mtcars)
# Step 2: Define the bootstrap function
<- function(data, n) {
bootstrap <- lapply(1:n, function(i) {
resampled_data <- sample(data, replace = TRUE)
resample mean(resample) # Calculate the mean of each resampled dataset
})return(resampled_data)
}
# Step 3: Perform the bootstrap resampling
<- bootstrap(mtcars$mpg, n = 1000)
bootstrapped_means
# Display the first few resampled means
head(bootstrapped_means)
[[1]]
[1] 20.21562
[[2]]
[1] 20.09375
[[3]]
[1] 19.59375
[[4]]
[1] 20.13437
[[5]]
[1] 21.17813
[[6]]
[1] 21.5375
In the above example, we resample the “mpg” column of the “mtcars” dataset 1000 times. The bootstrap()
function calculates the mean of each resampled dataset and returns a list of resampled means. The head()
function is then used to display the first few resampled means.
Of course we do not have to specify a statistic function in the bootstrap, we can choose to just return bootstrap samples and then perform some sort of statistic on it. Look at the following example using the above bootstrapped_samples
data.
quantile(unlist(bootstrapped_samples),
probs = c(0.025, 0.25, 0.5, 0.75, 0.975))
2.5% 25% 50% 75% 97.5%
10.400 15.725 19.200 22.800 33.900
mean(unlist(bootstrapped_samples))
[1] 20.06625
sd(unlist(bootstrapped_samples))
[1] 5.827239
Conclusion
In this blog post, we have learned how to write a bootstrap function in R using the lapply
and sample
functions. By employing these functions, we can easily generate resampled datasets to estimate the uncertainty of statistics or perform other desired operations. The example using the “mpg” column of the “mtcars” dataset demonstrated the usage of the bootstrap function to calculate resampled means. Feel free to customize the function to suit your specific needs and explore the power of bootstrap resampling in R.