library(forecast)
data(AirPassengers)
plot(AirPassengers, main = "International Airline Passengers 1949-1960")
Introduction
Time series analysis is a powerful tool for understanding and predicting patterns in data that vary over time. In this tutorial, we will use the AirPassengers dataset to create a bootstrapped timeseries model in R.
The AirPassengers dataset The AirPassengers dataset contains data on the number of passengers traveling on international flights per month from 1949 to 1960. To begin, we will load the dataset into R and plot it to get an idea of the data’s structure and any underlying patterns.
From the plot, we can see that there is a clear upward trend in the data, as well as some seasonality.
Creating a bootstrapped timeseries model
Now that we have an idea of the structure of the data, we can create a bootstrapped timeseries model using the auto.arima()
function from the {forecast}
package. The auto.arima()
function uses an automated algorithm to determine the best model for a given timeseries.
set.seed(123)
<- length(AirPassengers)
n <- 1000
n_boot
# create bootstrap sample indices
<- replicate(n_boot, sample(1:n, replace = TRUE))
boot_indices
# create list to store models
<- list()
models
# create bootstrapped models
for(i in 1:n_boot) {
<- AirPassengers[boot_indices[, i]]
boot_data <- auto.arima(boot_data)
models[[i]]
}
1]] models[[
Series: boot_data
ARIMA(0,0,0) with non-zero mean
Coefficients:
mean
275.5347
s.e. 9.5443
sigma^2 = 13209: log likelihood = -887.01
AIC=1778.02 AICc=1778.1 BIC=1783.96
In the code above, we first set a seed to ensure reproducibility of our results. We then specify the length of the timeseries and the number of bootstrap iterations we want to run. We create a list to store the models and a set of bootstrap sample indices.
We then loop through each bootstrap iteration, creating a new dataset from the original timeseries by sampling with replacement using the boot_indices. We use the auto.arima()
function to create a timeseries model for each bootstrap sample and store it in our models list.
Summarizing and plotting residuals
Now that we have created our bootstrapped timeseries models, we can summarize and plot the residuals of each model to get an idea of how well our models fit the data.
# create list to store residuals
<- list()
residuals
# create residuals for each model
for(i in 1:n_boot) {
<- AirPassengers[boot_indices[, i]]
boot_data <- residuals(models[[i]])
residuals[[i]]
}
# summarize residuals
<- sapply(residuals, mean)
residual_means <- sapply(residuals, sd)
residual_sd
# plot residuals
par(mfrow = c(2, 1))
hist(
residual_means, main = "Bootstrapped Model Residuals",
xlab = "Mean Residuals"
)hist(
residual_sd, col = "red",
main = "",
xlab = "SD Residuals"
)
par(mfrow = c(1,1))
In the code above, we create a list to store the residuals for each model, loop through each model to create residuals using the residuals() function, and summarize the residuals by taking the mean and standard deviation of each set of residuals.
We then plot the mean residuals and standard deviations for each model using the plot() function and add a legend to indicate the meaning of the two lines.
Voila!