library(healthyR.ts)
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)
library(timetk)
library(modeltime)
Introduction
Today I am going to go over some quick yet practical examples of ways that you can use the healthyR.ts
package. This package is designed to help you analyze time series data in a more efficient and effective manner.
Let’s just jump right into it!
Load the libraries
Load the data
We are going to use the timeseries data called BJSales.lead
that comes with Base R. We will do this to showcase a couple of things like turning a ts object into a tibble and plotting the data.
# Load the data, which has no time series information other than it is
# a time series object and 150 points in length, so we will go ahead and
# create a date column for it and name it date_col.
<- BJsales.lead |>
df ts_to_tbl() |>
mutate(date_col = seq.Date(from = as.Date("1991-01-01"),
by = "month",
length.out = 150)) |>
select(date_col, everything())
# Print the first few rows of the data
head(df)
# A tibble: 6 × 2
date_col value
<date> <dbl>
1 1991-01-01 10.0
2 1991-02-01 10.1
3 1991-03-01 10.3
4 1991-04-01 9.75
5 1991-05-01 10.3
6 1991-06-01 10.1
So far, we have loaded the data and created a date column for it. Now, let’s plot the data. We are going to use the ts_vva_plot
function to do this.
# Plot the data
<- ts_vva_plot(df, date_col, value)
plt_data
head(plt_data[["data"]][["augmented_data_tbl"]])
# A tibble: 6 × 3
date_col name value
<date> <fct> <dbl>
1 1991-01-01 Value 10.0
2 1991-01-01 Velocity NA
3 1991-01-01 Acceleration NA
4 1991-02-01 Value 10.1
5 1991-02-01 Velocity 0.0600
6 1991-02-01 Acceleration NA
"plots"]][["interactive_plot"]] plt_data[[
Now we have created the augmented data that gets the first order difference of the time series velocity and then the second order difference which gets us the acceleration. The function then creates a ggplot2
plot and a plotly
plot of the data. Let’s move on to see the growth rate of this data.
# Plot the growth rate of the data
<- ts_growth_rate_augment(
df_growth_augment_tbl
df,
value
)
head(df_growth_augment_tbl)
# A tibble: 6 × 3
date_col value growth_rate_value
<date> <dbl> <dbl>
1 1991-01-01 10.0 NA
2 1991-02-01 10.1 0.599
3 1991-03-01 10.3 2.48
4 1991-04-01 9.75 -5.52
5 1991-05-01 10.3 5.95
6 1991-06-01 10.1 -1.94
Let’s now view the data:
<- df_growth_augment_tbl |>
plt pivot_longer(cols = -date_col) |>
ggplot(aes(x = date_col, y = value, color = name)) +
facet_wrap(~ name, ncol = 1, scales = "free") +
geom_line() +
theme_minimal() +
labs(
x = "Date",
y = "Value",
title = "Growth Rate of Time Series Data",
color = "Variable"
)
print(plt)
ggplotly(plt)
Stationary?
Is the data stationary? Meaning does the joint probability of the distribution change when shifted in time? Let’s find out.
ts_adf_test(df[["value"]])
$test_stat
[1] -1.723664
$p_value
[1] 0.6915227
The p-value from this test is 0.692. This means that we can accept the null hypothesis that the data is non-stationary. We can, however, make the data stationary by using a built in function in this package.
<- auto_stationarize(df[["value"]]) auto_stationary_df
The time series is not stationary. Attempting to make it stationary...
<- auto_stationary_df[["stationary_ts"]]
stationary_vec <- auto_stationary_df[["ndiffs"]]
ndiffs <- auto_stationary_df[["trans_type"]]
trans_type <- auto_stationary_df[["adf_stats"]][["test_stat"]]
test_stat <- auto_stationary_df[["adf_stats"]][["p_value"]] p_value
The data is now stationary after 1 differencing. The transformation type used was diff. The test statistic was -4.839 and the p-value was 0.01.
Let’s now add the stationary data to the df_growth_augment_tbl and plot it. First in order to do this we are going to have to pad the data since it is shorter than the original data. We will simply add an NA to the vector then attach.
<- c(rep(NA, ndiffs), stationary_vec)
stationary_vec <- df_growth_augment_tbl |>
df_growth_augment_tbl mutate(stationary = stationary_vec)
|>
df_growth_augment_tbl pivot_longer(cols = -date_col) |>
ggplot(aes(x = date_col, y = value, color = name)) +
facet_wrap(~ name, ncol = 1, scales = "free") +
geom_line() +
theme_minimal() +
labs(
x = "Date",
y = "Value",
title = "Growth Rate/Value and Stationary Data of Time Series",
color = "Variable"
)
It’s close to the growth rate as it is the first order difference of the data.
Now, lets see if there is any lags that are present in the data.
<- ts_lag_correlation(df_growth_augment_tbl,
output .date_col = date_col,
.value_col = value,
.lags = c(1,2,3,4,6,12,24))
"plots"]][["plotly_lag_plot"]] output[[
"plots"]][["plotly_heatmap"]] output[[
We can tell from the data, and from the automatic stationarization that the data is highly correlated at lag 1. So lags 2, 3, …, etc are not necessary. We can see the linear correlation fall apart the further the lags
Let’s go ahead and model it and see what happens.
Model the data
<- time_series_split(
splits
df_growth_augment_tbl,
date_col, assess= 12,
skip = 3,
cumulative = TRUE
)
<- ts_auto_arima(
ts_aa
df_growth_augment_tbl,.num_cores = 2,
.date_col = date_col,
.value_col = value,
.rsamp_obj = splits,
.formula = value ~ .,
.grid_size = 10,
.cv_slice_limit = 12,
.tune = FALSE
)
frequency = 12 observations per 1 year
"recipe_info"]] ts_aa[[
$recipe_call
recipe(.data = df_growth_augment_tbl, .date_col = date_col, .value_col = value,
.formula = value ~ ., .rsamp_obj = splits, .tune = FALSE,
.grid_size = 10, .num_cores = 2, .cv_slice_limit = 12)
$recipe_syntax
[1] "ts_arima_recipe <-"
[2] "\n recipe(.data = df_growth_augment_tbl, .date_col = date_col, .value_col = value, \n .formula = value ~ ., .rsamp_obj = splits, .tune = FALSE, .grid_size = 10, \n .num_cores = 2, .cv_slice_limit = 12)"
$rec_obj
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs
Number of variables by role
outcome: 1
predictor: 3
"model_info"]] ts_aa[[
$model_spec
ARIMA Regression Model Specification (regression)
Computational engine: auto_arima
$wflw
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: arima_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ───────────────────────────────────────────────────────────────────────
ARIMA Regression Model Specification (regression)
Computational engine: auto_arima
$fitted_wflw
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: arima_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ───────────────────────────────────────────────────────────────────────
Series: outcome
Regression with ARIMA(0,1,2) errors
Coefficients:
ma1 ma2 drift growth_rate_value stationary
0.4626 -0.4013 0.0233 -0.0021 0.5168
s.e. 0.0794 0.0849 0.0131 0.0098 0.0831
sigma^2 = 0.02124: log likelihood = 70.34
AIC=-128.69 AICc=-128.04 BIC=-111.21
$was_tuned
[1] "not_tuned"
"model_calibration"]][["plot"]] ts_aa[[
"model_calibration"]] ts_aa[[
$plot
$calibration_tbl
# Modeltime Table
# A tibble: 1 × 5
.model_id .model .model_desc .type .calibration_data
<int> <list> <chr> <chr> <list>
1 1 <workflow> REGRESSION WITH ARIMA(0,1,2) ERR… Test <tibble [12 × 4]>
$model_accuracy
# A tibble: 1 × 9
.model_id .model_desc .type mae mape mase smape rmse rsq
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 REGRESSION WITH ARIMA(0,1… Test 0.179 1.32 0.832 1.33 0.229 0.307
Conclusion
This in short is a very simple practical example of how to use the healthyR.ts
package. This post was not meant to be a comprehensive guide to time series analysis, but rather a simple example of how to use the package. The package is still in development and will be updated with more features and functions in the future.
Happy Coding!