Practical Examples with healthyR.ts

code
rtip
healthyrts
Author

Steven P. Sanderson II, MPH

Published

June 20, 2024

Introduction

Today I am going to go over some quick yet practical examples of ways that you can use the healthyR.ts package. This package is designed to help you analyze time series data in a more efficient and effective manner.

Let’s just jump right into it!

Load the libraries

library(healthyR.ts)
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)
library(timetk)
library(modeltime)

Load the data

We are going to use the timeseries data called BJSales.lead that comes with Base R. We will do this to showcase a couple of things like turning a ts object into a tibble and plotting the data.

# Load the data, which has no time series information other than it is
# a time series object and 150 points in length, so we will go ahead and
# create a date column for it and name it date_col.
df <- BJsales.lead |>
  ts_to_tbl() |>
  mutate(date_col = seq.Date(from = as.Date("1991-01-01"), 
                              by = "month", 
                              length.out = 150)) |>
  select(date_col, everything())

# Print the first few rows of the data
head(df)
# A tibble: 6 × 2
  date_col   value
  <date>     <dbl>
1 1991-01-01 10.0 
2 1991-02-01 10.1 
3 1991-03-01 10.3 
4 1991-04-01  9.75
5 1991-05-01 10.3 
6 1991-06-01 10.1 

So far, we have loaded the data and created a date column for it. Now, let’s plot the data. We are going to use the ts_vva_plot function to do this.

# Plot the data
plt_data <- ts_vva_plot(df, date_col, value)

head(plt_data[["data"]][["augmented_data_tbl"]])
# A tibble: 6 × 3
  date_col   name           value
  <date>     <fct>          <dbl>
1 1991-01-01 Value        10.0   
2 1991-01-01 Velocity     NA     
3 1991-01-01 Acceleration NA     
4 1991-02-01 Value        10.1   
5 1991-02-01 Velocity      0.0600
6 1991-02-01 Acceleration NA     
plt_data[["plots"]][["interactive_plot"]]

Now we have created the augmented data that gets the first order difference of the time series velocity and then the second order difference which gets us the acceleration. The function then creates a ggplot2 plot and a plotly plot of the data. Let’s move on to see the growth rate of this data.

# Plot the growth rate of the data
df_growth_augment_tbl <- ts_growth_rate_augment(
  df,
  value
)

head(df_growth_augment_tbl)
# A tibble: 6 × 3
  date_col   value growth_rate_value
  <date>     <dbl>             <dbl>
1 1991-01-01 10.0             NA    
2 1991-02-01 10.1              0.599
3 1991-03-01 10.3              2.48 
4 1991-04-01  9.75            -5.52 
5 1991-05-01 10.3              5.95 
6 1991-06-01 10.1             -1.94 

Let’s now view the data:

plt <- df_growth_augment_tbl |>
  pivot_longer(cols = -date_col) |>
  ggplot(aes(x = date_col, y = value, color = name)) +
  facet_wrap(~ name, ncol = 1, scales = "free") +
  geom_line() +
  theme_minimal() +
  labs(
    x = "Date",
    y = "Value",
    title = "Growth Rate of Time Series Data",
    color = "Variable"
  )

print(plt)

ggplotly(plt)

Stationary?

Is the data stationary? Meaning does the joint probability of the distribution change when shifted in time? Let’s find out.

ts_adf_test(df[["value"]])
$test_stat
[1] -1.723664

$p_value
[1] 0.6915227

The p-value from this test is 0.692. This means that we can accept the null hypothesis that the data is non-stationary. We can, however, make the data stationary by using a built in function in this package.

auto_stationary_df <- auto_stationarize(df[["value"]])
The time series is not stationary. Attempting to make it stationary...
stationary_vec <- auto_stationary_df[["stationary_ts"]]
ndiffs <- auto_stationary_df[["ndiffs"]]
trans_type <- auto_stationary_df[["trans_type"]]
test_stat <- auto_stationary_df[["adf_stats"]][["test_stat"]]
p_value <- auto_stationary_df[["adf_stats"]][["p_value"]]

The data is now stationary after 1 differencing. The transformation type used was diff. The test statistic was -4.839 and the p-value was 0.01.

Let’s now add the stationary data to the df_growth_augment_tbl and plot it. First in order to do this we are going to have to pad the data since it is shorter than the original data. We will simply add an NA to the vector then attach.

stationary_vec <- c(rep(NA, ndiffs), stationary_vec)
df_growth_augment_tbl <- df_growth_augment_tbl |>
  mutate(stationary = stationary_vec)

df_growth_augment_tbl |>
  pivot_longer(cols = -date_col) |>
  ggplot(aes(x = date_col, y = value, color = name)) +
  facet_wrap(~ name, ncol = 1, scales = "free") +
  geom_line() +
  theme_minimal() +
  labs(
    x = "Date",
    y = "Value",
    title = "Growth Rate/Value and Stationary Data of Time Series",
    color = "Variable"
  )

It’s close to the growth rate as it is the first order difference of the data.

Now, lets see if there is any lags that are present in the data.

output <- ts_lag_correlation(df_growth_augment_tbl,
                .date_col = date_col,
                .value_col = value,
                .lags = c(1,2,3,4,6,12,24))

output[["plots"]][["plotly_lag_plot"]]
output[["plots"]][["plotly_heatmap"]]

We can tell from the data, and from the automatic stationarization that the data is highly correlated at lag 1. So lags 2, 3, …, etc are not necessary. We can see the linear correlation fall apart the further the lags

Let’s go ahead and model it and see what happens.

Model the data

splits <- time_series_split(
  df_growth_augment_tbl, 
  date_col, 
  assess= 12, 
  skip = 3, 
  cumulative = TRUE
  )

ts_aa <- ts_auto_arima(
  df_growth_augment_tbl,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 10,
  .cv_slice_limit = 12,
  .tune = FALSE
)
frequency = 12 observations per 1 year
ts_aa[["recipe_info"]]
$recipe_call
recipe(.data = df_growth_augment_tbl, .date_col = date_col, .value_col = value, 
    .formula = value ~ ., .rsamp_obj = splits, .tune = FALSE, 
    .grid_size = 10, .num_cores = 2, .cv_slice_limit = 12)

$recipe_syntax
[1] "ts_arima_recipe <-"                                                                                                                                                                                              
[2] "\n  recipe(.data = df_growth_augment_tbl, .date_col = date_col, .value_col = value, \n    .formula = value ~ ., .rsamp_obj = splits, .tune = FALSE, .grid_size = 10, \n    .num_cores = 2, .cv_slice_limit = 12)"

$rec_obj
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs 
Number of variables by role
outcome:   1
predictor: 3
ts_aa[["model_info"]]
$model_spec
ARIMA Regression Model Specification (regression)

Computational engine: auto_arima 


$wflw
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: arima_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
ARIMA Regression Model Specification (regression)

Computational engine: auto_arima 


$fitted_wflw
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: arima_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Series: outcome 
Regression with ARIMA(0,1,2) errors 

Coefficients:
         ma1      ma2   drift  growth_rate_value  stationary
      0.4626  -0.4013  0.0233            -0.0021      0.5168
s.e.  0.0794   0.0849  0.0131             0.0098      0.0831

sigma^2 = 0.02124:  log likelihood = 70.34
AIC=-128.69   AICc=-128.04   BIC=-111.21

$was_tuned
[1] "not_tuned"
ts_aa[["model_calibration"]][["plot"]]
ts_aa[["model_calibration"]]
$plot

$calibration_tbl
# Modeltime Table
# A tibble: 1 × 5
  .model_id .model     .model_desc                       .type .calibration_data
      <int> <list>     <chr>                             <chr> <list>           
1         1 <workflow> REGRESSION WITH ARIMA(0,1,2) ERR… Test  <tibble [12 × 4]>

$model_accuracy
# A tibble: 1 × 9
  .model_id .model_desc                .type   mae  mape  mase smape  rmse   rsq
      <int> <chr>                      <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1         1 REGRESSION WITH ARIMA(0,1… Test  0.179  1.32 0.832  1.33 0.229 0.307

Conclusion

This in short is a very simple practical example of how to use the healthyR.ts package. This post was not meant to be a comprehensive guide to time series analysis, but rather a simple example of how to use the package. The package is still in development and will be updated with more features and functions in the future.

Happy Coding!