Testing stationarity with the ts_adf_test() function in R

rtip
healthyrts
timeseries
Author

Steven P. Sanderson II, MPH

Published

October 17, 2023

Introduction

Hey there, R enthusiasts! Today, we’re going to dive into the fascinating world of time series analysis using the ts_adf_test() function from the healthyR.ts R library. If you’re into data, statistics, and R coding, this is a must-know tool for your arsenal.

What’s the Deal with Augmented Dickey-Fuller?

Before we delve into the ts_adf_test() function, let’s understand the concept behind it. The Augmented Dickey-Fuller (ADF) test is a crucial tool in time series analysis. It’s like the Sherlock Holmes of time series data, helping us detect whether a series is stationary or not. Stationarity is a fundamental assumption in time series modeling because many models work best when applied to stationary data.

So, why “Augmented”? Well, it’s an extension of the original Dickey-Fuller test that accounts for more complex relationships within the time series data.

The ts_adf_test() Function

Now, let’s get to the star of the show, the ts_adf_test() function. This function is part of the healthyR.ts library, and its primary job is to perform the ADF test on a given time series. In R, a time series can be represented as a numeric vector. Here’s the basic syntax:

ts_adf_test(.x, .k = NULL)
  • .x is your time series data, the numeric vector you want to analyze.
  • .k is an optional parameter that allows you to specify the lag order. If you leave it empty (like .k = NULL), don’t worry; the function will calculate it for you based on the number of observations using a clever formula.

Show Me the Stats!

So, what does ts_adf_test() return? It gives you a list object containing two vital pieces of information:

  1. Test Statistic: This is the heart of the ADF test. It tells us how strongly our data deviates from being stationary. A more negative value indicates stronger evidence for stationarity.

  2. P-Value: This is another critical number. It represents the probability that you’d observe a test statistic as extreme as the one you obtained if the data were not stationary. In simpler terms, a low p-value suggests that your data is likely stationary, while a high p-value implies non-stationarity.

Let’s Get Practical

Enough theory! Let’s see some action with a couple of examples. Say we have the AirPassengers and BJsales datasets, and we want to check their stationarity:

library(healthyR.ts)

# ADF test for AirPassengers
result_air <- ts_adf_test(AirPassengers)
cat("AirPassengers ADF Test Result:\n")
AirPassengers ADF Test Result:
print(result_air)
$test_stat
[1] -7.318571

$p_value
[1] 0.01
# ADF test for BJsales
result_bj <- ts_adf_test(BJsales)
cat("\nBJsales ADF Test Result:\n")

BJsales ADF Test Result:
print(result_bj)
$test_stat
[1] -2.110919

$p_value
[1] 0.5301832

In the AirPassengers example, we get a test statistic of -7.318571 and a p-value of 0.01. This suggests strong evidence for stationarity in this dataset.

However, for BJsales, we get a test statistic of -2.110919 and a p-value of 0.5301832. The higher p-value here indicates that the data is less likely to be stationary.

Now let’s see what happens when we change the lags of the series by one period.

ts_adf_test(AirPassengers, 1)
$test_stat
[1] -7.652287

$p_value
[1] 0.01
ts_adf_test(BJsales, 1)
$test_stat
[1] -1.316414

$p_value
[1] 0.8611925

Conclusion

The ts_adf_test() function in the healthyR.ts library is a valuable tool for any data scientist or R coder working with time series data. It helps you determine whether your data is stationary, a crucial step in building reliable time series models.

So, the next time you’re faced with a time series dataset, remember to call on your trusty companion, ts_adf_test(), to solve the mystery of stationarity. Happy coding, R enthusiasts!