Testing stationarity with the ts_adf_test() function in R
rtip
healthyrts
timeseries
Author
Steven P. Sanderson II, MPH
Published
October 17, 2023
Introduction
Hey there, R enthusiasts! Today, we’re going to dive into the fascinating world of time series analysis using the ts_adf_test() function from the healthyR.ts R library. If you’re into data, statistics, and R coding, this is a must-know tool for your arsenal.
What’s the Deal with Augmented Dickey-Fuller?
Before we delve into the ts_adf_test() function, let’s understand the concept behind it. The Augmented Dickey-Fuller (ADF) test is a crucial tool in time series analysis. It’s like the Sherlock Holmes of time series data, helping us detect whether a series is stationary or not. Stationarity is a fundamental assumption in time series modeling because many models work best when applied to stationary data.
So, why “Augmented”? Well, it’s an extension of the original Dickey-Fuller test that accounts for more complex relationships within the time series data.
The ts_adf_test() Function
Now, let’s get to the star of the show, the ts_adf_test() function. This function is part of the healthyR.ts library, and its primary job is to perform the ADF test on a given time series. In R, a time series can be represented as a numeric vector. Here’s the basic syntax:
ts_adf_test(.x, .k =NULL)
.x is your time series data, the numeric vector you want to analyze.
.k is an optional parameter that allows you to specify the lag order. If you leave it empty (like .k = NULL), don’t worry; the function will calculate it for you based on the number of observations using a clever formula.
Show Me the Stats!
So, what does ts_adf_test() return? It gives you a list object containing two vital pieces of information:
Test Statistic: This is the heart of the ADF test. It tells us how strongly our data deviates from being stationary. A more negative value indicates stronger evidence for stationarity.
P-Value: This is another critical number. It represents the probability that you’d observe a test statistic as extreme as the one you obtained if the data were not stationary. In simpler terms, a low p-value suggests that your data is likely stationary, while a high p-value implies non-stationarity.
Let’s Get Practical
Enough theory! Let’s see some action with a couple of examples. Say we have the AirPassengers and BJsales datasets, and we want to check their stationarity:
library(healthyR.ts)# ADF test for AirPassengersresult_air <-ts_adf_test(AirPassengers)cat("AirPassengers ADF Test Result:\n")
AirPassengers ADF Test Result:
print(result_air)
$test_stat
[1] -7.318571
$p_value
[1] 0.01
# ADF test for BJsalesresult_bj <-ts_adf_test(BJsales)cat("\nBJsales ADF Test Result:\n")
BJsales ADF Test Result:
print(result_bj)
$test_stat
[1] -2.110919
$p_value
[1] 0.5301832
In the AirPassengers example, we get a test statistic of -7.318571 and a p-value of 0.01. This suggests strong evidence for stationarity in this dataset.
However, for BJsales, we get a test statistic of -2.110919 and a p-value of 0.5301832. The higher p-value here indicates that the data is less likely to be stationary.
Now let’s see what happens when we change the lags of the series by one period.
ts_adf_test(AirPassengers, 1)
$test_stat
[1] -7.652287
$p_value
[1] 0.01
ts_adf_test(BJsales, 1)
$test_stat
[1] -1.316414
$p_value
[1] 0.8611925
Conclusion
The ts_adf_test() function in the healthyR.ts library is a valuable tool for any data scientist or R coder working with time series data. It helps you determine whether your data is stationary, a crucial step in building reliable time series models.
So, the next time you’re faced with a time series dataset, remember to call on your trusty companion, ts_adf_test(), to solve the mystery of stationarity. Happy coding, R enthusiasts!