install.packages("dplyr")
install.packages("xml2")
install.packages("rvest")
install.packages("tibble")
install.packages("purrr")
install.packages("lubridate")
install.packages("timetk")
Introduction
If you live in New York and rely on heating oil to keep your home warm during the colder months, you know how important it is to keep track of heating oil prices. Fortunately, with a bit of R code, you can easily access the latest heating oil prices in New York.
The code uses the {dplyr}
package to clean and manipulate the data, as well as the {timetk}
package to plot the time series. Here’s a breakdown of what the code does:
- First, it loads the necessary packages and sets the URL for the data source.
- Next, it reads the HTML from the URL using the
read_html
function from thexml2
package. - It then uses the
html_node
function from thervest
package to extract the HTML node that contains the data table.
The resulting data table is then cleaned and transformed using dplyr
functions such as html_table
, as_tibble
, set_names
, select
, mutate
, and arrange
.
Finally, the resulting time series data is plotted using plot_time_series
from the timetk
package.
To run this code, you will need to have these packages installed on your machine. You can install them using the install.packages function in R. Here’s how you can install the packages:
Once you have installed the packages, you can copy and paste the code into your R console or RStudio and run it to get the latest heating oil prices in New York.
In conclusion, the code above provides a simple and efficient way to access and visualize heating oil prices in New York using R. By keeping track of these prices, you can make informed decisions about when to buy heating oil and how much to purchase, ultimately saving you money on your heating bills.
Example
Now let’s run it!
<- "https://www.eia.gov/opendata/qb.php?sdid=PET.W_EPD2F_PRS_SNY_DPG.W"
url <- xml2::read_html(url)
page <- rvest::html_node(
node x = page
xpath = "/html/body/div[1]/section/div/div/div[2]/div[1]/table"
,
)<- node |>
ny_tbl ::html_table() |>
rvest::as_tibble() |>
tibble::set_names('series_name','period','frequency','value','units') |>
purrr::select(period, frequency, value, units, series_name) |>
dplyr::mutate(period = lubridate::ymd(period)) |>
dplyr::arrange(period)
dplyr
|>
ny_tbl ::plot_time_series(.date_var = period, .value = value) timetk
Voila!