library(tidymodels)
library(tidyAML)
library(tidyverse)
tidymodels_prefer()
# Create a model specification table
<- fast_regression_parsnip_spec_tbl(
mod_spec_tbl .parsnip_eng = c("lm","glm"),
.parsnip_fns = "linear_reg"
)
# Create a recipe
<- recipe(mpg ~ ., data = mtcars)
rec_obj
# Create splits
<- create_splits(mtcars, "initial_split")
splits_obj
# Generate the model table
<- mod_spec_tbl |>
mod_tbl mutate(wflw = full_internal_make_wflw(mod_spec_tbl, rec_obj))
# Generate the fitted model table
<- mod_tbl |>
mod_fitted_tbl mutate(fitted_wflw = internal_make_fitted_wflw(mod_tbl, splits_obj))
# Make predictions with the enhanced function
<- internal_make_wflw_predictions(mod_fitted_tbl, splits_obj) preds_list
Introduction
Hey R enthusiasts! Steve here, and today I’m excited to share some fantastic updates about a key function in the tidyAML package – internal_make_wflw_predictions()
. The latest version addresses issue #190, ensuring that all crucial data is now included in the predictions. Let’s dive into the details!
What’s New?
In response to user feedback, we’ve enhanced the internal_make_wflw_predictions()
function to provide a comprehensive set of predictions. Now, when you make a call to this function, it includes:
The Actual Data: This is the real-world data that your model aims to predict. Having access to this information helps you assess how well your model is performing on unseen instances.
Training Predictions: Predictions made on the training dataset. This is essential for understanding how well your model generalizes to the data it was trained on.
Testing Predictions: Predictions made on the testing dataset. This is crucial for evaluating the model’s performance on data it hasn’t seen during the training phase.
How to Use It
To take advantage of these new features, here’s how you can use the updated internal_make_wflw_predictions()
function:
internal_make_wflw_predictions(.model_tbl, .splits_obj)
Arguments:
.model_tbl: The model table generated from a function like
fast_regression_parsnip_spec_tbl()
. Ensure that it has a class of “tidyaml_mod_spec_tbl.” This is typically used after running theinternal_make_fitted_wflw()
function and saving the resulting tibble..splits_obj: The splits object obtained from the
auto_ml
function. It is internal to theauto_ml
function.
Example Usage
Let’s walk through an example using some popular R packages:
This example demonstrates how to integrate the updated function into your workflow seamlessly. Typically though one would not use this function directly, but rather use the fast_regression()
or fast_classification()
function, which calls this function internally. Let’s now take a look at the output of everything.
rec_obj
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs
Number of variables by role
outcome: 1
predictor: 10
splits_obj
$splits
<Training/Testing/Total>
<24/8/32>
$split_type
[1] "initial_split"
mod_spec_tbl
# A tibble: 2 × 5
.model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec
<int> <chr> <chr> <chr> <list>
1 1 lm regression linear_reg <spec[+]>
2 2 glm regression linear_reg <spec[+]>
mod_tbl
# A tibble: 2 × 6
.model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw
<int> <chr> <chr> <chr> <list> <list>
1 1 lm regression linear_reg <spec[+]> <workflow>
2 2 glm regression linear_reg <spec[+]> <workflow>
mod_fitted_tbl
# A tibble: 2 × 7
.model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw
<int> <chr> <chr> <chr> <list> <list>
1 1 lm regression linear_reg <spec[+]> <workflow>
2 2 glm regression linear_reg <spec[+]> <workflow>
# ℹ 1 more variable: fitted_wflw <list>
preds_list
[[1]]
# A tibble: 64 × 3
.data_category .data_type .value
<chr> <chr> <dbl>
1 actual actual 15.2
2 actual actual 19.7
3 actual actual 17.8
4 actual actual 15
5 actual actual 10.4
6 actual actual 15.8
7 actual actual 17.3
8 actual actual 30.4
9 actual actual 15.2
10 actual actual 19.2
# ℹ 54 more rows
[[2]]
# A tibble: 64 × 3
.data_category .data_type .value
<chr> <chr> <dbl>
1 actual actual 15.2
2 actual actual 19.7
3 actual actual 17.8
4 actual actual 15
5 actual actual 10.4
6 actual actual 15.8
7 actual actual 17.3
8 actual actual 30.4
9 actual actual 15.2
10 actual actual 19.2
# ℹ 54 more rows
You will notice the names of the preds_list output:
names(preds_list[[1]])
[1] ".data_category" ".data_type" ".value"
So we have .data_category
, .data_type
, and .value
. Let’s take a look at the unique values of each column for .data_category
and .data_type
:
unique(preds_list[[1]]$.data_category)
[1] "actual" "predicted"
So we have our actual
data the the predicted
data. The predicted though has both the training
and testing
data in it. Let’s take a look at the unique values of .data_type
:
unique(preds_list[[1]]$.data_type)
[1] "actual" "training" "testing"
This will allow you to visualize the data how you please, something we will go over tomorrow!
Why It Matters
By including actual data along with training and testing predictions, the internal_make_wflw_predictions()
function empowers you to perform a more thorough evaluation of your models. This is a significant step towards ensuring the reliability and generalization capability of your machine learning models.
So, R enthusiasts, update your tidyAML package, explore the enhanced features, and let us know how these improvements elevate your modeling experience. Happy coding!