Creating and Predicting Fast Regression Parsnip Models with {tidyAML}

code
rtip
tidyaml
Author

Steven P. Sanderson II, MPH

Published

February 9, 2023

Introduction

I am almost ready for a first release of my R package {tidyAML}. The purpose of this is to act as a way of quickly generating models using the parsnip package and keeping things inside of the tidymodels framework allowing users to seamlessly create models in tidyAML but pluck and move them over to tidymodels should they prefer. This is because I believe that software should be interchangeable and work well with other libraries. Today I am going to showcase how the function fast_regression()

Function

Let’s take a look at the function.

fast_regression(
  .data,
  .rec_obj,
  .parsnip_fns = "all",
  .parsnip_eng = "all",
  .split_type = "initial_split",
  .split_args = NULL
)

Here are the arguments to the function:

  • .data - The data being passed to the function for the regression problem
  • .rec_obj - The recipe object being passed.
  • .parsnip_fns - The default is ‘all’ which will create all possible regression model specifications supported.
  • .parsnip_eng - The default is ‘all’ which will create all possible regression model specifications supported.
  • .split_type - The default is ‘initial_split’, you can pass any type of split supported by rsample
  • .split_args - The default is NULL, when NULL then the default parameters of the split type will be executed for the rsample split type.

Example

Let’s take a look at an example.

library(tidyAML)
library(dplyr)
library(recipes)
library(purrr)

rec_obj <- recipe(mpg ~ ., data = mtcars)
fast_reg_tbl <- fast_regression(
  .data = mtcars,
  .rec_obj = rec_obj,
  .parsnip_eng = c("lm","glm"),
  .parsnip_fns = "linear_reg"
)

glimpse(fast_reg_tbl)
Rows: 2
Columns: 8
$ .model_id       <int> 1, 2
$ .parsnip_engine <chr> "lm", "glm"
$ .parsnip_mode   <chr> "regression", "regression"
$ .parsnip_fns    <chr> "linear_reg", "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[24 x 1]>], [<tbl_df[24 x 1]>]

Let’s take a look at the model spec.

fast_reg_tbl %>% slice(1) %>% pull(model_spec) %>% pluck(1)
Linear Regression Model Specification (regression)

Computational engine: lm 

Now the wflw column.

fast_reg_tbl %>% slice(1) %>% pull(wflw) %>% pluck(1)
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 

The Fitted workflow.

fast_reg_tbl %>% slice(1) %>% pull(fitted_wflw) %>% pluck(1)
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────

Call:
stats::lm(formula = ..y ~ ., data = data)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt  
 -15.077267     1.107474     0.001161    -0.001014     4.010199    -1.280324  
       qsec           vs           am         gear         carb  
   0.512318    -0.488014     2.430052     4.353568    -2.546043  

And lastly tne predicted workflow column.

fast_reg_tbl %>% slice(1) %>% pull(pred_wflw) %>% pluck(1)
# A tibble: 24 × 1
   .pred
   <dbl>
 1  24.7
 2  28.2
 3  18.9
 4  12.0
 5  14.8
 6  15.4
 7  14.7
 8  20.0
 9  11.2
10  19.1
# … with 14 more rows

Voila!