library(tidyAML)Introduction
I am happy to announce that a new version of tidyAML is now available on CRAN. This version includes support for gee models. This is a big step forward for tidyAML as it now supports a wide variety of regression and classification models.
What is tidyAML?
The package is meant to be a way to quickly create many models of different algorithms, and this release is a small step forward in supporting that mission. The package is built upon the tidymodels packages.
What is gee?
gee stands for Generalized Estimating Equations. It is a way to model correlated data. For example, if you have a dataset where you have multiple observations per person, you may want to use gee to account for the correlation between observations within a person. This is a very common situation in longitudinal studies. Here is a link to the gee parsnip model.
Example
Let’s see how it works, we will examine how it is setup to faily gracefully and how it works when the multilevelmod package is installed and loaded into your environment.
Load Library
Now, let’s build a model that will fail, it’s important I think to see the failure message so you can understand what is happening. It’s likely because the library is not loaded, let’s face it, it has happened to all of us.
library(recipes)
library(dplyr)
rec_obj <- recipe(mpg ~ ., data = mtcars)
frt_tbl <- fast_regression(
mtcars,
rec_obj,
.parsnip_eng = c("lm","glm","gee"),
.parsnip_fns = "linear_reg"
)
glimpse(frt_tbl)Rows: 3
Columns: 8
$ .model_id <int> 1, 2, 3
$ .parsnip_engine <chr> "lm", "gee", "glm"
$ .parsnip_mode <chr> "regression", "regression", "regression"
$ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg"
$ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw <list> [<tbl_df[8 x 1]>], <NULL>, [<tbl_df[8 x 1]>]
frt_tbl |> pull(pred_wflw) [[1]]
# A tibble: 8 × 1
.pred
<dbl>
1 22.9
2 18.0
3 21.1
4 32.9
5 18.5
6 10.4
7 9.59
8 24.7
[[2]]
NULL
[[3]]
# A tibble: 8 × 1
.pred
<dbl>
1 22.9
2 18.0
3 21.1
4 32.9
5 18.5
6 10.4
7 9.59
8 24.7
frt_tbl |> pull(fitted_wflw) |> purrr::map(broom::tidy)[[1]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -40.3 30.5 -1.32 0.209
2 cyl 0.907 1.09 0.830 0.421
3 disp 0.0105 0.0189 0.557 0.587
4 hp -0.00487 0.0248 -0.196 0.847
5 drat 1.73 2.04 0.846 0.413
6 wt -5.71 2.35 -2.43 0.0302
7 qsec 3.39 1.34 2.54 0.0248
8 vs -3.85 2.99 -1.29 0.220
9 am 2.16 2.29 0.942 0.364
10 gear 1.40 1.68 0.838 0.417
11 carb 0.200 0.835 0.239 0.815
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -40.3 30.5 -1.32 0.209
2 cyl 0.907 1.09 0.830 0.421
3 disp 0.0105 0.0189 0.557 0.587
4 hp -0.00487 0.0248 -0.196 0.847
5 drat 1.73 2.04 0.846 0.413
6 wt -5.71 2.35 -2.43 0.0302
7 qsec 3.39 1.34 2.54 0.0248
8 vs -3.85 2.99 -1.29 0.220
9 am 2.16 2.29 0.942 0.364
10 gear 1.40 1.68 0.838 0.417
11 carb 0.200 0.835 0.239 0.815
We see that the gee model failed. This is because we did not load the multilevelmod package. Let’s load it and try again.
library(multilevelmod)
frt_tbl <- fast_regression(
mtcars,
rec_obj,
.parsnip_eng = c("lm","glm","gee"),
.parsnip_fns = "linear_reg"
)
glimpse(frt_tbl)Rows: 3
Columns: 8
$ .model_id <int> 1, 2, 3
$ .parsnip_engine <chr> "lm", "gee", "glm"
$ .parsnip_mode <chr> "regression", "regression", "regression"
$ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg"
$ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw <list> [<tbl_df[8 x 1]>], [<tbl_df[8 x 1]>], [<tbl_df[8 x 1]>…
frt_tbl |> pull(pred_wflw) [[1]]
# A tibble: 8 × 1
.pred
<dbl>
1 20.6
2 15.6
3 15.8
4 26.1
5 27.8
6 16.6
7 25.4
8 21.6
[[2]]
# A tibble: 8 × 1
.pred
<dbl>
1 21.3
2 15.2
3 15.3
4 25.7
5 27.3
6 16.5
7 25.7
8 20.5
[[3]]
# A tibble: 8 × 1
.pred
<dbl>
1 20.6
2 15.6
3 15.8
4 26.1
5 27.8
6 16.6
7 25.4
8 21.6
frt_tbl |> pull(fitted_wflw) |> purrr::map(broom::tidy)[[1]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.70 18.9 -0.195 0.848
2 cyl 0.668 1.03 0.647 0.529
3 disp 0.00831 0.0153 0.544 0.596
4 hp -0.0124 0.0173 -0.717 0.486
5 drat 2.87 1.44 2.00 0.0674
6 wt -2.87 1.40 -2.05 0.0609
7 qsec 0.777 0.618 1.26 0.231
8 vs 0.169 1.64 0.103 0.920
9 am 1.90 1.79 1.07 0.306
10 gear 1.31 1.44 0.907 0.381
11 carb -0.601 0.730 -0.823 0.425
[[2]]
# A tibble: 10 × 6
term estimate std.error statistic p.value ``
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 6.54 10.2 0.643 4.01 1.63
2 disp 0.0119 0.0140 0.849 0.0134 0.887
3 hp -0.0149 0.0165 -0.901 0.0104 -1.43
4 drat 2.36 1.18 2.01 0.859 2.75
5 wt -3.01 1.35 -2.23 1.35 -2.23
6 qsec 0.577 0.524 1.10 0.193 2.99
7 vs 0.000922 1.58 0.000582 1.07 0.000860
8 am 1.35 1.54 0.880 0.886 1.53
9 gear 1.00 1.33 0.752 0.527 1.90
10 carb -0.355 0.611 -0.582 0.378 -0.940
[[3]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.70 18.9 -0.195 0.848
2 cyl 0.668 1.03 0.647 0.529
3 disp 0.00831 0.0153 0.544 0.596
4 hp -0.0124 0.0173 -0.717 0.486
5 drat 2.87 1.44 2.00 0.0674
6 wt -2.87 1.40 -2.05 0.0609
7 qsec 0.777 0.618 1.26 0.231
8 vs 0.169 1.64 0.103 0.920
9 am 1.90 1.79 1.07 0.306
10 gear 1.31 1.44 0.907 0.381
11 carb -0.601 0.730 -0.823 0.425
The recipe object
Let’s take a look at the model spec, the workflow and the fitted workflow to see what has happened behind the scenes. First let’s look at the formula in the recipe object.
formula(prep(rec_obj))mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
<environment: 0x0000012c521f0c98>
So we see what the formula is, but we also see that there is no id_var listed which means it would fail for the gee specification, but, our model worked! That is because behind the scenes in the fast sense at this point in development the first variable is taken to be the id variable. Let’s see what the workflow looks like.
gee_wflw <- frt_tbl |> slice(2) |> pull(wflw)
gee_wflw[[1]]$fit$actions$model$formulampg ~ id_var(cyl) + disp + hp + drat + wt + qsec + vs + am +
gear + carb
<environment: 0x0000012c5206d0f8>
As we can see the first predictor variable has been chosen as the id variable. This will change in the future, where you will be able to choose the id_var.
Conclusion
Thank you for reading and I hope you enjoy the new version of tidyAML. Please let me know if you have any questions or comments. We will continue our development of tidyAML and hope to have more updates soon.