library(tidyAML)
Introduction
I am happy to announce that a new version of tidyAML
is now available on CRAN. This version includes support for gee
models. This is a big step forward for tidyAML
as it now supports a wide variety of regression and classification models.
What is tidyAML
?
The package is meant to be a way to quickly create many models of different algorithms, and this release is a small step forward in supporting that mission. The package is built upon the tidymodels
packages.
What is gee
?
gee
stands for Generalized Estimating Equations. It is a way to model correlated data. For example, if you have a dataset where you have multiple observations per person, you may want to use gee
to account for the correlation between observations within a person. This is a very common situation in longitudinal studies. Here is a link to the gee
parsnip model.
Example
Let’s see how it works, we will examine how it is setup to faily gracefully and how it works when the multilevelmod
package is installed and loaded into your environment.
Load Library
Now, let’s build a model that will fail, it’s important I think to see the failure message so you can understand what is happening. It’s likely because the library is not loaded, let’s face it, it has happened to all of us.
library(recipes)
library(dplyr)
<- recipe(mpg ~ ., data = mtcars)
rec_obj <- fast_regression(
frt_tbl
mtcars,
rec_obj, .parsnip_eng = c("lm","glm","gee"),
.parsnip_fns = "linear_reg"
)
glimpse(frt_tbl)
Rows: 3
Columns: 8
$ .model_id <int> 1, 2, 3
$ .parsnip_engine <chr> "lm", "gee", "glm"
$ .parsnip_mode <chr> "regression", "regression", "regression"
$ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg"
$ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw <list> [<tbl_df[8 x 1]>], <NULL>, [<tbl_df[8 x 1]>]
|> pull(pred_wflw) frt_tbl
[[1]]
# A tibble: 8 × 1
.pred
<dbl>
1 22.9
2 18.0
3 21.1
4 32.9
5 18.5
6 10.4
7 9.59
8 24.7
[[2]]
NULL
[[3]]
# A tibble: 8 × 1
.pred
<dbl>
1 22.9
2 18.0
3 21.1
4 32.9
5 18.5
6 10.4
7 9.59
8 24.7
|> pull(fitted_wflw) |> purrr::map(broom::tidy) frt_tbl
[[1]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -40.3 30.5 -1.32 0.209
2 cyl 0.907 1.09 0.830 0.421
3 disp 0.0105 0.0189 0.557 0.587
4 hp -0.00487 0.0248 -0.196 0.847
5 drat 1.73 2.04 0.846 0.413
6 wt -5.71 2.35 -2.43 0.0302
7 qsec 3.39 1.34 2.54 0.0248
8 vs -3.85 2.99 -1.29 0.220
9 am 2.16 2.29 0.942 0.364
10 gear 1.40 1.68 0.838 0.417
11 carb 0.200 0.835 0.239 0.815
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -40.3 30.5 -1.32 0.209
2 cyl 0.907 1.09 0.830 0.421
3 disp 0.0105 0.0189 0.557 0.587
4 hp -0.00487 0.0248 -0.196 0.847
5 drat 1.73 2.04 0.846 0.413
6 wt -5.71 2.35 -2.43 0.0302
7 qsec 3.39 1.34 2.54 0.0248
8 vs -3.85 2.99 -1.29 0.220
9 am 2.16 2.29 0.942 0.364
10 gear 1.40 1.68 0.838 0.417
11 carb 0.200 0.835 0.239 0.815
We see that the gee
model failed. This is because we did not load the multilevelmod
package. Let’s load it and try again.
library(multilevelmod)
<- fast_regression(
frt_tbl
mtcars,
rec_obj, .parsnip_eng = c("lm","glm","gee"),
.parsnip_fns = "linear_reg"
)
glimpse(frt_tbl)
Rows: 3
Columns: 8
$ .model_id <int> 1, 2, 3
$ .parsnip_engine <chr> "lm", "gee", "glm"
$ .parsnip_mode <chr> "regression", "regression", "regression"
$ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg"
$ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw <list> [<tbl_df[8 x 1]>], [<tbl_df[8 x 1]>], [<tbl_df[8 x 1]>…
|> pull(pred_wflw) frt_tbl
[[1]]
# A tibble: 8 × 1
.pred
<dbl>
1 20.6
2 15.6
3 15.8
4 26.1
5 27.8
6 16.6
7 25.4
8 21.6
[[2]]
# A tibble: 8 × 1
.pred
<dbl>
1 21.3
2 15.2
3 15.3
4 25.7
5 27.3
6 16.5
7 25.7
8 20.5
[[3]]
# A tibble: 8 × 1
.pred
<dbl>
1 20.6
2 15.6
3 15.8
4 26.1
5 27.8
6 16.6
7 25.4
8 21.6
|> pull(fitted_wflw) |> purrr::map(broom::tidy) frt_tbl
[[1]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.70 18.9 -0.195 0.848
2 cyl 0.668 1.03 0.647 0.529
3 disp 0.00831 0.0153 0.544 0.596
4 hp -0.0124 0.0173 -0.717 0.486
5 drat 2.87 1.44 2.00 0.0674
6 wt -2.87 1.40 -2.05 0.0609
7 qsec 0.777 0.618 1.26 0.231
8 vs 0.169 1.64 0.103 0.920
9 am 1.90 1.79 1.07 0.306
10 gear 1.31 1.44 0.907 0.381
11 carb -0.601 0.730 -0.823 0.425
[[2]]
# A tibble: 10 × 6
term estimate std.error statistic p.value ``
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 6.54 10.2 0.643 4.01 1.63
2 disp 0.0119 0.0140 0.849 0.0134 0.887
3 hp -0.0149 0.0165 -0.901 0.0104 -1.43
4 drat 2.36 1.18 2.01 0.859 2.75
5 wt -3.01 1.35 -2.23 1.35 -2.23
6 qsec 0.577 0.524 1.10 0.193 2.99
7 vs 0.000922 1.58 0.000582 1.07 0.000860
8 am 1.35 1.54 0.880 0.886 1.53
9 gear 1.00 1.33 0.752 0.527 1.90
10 carb -0.355 0.611 -0.582 0.378 -0.940
[[3]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.70 18.9 -0.195 0.848
2 cyl 0.668 1.03 0.647 0.529
3 disp 0.00831 0.0153 0.544 0.596
4 hp -0.0124 0.0173 -0.717 0.486
5 drat 2.87 1.44 2.00 0.0674
6 wt -2.87 1.40 -2.05 0.0609
7 qsec 0.777 0.618 1.26 0.231
8 vs 0.169 1.64 0.103 0.920
9 am 1.90 1.79 1.07 0.306
10 gear 1.31 1.44 0.907 0.381
11 carb -0.601 0.730 -0.823 0.425
The recipe object
Let’s take a look at the model spec, the workflow and the fitted workflow to see what has happened behind the scenes. First let’s look at the formula in the recipe object.
formula(prep(rec_obj))
mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
<environment: 0x0000012c521f0c98>
So we see what the formula is, but we also see that there is no id_var
listed which means it would fail for the gee
specification, but, our model worked! That is because behind the scenes in the fast sense at this point in development the first variable is taken to be the id variable. Let’s see what the workflow looks like.
<- frt_tbl |> slice(2) |> pull(wflw)
gee_wflw
1]]$fit$actions$model$formula gee_wflw[[
mpg ~ id_var(cyl) + disp + hp + drat + wt + qsec + vs + am +
gear + carb
<environment: 0x0000012c5206d0f8>
As we can see the first predictor variable has been chosen as the id variable. This will change in the future, where you will be able to choose the id_var
.
Conclusion
Thank you for reading and I hope you enjoy the new version of tidyAML
. Please let me know if you have any questions or comments. We will continue our development of tidyAML
and hope to have more updates soon.