Updates to {healthyverse} packages

code
weeklytip
healthyverse
Author

Steven P. Sanderson II, MPH

Published

November 18, 2022

Introduction

I have made several updates to {healthyverse}, this has resulted in new releases to CRAN for {healthyR.ai}, {healthyR.ts}, and {TidyDesnsity}.

TidyDensity

TidyDensity

healthyR.ai

healthyR.ai

healthyR.ts

healthyR.ts

Some New Functions

Let’s take a look at some new functions.

TidyDensity

For TidyDensity a new distribution was added, welcome tidy_bernoulli(). This distribution also comes with the standard util_distname_param_estimate() and the util_distname_stats_tbl() functions. Let’s take a look at the function calls.

tidy_bernoulli(.n = 50, .prob = 0.1, .num_sims = 1)

util_bernoulli_param_estimate(.x, .auto_gen_empirical = TRUE)

util_bernoulli_stats_tbl(.data)

Let’s see them in use.

library(TidyDensity)
library(dplyr)

tb <- tidy_bernoulli()

tb
# A tibble: 50 × 7
   sim_number     x     y      dx     dy     p     q
   <fct>      <int> <int>   <dbl>  <dbl> <dbl> <dbl>
 1 1              1     0 -0.338  0.0366   0.9     0
 2 1              2     0 -0.304  0.0866   0.9     0
 3 1              3     0 -0.270  0.187    0.9     0
 4 1              4     0 -0.236  0.369    0.9     0
 5 1              5     0 -0.201  0.663    0.9     0
 6 1              6     0 -0.167  1.09     0.9     0
 7 1              7     0 -0.133  1.63     0.9     0
 8 1              8     1 -0.0988 2.22     1       1
 9 1              9     0 -0.0646 2.76     0.9     0
10 1             10     0 -0.0304 3.14     0.9     0
# … with 40 more rows
util_bernoulli_param_estimate(tb$y)
$combined_data_tbl
# A tibble: 100 × 8
   sim_number     x     y      dx     dy     p     q dist_type
   <fct>      <int> <dbl>   <dbl>  <dbl> <dbl> <dbl> <fct>    
 1 1              1     0 -0.338  0.0366  0.92     0 Empirical
 2 1              2     0 -0.304  0.0866  0.92     0 Empirical
 3 1              3     0 -0.270  0.187   0.92     0 Empirical
 4 1              4     0 -0.236  0.369   0.92     0 Empirical
 5 1              5     0 -0.201  0.663   0.92     0 Empirical
 6 1              6     0 -0.167  1.09    0.92     0 Empirical
 7 1              7     0 -0.133  1.63    0.92     0 Empirical
 8 1              8     1 -0.0988 2.22    1        0 Empirical
 9 1              9     0 -0.0646 2.76    0.92     0 Empirical
10 1             10     0 -0.0304 3.14    0.92     0 Empirical
# … with 90 more rows

$parameter_tbl
# A tibble: 1 × 8
  dist_type samp_size   min   max  mean variance sum_x  prob
  <chr>         <int> <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>
1 Bernoulli        50     0     1  0.08   0.0736     4  0.08
util_bernoulli_stats_tbl(tb) %>%
  glimpse()
Rows: 1
Columns: 18
$ tidy_function      <chr> "tidy_bernoulli"
$ function_call      <chr> "Bernoulli c(0.1)"
$ distribution       <chr> "Bernoulli"
$ distribution_type  <chr> "discrete"
$ points             <dbl> 50
$ simulations        <dbl> 1
$ mean               <dbl> 0.1
$ mode               <chr> "0"
$ coeff_var          <dbl> 0.09
$ skewness           <dbl> 2.666667
$ kurtosis           <dbl> 5.111111
$ mad                <dbl> 0.5
$ entropy            <dbl> 0.325083
$ fisher_information <dbl> 11.11111
$ computed_std_skew  <dbl> 3.096281
$ computed_std_kurt  <dbl> 10.58696
$ ci_lo              <dbl> 0
$ ci_hi              <dbl> 1

healthyR.ai

This was a minor patch release that exported some previously internal only functions and fixed an error with the custom recipe steps. One of the functions that has been exported is hai_data_impute()

Let’s take a look.

hai_data_impute(
  .recipe_object = NULL,
  ...,
  .seed_value = 123,
  .type_of_imputation = "mean",
  .number_of_trees = 25,
  .neighbors = 5,
  .mean_trim = 0,
  .roll_statistic,
  .roll_window = 5
)

Let’s take a look at an example of it’s use.

library(dplyr)
library(recipes)
library(healthyR.ai)

date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month")
val_seq <- rep(c(rnorm(9), NA), times = 10)
df_tbl <- tibble(
  date_col = date_seq,
  value    = val_seq
)

df_tbl
# A tibble: 100 × 2
   date_col     value
   <date>       <dbl>
 1 2013-01-01  0.325 
 2 2013-02-01 -0.308 
 3 2013-03-01  0.562 
 4 2013-04-01  0.0805
 5 2013-05-01 -0.528 
 6 2013-06-01 -2.49  
 7 2013-07-01 -0.964 
 8 2013-08-01 -0.337 
 9 2013-09-01 -1.08  
10 2013-10-01 NA     
# … with 90 more rows
rec_obj <- recipe(value ~ ., df_tbl)

hai_data_impute(
  .recipe_object = rec_obj,
  value,
  .type_of_imputation = "roll",
  .roll_statistic = median
)$impute_rec_obj %>%
  get_juiced_data()
# A tibble: 100 × 2
   date_col     value
   <date>       <dbl>
 1 2013-01-01  0.325 
 2 2013-02-01 -0.308 
 3 2013-03-01  0.562 
 4 2013-04-01  0.0805
 5 2013-05-01 -0.528 
 6 2013-06-01 -2.49  
 7 2013-07-01 -0.964 
 8 2013-08-01 -0.337 
 9 2013-09-01 -1.08  
10 2013-10-01 -0.322 
# … with 90 more rows

healthyR.ts

This was a minor patch release fixing the function ts_lag_correlation() when the column that was the value was not explicitly called…value.

Thank you!