Skip to contents

This is a wrapper around the h2o::h2o.kmeans() function that will return a list object with a lot of useful and easy to use tidy style information.

Usage

hai_kmeans_automl(
  .data,
  .split_ratio = 0.8,
  .seed = 1234,
  .centers = 10,
  .standardize = TRUE,
  .print_model_summary = TRUE,
  .predictors,
  .categorical_encoding = "auto",
  .initialization_mode = "Furthest",
  .max_iterations = 100
)

Arguments

.data

The data that is to be passed for clustering.

.split_ratio

The ratio for training and testing splits.

.seed

The default is 1234, but can be set to any integer.

.centers

The default is 1. Specify the number of clusters (groups of data) in a data set.

.standardize

The default is set to TRUE. When TRUE all numeric columns will be set to zero mean and unit variance.

.print_model_summary

This is a boolean and controls if the model summary is printed to the console. The default is TRUE.

.predictors

This must be in the form of c("column_1", "column_2", ... "column_n")

.categorical_encoding

Can be one of the following:

  • "auto"

  • "enum"

  • "one_hot_explicit"

  • "binary"

  • "eigen"

  • "label_encoder"

  • "sort_by_response"

  • "enum_limited"

.initialization_mode

This can be one of the following:

  • "Random"

  • "Furthest (default)

  • "PlusPlus"

.max_iterations

The default is 100. This specifies the number of training iterations

Value

A list object

Author

Steven P. Sanderson II, MPH

Examples

if (FALSE) { # \dontrun{
h2o.init()
output <- hai_kmeans_automl(
  .data = iris,
  .predictors = c("Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length"),
  .standardize = FALSE
)
h2o.shutdown()
} # }