hai_kmeans_mapped_tbl(.data, .centers = 15)
kmeans_mapped_tbl(.data, .centers = 15)
Introduction
K-Means is a clustering algorithm that can be used to find potential clusters in your data.
The algorithm does require that you look at different values of K
in order to assess which is the optimal value.
In the R
package {healthyR.ai}
there is a utility to do this.
Function
Let’s take a look at the full function call.
You will notice that there are two, they are synonyms to each other as this functionality is moving out of the {healthyR}
package.
Parameters
The parameters take the following arguments:
.data
- This is the data that should be an output of thehai_user_item_tbl()
or it’s synonym, or should at least be in theuser item
matrix format..centers
- The maximum amount of centers you want to map to thek-means
function. The default is 15.
Example
Let’s run an example.
library(healthyR.data)
library(healthyR.ai)
library(dplyr)
<- healthyR_data %>%
data_tbl filter(ip_op_flag == "I") %>%
filter(payer_grouping != "Medicare B") %>%
filter(payer_grouping != "?") %>%
select(service_line, payer_grouping) %>%
mutate(record = 1) %>%
as_tibble()
<- hai_kmeans_user_item_tbl(
ui_tbl .data = data_tbl,
.row_input = service_line,
.col_input = payer_grouping,
.record_input = record
)
<- hai_kmeans_mapped_tbl(ui_tbl) kmeans_mapped_tbl
Let’s take a look at our data, user item matrix and our kmeans mapped tibble.
data_tbl
# A tibble: 116,823 × 3
service_line payer_grouping record
<chr> <chr> <dbl>
1 Medical Blue Cross 1
2 Schizophrenia Medicare A 1
3 Syncope Medicare A 1
4 Pneumonia Medicare A 1
5 Chest Pain Blue Cross 1
6 Chest Pain Blue Cross 1
7 Surgical Commercial 1
8 Medical Medicare A 1
9 Alcohol Abuse Medicare A 1
10 Syncope Medicare A 1
# … with 116,813 more rows
ui_tbl
# A tibble: 23 × 12
service_line Blue …¹ Comme…² Compe…³ Excha…⁴ HMO Medic…⁵ Medic…⁶ Medic…⁷
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alcohol Abuse 0.0941 0.0321 5.25e-4 0.0116 0.0788 0.158 0.367 0.173
2 Bariatric Sur… 0.317 0.0583 0 0.0518 0.168 0.00324 0.343 0.0485
3 Carotid Endar… 0.0845 0.0282 0 0 0.0141 0 0.0282 0.648
4 Cellulitis 0.110 0.0339 1.18e-2 0.00847 0.0805 0.0869 0.192 0.355
5 Chest Pain 0.144 0.0391 2.90e-3 0.00543 0.112 0.0522 0.159 0.324
6 CHF 0.0295 0.00958 5.18e-4 0.00414 0.0205 0.0197 0.0596 0.657
7 COPD 0.0493 0.0228 2.28e-4 0.00548 0.0342 0.0461 0.172 0.520
8 CVA 0.0647 0.0246 1.07e-3 0.0107 0.0524 0.0289 0.0764 0.555
9 GI Hemorrhage 0.0542 0.0175 1.25e-3 0.00834 0.0480 0.0350 0.0855 0.588
10 Joint Replace… 0.139 0.0179 3.36e-2 0.00673 0.0516 0 0.0874 0.5
# … with 13 more rows, 3 more variables: `Medicare HMO` <dbl>,
# `No Fault` <dbl>, `Self Pay` <dbl>, and abbreviated variable names
# ¹`Blue Cross`, ²Commercial, ³Compensation, ⁴`Exchange Plans`, ⁵Medicaid,
# ⁶`Medicaid HMO`, ⁷`Medicare A`
kmeans_mapped_tbl
# A tibble: 15 × 3
centers k_means glance
<int> <list> <list>
1 1 <kmeans> <tibble [1 × 4]>
2 2 <kmeans> <tibble [1 × 4]>
3 3 <kmeans> <tibble [1 × 4]>
4 4 <kmeans> <tibble [1 × 4]>
5 5 <kmeans> <tibble [1 × 4]>
6 6 <kmeans> <tibble [1 × 4]>
7 7 <kmeans> <tibble [1 × 4]>
8 8 <kmeans> <tibble [1 × 4]>
9 9 <kmeans> <tibble [1 × 4]>
10 10 <kmeans> <tibble [1 × 4]>
11 11 <kmeans> <tibble [1 × 4]>
12 12 <kmeans> <tibble [1 × 4]>
13 13 <kmeans> <tibble [1 × 4]>
14 14 <kmeans> <tibble [1 × 4]>
15 15 <kmeans> <tibble [1 × 4]>
%>%
kmeans_mapped_tbl ::unnest(glance) tidyr
# A tibble: 15 × 6
centers k_means totss tot.withinss betweenss iter
<int> <list> <dbl> <dbl> <dbl> <int>
1 1 <kmeans> 1.41 1.41 1.33e-15 1
2 2 <kmeans> 1.41 0.592 8.17e- 1 1
3 3 <kmeans> 1.41 0.372 1.04e+ 0 2
4 4 <kmeans> 1.41 0.276 1.13e+ 0 2
5 5 <kmeans> 1.41 0.202 1.21e+ 0 2
6 6 <kmeans> 1.41 0.159 1.25e+ 0 3
7 7 <kmeans> 1.41 0.124 1.28e+ 0 3
8 8 <kmeans> 1.41 0.0884 1.32e+ 0 3
9 9 <kmeans> 1.41 0.0745 1.33e+ 0 3
10 10 <kmeans> 1.41 0.0576 1.35e+ 0 2
11 11 <kmeans> 1.41 0.0460 1.36e+ 0 3
12 12 <kmeans> 1.41 0.0363 1.37e+ 0 2
13 13 <kmeans> 1.41 0.0272 1.38e+ 0 2
14 14 <kmeans> 1.41 0.0202 1.39e+ 0 3
15 15 <kmeans> 1.41 0.0161 1.39e+ 0 3
Voila!