Introducing get_provider_meta_data() in healthyR.data

code
rtip
healthyrdata
Author

Steven P. Sanderson II, MPH

Published

May 29, 2024

Introduction

Hello, R enthusiasts!

Today, I’m excited to introduce a new function in the healthyR.data package: get_provider_meta_data(). This function is excellent for anyone working with healthcare datasets, making it easy to fetch and filter metadata from the Centers for Medicare & Medicaid Services (CMS) repository.

Overview

The get_provider_meta_data() function simplifies the process of retrieving and managing metadata for healthcare datasets. By allowing users to filter data based on various criteria, it streamlines data management and enhances analytical capabilities.

Syntax and Arguments

The function syntax is straightforward and highly customizable:

get_provider_meta_data(
  .identifier = NULL,
  .title = NULL,
  .description = NULL,
  .keyword = NULL,
  .issued = NULL,
  .modified = NULL,
  .released = NULL,
  .theme = NULL,
  .media_type = NULL
)

Here’s a breakdown of the arguments:

  • .identifier: A dataset identifier to filter the data.
  • .title: A title to filter the data.
  • .description: A description to filter the data.
  • .keyword: A keyword to filter the data.
  • .issued: A date when the dataset was issued to filter the data.
  • .modified: A date when the dataset was modified to filter the data.
  • .released: A date when the dataset was released to filter the data.
  • .theme: A theme to filter the data.
  • .media_type: A media type to filter the data.

What It Returns

The function returns a tidy tibble containing metadata about the datasets. This tibble includes the following columns:

  • identifier
  • title
  • description
  • keyword
  • issued
  • modified
  • released
  • theme
  • media_type
  • download_url
  • contact_fn
  • contact_email
  • publisher_name

Details

When you call get_provider_meta_data(), it fetches JSON data from the CMS metadata URL. The function then processes this data by: 1. Selecting relevant columns. 2. Unnesting nested lists. 3. Cleaning column names. 4. Processing dates and media types for enhanced usability.

Practical Example

Let’s walk through an example to see how get_provider_meta_data() works in action.

Suppose we want to retrieve metadata for a dataset based upong a specific data identifier? Here’s how we can do it:

library(healthyR.data)
library(dplyr)
library(tidyr)

# Retrieve metadata for a datset with identifier "3614-1eef"
get_provider_meta_data(.identifier = "3614-1eef") |>
  glimpse()
Rows: 1
Columns: 16
$ identifier      <chr> "3614-1eef"
$ title           <chr> "Addiction Medicine Office Visit Costs"
$ description     <chr> "Returns addiction medicine office visit costs per zip…
$ keyword         <list> "Addiction Medicine"
$ issued          <date> 2022-07-11
$ modified        <date> 2022-07-11
$ released        <date> 2023-09-28
$ landing_page    <chr> "https://data.medicare.gov/provider-data/dataset/3614-…
$ theme           <list> "Physician office visit costs"
$ access_level    <chr> "public"
$ archive_exclude <lgl> NA
$ contact_fn      <chr> "PPL Dataset"
$ contact_email   <chr> "[email protected]"
$ publisher_name  <chr> "Centers for Medicare & Medicaid Services (CMS)"
$ download_url    <chr> "https://data.cms.gov/provider-data/sites/default/file…
$ media_type      <chr> "text/csv"

In this example, we are filtering the metadata based on the dataset identifier “3614-1eef”. The glimpse() function allows us to view the structure of the resulting tibble.

Now, what if we want to filter data that meets a certain keyword? Here’s how we can do that:

provider_data_tbl <- get_provider_meta_data(.keyword = "medic")

# Let's see all the titles that contain the keyword "medic"
provider_data_tbl[["title"]]
 [1] "Addiction Medicine Office Visit Costs"                                     
 [2] "Emergency Medicine Office Visit Costs"                                     
 [3] "Geriatric Medicine Office Visit Costs"                                     
 [4] "Internal Medicine Office Visit Costs"                                      
 [5] "Medical Genetics and Genomics Office Visit Costs"                          
 [6] "Medical Oncology Office Visit Costs"                                       
 [7] "Medical Toxicology Office Visit Costs"                                     
 [8] "Nuclear Medicine Office Visit Costs"                                       
 [9] "Osteopathic Manipulative Medicine Office Visit Costs"                      
[10] "Pediatric Medicine Office Visit Costs"                                     
[11] "Physical Medicine and Rehabilitation Office Visit Costs"                   
[12] "Preventive Medicine Office Visit Costs"                                    
[13] "Sleep Medicine Office Visit Costs"                                         
[14] "Sports Medicine Office Visit Costs"                                        
[15] "Undersea and Hyperbaric Medicine Office Visit Costs"                       
[16] "Medical Equipment Suppliers"                                               
[17] "Home Health Care - Patient Survey (HHCAHPS) 2022Q4 to 2023Q3"              
[18] "Home Health Care - Patient Survey (HHCAHPS) National Data 2022Q4 to 2023Q3"
[19] "Home Health Care - Patient Survey (HHCAHPS) State Data 2022Q4 to 2023Q3"   
[20] "Home Health Care - Patient Survey (HHCAHPS) Measure Dates 2022Q4 to 2023Q3"
[21] "Medicare Spending Per Beneficiary - Hospital Additional Decimal Places"    
[22] "Hospital Value-Based Purchasing (HVBP) - Efficiency Scores"                
[23] "Medicare Hospital Spending by Claim"                                       
[24] "Medicare Spending Per Beneficiary - Hospital"                              
[25] "Medicare Spending Per Beneficiary - National"                              
[26] "Medicare Spending Per Beneficiary - State"                                 
# Now let's group them by theme
provider_data_tbl |>
  count(theme, sort = TRUE) |>
  unnest(cols = c(theme))
# A tibble: 4 × 2
  theme                            n
  <chr>                        <int>
1 Physician office visit costs    15
2 Hospitals                        6
3 Home health services             4
4 Supplier directory               1

In this example, the metadata is filtered based on the keyword “medic”. We then extract the titles containing the keyword and group them by theme to see the distribution of themes in the filtered data. Notice that we filtered the keyword not on a full word but on a partial match, which can be useful for broad searches.

Benefits of Using get_provider_meta_data()

This function is particularly useful for:

  • Data Scientists and Analysts: Quickly finding relevant datasets without manually searching through large repositories.
  • Healthcare Researchers: Accessing comprehensive metadata to support research and analysis.
  • Developers: Integrating CMS metadata retrieval into applications or workflows with minimal effort.

Conclusion

The get_provider_meta_data() function is a robust tool for anyone working with healthcare data. It not only saves time but also provides a cleaner, more efficient way to manage and analyze dataset metadata.

Give it a try and see how it can enhance your data workflows. Happy coding!

Feel free to share your experiences and any creative ways you’re using this function in the comments below. Until next time, keep exploring and innovating with R!


Steve