An Overview of the New Parameter Estimate Functions in the TidyDensity Package

code
rtip
tidydensity
Author

Steven P. Sanderson II, MPH

Published

June 3, 2024

Introduction

Hello, R enthusiasts! I’m excited to share some fantastic updates to the TidyDensity package. These updates introduce a suite of parameter estimate functions designed to make your data analysis more efficient and insightful. Whether you’re dealing with common distributions or more specialized ones, these functions have got you covered.

Why Parameter Estimation?

Parameter estimation is crucial when working with statistical distributions. It allows you to infer the parameters of a distribution from your data, providing insights into its underlying structure. This is particularly useful when you want to model real-world phenomena accurately.

New Parameter Estimate Functions

Here’s a quick rundown of the newly introduced functions in TidyDensity:

  1. util_zero_truncated_negative_binomial_param_estimate()
  2. util_zero_truncated_poisson_param_estimate()
  3. util_f_param_estimate()
  4. util_zero_truncated_geometric_param_estimate()
  5. util_t_param_estimate()
  6. util_pareto1_param_estimate()
  7. util_paralogistic_param_estimate()
  8. util_inverse_weibull_param_estimate()
  9. util_inverse_pareto_param_estimate()
  10. util_inverse_burr_param_estimate()
  11. util_generalized_pareto_param_estimate()
  12. util_generalized_beta_param_estimate()
  13. util_zero_truncated_binomial_param_estimate()

Each function is tailored to a specific distribution, providing a streamlined way to estimate its parameters.

Example: Estimating Parameters of a t Distribution

Let’s dive into an example using the util_t_param_estimate() function. Suppose you have data that you believe follows a t distribution. Here’s how you can estimate its parameters:

library(dplyr)
library(ggplot2)
library(TidyDensity)

set.seed(123)
x <- rt(100, df = 10, ncp = 0.5)
output <- util_t_param_estimate(x)

# Display the estimated parameters
print(output$parameter_tbl)
# A tibble: 2 × 7
  dist_type      samp_size  mean variance method df_est ncp_est
  <chr>              <int> <dbl>    <dbl> <chr>   <dbl>   <dbl>
1 T Distribution       100 0.612    0.949 MME     0.959   0.612
2 T Distribution       100 0.612    0.949 MLE     8.32    0.571

In this example, we generated some data from a t distribution with degrees of freedom (df) of 10 and a non-centrality parameter (ncp) of 0.5. Using the util_t_param_estimate() function, we estimated these parameters from the data.

The parameter_tbl in the output contains the estimated values, while combined_data_tbl can be used for visualization.

Visualizing the Results

Here’s what the output might look like:

# Visualize the combined data
output$combined_data_tbl |>
  tidy_combined_autoplot(.interactive = TRUE)

In the above plot, we visualize the output of the util_t_param_estimate() function from the TidyDensity package. The visualization shows how well the estimated t distribution fits our sample data. The x-axis represents the data values, while the y-axis shows the density. The different colors represent the data and the estimated density functions.

How to Use the New Functions

Each of the new parameter estimate functions follows a similar approach. Here’s a step-by-step guide to get you started:

  1. Load your data: Ensure your data is properly formatted and loaded into R.
  2. Select the appropriate function: Choose the function that matches the distribution you believe your data follows.
  3. Estimate the parameters: Use the selected function to estimate the parameters.
  4. Analyze and visualize: Review the estimated parameters and use the visualization functions to see how well the estimated distribution fits your data.

Your Turn!

I highly encourage you to try these new functions on your own datasets. Whether you’re working with common distributions or tackling more specialized ones, these tools can help you gain deeper insights into your data.

Feel free to experiment and see how these functions perform with different types of data. The more you explore, the better you’ll understand the strengths and applications of each distribution.

Conclusion

The new parameter estimate functions in TidyDensity open up exciting possibilities for data analysis. By simplifying the process of parameter estimation, they allow you to focus more on interpreting results and making informed decisions.

Give these functions a try and see how they can enhance your analysis workflow. Happy coding!