Demystifying Odds Ratios in Logistic Regression: Your R Recipe for Loan Defaults

rtip
regression
Author

Steven P. Sanderson II, MPH

Published

December 15, 2023

Introduction

Ever wondered why some individuals default on loans while others don’t? Logistic regression can shed light on this, and calculating odds ratios in R is the secret sauce. So, strap on your data aprons, folks, and let’s cook up some insights!

What are Odds Ratios?

Imagine a loan officer flipping a coin to decide whether to approve your loan. Odds ratios tell you how much more likely one factor (like your income) makes the “heads” (approval) side appear compared to another (like your student status).

In logistic regression, odds ratios compare the odds of an event (loan default, in our case) for two groups defined by a specific variable. They’re like multipliers: greater than 1 means something increases the chances of default, while less than 1 means it decreases them.

The R Recipe (with ISLR Flavor)

  1. Gather your ingredients: Load the ISLR package and the Default dataset. This data tells us whether individuals defaulted on loans, their student status, bank balance, and income.
  2. Whip up the model: Use the glm() function with family='binomial' to fit a logistic regression model that predicts loan defaults based on student status, balance, and income. Think of it as the base for your delicious insights.
  3. Extract the spices: Use the summary() function to access the estimated coefficients for each variable. These are the secret ingredients that give your model flavor.
  4. Unleash the magic of exponentiation: Apply the exp() function to transform the coefficients back to the odds ratio scale. Remember, logistic regression operates on log-odds, so we need to break the code.
  5. Savor the results: Analyze the odds ratios. Are they greater than 1? Those factors increase default odds. Less than 1? They decrease them. A value near 1 suggests little to no effect.

Example Time

# Load ISLR package and data
library(ISLR)

head(Default)
  default student   balance    income
1      No      No  729.5265 44361.625
2      No     Yes  817.1804 12106.135
3      No      No 1073.5492 31767.139
4      No      No  529.2506 35704.494
5      No      No  785.6559 38463.496
6      No     Yes  919.5885  7491.559
# Fit the model
model <- glm(default~student+balance+income, family='binomial', data=Default)

#disable scientific notation for model summary
options(scipen=999)

# Extract and exponentiate coefficients
odds_ratios <- exp(coef(model))

# Print the odds ratios
cat("Odds ratios:")
Odds ratios:
print(odds_ratios)
  (Intercept)    studentYes       balance        income 
0.00001903854 0.52373166965 1.00575299051 1.00000303345 
cat("Odds ratios with confidence intervals:")
Odds ratios with confidence intervals:
exp(cbind(Odds_Ratio = coef(model), confint(model)))
Waiting for profiling to be done...
               Odds_Ratio          2.5 %       97.5 %
(Intercept) 0.00001903854 0.000007074481 0.0000487808
studentYes  0.52373166965 0.329882707270 0.8334223982
balance     1.00575299051 1.005308940686 1.0062238757
income      1.00000303345 0.999986952969 1.0000191246

Interpretation time! Being a student decreases default with log odds by -0.646, while higher income leaves log odds basically flat.

Go Forth and Experiment!

This is just the tip of the iceberg! Play around with different models, variables, and visualizations using RStudio. Remember, the more you experiment, the better you’ll understand the magic of odds ratios and logistic regression. Now, go forth and analyze!

Bonus Tip: Check out the confint() function to calculate confidence intervals for your odds ratios. This adds another layer of spice to your statistical analysis!

So, there you have it! Odds ratios in R, made easy with the ISLR package and a dash of culinary magic. Remember, the key ingredients are understanding, practice, and a sprinkle of creativity. Bon appétit, data chefs!