MUSA 5080 Notes #10

Week 10: Logistic Regression for Binary Outcomes

Author

Fan Yang

Published

November 10, 2025

Note

Week 10: Logistic Regression for Binary Outcomes
Date: 11/10/2025

Overview

This week we learned about logistic regression for predicting binary outcomes (yes/no, 0/1). Key topics include the logistic function, interpreting odds ratios, confusion matrices, ROC curves, threshold selection, and equity considerations.

Key Learning Objectives

Understand when to use logistic regression vs. linear regression
Fit and interpret logistic regression models
Master confusion matrices and classification metrics
Understand ROC curves and AUC
Choose appropriate thresholds
Recognize equity considerations

Introduction to Logistic Regression

What Makes Binary Outcomes Different?

Problem with linear regression for binary outcomes: - Predictions can be > 1 or < 0 (makes no sense!) - Assumes constant effect (not realistic) - Violates regression assumptions

The Logistic Function

Solution: Predict probability that Y = 1

\[p(X) = \frac{1}{1+e^{-(\beta_0 + \beta_1X_1 + ... + \beta_kX_k)}}\]

Key properties: - Always outputs values between 0 and 1 - S-shaped curve (sigmoid)

The Logit Transformation

We work with log-odds:

\[\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1X_1 + ...\]

Interpretation: - Coefficients are log-odds - Exponentiate to get odds ratios: \(e^{\beta}\) - OR > 1: increases odds - OR < 1: decreases odds

Building Logistic Models

Fitting the Model

# Use glm() with family = "binomial"
model <- glm(
  is_spam ~ exclamation_marks + contains_free + length,
  data = spam_data,
  family = "binomial"
)

# Convert to odds ratios
odds_ratios <- exp(coef(model))

Interpretation: - If OR = 1.15: 15% increase in odds per unit - If OR = 0.80: 20% decrease in odds per unit - If OR = 2.00: Doubling of odds per unit

Making Predictions

# Predict probabilities
predicted_prob <- predict(model, newdata = new_email, type = "response")

But: If probability = 0.723, is this spam or not? Need to choose a threshold

Evaluating Binary Predictions

The Confusion Matrix

Four outcomes: - True Positive (TP): Correct positive prediction - False Positive (FP): Wrong positive (Type I error) - True Negative (TN): Correct negative prediction - False Negative (FN): Wrong negative (Type II error)

Performance Metrics

Sensitivity (Recall): \[\text{Sensitivity} = \frac{TP}{TP + FN}\] “Of all actual positives, how many did we catch?”

Specificity: \[\text{Specificity} = \frac{TN}{TN + FP}\] “Of all actual negatives, how many did we correctly identify?”

Precision: \[\text{Precision} = \frac{TP}{TP + FP}\] “Of our positive predictions, how many were correct?”

Accuracy: \[\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}\]

The Threshold Decision

Why Threshold Choice Matters

Threshold = 0.3 (low bar) - Higher sensitivity (catch more positives) - Lower specificity (more false alarms)

Threshold = 0.7 (high bar) - Lower sensitivity (miss some positives) - Higher specificity (fewer false alarms)

Trade-off: Can’t maximize both simultaneously!

Two Policy Scenarios

Scenario A: Rare, deadly disease - Goal: Don’t miss any cases (high sensitivity) - Acceptable: Some false positives (low threshold)

Scenario B: Limited intervention slots - Goal: Use resources efficiently (high precision) - Decision depends on: Cost of intervention vs. missed case

ROC Curves

What ROC Shows

ROC = Receiver Operating Characteristic

X-axis: False Positive Rate (1 - Specificity)
Y-axis: True Positive Rate (Sensitivity)
Diagonal line: Random guessing
Top-left corner: Perfect prediction

Creating ROC Curve

library(pROC)

roc_obj <- roc(actual_outcome, predicted_probability)
ggroc(roc_obj) + geom_abline(slope = 1, intercept = 1, linetype = "dashed")

# Calculate AUC
auc_value <- auc(roc_obj)

Interpreting AUC

AUC = 1.0: Perfect classifier
AUC = 0.9-1.0: Excellent
AUC = 0.8-0.9: Good
AUC = 0.7-0.8: Acceptable
AUC = 0.5: Random guessing

Limitations: - Doesn’t tell us which threshold to use - Doesn’t account for class imbalance - Doesn’t show equity implications

Equity Considerations

Disparate Impact

A model can be “accurate” overall but perform differently across groups

Example from recidivism model:

Group	Sensitivity	Specificity	False Positive Rate
Overall	0.72	0.68	0.32
Group A	0.78	0.74	0.26
Group B	0.64	0.58	0.42

Group B experiences: Lower sensitivity, lower specificity, higher false positive rate

Real-World Case: COMPAS

ProPublica investigation (2016): - Similar overall accuracy for Black and White defendants - BUT: False positive rates differed dramatically - Black: 45% false positive rate - White: 23% false positive rate - Black defendants twice as likely to be incorrectly labeled “high risk”

Key insight: Overall accuracy masks disparate impact

How to Choose a Threshold

Framework

Understand consequences: What happens with FP vs. FN?
Consider stakeholders: Who is affected by each error?
Choose metric priority: Sensitivity? Specificity? Precision? Equity?
Test multiple thresholds: Evaluate across thresholds, check group-wise performance

Practical Recommendations

Report multiple metrics (not just accuracy)
Show the ROC curve
Test multiple thresholds
Evaluate by sub-group
Document assumptions
Consider context
Be transparent about limitations

Key Takeaways

Logistic Regression Skills

When to use: Binary outcomes (yes/no, 0/1)
Model fitting: glm() with family = "binomial"
Interpretation: Odds ratios (exponentiate coefficients)
Predictions: Probabilities between 0 and 1
Threshold selection: Critical decision with real consequences

Evaluation Metrics

Sensitivity: Of all positives, how many did we catch?
Specificity: Of all negatives, how many did we correctly identify?
Precision: Of our positive predictions, how many were correct?
ROC Curve: Visualizes all threshold trade-offs
AUC: Overall discrimination ability (but has limitations)

Critical Considerations

Threshold choice matters: No single “right” threshold
Costs are asymmetric: FP and FN have different consequences
Equity matters: Models can perform differently across groups
Overall accuracy can mask bias: Always check group-wise performance
Transparency is essential: Document choices and assumptions

Common Pitfalls

Using linear regression for binary outcomes
Defaulting to 0.5 threshold
Only reporting accuracy
Ignoring class imbalance
Not checking for disparate impact
Not considering real-world costs