Fan Yang - MUSA 5080
  • Home
  • Weekly Notes
    • Week 1
    • Week 2
    • Week 3
    • Week 4
    • Week 5
    • Week 6
    • Week 7
    • Week 9
    • Week 10
    • Week 11
    • Week 12
  • Labs
    • Lab 1: Setup Instructions
    • Lab 2: Getting Started with dplyr
    • Lab 3: Data Visualization and EDA
    • Lab 4: Spatial Operations with Pennsylvania Data
  • Assignments
    • Assignment 1: Census Data Quality for Policy Decisions
    • Assignment 2: Spatial Analysis and Visualization
    • Assignment 4: Spatial Predictive Analysis
    • Assignment 5: Space-Time Prediction of Bike Share Demand
  • Final
    • Final Slides
    • Technical Appendix
    • README

On this page

  • Overview
    • Key Learning Objectives
  • Introduction to Logistic Regression
    • What Makes Binary Outcomes Different?
    • The Logistic Function
    • The Logit Transformation
  • Building Logistic Models
    • Fitting the Model
    • Making Predictions
  • Evaluating Binary Predictions
    • The Confusion Matrix
    • Performance Metrics
  • The Threshold Decision
    • Why Threshold Choice Matters
    • Two Policy Scenarios
  • ROC Curves
    • What ROC Shows
    • Creating ROC Curve
    • Interpreting AUC
  • Equity Considerations
    • Disparate Impact
    • Real-World Case: COMPAS
  • How to Choose a Threshold
    • Framework
    • Practical Recommendations
  • Key Takeaways
    • Logistic Regression Skills
    • Evaluation Metrics
    • Critical Considerations
    • Common Pitfalls

MUSA 5080 Notes #10

Week 10: Logistic Regression for Binary Outcomes

Author

Fan Yang

Published

November 10, 2025

Note

Week 10: Logistic Regression for Binary Outcomes
Date: 11/10/2025

Overview

This week we learned about logistic regression for predicting binary outcomes (yes/no, 0/1). Key topics include the logistic function, interpreting odds ratios, confusion matrices, ROC curves, threshold selection, and equity considerations.

Key Learning Objectives

  • Understand when to use logistic regression vs. linear regression
  • Fit and interpret logistic regression models
  • Master confusion matrices and classification metrics
  • Understand ROC curves and AUC
  • Choose appropriate thresholds
  • Recognize equity considerations

Introduction to Logistic Regression

What Makes Binary Outcomes Different?

Problem with linear regression for binary outcomes: - Predictions can be > 1 or < 0 (makes no sense!) - Assumes constant effect (not realistic) - Violates regression assumptions

The Logistic Function

Solution: Predict probability that Y = 1

\[p(X) = \frac{1}{1+e^{-(\beta_0 + \beta_1X_1 + ... + \beta_kX_k)}}\]

Key properties: - Always outputs values between 0 and 1 - S-shaped curve (sigmoid)

The Logit Transformation

We work with log-odds:

\[\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1X_1 + ...\]

Interpretation: - Coefficients are log-odds - Exponentiate to get odds ratios: \(e^{\beta}\) - OR > 1: increases odds - OR < 1: decreases odds

Building Logistic Models

Fitting the Model

# Use glm() with family = "binomial"
model <- glm(
  is_spam ~ exclamation_marks + contains_free + length,
  data = spam_data,
  family = "binomial"
)

# Convert to odds ratios
odds_ratios <- exp(coef(model))

Interpretation: - If OR = 1.15: 15% increase in odds per unit - If OR = 0.80: 20% decrease in odds per unit - If OR = 2.00: Doubling of odds per unit

Making Predictions

# Predict probabilities
predicted_prob <- predict(model, newdata = new_email, type = "response")

But: If probability = 0.723, is this spam or not? Need to choose a threshold

Evaluating Binary Predictions

The Confusion Matrix

Four outcomes: - True Positive (TP): Correct positive prediction - False Positive (FP): Wrong positive (Type I error) - True Negative (TN): Correct negative prediction - False Negative (FN): Wrong negative (Type II error)

Performance Metrics

Sensitivity (Recall): \[\text{Sensitivity} = \frac{TP}{TP + FN}\] “Of all actual positives, how many did we catch?”

Specificity: \[\text{Specificity} = \frac{TN}{TN + FP}\] “Of all actual negatives, how many did we correctly identify?”

Precision: \[\text{Precision} = \frac{TP}{TP + FP}\] “Of our positive predictions, how many were correct?”

Accuracy: \[\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}\]

The Threshold Decision

Why Threshold Choice Matters

Threshold = 0.3 (low bar) - Higher sensitivity (catch more positives) - Lower specificity (more false alarms)

Threshold = 0.7 (high bar) - Lower sensitivity (miss some positives) - Higher specificity (fewer false alarms)

Trade-off: Can’t maximize both simultaneously!

Two Policy Scenarios

Scenario A: Rare, deadly disease - Goal: Don’t miss any cases (high sensitivity) - Acceptable: Some false positives (low threshold)

Scenario B: Limited intervention slots - Goal: Use resources efficiently (high precision) - Decision depends on: Cost of intervention vs. missed case

ROC Curves

What ROC Shows

ROC = Receiver Operating Characteristic

  • X-axis: False Positive Rate (1 - Specificity)
  • Y-axis: True Positive Rate (Sensitivity)
  • Diagonal line: Random guessing
  • Top-left corner: Perfect prediction

Creating ROC Curve

library(pROC)

roc_obj <- roc(actual_outcome, predicted_probability)
ggroc(roc_obj) + geom_abline(slope = 1, intercept = 1, linetype = "dashed")

# Calculate AUC
auc_value <- auc(roc_obj)

Interpreting AUC

  • AUC = 1.0: Perfect classifier
  • AUC = 0.9-1.0: Excellent
  • AUC = 0.8-0.9: Good
  • AUC = 0.7-0.8: Acceptable
  • AUC = 0.5: Random guessing

Limitations: - Doesn’t tell us which threshold to use - Doesn’t account for class imbalance - Doesn’t show equity implications

Equity Considerations

Disparate Impact

A model can be “accurate” overall but perform differently across groups

Example from recidivism model:

Group Sensitivity Specificity False Positive Rate
Overall 0.72 0.68 0.32
Group A 0.78 0.74 0.26
Group B 0.64 0.58 0.42

Group B experiences: Lower sensitivity, lower specificity, higher false positive rate

Real-World Case: COMPAS

ProPublica investigation (2016): - Similar overall accuracy for Black and White defendants - BUT: False positive rates differed dramatically - Black: 45% false positive rate - White: 23% false positive rate - Black defendants twice as likely to be incorrectly labeled “high risk”

Key insight: Overall accuracy masks disparate impact

How to Choose a Threshold

Framework

  1. Understand consequences: What happens with FP vs. FN?
  2. Consider stakeholders: Who is affected by each error?
  3. Choose metric priority: Sensitivity? Specificity? Precision? Equity?
  4. Test multiple thresholds: Evaluate across thresholds, check group-wise performance

Practical Recommendations

  1. Report multiple metrics (not just accuracy)
  2. Show the ROC curve
  3. Test multiple thresholds
  4. Evaluate by sub-group
  5. Document assumptions
  6. Consider context
  7. Be transparent about limitations

Key Takeaways

Logistic Regression Skills

  1. When to use: Binary outcomes (yes/no, 0/1)
  2. Model fitting: glm() with family = "binomial"
  3. Interpretation: Odds ratios (exponentiate coefficients)
  4. Predictions: Probabilities between 0 and 1
  5. Threshold selection: Critical decision with real consequences

Evaluation Metrics

  1. Sensitivity: Of all positives, how many did we catch?
  2. Specificity: Of all negatives, how many did we correctly identify?
  3. Precision: Of our positive predictions, how many were correct?
  4. ROC Curve: Visualizes all threshold trade-offs
  5. AUC: Overall discrimination ability (but has limitations)

Critical Considerations

  1. Threshold choice matters: No single “right” threshold
  2. Costs are asymmetric: FP and FN have different consequences
  3. Equity matters: Models can perform differently across groups
  4. Overall accuracy can mask bias: Always check group-wise performance
  5. Transparency is essential: Document choices and assumptions

Common Pitfalls

  • Using linear regression for binary outcomes
  • Defaulting to 0.5 threshold
  • Only reporting accuracy
  • Ignoring class imbalance
  • Not checking for disparate impact
  • Not considering real-world costs