MUSA 5080 Notes #10
Week 10: Logistic Regression for Binary Outcomes
Week 10: Logistic Regression for Binary Outcomes
Date: 11/10/2025
Overview
This week we learned about logistic regression for predicting binary outcomes (yes/no, 0/1). Key topics include the logistic function, interpreting odds ratios, confusion matrices, ROC curves, threshold selection, and equity considerations.
Key Learning Objectives
- Understand when to use logistic regression vs. linear regression
- Fit and interpret logistic regression models
- Master confusion matrices and classification metrics
- Understand ROC curves and AUC
- Choose appropriate thresholds
- Recognize equity considerations
Introduction to Logistic Regression
What Makes Binary Outcomes Different?
Problem with linear regression for binary outcomes: - Predictions can be > 1 or < 0 (makes no sense!) - Assumes constant effect (not realistic) - Violates regression assumptions
The Logistic Function
Solution: Predict probability that Y = 1
\[p(X) = \frac{1}{1+e^{-(\beta_0 + \beta_1X_1 + ... + \beta_kX_k)}}\]
Key properties: - Always outputs values between 0 and 1 - S-shaped curve (sigmoid)
The Logit Transformation
We work with log-odds:
\[\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1X_1 + ...\]
Interpretation: - Coefficients are log-odds - Exponentiate to get odds ratios: \(e^{\beta}\) - OR > 1: increases odds - OR < 1: decreases odds
Building Logistic Models
Fitting the Model
# Use glm() with family = "binomial"
model <- glm(
is_spam ~ exclamation_marks + contains_free + length,
data = spam_data,
family = "binomial"
)
# Convert to odds ratios
odds_ratios <- exp(coef(model))Interpretation: - If OR = 1.15: 15% increase in odds per unit - If OR = 0.80: 20% decrease in odds per unit - If OR = 2.00: Doubling of odds per unit
Making Predictions
# Predict probabilities
predicted_prob <- predict(model, newdata = new_email, type = "response")But: If probability = 0.723, is this spam or not? Need to choose a threshold
Evaluating Binary Predictions
The Confusion Matrix
Four outcomes: - True Positive (TP): Correct positive prediction - False Positive (FP): Wrong positive (Type I error) - True Negative (TN): Correct negative prediction - False Negative (FN): Wrong negative (Type II error)
Performance Metrics
Sensitivity (Recall): \[\text{Sensitivity} = \frac{TP}{TP + FN}\] “Of all actual positives, how many did we catch?”
Specificity: \[\text{Specificity} = \frac{TN}{TN + FP}\] “Of all actual negatives, how many did we correctly identify?”
Precision: \[\text{Precision} = \frac{TP}{TP + FP}\] “Of our positive predictions, how many were correct?”
Accuracy: \[\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}\]
The Threshold Decision
Why Threshold Choice Matters
Threshold = 0.3 (low bar) - Higher sensitivity (catch more positives) - Lower specificity (more false alarms)
Threshold = 0.7 (high bar) - Lower sensitivity (miss some positives) - Higher specificity (fewer false alarms)
Trade-off: Can’t maximize both simultaneously!
Two Policy Scenarios
Scenario A: Rare, deadly disease - Goal: Don’t miss any cases (high sensitivity) - Acceptable: Some false positives (low threshold)
Scenario B: Limited intervention slots - Goal: Use resources efficiently (high precision) - Decision depends on: Cost of intervention vs. missed case
ROC Curves
What ROC Shows
ROC = Receiver Operating Characteristic
- X-axis: False Positive Rate (1 - Specificity)
- Y-axis: True Positive Rate (Sensitivity)
- Diagonal line: Random guessing
- Top-left corner: Perfect prediction
Creating ROC Curve
library(pROC)
roc_obj <- roc(actual_outcome, predicted_probability)
ggroc(roc_obj) + geom_abline(slope = 1, intercept = 1, linetype = "dashed")
# Calculate AUC
auc_value <- auc(roc_obj)Interpreting AUC
- AUC = 1.0: Perfect classifier
- AUC = 0.9-1.0: Excellent
- AUC = 0.8-0.9: Good
- AUC = 0.7-0.8: Acceptable
- AUC = 0.5: Random guessing
Limitations: - Doesn’t tell us which threshold to use - Doesn’t account for class imbalance - Doesn’t show equity implications
Equity Considerations
Disparate Impact
A model can be “accurate” overall but perform differently across groups
Example from recidivism model:
| Group | Sensitivity | Specificity | False Positive Rate |
|---|---|---|---|
| Overall | 0.72 | 0.68 | 0.32 |
| Group A | 0.78 | 0.74 | 0.26 |
| Group B | 0.64 | 0.58 | 0.42 |
Group B experiences: Lower sensitivity, lower specificity, higher false positive rate
Real-World Case: COMPAS
ProPublica investigation (2016): - Similar overall accuracy for Black and White defendants - BUT: False positive rates differed dramatically - Black: 45% false positive rate - White: 23% false positive rate - Black defendants twice as likely to be incorrectly labeled “high risk”
Key insight: Overall accuracy masks disparate impact
How to Choose a Threshold
Framework
- Understand consequences: What happens with FP vs. FN?
- Consider stakeholders: Who is affected by each error?
- Choose metric priority: Sensitivity? Specificity? Precision? Equity?
- Test multiple thresholds: Evaluate across thresholds, check group-wise performance
Practical Recommendations
- Report multiple metrics (not just accuracy)
- Show the ROC curve
- Test multiple thresholds
- Evaluate by sub-group
- Document assumptions
- Consider context
- Be transparent about limitations
Key Takeaways
Logistic Regression Skills
- When to use: Binary outcomes (yes/no, 0/1)
- Model fitting:
glm()withfamily = "binomial" - Interpretation: Odds ratios (exponentiate coefficients)
- Predictions: Probabilities between 0 and 1
- Threshold selection: Critical decision with real consequences
Evaluation Metrics
- Sensitivity: Of all positives, how many did we catch?
- Specificity: Of all negatives, how many did we correctly identify?
- Precision: Of our positive predictions, how many were correct?
- ROC Curve: Visualizes all threshold trade-offs
- AUC: Overall discrimination ability (but has limitations)
Critical Considerations
- Threshold choice matters: No single “right” threshold
- Costs are asymmetric: FP and FN have different consequences
- Equity matters: Models can perform differently across groups
- Overall accuracy can mask bias: Always check group-wise performance
- Transparency is essential: Document choices and assumptions
Common Pitfalls
- Using linear regression for binary outcomes
- Defaulting to 0.5 threshold
- Only reporting accuracy
- Ignoring class imbalance
- Not checking for disparate impact
- Not considering real-world costs