Georgia DOC Policy Advisory Challenge
In-Class Group Activity - Week 10
Your Role
You are policy analysts hired by the Georgia Department of Corrections. They are considering deploying a recidivism prediction model to inform parole decisions. Your team must analyze the model and make a GO/NO-GO recommendation to the Commissioner.
Instructions
Phase 1: Individual Exploration
Run the provided R script (week10_exercise.R) and note:
- What’s the model’s AUC?
- 0.732
- At threshold 0.50, what’s the sensitivity and specificity?
- Sensitivity: 0.8167219
- Specificity: 0.4923896
- Which racial group has the highest false positive rate?
- Black, cos the FPR is 0.562 compared with 0.425 in white racial group.
- Which group has the highest false negative rate?
- White, cos the FNR is 0.227 compared with 0.154 in black.
- What happens if we change the threshold to 0.30 or 0.70?
- The FPR in both groups fall down dramatically, but the rate in black racial group is still higher than that in white racial group.
- In reverse, FNR shows different pattern if we change the threshold to 0.30 or 0.70, which is an increase trend. But still, FNR in white racial group is higher than FNR in black group.
Phase 2: Group Analysis
As a table or half table team, discuss your findings and complete the template below. Prepare to present your recommendation.
Phase 3: Presentation
Present your recommendation to the “Commissioner” (instructor).
Policy Recommendation Template
Complete this as a group and be ready to present
Consulting Team Information
Clever Team Name: _____________
Team Members:
- Christine_____________________
- Demi_____________________
- Jinyang_____________________
1. TECHNICAL ASSESSMENT
Model Performance Metrics
AUC (Area Under ROC Curve): 0.732____
At threshold = 0.50:
- Sensitivity (True Positive Rate): 0.817________
- Specificity (True Negative Rate): 0.492______
- Precision (Positive Predictive Value): 0.706______
- Overall Accuracy: 0.686______
Technical Quality Rating
Select one:
- ☐√ Acceptable (AUC 0.70-0.80)
Brief Technical Summary (2-3 sentences)
Is the model accurate enough for high-stakes decision-making?
The model demonstrates acceptable but limited discriminatory power (AUC = 0.732). While it has good sensitivity (81.7%), its low specificity (49.2%) results in a high rate of false positives. Therefore, it is not accurate enough for high-stakes decision-making without careful consideration of the costs associated with false alarms.______________________________________________________________________
2. EQUITY ANALYSIS
False Positive Rates by Race (at threshold 0.50)
| Racial Group | False Positive Rate | Sample Size |
|---|---|---|
| Group 1: Black | 0.562 | 3931 |
| Group 2: White | 0.425 | 2620 |
| Group 3: | ||
| Group 4: |
False Negative Rates by Race (at threshold 0.50)
| Racial Group | False Negative Rate | Sample Size |
|---|---|---|
| Group 1: Black | 0.154 | 3931 |
| Group 2: White | 0.227 | 2620 |
| Group 3: | ||
| Group 4: |
Disparity Analysis
Largest disparity identified:
Group _____Black________ has 13.7_% higher false positive rate than Group White______
OR
Group ______White_______ has 7.3% higher false negative rate than Group ____Black_________
Equity Concerns Summary (3-4 sentences)
What are the implications of these disparities? Who is harmed?
The model exhibits significant racial disparity, with the Black population experiencing a 13.7% higher False Positive Rate than the White population. This means that Black individuals are disproportionately harmed by being incorrectly flagged as “high-risk” when they are not. Such a disparity could lead to unjust outcomes, including denied opportunities or increased scrutiny, thereby perpetuating and automating existing biases. The model in its current state raises serious equity concerns and is not suitable for deployment without algorithmic fairness interventions.
3. THRESHOLD RECOMMENDATION
If we deploy this model, we recommend:
Select one:
- ☐√ Threshold = 0.70 (Conservative - minimize false accusations)
Rationale for Threshold Choice (3-4 sentences)
Why this threshold? What does it optimize for? What are the trade-offs?
- We recommend the conservative threshold of 0.70 to prioritize minimizing false positives, thereby reducing the risk of unjustly accusing individuals who would not reoffend. This choice is critically informed by the equity analysis, which revealed that false positives disproportionately harm the Black population. The trade-off is a deliberate acceptance of more false negatives, meaning some actual recidivists may be missed, in order to prevent the more severe societal harm of systematic false accusations against a protected group. This approach optimizes for fairness and mitigates the model’s potential to amplify existing biases.
This threshold prioritizes:
Select one:
- ☐√ High Specificity - Avoid false accusations (accept more false negatives)
4. DEPLOYMENT RECOMMENDATION
Our recommendation to Georgia DOC:
Select one:
- ☐√ CONDITIONAL DEPLOY - Deploy only with specific safeguards in place
Key Reasons for Our Recommendation
Provide 3-5 bullet points supporting your decision:
What about the equity concerns?
How do you justify your recommendation given the disparate impact you identified?
5. SAFEGUARDS OR ALTERNATIVES
If DEPLOY - Required Safeguards
What protections must be in place before deployment?
OR
If DO NOT DEPLOY - Alternative Approaches
What should Georgia DOC do instead?
6. LIMITATIONS & UNCERTAINTIES
What we don’t know (but wish we did)
What additional information would strengthen your recommendation?
Weaknesses in our recommendation
What’s the strongest argument AGAINST your recommendation?
7. BOTTOM LINE
One-Sentence Recommendation
If the Commissioner only reads one thing, what should it be?