Georgia DOC Policy Advisory Challenge

In-Class Group Activity - Week 10


Your Role

You are policy analysts hired by the Georgia Department of Corrections. They are considering deploying a recidivism prediction model to inform parole decisions. Your team must analyze the model and make a GO/NO-GO recommendation to the Commissioner.


Instructions

Phase 1: Individual Exploration

Run the provided R script (week10_exercise.R) and note:

  1. What’s the model’s AUC?
  • 0.732
  1. At threshold 0.50, what’s the sensitivity and specificity?
  • Sensitivity: 0.8167219
  • Specificity: 0.4923896
  1. Which racial group has the highest false positive rate?
  • Black, cos the FPR is 0.562 compared with 0.425 in white racial group.
  1. Which group has the highest false negative rate?
  • White, cos the FNR is 0.227 compared with 0.154 in black.
  1. What happens if we change the threshold to 0.30 or 0.70?
  • The FPR in both groups fall down dramatically, but the rate in black racial group is still higher than that in white racial group.
  • In reverse, FNR shows different pattern if we change the threshold to 0.30 or 0.70, which is an increase trend. But still, FNR in white racial group is higher than FNR in black group.

Phase 2: Group Analysis

As a table or half table team, discuss your findings and complete the template below. Prepare to present your recommendation.

Phase 3: Presentation

Present your recommendation to the “Commissioner” (instructor).


Policy Recommendation Template

Complete this as a group and be ready to present


Consulting Team Information

Clever Team Name: _____________

Team Members:

  • Christine_____________________
  • Demi_____________________
  • Jinyang_____________________




1. TECHNICAL ASSESSMENT

Model Performance Metrics

AUC (Area Under ROC Curve): 0.732____

At threshold = 0.50:

  • Sensitivity (True Positive Rate): 0.817________
  • Specificity (True Negative Rate): 0.492______
  • Precision (Positive Predictive Value): 0.706______
  • Overall Accuracy: 0.686______

Technical Quality Rating

Select one:

  • ☐√ Acceptable (AUC 0.70-0.80)

Brief Technical Summary (2-3 sentences)

Is the model accurate enough for high-stakes decision-making?

The model demonstrates acceptable but limited discriminatory power (AUC = 0.732). While it has good sensitivity (81.7%), its low specificity (49.2%) results in a high rate of false positives. Therefore, it is not accurate enough for high-stakes decision-making without careful consideration of the costs associated with false alarms.______________________________________________________________________




2. EQUITY ANALYSIS

False Positive Rates by Race (at threshold 0.50)

Racial Group False Positive Rate Sample Size
Group 1: Black 0.562 3931
Group 2: White 0.425 2620
Group 3:
Group 4:

False Negative Rates by Race (at threshold 0.50)

Racial Group False Negative Rate Sample Size
Group 1: Black 0.154 3931
Group 2: White 0.227 2620
Group 3:
Group 4:

Disparity Analysis

Largest disparity identified:

Group _____Black________ has 13.7_% higher false positive rate than Group White______

OR

Group ______White_______ has 7.3% higher false negative rate than Group ____Black_________

Equity Concerns Summary (3-4 sentences)

What are the implications of these disparities? Who is harmed?

The model exhibits significant racial disparity, with the Black population experiencing a 13.7% higher False Positive Rate than the White population. This means that Black individuals are disproportionately harmed by being incorrectly flagged as “high-risk” when they are not. Such a disparity could lead to unjust outcomes, including denied opportunities or increased scrutiny, thereby perpetuating and automating existing biases. The model in its current state raises serious equity concerns and is not suitable for deployment without algorithmic fairness interventions.


3. THRESHOLD RECOMMENDATION

If we deploy this model, we recommend:

Select one:

  • ☐√ Threshold = 0.70 (Conservative - minimize false accusations)

Rationale for Threshold Choice (3-4 sentences)

Why this threshold? What does it optimize for? What are the trade-offs?

  • We recommend the conservative threshold of 0.70 to prioritize minimizing false positives, thereby reducing the risk of unjustly accusing individuals who would not reoffend. This choice is critically informed by the equity analysis, which revealed that false positives disproportionately harm the Black population. The trade-off is a deliberate acceptance of more false negatives, meaning some actual recidivists may be missed, in order to prevent the more severe societal harm of systematic false accusations against a protected group. This approach optimizes for fairness and mitigates the model’s potential to amplify existing biases.

This threshold prioritizes:

Select one:

  • ☐√ High Specificity - Avoid false accusations (accept more false negatives)

4. DEPLOYMENT RECOMMENDATION

Our recommendation to Georgia DOC:

Select one:

  • ☐√ CONDITIONAL DEPLOY - Deploy only with specific safeguards in place

Key Reasons for Our Recommendation

Provide 3-5 bullet points supporting your decision:






What about the equity concerns?

How do you justify your recommendation given the disparate impact you identified?





5. SAFEGUARDS OR ALTERNATIVES

If DEPLOY - Required Safeguards

What protections must be in place before deployment?





OR

If DO NOT DEPLOY - Alternative Approaches

What should Georgia DOC do instead?






6. LIMITATIONS & UNCERTAINTIES

What we don’t know (but wish we did)

What additional information would strengthen your recommendation?




Weaknesses in our recommendation

What’s the strongest argument AGAINST your recommendation?





7. BOTTOM LINE

One-Sentence Recommendation

If the Commissioner only reads one thing, what should it be?