Georgia DOC Policy Advisory Challenge

In-Class Group Activity - Week 10

Your Role

You are policy analysts hired by the Georgia Department of Corrections. They are considering deploying a recidivism prediction model to inform parole decisions. Your team must analyze the model and make a GO/NO-GO recommendation to the Commissioner.

Instructions

Phase 1: Individual Exploration

Run the provided R script (week10_exercise.R) and note:

What’s the model’s AUC?

0.732

At threshold 0.50, what’s the sensitivity and specificity?

Sensitivity: 0.8167219
Specificity: 0.4923896

Which racial group has the highest false positive rate?

Black, cos the FPR is 0.562 compared with 0.425 in white racial group.

Which group has the highest false negative rate?

White, cos the FNR is 0.227 compared with 0.154 in black.

What happens if we change the threshold to 0.30 or 0.70?

The FPR in both groups fall down dramatically, but the rate in black racial group is still higher than that in white racial group.
In reverse, FNR shows different pattern if we change the threshold to 0.30 or 0.70, which is an increase trend. But still, FNR in white racial group is higher than FNR in black group.

Phase 2: Group Analysis

As a table or half table team, discuss your findings and complete the template below. Prepare to present your recommendation.

Phase 3: Presentation

Present your recommendation to the “Commissioner” (instructor).

Policy Recommendation Template

Complete this as a group and be ready to present

Consulting Team Information

Clever Team Name: _____________

Team Members:

Christine_____________________
Demi_____________________
Jinyang_____________________

1. TECHNICAL ASSESSMENT

Model Performance Metrics

AUC (Area Under ROC Curve): 0.732____

At threshold = 0.50:

Sensitivity (True Positive Rate): 0.817________
Specificity (True Negative Rate): 0.492______
Precision (Positive Predictive Value): 0.706______
Overall Accuracy: 0.686______

Technical Quality Rating

Select one:

Excellent (AUC > 0.90)
Good (AUC 0.80-0.90)
☐√ Acceptable (AUC 0.70-0.80)
Poor (AUC < 0.70)

Brief Technical Summary (2-3 sentences)

Is the model accurate enough for high-stakes decision-making?

The model demonstrates acceptable but limited discriminatory power (AUC = 0.732). While it has good sensitivity (81.7%), its low specificity (49.2%) results in a high rate of false positives. Therefore, it is not accurate enough for high-stakes decision-making without careful consideration of the costs associated with false alarms.______________________________________________________________________

2. EQUITY ANALYSIS

False Positive Rates by Race (at threshold 0.50)

Racial Group	False Positive Rate	Sample Size
Group 1: Black	0.562	3931
Group 2: White	0.425	2620
Group 3:
Group 4:

False Negative Rates by Race (at threshold 0.50)

Racial Group	False Negative Rate	Sample Size
Group 1: Black	0.154	3931
Group 2: White	0.227	2620
Group 3:
Group 4:

Disparity Analysis

Largest disparity identified:

Group _____Black________ has 13.7_% higher false positive rate than Group White______

Group ______White_______ has 7.3% higher false negative rate than Group ____Black_________

Equity Concerns Summary (3-4 sentences)

What are the implications of these disparities? Who is harmed?

The model exhibits significant racial disparity, with the Black population experiencing a 13.7% higher False Positive Rate than the White population. This means that Black individuals are disproportionately harmed by being incorrectly flagged as “high-risk” when they are not. Such a disparity could lead to unjust outcomes, including denied opportunities or increased scrutiny, thereby perpetuating and automating existing biases. The model in its current state raises serious equity concerns and is not suitable for deployment without algorithmic fairness interventions.

3. THRESHOLD RECOMMENDATION

Rationale for Threshold Choice (3-4 sentences)

Why this threshold? What does it optimize for? What are the trade-offs?

We recommend the conservative threshold of 0.70 to prioritize minimizing false positives, thereby reducing the risk of unjustly accusing individuals who would not reoffend. This choice is critically informed by the equity analysis, which revealed that false positives disproportionately harm the Black population. The trade-off is a deliberate acceptance of more false negatives, meaning some actual recidivists may be missed, in order to prevent the more severe societal harm of systematic false accusations against a protected group. This approach optimizes for fairness and mitigates the model’s potential to amplify existing biases.

This threshold prioritizes:

Select one:

High Sensitivity - Catch more people who will reoffend (accept more false positives)
☐√ High Specificity - Avoid false accusations (accept more false negatives)
Balance - Try to minimize both types of errors

4. DEPLOYMENT RECOMMENDATION

Our recommendation to Georgia DOC:

Select one:

DEPLOY - Use this model to inform parole decisions
DO NOT DEPLOY - Do not use this model
☐√ CONDITIONAL DEPLOY - Deploy only with specific safeguards in place

Key Reasons for Our Recommendation

Provide 3-5 bullet points supporting your decision:

What about the equity concerns?

How do you justify your recommendation given the disparate impact you identified?

Georgia DOC Policy Advisory Challenge

In-Class Group Activity - Week 10

Your Role

Instructions

Phase 1: Individual Exploration

Phase 2: Group Analysis

Phase 3: Presentation

Policy Recommendation Template

Consulting Team Information

1. TECHNICAL ASSESSMENT

Model Performance Metrics

Technical Quality Rating

Brief Technical Summary (2-3 sentences)

2. EQUITY ANALYSIS

False Positive Rates by Race (at threshold 0.50)

False Negative Rates by Race (at threshold 0.50)

Disparity Analysis

Equity Concerns Summary (3-4 sentences)

3. THRESHOLD RECOMMENDATION

Rationale for Threshold Choice (3-4 sentences)

This threshold prioritizes:

4. DEPLOYMENT RECOMMENDATION

Our recommendation to Georgia DOC:

Key Reasons for Our Recommendation

What about the equity concerns?

5. SAFEGUARDS OR ALTERNATIVES

If DEPLOY - Required Safeguards

If DO NOT DEPLOY - Alternative Approaches

6. LIMITATIONS & UNCERTAINTIES

What we don’t know (but wish we did)

Weaknesses in our recommendation

7. BOTTOM LINE

One-Sentence Recommendation

Georgia DOC Policy Advisory Challenge

In-Class Group Activity - Week 10

Your Role

Instructions

Phase 1: Individual Exploration

Phase 2: Group Analysis

Phase 3: Presentation

Policy Recommendation Template

Consulting Team Information

1. TECHNICAL ASSESSMENT

Model Performance Metrics

Technical Quality Rating

Brief Technical Summary (2-3 sentences)

2. EQUITY ANALYSIS

False Positive Rates by Race (at threshold 0.50)

False Negative Rates by Race (at threshold 0.50)

Disparity Analysis

Equity Concerns Summary (3-4 sentences)

3. THRESHOLD RECOMMENDATION

If we deploy this model, we recommend:

Rationale for Threshold Choice (3-4 sentences)

This threshold prioritizes:

4. DEPLOYMENT RECOMMENDATION

Our recommendation to Georgia DOC:

Key Reasons for Our Recommendation

What about the equity concerns?

5. SAFEGUARDS OR ALTERNATIVES

If DEPLOY - Required Safeguards

If DO NOT DEPLOY - Alternative Approaches

6. LIMITATIONS & UNCERTAINTIES

What we don’t know (but wish we did)

Weaknesses in our recommendation

7. BOTTOM LINE

One-Sentence Recommendation