Key Concepts Learned
- Part 1: The Seductive Promise of Predictive Policing
- Part 2: The Dirty Data Problem
- Definition
- Traditional definition (data mining): Missing data, Incorrect data, Non-standardized formats
- Extended definition: “Data derived from or influenced by corrupt, biased, and unlawful practices, including data that has been intentionally manipulated or ‘juked,’ as well as data that is distorted by individual and societal biases.”
- Forms of Dirty Data
- Fabricated/Manipulated Data
- Systematically Biased Data
- Missing/Incomplete Data
- Proxy Problems
- Part 3: Technical Fixes Can’t Solve Social Problems
- Part 4: Consequences and Harms
- Part 5: Can Reform Work?
- Consent Decrees
- Training on constitutional policing
- Early intervention systems for problem officers
- Revised use-of-force policies
- Community oversight
- Data collection improvements
- Part 6: A Framework for Critical Evaluation
- Questions to Ask About Any Predictive Policing System
- Data Provenance
- Variable Selection
- Validation
- Deployment
- Transparency & Accountability
- Alternatives
- Technical Foundations
- Modeling Workflow highly important
- The Core Logic: “Broken Windows Theory”
- Local Spatial Autocorrelation
- Count Regression Fundamentals
- Problems for counts: negative values are impossible for counts, counts often have variance ≠ mean, counts are discrete (not continuous), count data is skewed (not normal errors) Overdispersion common!!
- The Poisson Distribution-Appropriate for count data
- Key property: Mean = Variance = λ
- Poisson Regression Model
- Log link: Ensures λi>0 (counts can’t be negative)
- Linear relationship on log scale
- Interpreting Poisson Coefficients
- On log scale: β1 = change in log(expected count) per unit increase in X1
- On count scale (exponentiate指数): exp(β1) = multiplicative effect on expected count (exp(β1)-1)是变化
- Check for overdispersion: Dispersion=Residual Deviance/Degrees of Freedom (If ≈ 1: Poisson is fine, If > 1: Overdispersion, If > 2-3: Serious overdispersion → Use Negative Binomial)
Negative Binomial Regression
Coding Techniques
- [New R functions or approaches]
- [Quarto features learned]
Questions & Challenges
- What I didn’t fully understand
- the basic workflow: pull-commit-push
- Areas needing more practice
- remember the essential dplyr functions
Connections to Policy
- [How this week’s content applies to real policy work]
Reflection
- [What was most interesting]
- [How I’ll apply this knowledge]