Week 2 Notes - Algorithmic Decision Making & Census Data

Published

September 15, 2025

Key Concepts Learned

  • Algorithms are a set of rules or instructions for solving a problem and completing a task.
  • Algorithms in a government are typically systems used to assist or replace human decision-makers:
    • Based on predictions from models trained on historical data.
    • Using inputs (features or predictors) and outputs (outcome, dependent variable).
  • Long history of government data collection:
    • Census data
    • Civil registration system
    • Administrative records
  • Budgetary constraints force governments to use algos:
    • Efficiency: faster case processing
    • Consistency: same rules applied to everyone
    • Objectivity: removes human bias
    • Cost savings
  • Issues with Algorithms:
    • Data cleaning decisions (might not be good)
    • Data coding or classification (misclassification, race for example)
    • Data collection (proxies for items)
    • How results are interpreted
    • What variables you put in the model
    • EX: Healthcare algo bias whereby black patients are discriminated against due to proxies for healthcare needs.
  • Census data is used for:
    • Understanding community demos
    • Allocate government resources
    • Tracking neighborhood change
    • Designing fair algos
    • Put into the constitution by Madison
  • Censuses are decennial and contain 9 basic demographic questions.
  • ACS is 3% of households, done annually, and has more detailed questions on income, education, employment, and housing costs.
  • ACS is aggregated to 5-year estimates (to have more reliable data):
    • County-level ACS data is for state and regional planning
    • Census tract level
    • Block group level: local analysis, but has large margins of errors (MOEs)
  • To protect privacy the census applies mathematical noise to individual data while preserving overall patterns:
    • MOEs might skew results slightly or cause biases
  • TIGER is Topologically Integrated Geographic Encoding and Referencing system.

Coding Techniques

  • GEOID: geographic identifier
  • NAM: Human-readable location
  • Typical output is long, but you can force output = "wide"
  • str_remove(), str_extract(), str_replace()
  • kable() for professional formatting

Questions & Challenges

  • Make my interpretations understandable for a policy audience.

Connections to Policy

  • Algorithmic decision making → understanding why your analysis matters for real policy decisions
  • Data subjectivity → why we emphasize transparent, reproducible methods in this class
  • Census data → the foundation for most urban planning and policy analysis
  • R skills → the tools to do this work professionally and ethically

Reflection

  • The data is clean and samples are sufficiently random and large to be representative.
  • Undocumented workers are likely to be excluded and those who are nomadic or without a permanent address.