Week 2 Notes

Published

September 15, 2025

Key Concepts Learned + Notes

  • Algorithms
    • High level, a set of rules or instructions for solving a problem or completing a task.
  • Algorithmic Decision Making in the Government
    • Can be used to assist or replace human decision makers (mitigating biases that may exist in a human only process).
    • Can use historical data providing Inputs(“features”, “predictors”, “independent variables”) to determine Outputs(“labels”, “outcome”, “dependent variables”).
    • Data has trade-offs! Data can be inaccurate, subject to biases itself, etc.
  • Data collection is a long running practice in gov’t (civic registration, census, admin records, operations research)
    • Now, there is an increase in official and accidental (things that can be collected from sources like social media ) data.
    • There is a shift from historical analysis to prediction.
  • IMPORTANT: Data Analytics is Subjective!
    • Data cleaning, coding/classification, collection, interpretation, model variables. All require human choices that embody human values and biases.
    • Especially data proxies/variables that can rely on historically biased data.
  • Census and ACS Data:
    • Foundational in understanding community demographics, allocating govt resources, etc.

    • Census is 10 years and sent to everyone, 9 basic questions, constitutional requirement, determines political representation.

    • American Community Survey (ACS) sent to ~3% of households annually, with more detailed questions (income, education, employment, housing costs).

      • ACS has 1-year estimates (areas > 65k people) and 5-year estimates (all areas with census tracts). 5-year estimates are the most reliable and based on the largest sample.
    • Hierarchy of Census Data:

      Nation
      ├── Regions  
      ├── States
      │   ├── Counties
      │   │   ├── Census Tracts (1,500-8,000 people)
      │   │   │   ├── Block Groups (600-3,000 people)  
      │   │   │   │   └── Blocks (≈85 people, Decennial only)
    • Most policy analysis happens at County, Census, Block (although Blocks have big margins of error (MOE))

  • Census Data in R (see below)

Coding Techniques/Technical Notes

  • tidycensus
    • Rather than downloading csv files from the Census website, we will use the tidycensus R package to programatically access this data! Allows us to get the latest data and with automatic geographic boundaries.
    • Tables contain estimates (recall, ACS is based on a sample) and MOEs
    • Always report MOEs
  • TIGER/Line Files
    • Shapefiles (geographic boundaries) for census tracts, counties, states.
  • Historical Data Sources
    • NHGIS, Longitudinal Tract DB
    • We need this because boundaries change!

Questions & Challenges

  • N/A

Connections to Policy

  • Noted above.

Reflection

  • N/A