Key Concepts Learned
- Algorithms are a set of rules or instructions for solving a problem and completing a task.
- Algorithms in a government are typically systems used to assist or replace human decision-makers:
- Based on predictions from models trained on historical data.
- Using inputs (features or predictors) and outputs (outcome, dependent variable).
- Long history of government data collection:
- Census data
- Civil registration system
- Administrative records
- Budgetary constraints force governments to use algos:
- Efficiency: faster case processing
- Consistency: same rules applied to everyone
- Objectivity: removes human bias
- Cost savings
- Issues with Algorithms:
- Data cleaning decisions (might not be good)
- Data coding or classification (misclassification, race for example)
- Data collection (proxies for items)
- How results are interpreted
- What variables you put in the model
- EX: Healthcare algo bias whereby black patients are discriminated against due to proxies for healthcare needs.
- Census data is used for:
- Understanding community demos
- Allocate government resources
- Tracking neighborhood change
- Designing fair algos
- Put into the constitution by Madison
- Censuses are decennial and contain 9 basic demographic questions.
- ACS is 3% of households, done annually, and has more detailed questions on income, education, employment, and housing costs.
- ACS is aggregated to 5-year estimates (to have more reliable data):
- County-level ACS data is for state and regional planning
- Census tract level
- Block group level: local analysis, but has large margins of errors (MOEs)
- To protect privacy the census applies mathematical noise to individual data while preserving overall patterns:
- MOEs might skew results slightly or cause biases
- TIGER is Topologically Integrated Geographic Encoding and Referencing system.
Coding Techniques
- GEOID: geographic identifier
- NAM: Human-readable location
- Typical output is long, but you can force
output = "wide"
str_remove()
, str_extract()
, str_replace()
kable()
for professional formatting
Questions & Challenges
- Make my interpretations understandable for a policy audience.
Connections to Policy
- Algorithmic decision making → understanding why your analysis matters for real policy decisions
- Data subjectivity → why we emphasize transparent, reproducible methods in this class
- Census data → the foundation for most urban planning and policy analysis
- R skills → the tools to do this work professionally and ethically
Reflection
- The data is clean and samples are sufficiently random and large to be representative.
- Undocumented workers are likely to be excluded and those who are nomadic or without a permanent address.