Week 2 notes
Key Concepts Learned
What is an algorithm?
A set of rules or instructions for solving a problem or completing a task
Inputs (“features”, “predictors”, “independent variables”, “x”)
Outputs (“labels”, “outcome”, “dependent variable”, “y”)
Data Science - Computer science/engineering focus on algorithms and methods
Data Analytics - Application of data science methods to other disciplines
Machine Learning - Algorithms for classification & prediction that learn from data
AI - Algorithms that adjust and improve across iterations (neural networks, etc.)
TIGER/Line Files
- Geographic boundaries (shapefiles)
- Census tracts, counties, states
- Now released as shapefiles (easier to use!)
Historical Data Sources:
- NHGIS (nhgis.org) - Historical census data
- Neighborhood Change Database
- Longitudinal Tract Database - Track changes over time
Coding Techniques
library(tidycensus)
library(tidyverse)
library(knitr)
case_when()
get_acs()
mutate()
select()
str_remove()
,str_extract()
,str_replace()
kable()
Questions & Challenges
I need to finish Lab 0, so that I can practice the different coding functions.
Connections to Policy
- Proxy: What would you use to stand in for what you want?
- Blind spot: What data gap or historical bias could skew results?
- Harm + Guardrail: Who could be harmed, and one simple safeguard?
Criminal Justice - Recidivism risk scores for bail and sentencing decisions
Housing & Finance - Mortgage lending and tenant screening algorithms
Healthcare - Patient care prioritization and resource allocation
Long history of government data collection:
Civic registration systems
Census data
Administrative records
Operations research (post-WWII)
Census data is the foundation for:
Understanding community demographics
Allocating government resources
Tracking neighborhood change
Reflection
I think it is pretty nerve-racking thinking about how the data that I have/choose to use could affect the outcome of someone’s life. It makes me want to be as careful as possible to minimize bias as much as possible. This has made me very slow to answer data analysis questions in class.