Part 1: Why Visualization Matters
Opening Question
Think about Assignment 1:
You created tables showing income reliability patterns across counties. But what if you needed to present these findings to:
- The state legislature (2-minute briefing)
- Community advocacy groups
- Local news reporters
Discussion: How might visual presentation change the impact of your analysis?
Anscombe’s Quartet: The Famous Example
Four datasets with identical summary statistics:
- Same means (x̄ = 9, ȳ = 7.5)
- Same variances
- Same correlation (r = 0.816)
- Same regression line
But completely different patterns when visualized
The Policy Implications
Why this matters for your work:
- Summary statistics can hide critical patterns
- Outliers may represent important communities
- Relationships aren’t always linear
- Visual inspection reveals data quality issues
Example: A county with “average” income might have extreme inequality that algorithms would miss without visualization.
Connecting Week 2: Ethical Data Communication
From last week’s algorithmic bias discussion:
Research finding: Only 27% of planners warn users about unreliable ACS data - Most planners don’t report margins of error - Many lack training on statistical uncertainty - This violates AICP Code of Ethics
Your responsibility:
- Create honest, transparent visualizations
- Always assess and communicate data quality
- Consider who might be harmed by uncertain data
Bad Visualizations Have Real Consequences
Common problems in government data presentation:
- Misleading scales or axes
- Cherry-picked time periods
- Hidden or ignored uncertainty
- Missing context about data reliability
Real impact: The Jurjevich et al. study found that 72% of Portland census tracts had unreliable child poverty estimates, yet planners rarely communicated this uncertainty.
Result: Poor policy decisions based on misunderstood data
Part 3: Exploratory Data Analysis
The EDA Mindset
Exploratory Data Analysis is detective work:
- What does the data look like? (distributions, missing values)
- What patterns exist? (relationships, clusters, trends)
- What’s unusual? (outliers, anomalies, data quality issues)
- What questions does this raise? (hypotheses for further investigation)
- How reliable is this data?
Goal: Understand your data before making decisions or building models
EDA Workflow with Data Quality Focus
Enhanced process for policy analysis:
- Load and inspect - dimensions, variable types, missing data
- Assess reliability - examine margins of error, calculate coefficients of variation
- Visualize distributions - histograms, boxplots for each variable
- Explore relationships - scatter plots, correlations
- Identify patterns - grouping, clustering, geographical patterns
- Question anomalies - investigate outliers and unusual patterns
- Document limitations - prepare honest communication about data quality
Understanding Distributions
Why distribution shape matters:
![]()
What to look for: Skewness, outliers, multiple peaks, gaps
Boxplots!
Critical: Data Quality Through Visualization
Research insight: Most planners don’t visualize or communicate uncertainty
![]()
Pattern: Smaller populations have higher uncertainty Ethical implication: These communities might be systematically undercounted
Research-Based Recommendations for Planners
Jurjevich et al. (2018): 5 Essential Guidelines for Using ACS Data
- Report the corresponding MOEs of ACS estimates - Always include margin of error values
- Include a footnote when not reporting MOEs - Explicitly acknowledge omission
- Provide context for (un)reliability - Use coefficient of variation (CV):
- CV < 12% = reliable (green coding)
- CV 12-40% = somewhat reliable (yellow)
- CV > 40% = unreliable (red coding)
- Reduce statistical uncertainty - Collapse data detail, aggregate geographies, use multi-year estimates
- Always conduct statistical significance tests when comparing ACS estimates over time
Key insight: These practices are not just technical best practices—they are ethical requirements under the AICP Code of Ethics
EDA for Policy Analysis
Key questions for census data:
- Geographic patterns: Are problems concentrated in certain areas?
- Population relationships: How does size affect data quality?
- Demographic patterns: Are certain communities systematically different?
- Temporal trends: How do patterns change over time?
- Data integrity: Where might survey bias affect results?
- Reliability assessment: Which estimates should we trust?