Lingxuan Gao - MUSA 5080 Portfolio
  • Home
  • Weekly Notes
    • Weekly Notes 01: Introduction to R and dplyr
    • Weekly Notes 02: Algorithmic Decision Making & The Census
    • Weekly Notes 03: Data Visualization & Exploratory Analysis
    • Weekly Notes 04: Spatial Data & GIS Operations in R
    • Weekly Notes 05: Introduction to Linear Regression
    • Weekly Notes 11: Space-Time Prediction
  • Labs
    • Lab 0: dplyr Basics
    • Lab 1: Census Data Quality for Policy Decisions
    • Lab 2: Spatial Analysis and Visualization-Healthcare Access and Equity in Pennsylvania
    • Lab 4: Spatial Predictive Analysis
    • Lab 5: Space-Time Prediction
  • Midterm
    • Appendix
    • Presentation
  • Final
    • Eviction Risk Prediction in Philadelphia

week-03-notes

#Part 1: Why Visualization Matters

##Why vis matters Summary statistics can hide critical patterns Outliers may represent important communities Relationships aren’t always linear Visual inspection reveals data quality issues

##Common problems in government data presentation: Misleading scales or axes Cherry-picked time periods Hidden or ignored uncertainty Missing context about data reliability

#Part 2: Grammar of Graphics

##Grammar of Graphics principles:

Data → Aesthetics → Geometries → Visual

#Part 3: Exploratory Data Analysis

##EDA Mindset What does the data look like? (distributions, missing values) What patterns exist? (relationships, clusters, trends) What’s unusual? (outliers, anomalies, data quality issues) What questions does this raise? (hypotheses for further investigation) How reliable is this data? Understand your data before making decisions or building models

##EDA Workflow with Data Quality Focus Load and inspect - dimensions, variable types, missing data Assess reliability - examine margins of error, calculate coefficients of variation Visualize distributions - histograms, boxplots for each variable Explore relationships - scatter plots, correlations Identify patterns - grouping, clustering, geographical patterns Question anomalies - investigate outliers and unusual patterns Document limitations - prepare honest communication about data quality

##EDA for Policy Analysis ###Key questions for census data: Geographic patterns: Are problems concentrated in certain areas? Population relationships: How does size affect data quality? Demographic patterns: Are certain communities systematically different? Temporal trends: How do patterns change over time? Data integrity: Where might survey bias affect results? Reliability assessment: Which estimates should we trust?

#Part 4: Data Joins & Integration