Week 3 Notes - Data Visualization & Exploratory Analysis

Published

September 22, 2025

Key Concepts Learned

  • EDA is exploratory data analysis:
    • What the data looks like (distribution and missings)
    • What patterns exist (clustering and relationships)
    • What’s unusual (outliers)
    • what questions does this raise
    • how reliable is the dataset
  • Best practcies:
    • Report corresponding MOEs of ACS estimates
    • Include a footnote to acknolwedge MOEs if not reporting
    • provide unreliability context which would revolve around the coefficient of variation (CV <12% being good, 12-40% somewhat reliable, and CV > 40% being concerning)
    • reduce statistical uncertainty, collapse or aggregate data
    • stat significance tests are recommended as you go.
  • Are there geographic patterns or correlations?
  • Population relationships, how size affect data quality
  • Are certain communities systematically different

Coding Techniques

  • ggplot2:
    • Data is the actual datasets
    • Aesthetics, variables mapped to visual properties (x ,y ,color, size )
    • Geometries, how to display the data (points, bars, lines)
    • Additional layers: scales, themes, facets and annotations
  • Aesthetics:
    • x, y, are data positions
    • color of the point/line
    • fill, is the area color
    • size, point/line size
    • shape, point shape
    • alpha, transparency
  • left_join() - keep all rows from left dataset
  • right_join() - keep all rows from right dataset
  • inner_join() - Keep matching only
  • full_join() - just merge the datasets

Questions & Challenges

  • Everything was clear.

Connections to Policy

  • Analyzing bias within data before running analysis and providing recommendations
  • this is done to allow us to ensure that no group is discriminated or biased against within the recommendations.

Reflection

  • good practices in terms of coding and communication .