Week 3 Notes - Course Introduction

Published

September 22, 2025

Key Concepts Learned

###Importance of Visualization### Anscombe’s Quartet: The dataset with identical summary statistics, but completely different patterns when visualized. ###policy implications### Summary statistics can hide critical patterns Outliers may represent important communities Relationships aren’t always linear Visual inspection reveals data quality issues ###ethical data communication### Create honest, transparent visualizations Always assess and communicate data quality Consider who might be harmed by uncertain data

Big MOE: small sample size/ a large variation in the sample

Coding Techniques

###ggplots philosophy### grammar of graphic principles: Data-> Aesthetics ->Geometries->Visual

{g<- ggplot(data = your_data) + aes(x = variable1, y = variables2, variable3) + geom_something(decorate, colors, size…..) + additional_layers()}

Data: Your dataset (census data, survey responses, etc.) Aesthetics: What variables map to visual properties (x, y, color, size) - specific elements - x, y - position - color - point/line color - fill - area fill color - size - point/line size - shape - point shape - alpha - transparency

Geometries: How to display the data (points, bars, lines) Additional layers: Scales, themes, facets, annotations

###Exploratory Data Analysis(EDA) Mindset 1. Load and inspect - dimensions, variable types, missing data 2. Assess reliability - examine margins of error, calculate coefficients of variation 3. Visualize distributions - histograms, boxplots for each variable 4. Explore relationships - scatter plots, correlations 5. Identify patterns - grouping, clustering, geographical patterns 6. Question anomalies - investigate outliers and unusual patterns 7. Document limitations - prepare honest communication about data quality

###join left join: preserve everything from the first table input, if right doesnt’t provide-shows NA in the joined one right join: similar as the first one full join: with all the rows from both table inner join: only contain the row that in both table

*the column name doesn’t have to be the same, but the data type need to be the same

Questions & Challenges

  • Which file I should go to when I make changes
  • The whole process of making changes

Connections to Policy

  • Upload my work to my portfolio for visualization

Reflection

  • How different platform can connect and work with each other
  • I want to practice more and dig deeper