Week 3 Notes - Course Introduction
Key Concepts Learned
###Importance of Visualization### Anscombe’s Quartet: The dataset with identical summary statistics, but completely different patterns when visualized. ###policy implications### Summary statistics can hide critical patterns Outliers may represent important communities Relationships aren’t always linear Visual inspection reveals data quality issues ###ethical data communication### Create honest, transparent visualizations Always assess and communicate data quality Consider who might be harmed by uncertain data
Big MOE: small sample size/ a large variation in the sample
Coding Techniques
###ggplots philosophy### grammar of graphic principles: Data-> Aesthetics ->Geometries->Visual
{g<- ggplot(data = your_data) + aes(x = variable1, y = variables2, variable3) + geom_something(decorate, colors, size…..) + additional_layers()}
Data: Your dataset (census data, survey responses, etc.) Aesthetics: What variables map to visual properties (x, y, color, size) - specific elements - x, y - position - color - point/line color - fill - area fill color - size - point/line size - shape - point shape - alpha - transparency
Geometries: How to display the data (points, bars, lines) Additional layers: Scales, themes, facets, annotations
###Exploratory Data Analysis(EDA) Mindset 1. Load and inspect - dimensions, variable types, missing data 2. Assess reliability - examine margins of error, calculate coefficients of variation 3. Visualize distributions - histograms, boxplots for each variable 4. Explore relationships - scatter plots, correlations 5. Identify patterns - grouping, clustering, geographical patterns 6. Question anomalies - investigate outliers and unusual patterns 7. Document limitations - prepare honest communication about data quality
###join left join: preserve everything from the first table input, if right doesnt’t provide-shows NA in the joined one right join: similar as the first one full join: with all the rows from both table inner join: only contain the row that in both table
*the column name doesn’t have to be the same, but the data type need to be the same
Questions & Challenges
- Which file I should go to when I make changes
- The whole process of making changes
Connections to Policy
- Upload my work to my portfolio for visualization
Reflection
- How different platform can connect and work with each other
- I want to practice more and dig deeper