Week 7 Notes - Model Diagnostics & Spatial Autocorrelation

Published

October 20, 2025

Key Concepts Learned

  • Part 1: Where We Are
    • Review
      • Weeks 1-3: Data foundations
        • Census data, tidycensus, spatial data basics
        • Visualization and exploratory analysis
      • Week 5: Linear regression fundamentals
        • Y = f(X) + ε framework
        • Train/test splits, cross-validation
        • Checking assumptions
      • Week 6: Expanding the toolkit
        • Categorical variables and interactions
        • Spatial features (buffers, kNN, distance)
        • Neighborhood fixed effects
    • The Regression Workflow
      • Building the model:
        • Visualize relationships
        • Engineer features
        • Fit the model
        • Evaluate performance (RMSE, R²)
        • Check assumptions
      • Spatial diagnostics:
        • Are errors random or clustered?
        • Do we predict better in some areas?
        • Is there remaining spatial structure?
      • If errors cluster spatially, it suggests:
        • Missing spatial variables
        • Misspecified relationships
        • Non-stationarity (relationships vary across space) 非平稳性
  • Part 2: Understanding Spatial Patterns in Errors
    • Visualizing Error Patterns(误差有聚集,不随机,不好)→怎么改进:fixed
  • Part 3: Moran’s I
    • Moran’s I measures spatial autocorrelation
    • Range: -1(Perfect negative correlation (dispersion)) to +1(Perfect positive correlation (clustering)),0 = Random spatial pattern
    • wij = spatial weight between locations i and j (0 or 1)

Coding Techniques

  • [New R functions or approaches]
  • [Quarto features learned]

Questions & Challenges

  • What I didn’t fully understand
    • the basic workflow: pull-commit-push
  • Areas needing more practice
    • remember the essential dplyr functions

Connections to Policy

  • [How this week’s content applies to real policy work]

Reflection

  • [What was most interesting]
  • [How I’ll apply this knowledge]