Week 6 Notes - Course Introduction

Published

October 13, 2025

Spatial Machine Learning & Advanced Regression (Week 6)

Start with a simple linear regression using structural features only (e.g., LivingArea → SalePrice).
Interpretation:
- Coefficients show marginal effects (e.g., $ per sq ft).
- Even if coefficients are significant, R² can be low.
Limitation:
- Large share of price variation remains unexplained without location and neighborhood context.

Categorical variables (e.g., neighborhood) enter the model via dummy variables.
R automatically:
- Creates (n-1) dummies.
- Chooses one category as reference.
Interpretation:
- Dummy coefficient = price premium/discount relative to reference, holding other variables constant.
Neighborhood fixed effects:
- Absorb unobserved neighborhood characteristics (schools, amenities, reputation).
- Typically produce large gains in explanatory power and predictive accuracy.
- Trade-off: less interpretability of why differences exist.

Use interactions when the effect of one variable depends on another.
- Example: LivingArea × WealthyNeighborhood.
Without interaction:
- Same slope for all groups; only intercept shifts.
With interaction:
- Both intercept and slope can differ by group.
- Captures heterogeneous returns to size across market segments.
Interpretation:
- Interaction coefficient adjusts the slope for specific categories.
- Check if interaction improves fit (R², CV) and is substantively meaningful.

Use polynomial terms (e.g., Age and Age²) when relationships are not linear.
- Typical pattern: U-shaped or inverted-U.
Implementation:
- Use I(Age^2) in formulas to treat squared term literally.
Interpretation:
- Coefficients not directly intuitive.
- Marginal effect of Age = β₁ + 2β₂·Age.
Evaluate:
- Compare R² and F-test between linear and polynomial models.
- Use residual plots to check improvement.

Tobler’s First Law: nearby observations are more related than distant ones.
Housing prices depend on:
- Local crime, amenities, accessibility, neighborhood environment.
Three common spatial feature constructions:
1. Buffer counts:
  - Count events (e.g., crimes) within a fixed radius.
2. k-Nearest Neighbors (kNN):
  - Average distance to k nearest events.
3. Distance to key points:
  - Distance to CBD, transit, parks, etc.
These features convert spatial context into usable numeric predictors.

Model layering:
- Structural only → + spatial features → + neighborhood fixed effects.
Typical pattern:
- Each step improves predictive performance.
- Spatial features capture continuous location effects.
- Fixed effects absorb remaining unobserved neighborhood-level heterogeneity.
Important:
- Coefficients on spatial variables can change once fixed effects are included (less confounding).

Use k-fold CV (e.g., 10-fold) to evaluate out-of-sample performance.
Problem:
- Sparse categories (few observations in some neighborhoods) can lead to:
  - “New level” errors in test folds.
  - Unstable estimates.
Solutions:
- Check counts per category before CV.
- Group rare categories into an “Other/Small_Neighborhoods” class.
- Alternatively, drop categories with extremely low counts (must be documented and justified).
Use CV metrics (RMSE, MAE) to compare:
- Structural vs spatial vs fixed-effect models.