Fan Yang - MUSA 5080
  • Home
  • Weekly Notes
    • Week 1
    • Week 2
    • Week 3
    • Week 4
    • Week 5
    • Week 6
    • Week 7
    • Week 9
    • Week 10
    • Week 11
    • Week 12
  • Labs
    • Lab 1: Setup Instructions
    • Lab 2: Getting Started with dplyr
    • Lab 3: Data Visualization and EDA
    • Lab 4: Spatial Operations with Pennsylvania Data
  • Assignments
    • Assignment 1: Census Data Quality for Policy Decisions
    • Assignment 2: Spatial Analysis and Visualization
    • Assignment 4: Spatial Predictive Analysis
    • Assignment 5: Space-Time Prediction of Bike Share Demand
  • Final
    • Final Slides
    • Technical Appendix
    • README

On this page

  • Overview
    • Key Learning Objectives
  • The Problem with Snapshots
    • Why Sequences Matter
    • The DNA Metaphor
  • Research Context
    • Urban Theories
  • Methodological Workflow
    • 5 Main Steps
  • Step 1: K-means Clustering
    • What is K-means?
    • Choosing K
  • Step 2: Create Sequences
    • From Clusters to Sequences
    • Creating Sequence Object
  • Step 3: Measure Sequence Dissimilarity
    • Why Sequence Similarity Matters
    • OMstrans: Sequence-Aware Matching
    • Computing OMstrans
  • Step 4: Cluster Similar Sequences
    • Hierarchical Clustering
    • Choosing Number of Clusters
  • Step 5: Map Results
    • Mapping Trajectory Clusters
  • Key Takeaways
    • Sequential Analysis Skills
    • Key Concepts
    • Common Pitfalls
    • Best Practices

MUSA 5080 Notes #12

Week 12: Mapping the DNA of Urban Neighborhoods

Author

Fan Yang

Published

November 24, 2025

Note

Week 12: Mapping the DNA of Urban Neighborhoods
Date: 11/24/2025

Overview

This week we learned about sequential analysis of neighborhood change, understanding how neighborhoods evolve as complete trajectories rather than snapshots. We used k-means clustering, sequence analysis (TraMineR), optimal matching, and hierarchical clustering to identify common neighborhood pathways.

Key Learning Objectives

  • Understand limitations of snapshot-based analysis
  • Learn k-means clustering for classifying neighborhoods
  • Create sequence objects from longitudinal data
  • Measure sequence dissimilarity using OMstrans
  • Cluster similar trajectories and map results

The Problem with Snapshots

Why Sequences Matter

Traditional approach: Compare neighborhoods at single time points (1970 vs. 2010)

Problem: Missing the full longitudinal sequence of change

Example: Same “gentrifying” endpoint in 2010 could be: - Stable → Declining → Struggling → Gentrifying (Neighborhood A) - Struggling → Struggling → Struggling → Gentrifying (Neighborhood B)

Policy implications: Different histories = different policy needs

The DNA Metaphor

Neighborhood sequences reveal: - Patterns of change across metro areas - Where different processes occur spatially - Which neighborhoods are “stuck” vs. volatile - Whether Chicago School theories still apply

Research Context

Urban Theories

Chicago School (1920s-1960s): - Predicted: Regular spatial patterns (concentric zones)

Los Angeles School (1990s-2000s): - Predicted: Chaotic, fragmented patterns

What data shows: - Chicago: Rings persist (supports Chicago School) - Los Angeles: Fragmented patterns (supports LA School)

Methodological Workflow

5 Main Steps

  1. K-means clustering: Classify neighborhoods at each time point
  2. Create sequences: Convert to sequence objects
  3. Measure dissimilarity: Calculate distances using OMstrans
  4. Cluster sequences: Group similar trajectories
  5. Map results: Visualize spatial patterns

Step 1: K-means Clustering

What is K-means?

Unsupervised algorithm that: - Groups observations into k clusters based on similarity - Assigns to nearest centroid (mean) - Iteratively updates until convergence

Input: Variables describing neighborhoods (all years combined) - Socioeconomic, housing, demographic variables - Standardize (z-score) for temporal consistency

Choosing K

Mathematical optimum ≠ Substantive optimum

Example: NYC analysis found optimal k=3, but used k=6

Why? - k=3: mostly White, Black, Hispanic (too simple) - k=6: mixed-race categories, more nuance

Approach: Test k=2 through k=10, examine fit AND interpretability

Metrics: - WSS (within-cluster sum of squares) - Silhouette width - Elbow method

Step 2: Create Sequences

From Clusters to Sequences

Before: Each tract has separate cluster assignments

Tract 123:  1980=1, 1990=1, 2000=3, 2010=3, 2020=3

After: Each tract has one sequence

Tract 123:  White → White → Hispanic → Hispanic → Hispanic

Creating Sequence Object

# Reshape to wide format
census_wide <- census_data %>%
  select(tract, year, cluster) %>%
  pivot_wider(names_from = year, values_from = cluster)

# Define sequence object with TraMineR
seq_data <- seqdef(
  census_wide,
  var = c("1980", "1990", "2000", "2010", "2020"),
  labels = c("White", "Black", "Hispanic", "Asian", 
             "Black/Hisp", "White Mixed")
)

Key function: seqdef() from TraMineR package

Step 3: Measure Sequence Dissimilarity

Why Sequence Similarity Matters

Question: Which sequences are most similar?

A: White → White → Hispanic → Hispanic → Hispanic
B: White → Hispanic → Hispanic → Hispanic → Hispanic  
C: Black → Black → Black → Black → Hispanic

Answer depends on: Timing, endpoints, or sequence order?

OMstrans: Sequence-Aware Matching

Traditional OM: Counts edits to transform sequences (doesn’t emphasize ordering)

OMstrans: Focuses on sequences of transitions - Joins each state with previous (AB, BC, CD) - Better captures the process of change - More sensitive to order of transitions

Key difference: “White→Hispanic→Asian” ≠ “White→Asian→Hispanic”

Computing OMstrans

# Calculate transition-based substitution costs
submat <- seqsubm(seq_data, method = "TRATE", transition = "both")

# Compute OMstrans dissimilarity
dist_omstrans <- seqdist(
  seq_data,
  method = "OMstrans",
  indel = 1,            # insertion/deletion cost
  sm = submat,          # substitution cost matrix
  otto = 0.1            # origin-transition tradeoff
)

Key parameters: - indel = 1: Cost for insert/delete operations - otto = 0.1: Lower values emphasize sequencing - norm = TRUE: Normalize by sequence length

Output: Dissimilarity matrix (pairwise distances)

Step 4: Cluster Similar Sequences

Hierarchical Clustering

How it works: 1. Start with each sequence as own cluster 2. Find two most similar clusters 3. Merge them 4. Repeat until one cluster 5. Cut dendrogram at desired number

Ward’s method: Minimizes within-cluster variance

Choosing Number of Clusters

Unlike k-means, no clear mathematical optimum

Approach: - Start with many clusters (20-30) - Examine sequence plots - Merge if similar patterns - Stop when you’d group opposite trajectories

Example: Don’t merge “increasing White” with “increasing Hispanic”

Step 5: Map Results

Mapping Trajectory Clusters

Final step: Map clusters to see spatial patterns

  • Each neighborhood colored by trajectory cluster
  • Reveals spatial clustering of similar trajectories
  • Tests Chicago School vs. LA School patterns

Key questions: - Are similar trajectories spatially clustered? - Do patterns follow concentric zones? - Or are they fragmented?

Key Takeaways

Sequential Analysis Skills

  1. K-means: Classify neighborhoods at each time point
  2. Sequences: Use seqdef() to create sequence objects
  3. Dissimilarity: Use OMstrans to measure similarity
  4. Clustering: Hierarchical clustering for trajectories
  5. Mapping: Visualize spatial patterns

Key Concepts

  • Sequences vs. snapshots: Full trajectories reveal more
  • OMstrans: Captures order and timing of transitions
  • Temporal consistency: Cluster all years together
  • Interpretability: Balance fit with meaningful clusters

Common Pitfalls

  • Choosing k too small (loses nuance) or too large (over-fragmentation)
  • Ignoring interpretability (mathematical optimum ≠ best solution)
  • Not mapping results (miss spatial patterns)
  • Not considering theory (pure data-driven misses context)

Best Practices

  • Test multiple k values
  • Examine cluster characteristics
  • Visualize sequences before clustering
  • Map results to reveal spatial patterns
  • Connect to urban theory