MUSA 5080 Notes #12

Week 12: Mapping the DNA of Urban Neighborhoods

Author

Fan Yang

Published

November 24, 2025

Note

Week 12: Mapping the DNA of Urban Neighborhoods
Date: 11/24/2025

Overview

This week we learned about sequential analysis of neighborhood change, understanding how neighborhoods evolve as complete trajectories rather than snapshots. We used k-means clustering, sequence analysis (TraMineR), optimal matching, and hierarchical clustering to identify common neighborhood pathways.

Key Learning Objectives

Understand limitations of snapshot-based analysis
Learn k-means clustering for classifying neighborhoods
Create sequence objects from longitudinal data
Measure sequence dissimilarity using OMstrans
Cluster similar trajectories and map results

The Problem with Snapshots

Why Sequences Matter

Traditional approach: Compare neighborhoods at single time points (1970 vs. 2010)

Problem: Missing the full longitudinal sequence of change

Example: Same “gentrifying” endpoint in 2010 could be: - Stable → Declining → Struggling → Gentrifying (Neighborhood A) - Struggling → Struggling → Struggling → Gentrifying (Neighborhood B)

Policy implications: Different histories = different policy needs

The DNA Metaphor

Neighborhood sequences reveal: - Patterns of change across metro areas - Where different processes occur spatially - Which neighborhoods are “stuck” vs. volatile - Whether Chicago School theories still apply

Research Context

Urban Theories

Chicago School (1920s-1960s): - Predicted: Regular spatial patterns (concentric zones)

Los Angeles School (1990s-2000s): - Predicted: Chaotic, fragmented patterns

What data shows: - Chicago: Rings persist (supports Chicago School) - Los Angeles: Fragmented patterns (supports LA School)

Methodological Workflow

5 Main Steps

K-means clustering: Classify neighborhoods at each time point
Create sequences: Convert to sequence objects
Measure dissimilarity: Calculate distances using OMstrans
Cluster sequences: Group similar trajectories
Map results: Visualize spatial patterns

Step 1: K-means Clustering

What is K-means?

Unsupervised algorithm that: - Groups observations into k clusters based on similarity - Assigns to nearest centroid (mean) - Iteratively updates until convergence

Input: Variables describing neighborhoods (all years combined) - Socioeconomic, housing, demographic variables - Standardize (z-score) for temporal consistency

Choosing K

Mathematical optimum ≠ Substantive optimum

Example: NYC analysis found optimal k=3, but used k=6

Why? - k=3: mostly White, Black, Hispanic (too simple) - k=6: mixed-race categories, more nuance

Approach: Test k=2 through k=10, examine fit AND interpretability

Metrics: - WSS (within-cluster sum of squares) - Silhouette width - Elbow method

Step 2: Create Sequences

From Clusters to Sequences

Before: Each tract has separate cluster assignments

Tract 123:  1980=1, 1990=1, 2000=3, 2010=3, 2020=3

After: Each tract has one sequence

Tract 123:  White → White → Hispanic → Hispanic → Hispanic

Creating Sequence Object

# Reshape to wide format
census_wide <- census_data %>%
  select(tract, year, cluster) %>%
  pivot_wider(names_from = year, values_from = cluster)

# Define sequence object with TraMineR
seq_data <- seqdef(
  census_wide,
  var = c("1980", "1990", "2000", "2010", "2020"),
  labels = c("White", "Black", "Hispanic", "Asian", 
             "Black/Hisp", "White Mixed")
)

Key function: seqdef() from TraMineR package

Step 3: Measure Sequence Dissimilarity

Why Sequence Similarity Matters

Question: Which sequences are most similar?

A: White → White → Hispanic → Hispanic → Hispanic
B: White → Hispanic → Hispanic → Hispanic → Hispanic  
C: Black → Black → Black → Black → Hispanic

Answer depends on: Timing, endpoints, or sequence order?

OMstrans: Sequence-Aware Matching

Traditional OM: Counts edits to transform sequences (doesn’t emphasize ordering)

OMstrans: Focuses on sequences of transitions - Joins each state with previous (AB, BC, CD) - Better captures the process of change - More sensitive to order of transitions

Key difference: “White→Hispanic→Asian” ≠ “White→Asian→Hispanic”

Computing OMstrans

# Calculate transition-based substitution costs
submat <- seqsubm(seq_data, method = "TRATE", transition = "both")

# Compute OMstrans dissimilarity
dist_omstrans <- seqdist(
  seq_data,
  method = "OMstrans",
  indel = 1,            # insertion/deletion cost
  sm = submat,          # substitution cost matrix
  otto = 0.1            # origin-transition tradeoff
)

Key parameters: - indel = 1: Cost for insert/delete operations - otto = 0.1: Lower values emphasize sequencing - norm = TRUE: Normalize by sequence length

Output: Dissimilarity matrix (pairwise distances)

Step 4: Cluster Similar Sequences

Hierarchical Clustering

How it works: 1. Start with each sequence as own cluster 2. Find two most similar clusters 3. Merge them 4. Repeat until one cluster 5. Cut dendrogram at desired number

Ward’s method: Minimizes within-cluster variance

Choosing Number of Clusters

Unlike k-means, no clear mathematical optimum

Approach: - Start with many clusters (20-30) - Examine sequence plots - Merge if similar patterns - Stop when you’d group opposite trajectories

Example: Don’t merge “increasing White” with “increasing Hispanic”

Step 5: Map Results

Mapping Trajectory Clusters

Final step: Map clusters to see spatial patterns

Each neighborhood colored by trajectory cluster
Reveals spatial clustering of similar trajectories
Tests Chicago School vs. LA School patterns

Key questions: - Are similar trajectories spatially clustered? - Do patterns follow concentric zones? - Or are they fragmented?

Key Takeaways

Sequential Analysis Skills

K-means: Classify neighborhoods at each time point
Sequences: Use seqdef() to create sequence objects
Dissimilarity: Use OMstrans to measure similarity
Clustering: Hierarchical clustering for trajectories
Mapping: Visualize spatial patterns

Key Concepts

Sequences vs. snapshots: Full trajectories reveal more
OMstrans: Captures order and timing of transitions
Temporal consistency: Cluster all years together
Interpretability: Balance fit with meaningful clusters

Common Pitfalls

Choosing k too small (loses nuance) or too large (over-fragmentation)
Ignoring interpretability (mathematical optimum ≠ best solution)
Not mapping results (miss spatial patterns)
Not considering theory (pure data-driven misses context)

Best Practices

Test multiple k values
Examine cluster characteristics
Visualize sequences before clustering
Map results to reveal spatial patterns
Connect to urban theory