MUSA 5080 Notes #12
Week 12: Mapping the DNA of Urban Neighborhoods
Week 12: Mapping the DNA of Urban Neighborhoods
Date: 11/24/2025
Overview
This week we learned about sequential analysis of neighborhood change, understanding how neighborhoods evolve as complete trajectories rather than snapshots. We used k-means clustering, sequence analysis (TraMineR), optimal matching, and hierarchical clustering to identify common neighborhood pathways.
Key Learning Objectives
- Understand limitations of snapshot-based analysis
- Learn k-means clustering for classifying neighborhoods
- Create sequence objects from longitudinal data
- Measure sequence dissimilarity using OMstrans
- Cluster similar trajectories and map results
The Problem with Snapshots
Why Sequences Matter
Traditional approach: Compare neighborhoods at single time points (1970 vs. 2010)
Problem: Missing the full longitudinal sequence of change
Example: Same “gentrifying” endpoint in 2010 could be: - Stable → Declining → Struggling → Gentrifying (Neighborhood A) - Struggling → Struggling → Struggling → Gentrifying (Neighborhood B)
Policy implications: Different histories = different policy needs
The DNA Metaphor
Neighborhood sequences reveal: - Patterns of change across metro areas - Where different processes occur spatially - Which neighborhoods are “stuck” vs. volatile - Whether Chicago School theories still apply
Research Context
Urban Theories
Chicago School (1920s-1960s): - Predicted: Regular spatial patterns (concentric zones)
Los Angeles School (1990s-2000s): - Predicted: Chaotic, fragmented patterns
What data shows: - Chicago: Rings persist (supports Chicago School) - Los Angeles: Fragmented patterns (supports LA School)
Methodological Workflow
5 Main Steps
- K-means clustering: Classify neighborhoods at each time point
- Create sequences: Convert to sequence objects
- Measure dissimilarity: Calculate distances using OMstrans
- Cluster sequences: Group similar trajectories
- Map results: Visualize spatial patterns
Step 1: K-means Clustering
What is K-means?
Unsupervised algorithm that: - Groups observations into k clusters based on similarity - Assigns to nearest centroid (mean) - Iteratively updates until convergence
Input: Variables describing neighborhoods (all years combined) - Socioeconomic, housing, demographic variables - Standardize (z-score) for temporal consistency
Choosing K
Mathematical optimum ≠ Substantive optimum
Example: NYC analysis found optimal k=3, but used k=6
Why? - k=3: mostly White, Black, Hispanic (too simple) - k=6: mixed-race categories, more nuance
Approach: Test k=2 through k=10, examine fit AND interpretability
Metrics: - WSS (within-cluster sum of squares) - Silhouette width - Elbow method
Step 2: Create Sequences
From Clusters to Sequences
Before: Each tract has separate cluster assignments
Tract 123: 1980=1, 1990=1, 2000=3, 2010=3, 2020=3
After: Each tract has one sequence
Tract 123: White → White → Hispanic → Hispanic → Hispanic
Creating Sequence Object
# Reshape to wide format
census_wide <- census_data %>%
select(tract, year, cluster) %>%
pivot_wider(names_from = year, values_from = cluster)
# Define sequence object with TraMineR
seq_data <- seqdef(
census_wide,
var = c("1980", "1990", "2000", "2010", "2020"),
labels = c("White", "Black", "Hispanic", "Asian",
"Black/Hisp", "White Mixed")
)Key function: seqdef() from TraMineR package
Step 3: Measure Sequence Dissimilarity
Why Sequence Similarity Matters
Question: Which sequences are most similar?
A: White → White → Hispanic → Hispanic → Hispanic
B: White → Hispanic → Hispanic → Hispanic → Hispanic
C: Black → Black → Black → Black → Hispanic
Answer depends on: Timing, endpoints, or sequence order?
OMstrans: Sequence-Aware Matching
Traditional OM: Counts edits to transform sequences (doesn’t emphasize ordering)
OMstrans: Focuses on sequences of transitions - Joins each state with previous (AB, BC, CD) - Better captures the process of change - More sensitive to order of transitions
Key difference: “White→Hispanic→Asian” ≠ “White→Asian→Hispanic”
Computing OMstrans
# Calculate transition-based substitution costs
submat <- seqsubm(seq_data, method = "TRATE", transition = "both")
# Compute OMstrans dissimilarity
dist_omstrans <- seqdist(
seq_data,
method = "OMstrans",
indel = 1, # insertion/deletion cost
sm = submat, # substitution cost matrix
otto = 0.1 # origin-transition tradeoff
)Key parameters: - indel = 1: Cost for insert/delete operations - otto = 0.1: Lower values emphasize sequencing - norm = TRUE: Normalize by sequence length
Output: Dissimilarity matrix (pairwise distances)
Step 4: Cluster Similar Sequences
Hierarchical Clustering
How it works: 1. Start with each sequence as own cluster 2. Find two most similar clusters 3. Merge them 4. Repeat until one cluster 5. Cut dendrogram at desired number
Ward’s method: Minimizes within-cluster variance
Choosing Number of Clusters
Unlike k-means, no clear mathematical optimum
Approach: - Start with many clusters (20-30) - Examine sequence plots - Merge if similar patterns - Stop when you’d group opposite trajectories
Example: Don’t merge “increasing White” with “increasing Hispanic”
Step 5: Map Results
Mapping Trajectory Clusters
Final step: Map clusters to see spatial patterns
- Each neighborhood colored by trajectory cluster
- Reveals spatial clustering of similar trajectories
- Tests Chicago School vs. LA School patterns
Key questions: - Are similar trajectories spatially clustered? - Do patterns follow concentric zones? - Or are they fragmented?
Key Takeaways
Sequential Analysis Skills
- K-means: Classify neighborhoods at each time point
- Sequences: Use
seqdef()to create sequence objects - Dissimilarity: Use OMstrans to measure similarity
- Clustering: Hierarchical clustering for trajectories
- Mapping: Visualize spatial patterns
Key Concepts
- Sequences vs. snapshots: Full trajectories reveal more
- OMstrans: Captures order and timing of transitions
- Temporal consistency: Cluster all years together
- Interpretability: Balance fit with meaningful clusters
Common Pitfalls
- Choosing k too small (loses nuance) or too large (over-fragmentation)
- Ignoring interpretability (mathematical optimum ≠ best solution)
- Not mapping results (miss spatial patterns)
- Not considering theory (pure data-driven misses context)
Best Practices
- Test multiple k values
- Examine cluster characteristics
- Visualize sequences before clustering
- Map results to reveal spatial patterns
- Connect to urban theory