Week 4: MUSA 5080
2025-09-29
Spatial Data Fundamentals
Spatial Operations
Geometry Operations
Coordinate Systems
You’ve been working with:
Missing piece: WHERE are these patterns occurring?
This week: Add the spatial dimension to understand:
“Which communities have the lowest income?”
→ Are they clustered? Isolated? Near resources?
“Where should we locate a new health clinic?”
→ Optimize access for underserved populations
“How do school districts compare?”
→ Account for geographic boundaries and spillovers
“Is there environmental justice concern?”
→ Do pollution sources cluster near vulnerable communities?
Geographic bias in algorithms:
Examples:
Your role: Understand spatial patterns to design fairer systems
Real world → Simplified geometric representations
Three basic types:
Each feature has:
Why sf?
Key principle: Spatial data is just data.frame + geometry column
Common spatial data formats:
examples in corresponding code
Extract features based on spatial relationships
examples in code
Key functions: st_filter()
, st_intersects()
, st_touches()
, st_within()
The .predicate
tells st_filter()
what spatial relationship to look for:
r predicate-structure # Basic structure: st_filter(data_to_filter, reference_geometry, .predicate = relationship)
Different questions need different relationships:
st_touches
st_within
st_intersects
Default: If no .predicate
specified, uses st_intersects
For county neighbors, why st_touches
not st_intersects
?
example in code
st_intersects includes the reference feature itself!
Predicate | Definition | Policy Use Case |
---|---|---|
st_intersects() |
Any overlap at all | “Counties affected by flooding” |
st_touches() |
Share boundary, no interior overlap | “Neighboring counties” |
st_within() |
Completely inside | “Schools within district boundaries” |
st_contains() |
Completely contains | “Districts containing hospitals” |
st_overlaps() |
Partial overlap | “Overlapping service areas” |
st_disjoint() |
No spatial relationship | “Counties separate from urban areas” |
Most common: st_intersects()
(any overlap) and st_touches()
(neighbors)
Combine datasets based on spatial relationships
examples in codes
examples in codes
`{r areas} # Calculate county areas
examples in codes
Important: Units depend on coordinate reference system!
The dot (.) is a placeholder that represents the data being passed through the pipe (%>%).
pa_counties <- pa_counties %>% mutate( area_sqkm = as.numeric(st_area(.)) / 1000000 )
The . refers to pa_counties
- the data frame being passed through the pipe. So this is equivalent to:
pa_counties <- pa_counties %>% mutate( area_sqkm = as.numeric(st_area(pa_counties)) / 1000000 )
Create zones around features
`{r buffers} # 10km buffer around all hospitals hospital_buffers <- hospitals %>% st_buffer(dist = 10000) # 10,000 meters
hospital_buffers <- hospitals %>% mutate( buffer_size = case_when( type == “Major Medical Center” ~ 15000, type == “Community Hospital” ~ 10000, type == “Clinic” ~ 5000 ) ) %>% st_buffer(dist = .$buffer_size) `
Policy application: Service accessibility zones, environmental impact areas
Find overlapping areas
The Key Difference st_filter()
with predicates: Selects complete features (keeps or removes entire rows)
st_intersection()
and st_union()
: Modifies geometries (creates new shapes)
Use st_filter()
when:
Use st_intersection()
when:
Combine multiple features into one
Summarize data across spatial boundaries
The Earth is round, maps are flat
Problems:
Example: Measuring areas in latitude/longitude gives wrong answers
Geographic Coordinate Systems (GCS):
Projected Coordinate Systems (PCS):
WGS84 (EPSG:4326)
Web Mercator (EPSG:3857)
State Plane / UTM zones
Albers Equal Area
To simply check current CRS st_crs(pa_counties)
To set CRS (ONLY if missing) pa_counties <- st_set_crs(pa_counties, 4326)
Transform to different CRS
Pennsylvania South State Plane (good for PA analysis)
pa_counties_projected <- pa_counties %>% st_transform(crs = 3365)
Transform to Albers Equal Area (good for area calculations) pa_counties_albers <- pa_counties %>% st_transform(crs = 5070)
`
Transform when:
Typical spatial analysis steps:
Research question: Which communities have poor access to healthcare?
Step 1: Load data
Step 2: Transform to local projection
counties <- st_transform(counties, 3365) hospitals <- st_transform(hospitals, 3365)
Step 3: Calculate access
hospital_buffers <- hospitals %>% st_buffer(dist = 25000) # 25km access zone
county_access <- counties %>% mutate( has_hospital = st_intersects(., st_union(hospital_buffers), sparse = FALSE), distance_to_nearest = as.numeric(st_distance(st_centroid(.), hospitals)) )
Step 4: Join with demographic data and analyze