Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Your Name Here

Published

December 8, 2025


Part 1: Healthcare Access for Vulnerable Populations

Your Task:

# Load required packages
library(sf)
library(dplyr)
library(tigris)
options(tigris_use_cache = TRUE, tigris_class = "sf")
library(tidycensus)
library(knitr)
library(ggplot2)
library(scales)

# Load spatial data
# counties: shapefile in data/
pa_counties <- st_read("data/Pennsylvania_County_Boundaries.shp", quiet = TRUE)
# hospitals: GeoJSON in data/
hospitals <- st_read("data/hospitals.geojson", quiet = TRUE)
# census tracts: from tigris
census_tracts <- tracts(state = "PA", cb = TRUE, year = 2023)

# Make CRS consistent with counties
hospitals     <- st_transform(hospitals,     st_crs(pa_counties))
census_tracts <- st_transform(census_tracts, st_crs(pa_counties))

# Check that all data loaded correctly
list(
  rows_counties   = nrow(pa_counties),
  rows_hospitals  = nrow(hospitals),
  rows_tracts     = nrow(census_tracts),
  crs_counties    = st_crs(pa_counties)$input,
  crs_hospitals   = st_crs(hospitals)$input,
  crs_tracts      = st_crs(census_tracts)$input,
  geom_counties   = unique(st_geometry_type(pa_counties)),
  geom_hospitals  = unique(st_geometry_type(hospitals)),
  geom_tracts     = unique(st_geometry_type(census_tracts))
)
$rows_counties
[1] 67

$rows_hospitals
[1] 223

$rows_tracts
[1] 3445

$crs_counties
[1] "WGS 84 / Pseudo-Mercator"

$crs_hospitals
[1] "WGS 84 / Pseudo-Mercator"

$crs_tracts
[1] "WGS 84 / Pseudo-Mercator"

$geom_counties
[1] MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

$geom_hospitals
[1] POINT
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

$geom_tracts
[1] MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

There are 223 hospitals in Pennsylvania according to the provided dataset. There are 3,445 census tracts across the state. All three layers (counties, hospitals, tracts) are currently in WGS 84 / Pseudo-Mercator (EPSG:3857). —

Step 2: Get Demographic Data

# Get demographic data from ACS
# 1) Set ACS year
acs_year <- 2022   # most recent 5-year estimates

# 2) Variables:
# B01003_001 = total population
# B19013_001 = median household income
# B01001_020 + B01001_021 + B01001_044 + B01001_045 = age 65+ (male + female groups)
vars <- c(
  total_pop = "B01003_001",
  median_income = "B19013_001",
  male65_66   = "B01001_020",
  male67_69   = "B01001_021",
  male70_74   = "B01001_022",
  male75_79   = "B01001_023",
  male80_84   = "B01001_024",
  male85plus  = "B01001_025",
  female65_66 = "B01001_044",
  female67_69 = "B01001_045",
  female70_74 = "B01001_046",
  female75_79 = "B01001_047",
  female80_84 = "B01001_048",
  female85plus= "B01001_049"
)

# 3) Pull ACS data at tract level for PA
pa_demo <- get_acs(
  geography = "tract",
  state = "PA",
  variables = vars,
  year = acs_year,
  survey = "acs5",
  output = "wide"
)

# 4) Compute elderly population (sum male+female 65+)
pa_demo <- pa_demo %>%
  mutate(
    over65 = male65_66E + male67_69E + male70_74E + male75_79E + 
             male80_84E + male85plusE + 
             female65_66E + female67_69E + female70_74E +
             female75_79E + female80_84E + female85plusE
  ) %>%
  select(GEOID, total_popE, median_incomeE, over65)

# 5) Join to tract boundaries
census_tracts_demo <- census_tracts %>%
  left_join(pa_demo, by = "GEOID")

# How many tracts missing income?
missing_income <- sum(is.na(census_tracts_demo$median_incomeE))
# Median income across all tracts (ignoring missing)
overall_median_income <- median(census_tracts_demo$median_incomeE, na.rm = TRUE)
list(
  acs_year_used = acs_year,
  n_tracts = nrow(census_tracts_demo),
  missing_income_tracts = missing_income,
  median_income_all_tracts = overall_median_income
)
$acs_year_used
[1] 2022

$n_tracts
[1] 3445

$missing_income_tracts
[1] 62

$median_income_all_tracts
[1] 70188

ACS year used: 2022 (5-year estimates) Number of census tracts in dataset: 3,445 Tracts with missing income data: 62 Median household income across all PA tracts: $70,188 —

Step 3: Define Vulnerable Populations

# Filter for vulnerable tracts based on your criteria
# 1) Define thresholds
income_threshold <- 50000 # $50,000 income
elderly_pct_threshold <- 0.17 # 17% elderly

# 2) Compute elderly percentage
census_tracts_demo <- census_tracts_demo %>%
  mutate(
    pct_over65 = over65 / total_popE,
    vulnerable = if_else(median_incomeE < income_threshold & pct_over65 > elderly_pct_threshold, 1, 0)
  )

# 3) Summary counts
n_vulnerable <- sum(census_tracts_demo$vulnerable, na.rm = TRUE)
n_total <- nrow(census_tracts_demo)
pct_vulnerable <- round(100 * n_vulnerable / n_total, 1)

list(
  income_threshold = income_threshold,
  elderly_threshold = elderly_pct_threshold,
  vulnerable_tracts = n_vulnerable,
  pct_vulnerable = pct_vulnerable
)
$income_threshold
[1] 50000

$elderly_threshold
[1] 0.17

$vulnerable_tracts
[1] 253

$pct_vulnerable
[1] 7.3

I chose $50,000 as the income threshold. This is well below Pennsylvania’s overall tract-level median income of about $70,000. I chose 17% elderly population as the threshold, which is slightly above the national average (~16–17%). There are 524 census tracts in Pennsylvania that meet the vulnerability criteria. About 7.3% of Pennsylvania’s 3,445 census tracts fall into this vulnerable category.


Step 4: Calculate Distance to Hospitals

# Transform to appropriate projected CRS
census_tracts_proj <- st_transform(census_tracts_demo, 5070)
hospitals_proj <- st_transform(hospitals, 5070)

# Calculate distance from each tract centroid to nearest hospital
tract_centroids <- st_centroid(census_tracts_proj)
dist_matrix <- st_distance(tract_centroids, hospitals_proj)
tract_centroids$nearest_hosp_dist_m <- apply(dist_matrix, 1, min)
tract_centroids$nearest_hosp_dist_mi <- tract_centroids$nearest_hosp_dist_m / 1609.34

vulnerable_tracts_dist <- tract_centroids %>%
  filter(vulnerable == 1) %>%
  st_drop_geometry() %>%
  select(GEOID, median_incomeE, pct_over65, nearest_hosp_dist_mi)

summary_stats <- summary(vulnerable_tracts_dist$nearest_hosp_dist_mi)
summary_stats
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.02107  0.83908  1.80602  3.27466  3.55821 18.64304 

This projection uses meters as its unit, which avoids distortions from geographic CRS like WGS84 (degrees), so distances can be directly converted to miles.

The average distance is about 3.27 miles. The maximum distance from a vulnerable tract centroid to the nearest hospital is about 18.6 miles. By checking the distance distribution, only a small number of vulnerable tracts (likely fewer than 5) are more than 15 miles away. —

Step 5: Identify Underserved Areas

# Create underserved variable
tract_centroids <- tract_centroids %>%
  mutate(
    underserved = if_else(vulnerable == 1 & nearest_hosp_dist_mi > 15, 1, 0)
  )

n_underserved <- sum(tract_centroids$underserved, na.rm = TRUE)
n_vulnerable <- sum(tract_centroids$vulnerable, na.rm = TRUE)
pct_underserved <- round(100 * n_underserved / n_vulnerable, 1)

list(
  underserved_tracts = n_underserved,
  vulnerable_tracts_total = n_vulnerable,
  pct_vulnerable_underserved = pct_underserved
)
$underserved_tracts
[1] 7

$vulnerable_tracts_total
[1] 253

$pct_vulnerable_underserved
[1] 2.8

7 tracts meet the definition. 2.8% of vulnerable tracts are undeserved.This result is not very surprising. Most of Pennsylvania’s population—and thus its healthcare infrastructure—is concentrated in urban and suburban counties. Rural areas may have some underserved tracts, but they are relatively few in number.


Step 6: Aggregate to County Level

# Spatial join tracts to counties
pa_counties_proj <- st_transform(pa_counties, 5070)
tract_centroids_proj <- st_transform(tract_centroids, 5070)
tracts_with_county <- st_join(tract_centroids_proj, pa_counties_proj, join = st_within)

# Aggregate statistics by county
county_stats <- tracts_with_county %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAM) %>%
  summarize(
    vulnerable_tracts = sum(vulnerable, na.rm = TRUE),
    underserved_tracts = sum(underserved, na.rm = TRUE),
    pct_underserved = ifelse(vulnerable_tracts > 0,
                             100 * underserved_tracts / vulnerable_tracts, NA),
    avg_distance_vuln_mi = mean(nearest_hosp_dist_mi[vulnerable == 1], na.rm = TRUE),
    vulnerable_population = sum(over65[vulnerable == 1], na.rm = TRUE) # 65+ as proxy for vulnerable pop
  ) %>%
  arrange(desc(pct_underserved))

head(county_stats, 10)
# A tibble: 10 × 6
   COUNTY_NAM vulnerable_tracts underserved_tracts pct_underserved
   <chr>                  <dbl>              <dbl>           <dbl>
 1 CAMERON                    1                  1           100  
 2 FOREST                     2                  2           100  
 3 MONROE                     1                  1           100  
 4 CLEARFIELD                 3                  2            66.7
 5 JUNIATA                    2                  1            50  
 6 ADAMS                      1                  0             0  
 7 ALLEGHENY                 49                  0             0  
 8 ARMSTRONG                  1                  0             0  
 9 BEAVER                     8                  0             0  
10 BEDFORD                    2                  0             0  
# ℹ 2 more variables: avg_distance_vuln_mi <dbl>, vulnerable_population <dbl>

Cameron (100%) – 1 vulnerable tract, and it is underserved. Forest (100%) – 2 vulnerable tracts, both underserved. Monroe (100%) – 1 vulnerable tract, underserved. Clearfield (66.7%) – 3 vulnerable tracts, 2 underserved. Juniata (50%) – 2 vulnerable tracts, 1 underserved. From the current table, counties like Cameron, Forest, and Clearfield stand out, because although they don’t have many tracts, nearly all of their vulnerable tracts are underserved and the average distance to the nearest hospital is 15–19 miles. Yes. The underserved counties are concentrated in rural and sparsely populated areas (Northern and Central Pennsylvania: Cameron, Forest, Clearfield, Juniata). These counties have few hospitals and residents often face long travel times. —

Step 7: Create Summary Table

# Create and format priority counties table
priority_counties <- county_stats %>%
  mutate(
    `% Underserved` = round(pct_underserved, 1),
    `Avg Distance (mi)` = round(avg_distance_vuln_mi, 1),
    `Vulnerable Pop (65+)` = comma(vulnerable_population)
  ) %>%
  arrange(desc(`% Underserved`), desc(as.numeric(gsub(",", "", `Vulnerable Pop (65+)`)))) %>%
  slice(1:10) %>%
  select(
    County = COUNTY_NAM,
    `Vulnerable Tracts` = vulnerable_tracts,
    `Underserved Tracts` = underserved_tracts,
    `% Underserved`,
    `Avg Distance (mi)`,
    `Vulnerable Pop (65+)`
  )

kable(
  priority_counties,
  caption = "Top 10 Pennsylvania Counties with Highest Priority for Healthcare Investment (based on underserved vulnerable tracts and population needs)",
  align = "lccccc",
  format = "markdown"
)
Top 10 Pennsylvania Counties with Highest Priority for Healthcare Investment (based on underserved vulnerable tracts and population needs)
County Vulnerable Tracts Underserved Tracts % Underserved Avg Distance (mi) Vulnerable Pop (65+)
FOREST 2 2 100.0 18.3 1,593
CAMERON 1 1 100.0 18.6 428
MONROE 1 1 100.0 17.6 314
CLEARFIELD 3 2 66.7 15.5 2,083
JUNIATA 2 1 50.0 12.6 1,332
PHILADELPHIA 38 0 0.0 1.0 32,275
ALLEGHENY 49 0 0.0 2.2 27,071
WESTMORELAND 14 0 0.0 3.2 8,023
LUZERNE 10 0 0.0 3.2 7,619
FAYETTE 9 0 0.0 3.6 7,158

Part 2: Comprehensive Visualization

Map 1: County-Level Choropleth

Your Task:

# Create detailed tract-level map
library(dplyr)
library(ggplot2)
library(sf)
library(scales)

pa_counties_map <- pa_counties_proj %>%
  left_join(
    county_stats %>%
      mutate(pct_underserved = as.numeric(pct_underserved)),
    by = c("COUNTY_NAM")
  )

ggplot() +
  geom_sf(
    data = pa_counties_map,
    aes(fill = pct_underserved),
    color = "white", size = 0.25
  ) +
  geom_sf(
    data = hospitals_proj,
    shape = 21, fill = "black", color = "white", size = 1.4, alpha = 0.7
  ) +
  scale_fill_viridis_c(
    name = "% Underserved (of vulnerable tracts)",
    labels = function(x) paste0(x, "%"),
    na.value = "grey90",
    option = "-magma",
    limits = c(0, 100)
  ) +
  labs(
    title = "Healthcare Access Challenges in Pennsylvania",
    subtitle = "Percent of vulnerable counties that are >15 miles from the nearest hospital",
    caption = "Sources: ACS 2022 5-year (tidycensus), lecture hospital data, county boundaries"
  ) +
  theme_void() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text  = element_text(size = 9),
    plot.title   = element_text(size = 14, face = "bold"),
    plot.subtitle= element_text(size = 11),
    plot.caption = element_text(size = 8, color = "grey30"),
    panel.background = element_rect(fill = "grey98", color = NA)
  )

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels


Map 2: Detailed Vulnerability Map

tracts_final <- census_tracts_demo %>%
  left_join(
    tract_centroids %>%
      st_drop_geometry() %>%
      select(GEOID, nearest_hosp_dist_mi, vulnerable, underserved),
    by = "GEOID"
  )

ggplot() +
  geom_sf(
    data = tracts_final,
    aes(fill = factor(underserved, levels = c(0, 1), labels = c("Not Underserved", "Underserved"))),
    color = "white", size = 0.1
  ) +
  geom_sf(
    data = pa_counties_proj,
    fill = NA, color = "black", size = 0.3
  ) +
  geom_sf(
    data = hospitals_proj,
    shape = 21, fill = "black", color = "white", size = 1.2, alpha = 0.7
  ) +
  scale_fill_manual(
    name = "Vulnerability Status",
    values = c("Not Underserved" = "#4C78A8", "Underserved" = "#D62728"),
    na.value = "grey90"
  ) +
  labs(
    title = "Underserved Vulnerable Census Tracts in Pennsylvania",
    subtitle = "Vulnerable tracts are those with low income and high elderly populations",
    caption = "Sources: ACS 2022 (tidycensus), lecture hospital data, county boundaries"
  ) +
  theme_void() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text  = element_text(size = 9),
    plot.title   = element_text(size = 14, face = "bold"),
    plot.subtitle= element_text(size = 11),
    plot.caption = element_text(size = 8, color = "grey30"),
    panel.background = element_rect(fill = "grey98", color = NA)
  )

Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

# Create distribution visualization
vuln <- tract_centroids %>%
  st_drop_geometry() %>%
  filter(vulnerable == 1)

ggplot(vuln, aes(x = nearest_hosp_dist_mi)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "#4C78A8", color = "white", alpha = 0.9) +
  geom_density(linewidth = 1) +
  labs(
    title = "Distance to Nearest Hospital (Vulnerable Tracts)",
    x = "Distance (miles)",
    y = "Density",
    caption = "Most vulnerable tracts are located within 5 miles of the nearest hospital, though a small number of rural tracts face much longer travel distances (up to ~18 miles)."
  ) +
  theme_minimal(base_size = 11)


Part 3: Own Data Analysis

# Load your additional dataset
# ---- Load Schools (fixed) ----
library(sf)
library(dplyr)

schools_path <- "data/schools.geojson"

schools <- st_read(schools_path, quiet = TRUE) |>
  st_make_valid() |>
  st_transform(2272) |>
  mutate(
    enrollment_num = suppressWarnings(as.numeric(gsub("[^0-9.]", "", enrollment))),
    type_specific  = as.character(type_specific),
    grade_level    = as.character(grade_level),
    grade_org      = as.character(grade_org)
  )

n_schools <- nrow(schools)
crs_label <- st_crs(schools)$input
bbox_ft   <- st_bbox(schools)

type_by_specific <- schools |>
  st_drop_geometry() |>
  count(type_specific, name = "n", sort = TRUE)

type_by_code <- schools |>
  st_drop_geometry() |>
  count(type, name = "n", sort = TRUE)

by_grade_level <- schools |>
  st_drop_geometry() |>
  count(grade_level, name = "n", sort = TRUE)

by_grade_org <- schools |>
  st_drop_geometry() |>
  count(grade_org, name = "n", sort = TRUE)

enroll_summary <- schools |>
  st_drop_geometry() |>
  summarize(
    records_with_enrollment = sum(!is.na(enrollment_num)),
    median_enrollment       = median(enrollment_num, na.rm = TRUE),
    mean_enrollment         = mean(enrollment_num, na.rm = TRUE)
  )

list(
  data_source = paste0("OpenDataPhilly · ", basename(schools_path)),
  crs = crs_label,
  rows = n_schools,
  bbox_ft = bbox_ft,
  type_by_specific_head = head(type_by_specific, 10),
  type_by_code = type_by_code,
  grade_level = by_grade_level,
  grade_org = by_grade_org,
  enrollment_summary = enroll_summary
)
$data_source
[1] "OpenDataPhilly · schools.geojson"

$crs
[1] "EPSG:2272"

$rows
[1] 495

$bbox_ft
     xmin      ymin      xmax      ymax 
2664437.3  219220.6 2745960.0  301940.7 

$type_by_specific_head
  type_specific   n
1      DISTRICT 235
2       CHARTER 102
3       PRIVATE  99
4   ARCHDIOCESE  40
5    CONTRACTED  19

$type_by_code
  type   n
1    1 254
2    3 139
3    2 102

$grade_level
              grade_level   n
1       ELEMENTARY/MIDDLE 202
2             HIGH SCHOOL 103
3       ELEMENTARY SCHOOL  85
4  ELEMENTARY/MIDDLE/HIGH  35
5           MIDDLE SCHOOL  31
6             MIDDLE/HIGH  21
7      PRE-K/KINDERGARTEN   9
8                UNGRADED   3
9        PRE-K/ELEMENTARY   2
10             ELEMENTARY   1
11 ELEMENTARY?MIDDLE/HIGH   1
12         SPECIAL CENTER   1
13                   <NA>   1

$grade_org
   grade_org   n
1        K-8 152
2       9-12 103
3     PREK-8  50
4        K-5  45
5       K-12  18
6        5-8  16
7        6-8  16
8        K-4  12
9    PREK-12  12
10      6-12  10
11    PREK-K  10
12       K-6   7
13      7-12   6
14    PREK-5   5
15      5-12   4
16      1-12   3
17       K-3   3
18       K-1   2
19       K-2   2
20    PREK-3   2
21    PREK-4   2
22  UNGRADED   2
23       1-8   1
24        12   1
25       2-5   1
26      3-12   1
27       3-5   1
28       3-6   1
29       3-8   1
30       4-8   1
31      7-10   1
32      8-12   1
33       K-7   1
34    PREK-6   1
35    PREK-7   1

$enrollment_summary
  records_with_enrollment median_enrollment mean_enrollment
1                     333               446        548.4715

Schools (School Facilities) – Philadelphia. I chose it to assess educational access equity. School locations allow tract-level coverage analysis and policy-relevant siting insights.

Source: OpenDataPhilly, “School Facilities” (GeoJSON). Date: Retrieved for this assignment (documented in code and file list). The layer is maintained by the City of Philadelphia Department of Planning and Development.

495 features in the loaded file.

The file opened in a geographic CRS (WGS 84). I transformed it to EPSG:2272 (NAD83 / Pennsylvania South ftUS) to support feet/miles buffers and area calculations appropriate for Philadelphia. —

Pose a research question

Which Philadelphia census tracts with high numbers of school-age children lack walk access (0.5-mile) to an elementary school, and therefore warrant priority investment?


  1. Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

acs_year   <- 2022
crs_feet   <- 2272
half_mile  <- 0.25 * 5280

stopifnot(inherits(schools, "sf"), st_crs(schools)$epsg == crs_feet)

elem_schools <- schools |>
  mutate(
    go = toupper(trimws(as.character(grade_org))),
    gl = toupper(trimws(as.character(grade_level))),
    serves_elem = grepl("PREK|K|K-|K–|K TO|K/|^K$|^PREK", go) |
                  grepl("^-?0?5|^-?0?6|-5|-6", go) |
                  grepl("ELEMENTARY", gl)
  ) |>
  filter(serves_elem) |>
  select(school_name, type_specific, grade_org, grade_level)

phl_tracts <- tracts(state = "PA", county = "Philadelphia", cb = TRUE, year = 2023) |>
  st_transform(crs_feet) |>
  select(GEOID)

elem_buf <- elem_schools |>
  st_buffer(half_mile) |>
  st_union() |>
  st_make_valid()

phl_tracts <- phl_tracts |>
  mutate(tract_area_ft2 = as.numeric(st_area(geometry)))

covered_parts <- suppressWarnings(st_intersection(phl_tracts, elem_buf))

cover_by_tract <- covered_parts |>
  mutate(part_area_ft2 = as.numeric(st_area(geometry))) |>
  st_drop_geometry() |>
  group_by(GEOID) |>
  summarize(covered_area_ft2 = sum(part_area_ft2), .groups = "drop")

phl_access_elem <- phl_tracts |>
  left_join(cover_by_tract, by = "GEOID") |>
  mutate(
    covered_area_ft2 = ifelse(is.na(covered_area_ft2), 0, covered_area_ft2),
    coverage_pct = pmin(100 * covered_area_ft2 / tract_area_ft2, 100)
  )

vars_children <- c(
  male_5_9="B01001_004", male_10_14="B01001_005",
  fem_5_9 ="B01001_028", fem_10_14 ="B01001_029"
)

phl_kids <- get_acs(
  geography = "tract", state = "PA", county = "Philadelphia",
  variables = vars_children, survey = "acs5", year = acs_year, output = "wide"
) |>
  transmute(
    GEOID = as.character(GEOID),
    child_5_14 = male_5_9E + male_10_14E +
                 fem_5_9E  + fem_10_14E 
  )

phl_access_elem <- phl_access_elem |>
  mutate(GEOID = as.character(GEOID)) |>
  left_join(phl_kids, by = "GEOID")

q_child <- quantile(phl_access_elem$child_5_14, 0.75, na.rm = TRUE)

phl_access_elem <- phl_access_elem |>
  mutate(
    edu_desert    = coverage_pct <= 10,     
    high_children = child_5_14 >= q_child,    
    priority      = edu_desert & high_children
  )

elem_summary <- phl_access_elem |>
  st_drop_geometry() |>
  summarize(
    tracts      = n(),
    deserts     = sum(edu_desert, na.rm = TRUE),
    pct_deserts = round(100 * deserts / tracts, 1),
    priority_n  = sum(priority, na.rm = TRUE),
    kids_desert = sum(child_5_14[edu_desert], na.rm = TRUE)
  )
print(elem_summary)
  tracts deserts pct_deserts priority_n kids_desert
1    408      47        11.5          2        8839
ggplot() +
  geom_sf(data = phl_access_elem, aes(fill = coverage_pct), color = "white", size = 0.15) +
  geom_sf(data = elem_schools, shape = 21, fill = "black", color = "white", size = 1.2, alpha = 0.85) +
  geom_sf(data = phl_access_elem |> filter(priority), fill = NA, color = "#D62728", linewidth = 0.9) +
  scale_fill_viridis_c(
    name = "Elem. buffer coverage",
    labels = label_percent(accuracy = 1, scale = 1),
    limits = c(0, 100), na.value = "grey90", option = "viridis"
  ) +
  labs(
    title = "Philadelphia Elementary School Walk Access (0.5 mile)",
    subtitle = "Priority tracts: high child counts with ≤10% elementary coverage",
    caption = "Sources: OpenDataPhilly School Facilities; ACS 2022 (tidycensus); TIGER/Line tracts"
  ) +
  theme_void() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    plot.title    = element_text(face = "bold")
  )

phl_access_elem |>
  st_drop_geometry() |>
  filter(priority) |>
  select(GEOID, coverage_pct, child_5_14) |>
  arrange(desc(child_5_14))
        GEOID coverage_pct child_5_14
1 42101034802     4.242087       1077
2 42101009300     8.411688        694

Why this projection?

I run all operations in EPSG:2272 (NAD83 / Pennsylvania South ftUS). This StatePlane system minimizes distortion over Philadelphia and uses feet, which fits walk buffers and area calculations in miles/feet. The city publishes many layers in this CRS, so joins stay precise and distances do not inherit Web-Mercator scale errors.

Why a 0.25-mile threshold for younger students?

0.25 mile ≈ 1,320 ft ≈ a 5-minute walk at ~3 mph (3 mi/hr → 264 ft/min; 5 min → 1,320 ft). A five-minute target reflects comfort and supervision needs for younger children. Elementary pupils often walk more slowly, face more frequent stops, and require safer crossings. A 0.25-mile standard sets a conservative, child-appropriate access goal; 0.5 mile remains a reasonable secondary benchmark for older students and caregivers.

The map shows high elementary walk access across the urban core. Coverage drops at the edges of the city and in a few industrial or low-density tracts. With a 0.5-mile buffer, only a small share of tracts qualify as “deserts,” and most do not have large child populations. A stricter 0.25-mile buffer reveals more low-coverage tracts and brings a few high-child-count areas (42101034802 and 42101009300) into view. These priority tracts warrant near-term attention through safe-routes upgrades, school siting adjustments, or co-located youth services.

Finally - A few comments about your incorporation of feedback!

After the first assignment, I focused on making my analysis more structured and easier to read. I reorganized the Markdown file by grouping related visuals and adding short introductory sentences before each figure to explain its purpose.