Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Sihan Yu

Published

October 14, 2025

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences

Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages

library(sf)
library(tidyverse)
library(tigris)
library(tidycensus)
library(scales)
library(patchwork)
library(here)

# Set Census API key
census_api_key("e173851c633e89c20243632174db68f63bac8856")

# Load spatial data

pa_counties <- st_read(here("assignments/assignments2/data/Pennsylvania_County_Boundaries.shp"))

Reading layer `Pennsylvania_County_Boundaries' from data source 
  `D:\MUSA5080_Github\portfolio-setup-sihan-yu429\assignments\assignments2\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator

hospitals <- st_read(here("assignments/assignments2/data/hospitals.geojson"))

Reading layer `hospitals' from data source 
  `D:\MUSA5080_Github\portfolio-setup-sihan-yu429\assignments\assignments2\data\hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84

census_tracts <- tracts(state = "PA", cb = TRUE, progress_bar = FALSE)


# Standardize CRS
hospitals <- st_transform(hospitals, st_crs(pa_counties))
census_tracts <- st_transform(census_tracts, st_crs(pa_counties))

# check that all data loaded correctly

# Number of hospitals
n_hospitals <- nrow(hospitals)


# Number of census tracts
n_tracts <- nrow(census_tracts)

cat("Number of hospitals:", nrow(hospitals), "\n")

Number of hospitals: 223

cat("Number of census tracts:", nrow(census_tracts), "\n")

Number of census tracts: 3445

cat("\nCoordinate Reference Systems (CRS):\n")


Coordinate Reference Systems (CRS):

cat("Counties:", st_crs(pa_counties)$input, "\n")

Counties: WGS 84 / Pseudo-Mercator

cat("Hospitals:", st_crs(hospitals)$input, "\n")

Hospitals: WGS 84 / Pseudo-Mercator

cat("Census tracts:", st_crs(census_tracts)$input, "\n")

Census tracts: WGS 84 / Pseudo-Mercator

Check that all data loaded correctly

Questions to answer:
- How many hospitals are in your dataset?
There are 223 hospitals in the dataset.
- How many census tracts?
There are 3445 census tracts.
- What coordinate reference system is each dataset in?
They share the same coordinate reference system: WGS 84.

Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
#### Step 2: Get Demographic Data ----

pa_demo <- get_acs(
  geography = "tract",
  state = "PA",
  year = 2022,             
  survey = "acs5",
  variables = c(
    total_pop = "B01003_001",       # Total population
    median_income = "B19013_001",   # Median household income
    age_65_66 = "B01001_020",       # Male 65–66
    age_67_69 = "B01001_021",
    age_70_74 = "B01001_022",
    age_75_79 = "B01001_023",
    age_80_84 = "B01001_024",
    age_85_plus = "B01001_025",
    f_65_66 = "B01001_044",         # Female 65–66
    f_67_69 = "B01001_045",
    f_70_74 = "B01001_046",
    f_75_79 = "B01001_047",
    f_80_84 = "B01001_048",
    f_85_plus = "B01001_049"
  ),
  output = "wide"
)


pa_demo <- pa_demo %>%
  mutate(
    pop_65_over = age_65_66E + age_67_69E + age_70_74E + age_75_79E +
                  age_80_84E + age_85_plusE +
                  f_65_66E + f_67_69E + f_70_74E + f_75_79E +
                  f_80_84E + f_85_plusE
  ) %>%
  select(GEOID, total_pop = total_popE, median_income = median_incomeE, pop_65_over)



# Join to tract boundaries
census_tracts_demo <- census_tracts %>%
  left_join(pa_demo, by = "GEOID")


# 1. What year of ACS data are you using?
acs_year <- 2022
cat("ACS Year Used:", acs_year, "\n")

ACS Year Used: 2022

# 2. How many tracts have missing income data?
missing_income <- sum(is.na(census_tracts_demo$median_income))
cat("Number of tracts with missing income data:", missing_income, "\n")

Number of tracts with missing income data: 62

# 3. What is the median income across all PA census tracts?
median_income_pa <- median(census_tracts_demo$median_income, na.rm = TRUE)
cat("Median income across all PA census tracts:", dollar(median_income_pa), "\n")

Median income across all PA census tracts: $70,188

Questions to answer:
- What year of ACS data are you using?
I used data of 2022.
- How many tracts have missing income data?
There are 62 tracts missing.
- What is the median income across all PA census tracts?
The medium income across all PA census tracts is $70,188. —

Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
income_threshold <- quantile(census_tracts_demo$median_income, 0.25, na.rm = TRUE)
elderly_threshold <- quantile(
  (census_tracts_demo$pop_65_over / census_tracts_demo$total_pop), 
  0.75, 
  na.rm = TRUE
)

cat("Low-income threshold:", scales::dollar(income_threshold), "\n")

Low-income threshold: $55,923.50

cat("High-elderly threshold (share of 65+):", scales::percent(elderly_threshold), "\n")

High-elderly threshold (share of 65+): 23%

census_tracts_demo <- census_tracts_demo %>%
  mutate(
    elderly_share = pop_65_over / total_pop
  )

vulnerable_tracts <- census_tracts_demo %>%
  filter(
    median_income <= income_threshold &
    elderly_share >= elderly_threshold
  )
cat("Number of vulnerable tracts identified:", nrow(vulnerable_tracts), "\n")

Number of vulnerable tracts identified: 165

total_tracts <- nrow(census_tracts_demo)
vulnerable_count <- nrow(vulnerable_tracts)
vulnerable_pct <- (vulnerable_count / total_tracts) * 100
cat("Percentage of PA census tracts considered vulnerable:", round(vulnerable_pct, 1), "%\n")

Percentage of PA census tracts considered vulnerable: 4.8 %

Questions to answer:
- What income threshold did you choose and why?
I used $55,923.50 has the income threshold. It is based on the percentiles of the general income data and this is the benchmarks of the bottom 25% of the income across Pennsylvania counties.
- What elderly population threshold did you choose and why?
I used the same method as for the income threshold. I used the top 25% elderly share(23%) as my threshold.
- How many tracts meet your vulnerability criteria?
There are 165 tracts meet my vulnerability criteria.
- What percentage of PA census tracts are considered vulnerable by your definition?
There are 4.8% of PA census tracts that are vulnerable. —

Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

# Transform to appropriate projected CRS
library(units)
vulnerable_tracts_proj <- st_transform(vulnerable_tracts, 5070)
hospitals_proj <- st_transform(hospitals, 5070)

tract_centroids <- st_centroid(vulnerable_tracts_proj)
# Calculate distance from each tract centroid to nearest hospital
dist_matrix <- st_distance(tract_centroids, hospitals_proj)
nearest_hospital_dist_m <- apply(dist_matrix, 1, min)

nearest_hospital_dist_miles <- as.numeric(nearest_hospital_dist_m)/1609.34

vulnerable_tracts_proj <- vulnerable_tracts_proj %>%
  mutate(distance_to_hospital_miles = nearest_hospital_dist_miles)

avg_distance <- mean(vulnerable_tracts_proj$distance_to_hospital_miles, na.rm = TRUE)
max_distance <- max(vulnerable_tracts_proj$distance_to_hospital_miles, na.rm = TRUE)
num_over_15 <- sum(vulnerable_tracts_proj$distance_to_hospital_miles > 15, na.rm = TRUE)

cat(
  "Average distance to nearest hospital (miles):", round(avg_distance, 2), "\n",
  "Maximum distance to nearest hospital (miles):", round(max_distance, 2), "\n",
  "Number of vulnerable tracts >15 miles from hospital:", num_over_15, "\n"
)

Average distance to nearest hospital (miles): 4.75 
 Maximum distance to nearest hospital (miles): 19.32 
 Number of vulnerable tracts >15 miles from hospital: 10

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer:
- What is the average distance to the nearest hospital for vulnerable tracts?
The average distance is 4.75 miles.
- What is the maximum distance?
The maximum distance is 19.16 miles.
- How many vulnerable tracts are more than 15 miles from the nearest hospital?
There are 9 tracts are more than 15 miles from the nearest hospital. —

Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable

vulnerable_tracts_proj <- vulnerable_tracts_proj %>%
  mutate(underserved = distance_to_hospital_miles > 15)

num_underserved <- sum(vulnerable_tracts_proj$underserved, na.rm = TRUE)

# Percentage of vulnerable tracts that are underserved
perc_underserved <- num_underserved / nrow(vulnerable_tracts_proj) * 100

cat("Number of underserved tracts:", num_underserved, "\n")

Number of underserved tracts: 10

cat("Percentage of vulnerable tracts underserved:", round(perc_underserved, 2), "%\n")

Percentage of vulnerable tracts underserved: 6.06 %

Questions to answer:
- How many tracts are underserved?
There are 9 tracts are underserved.
- What percentage of vulnerable tracts are underserved?
There are 5.45% of the vulnerable tracts are underserved.
- Does this surprise you? Why or why not?
It doesn’t surprise me. It shows that the hospitals are relatively well_distributed in Pennsylvania that only a small amount of vulnerable tracts are more than 15 miles from the nearest hospital. The few underserved tract are likely in rural areas, however, most of the population concentrated in urban and suburban counties. —

Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

# Spatial join tracts to counties

pa_counties_proj <- st_transform(pa_counties, 5070)

tract_centroids <- st_centroid(vulnerable_tracts_proj)
tract_centroids <- cbind(tract_centroids, vulnerable_tracts_proj)
tracts_with_county <- st_join(tract_centroids, pa_counties_proj, join = st_within)

# Aggregate statistics by county
county_stats <- tracts_with_county %>%
  group_by(COUNTY_NAM) %>%          
  summarise(
    n_vulnerable_tracts = n(),
    n_underserved = sum(distance_to_hospital_miles > 15, na.rm = TRUE),
    pct_underserved = n_underserved / n_vulnerable_tracts * 100,
    avg_distance_miles = mean(distance_to_hospital_miles, na.rm = TRUE),
    total_vulnerable_pop = sum(total_pop, na.rm = TRUE)
  ) %>%
  arrange(desc(pct_underserved))

st_drop_geometry(county_stats)

# A tibble: 50 × 6
   COUNTY_NAM n_vulnerable_tracts n_underserved pct_underserved
 * <chr>                    <int>         <int>           <dbl>
 1 BRADFORD                     1             1           100  
 2 FOREST                       1             1           100  
 3 JUNIATA                      1             1           100  
 4 MONROE                       1             1           100  
 5 SULLIVAN                     1             1           100  
 6 CLEARFIELD                   4             2            50  
 7 DAUPHIN                      2             1            50  
 8 POTTER                       3             1            33.3
 9 CRAWFORD                     4             1            25  
10 ALLEGHENY                   22             0             0  
# ℹ 40 more rows
# ℹ 2 more variables: avg_distance_miles <dbl>, total_vulnerable_pop <dbl>

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer:
- Which 5 counties have the highest percentage of underserved vulnerable tracts?
The 5 counties are Bradford, Forest, Juniata, Monroe, Sullivan.They all have 1 tract that is vulnerable and undeserved.
- Which counties have the most vulnerable people living far from hospitals?
Allegheny County has the most vulnerable people ~51790 living far from hospitals.
- Are there any patterns in where underserved counties are located?
The underserved counties are located in northern and central rural Pennsyvania with relatively low population density. —

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

# Create and format priority counties table
library(knitr)
library(dplyr)
library(scales)

priority_counties <- county_stats %>%
  arrange(desc(pct_underserved)) %>%
  mutate(
    pct_underserved = round(pct_underserved, 1),
    avg_distance_miles = round(avg_distance_miles, 1),
    total_vulnerable_pop = comma(total_vulnerable_pop)
  ) %>%
  select(
    County = COUNTY_NAM,
    `Vulnerable Tracts` = n_vulnerable_tracts,
    `Underserved Tracts` = n_underserved,
    `Underserved (%)` = pct_underserved,
    `Avg Distance to Hospital (miles)` = avg_distance_miles,
    `Total Vulnerable Population` = total_vulnerable_pop
  ) 

priority_counties_top10 <- priority_counties %>% slice(1:10)

kable(
  st_drop_geometry(priority_counties),
  caption = "Table: Top 10 Priority Counties for Healthcare Investment in Pennsylvania",
  align = "lccccr",
  format = "pipe"
)

Table: Top 10 Priority Counties for Healthcare Investment in Pennsylvania
County	Vulnerable Tracts	Underserved Tracts	Underserved (%)	Avg Distance to Hospital (miles)	Total Vulnerable Population
BRADFORD	1	1	100.0	16.7	5,466
FOREST	1	1	100.0	18.0	2,701
JUNIATA	1	1	100.0	16.0	1,782
MONROE	1	1	100.0	17.6	1,299
SULLIVAN	1	1	100.0	16.9	918
CLEARFIELD	4	2	50.0	11.3	13,056
DAUPHIN	2	1	50.0	10.0	8,410
POTTER	3	1	33.3	9.5	9,062
CRAWFORD	4	1	25.0	8.3	9,072
ALLEGHENY	22	0	0.0	2.8	51,790
ARMSTRONG	1	0	0.0	10.1	2,106
BEAVER	5	0	0.0	2.8	9,763
BEDFORD	2	0	0.0	8.0	7,578
BLAIR	3	0	0.0	3.1	7,602
CAMBRIA	6	0	0.0	4.8	13,551
CLARION	2	0	0.0	9.3	3,586
CLINTON	1	0	0.0	4.9	3,176
CUMBERLAND	1	0	0.0	2.0	2,438
DELAWARE	1	0	0.0	0.5	2,972
ELK	1	0	0.0	10.4	1,557
ERIE	4	0	0.0	3.5	12,216
FAYETTE	7	0	0.0	4.3	25,230
FRANKLIN	2	0	0.0	2.6	7,949
HUNTINGDON	1	0	0.0	14.2	2,558
INDIANA	2	0	0.0	10.2	5,703
JEFFERSON	2	0	0.0	5.6	4,566
LACKAWANNA	5	0	0.0	1.8	13,322
LANCASTER	1	0	0.0	6.5	4,387
LAWRENCE	2	0	0.0	4.4	4,200
LEBANON	1	0	0.0	6.3	3,107
LEHIGH	1	0	0.0	3.4	3,974
LUZERNE	7	0	0.0	3.3	21,062
LYCOMING	1	0	0.0	6.8	2,052
MCKEAN	3	0	0.0	6.0	7,832
MERCER	5	0	0.0	1.8	16,686
MONTGOMERY	1	0	0.0	1.3	2,069
NORTHAMPTON	3	0	0.0	2.6	10,073
NORTHUMBERLAND	4	0	0.0	11.2	9,087
PHILADELPHIA	16	0	0.0	1.0	61,873
SCHUYLKILL	3	0	0.0	2.2	11,194
SOMERSET	4	0	0.0	8.4	11,454
SUSQUEHANNA	1	0	0.0	5.8	1,612
TIOGA	1	0	0.0	0.2	3,459
UNION	2	0	0.0	2.6	9,762
VENANGO	1	0	0.0	8.1	4,785
WARREN	3	0	0.0	6.9	8,041
WASHINGTON	3	0	0.0	2.3	5,008
WAYNE	1	0	0.0	9.4	2,445
WESTMORELAND	11	0	0.0	2.9	25,093
YORK	4	0	0.0	1.8	11,199

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)

Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

# Create county-level access map
library(ggplot2)
library(scales)
library(dplyr)


pa_counties_map <- pa_counties_proj %>%
  left_join(
    priority_counties %>%
      st_drop_geometry() %>% 
      select(County, `Underserved (%)`),
    by = c("COUNTY_NAM" = "County")
  )

ggplot() +
  geom_sf(
    data = pa_counties_map,
    aes(fill = `Underserved (%)`),
    color = "white", size = 0.25
  ) +
  geom_sf(
    data = hospitals,
    shape = 21, fill = "white", color = "black", size = 2, alpha = 0.8
  ) +
  scale_fill_viridis_c(
    option = "Viridis",
    name = "% Underserved (of vulnerable tracts)",
    labels = function(x) paste0(x, "%"),
    na.value = "grey90",
    limits = c(0, 100)
  ) +
  labs(
    title = "Healthcare Access Challenges in Pennsylvania",
    subtitle = "Percent of vulnerable census tracts that are underserved (>15 miles from nearest hospital)",
    caption = "Sources: ACS 2022 5-year (tidycensus), lecture hospital data, county boundaries"
  ) +
  theme_void()

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels

Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map
vulnerable_tracts_proj <- vulnerable_tracts_proj %>%
  mutate(
    vulnerable= median_income <= income_threshold &
                elderly_share >= elderly_threshold
  ) %>%
  mutate(
    tract_status = case_when(
      vulnerable & underserved ~ "Underserved & Vulnerable",
      vulnerable & !underserved ~ "Served & Vulnerable",
      TRUE ~ "Other Tracts"
  )
)

  
ggplot()+
  geom_sf(data = pa_counties_proj, fill = "grey50", color = "white", size = 0.3) +
  geom_sf(
    data = vulnerable_tracts_proj,
    aes(fill = tract_status),
    color = NA,
    alpha = 0.8
  ) +
  geom_sf(
     data = hospitals, shape = 21, fill = "white", color = "black", size = 2, alpha = 0.8
  ) +
  scale_fill_manual(
    name = "Tract Status",
    values = c(
      "Underserved & Vulnerable" = "#d7191c",
      "Served & Vulnerable" = "#fdae61",
      "Other Tracts" = "#cccccc" 
    )
  ) +
  labs(
    title = "Underserved Vulnerable Tracts in Pennsylvania",
    subtitle = "Tracts identified as both socially vulnerable and >15 miles from nearest hospital",
    caption = "Sources: ACS 2022 5-year (tidycensus), hospital data, county boundaries"
  ) +
  theme_void()

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle

Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization
ggplot(vulnerable_tracts_proj, aes(x = distance_to_hospital_miles)) +
  geom_histogram(bins = 20, fill = "#4575b4", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 15, color = "#d7191c", linetype = "dashed", size = 1) +
  labs(
    title = "Distribution of Distance to Nearest Hospital for Vulnerable Tracts",
    x = "Distance to Nearest Hospital (miles)",
    y = "Number of Vulnerable Tracts",
    caption = "Dashed line indicates 15-mile threshold for underserved classification"
  ) +
  theme_minimal(base_size = 12)

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)

Part 3: Bring Your Own Data Analysis

Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment — Your Task:

Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics

# Load your additional dataset
library(sf)
parks <- st_read("PPR_Program_Sites.geojson")

Reading layer `PPR_Program_Sites' from data source 
  `D:\MUSA5080_Github\portfolio-setup-sihan-yu429\assignments\assignments2\PPR_Program_Sites.geojson' 
  using driver `GeoJSON'
Simple feature collection with 171 features and 10 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -75.2563 ymin: 39.90444 xmax: -74.96944 ymax: 40.12284
Geodetic CRS:  WGS 84

parks <- st_transform(parks, crs = 3857)  # Web Mercator


vars <- c(
  median_income = "B19013_001",
  total_pop = "B02001_001",
  white_pop = "B02001_002"
)

phl_tracts <- get_acs(
  geography = "tract",
  variables = vars,
  state = "PA",
  county = "Philadelphia",
  geometry = TRUE,   
  year = 2023,
  progress_bar = FALSE
) %>%
  select(GEOID, NAME, variable, estimate, geometry) %>%
  tidyr::pivot_wider(names_from = variable, values_from = estimate) %>%
  mutate(
    minority_share = (total_pop - white_pop) / total_pop * 100
  )

head(phl_tracts)

Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -75.24686 ymin: 39.91545 xmax: -75.15233 ymax: 39.94983
Geodetic CRS:  NAD83
# A tibble: 6 × 7
  GEOID       NAME                    geometry total_pop white_pop median_income
  <chr>       <chr>         <MULTIPOLYGON [°]>     <dbl>     <dbl>         <dbl>
1 42101001500 Censu… (((-75.16558 39.94366, -…      3027      2213        108378
2 42101001800 Censu… (((-75.1662 39.94081, -7…      3285      2425        121719
3 42101002802 Censu… (((-75.16735 39.92658, -…      5868      3525         94427
4 42101004001 Censu… (((-75.17002 39.92314, -…      4118      3165         82258
5 42101006300 Censu… (((-75.24686 39.91876, -…      4380        99         32997
6 42101007700 Censu… (((-75.21292 39.94906, -…      1815      1038         60817
# ℹ 1 more variable: minority_share <dbl>

summary(trees)

     Girth           Height       Volume     
 Min.   : 8.30   Min.   :63   Min.   :10.20  
 1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
 Median :12.90   Median :76   Median :24.20  
 Mean   :13.25   Mean   :76   Mean   :30.17  
 3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
 Max.   :20.60   Max.   :87   Max.   :77.00

Questions to answer:
- What dataset did you choose and why?
I chose the Philadelphia parks dataset because public parks represent accessible green space for residents. Analyzing their distribution helps assess equity in green space access across different neighborhoods.
- What is the data source and date?
The dataset comes from OpenDataPhilly, specifically the “PPR Program Sites” dataset, most recently updated in 2023.
- How many features does it contain?
The dataset contains 171 park features.
- What CRS is it in? Did you need to transform it?
The parks dataset was originally in WGS84 (EPSG:4326). For consistency with other spatial layers and to facilitate distance and area calculations, I transformed it to EPSG:3857 (Web Mercator projection). —

Pose a research question

Do low-income and minority neighborhoods in Philadelphia have equitable access to green spaces within a 10-minute walk distance?

Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis
# buffer a 0.5 mile distance from the park polygon ~10min
parks_buffer <- st_buffer(parks, dist = 800)
parks_buffer <- st_transform(parks_buffer, st_crs(phl_tracts))



#  Spatial join to identify tracts within 0.5 miles of any park 
census_access <- phl_tracts %>%
  mutate(has_access = lengths(st_intersects(geometry, parks_buffer)) > 0)

# Summary statistics 
access_summary <- census_access %>%
  group_by(income_group = case_when(
    median_income < median(median_income, na.rm = TRUE) ~ "Low Income",
    TRUE ~ "High Income"
  )) %>%
  summarise(
    total_tracts = n(),
    tracts_with_access = sum(has_access),
    pct_with_access = round(mean(has_access) * 100, 1)
  )

print(access_summary)

Simple feature collection with 2 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -75.28027 ymin: 39.86705 xmax: -74.95578 ymax: 40.13799
Geodetic CRS:  NAD83
# A tibble: 2 × 5
  income_group total_tracts tracts_with_access pct_with_access
  <chr>               <int>              <int>           <dbl>
1 High Income           218                190            87.2
2 Low Income            190                181            95.3
# ℹ 1 more variable: geometry <MULTIPOLYGON [°]>

ggplot() +
  geom_sf(data = census_access, aes(fill = median_income), color = "white", size = 0.1) +
  
  geom_sf(
    data = parks_buffer,
    fill = "#32CD32", alpha = 0.25, color = NA
  ) +
  
  geom_sf(
    data = parks,
    color = "#228B22", size = 1.2
  ) +

 scale_fill_gradientn(
  name = "Median Household Income ($)",
  colors = c("#4575b4", "#91bfdb", "#fdae61","orange", "#d73027")  # 蓝→浅蓝→浅橙→橙
  ) +
  
  labs(
    title = "Philadelphia Census Tracts by Income and Park Accessibility",
    subtitle = "Red shaded areas represent 500 m park buffer zones",
    caption = "Data: OpenDataPhilly Parks, ACS 2021"
  ) +
  
  theme_minimal()

Your interpretation:

The table shows that park accessibility is 87.2% in high-income areas and slightly higher at 95.3% in low-income areas, indicating that not all high-income tracts are near parks, though there is a cluster of parks in moderate-income areas in South Philly. Maps show that parks and housing are more concentrated in the city center and low-income areas, while high-income areas in the northwest, northeast, and south are larger but have more dispersed parks. This likely reflects land use and planning: low-density neighborhoods with private yards leave fewer public parks. Overall, park distribution appears influenced more by urban layout and planning than by income alone.

Finally - A few comments about your incorporation of feedback!

I really appreciated the detailed feedback on my first assignment. It suggested that I clean up unnecessary content and remove figures that didn’t need to be displayed. Following this advice, I tried to clean my Markdown document this time to try to make it look much cleaner and more professional. I hope to continue improving in this way with future assignments.