Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Ryan Drake

Published

October 14, 2025

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences

Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages
library(sf)
library(tidyverse)
library(tigris)
library(tidycensus)
library(scales)
library(patchwork)
library(here)
library(ggplot2)
# Load spatial data

census_api_key("42bf8a20a3df1def380f330cf7edad0dd5842ce6")

pa_counties <- st_read(here("Assignments/assignment_02/data/Pennsylvania_County_Boundaries.shp"))

Reading layer `Pennsylvania_County_Boundaries' from data source 
  `C:\Users\rydra\OneDrive - PennO365\CPLN 5080 Public Policy\portfolio-setup-rdrake251\Assignments\assignment_02\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator

districts <- st_read(here("Assignments/assignment_02/data/districts.geojson"))

Reading layer `U.S._Congressional_Districts_for_Pennsylvania' from data source 
  `C:\Users\rydra\OneDrive - PennO365\CPLN 5080 Public Policy\portfolio-setup-rdrake251\Assignments\assignment_02\data\districts.geojson' 
  using driver `GeoJSON'
Simple feature collection with 17 features and 8 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -80.51939 ymin: 39.71986 xmax: -74.68956 ymax: 42.26935
Geodetic CRS:  WGS 84

hospitals <- st_read(here("Assignments/assignment_02/data/hospitals.geojson"))

Reading layer `hospitals' from data source 
  `C:\Users\rydra\OneDrive - PennO365\CPLN 5080 Public Policy\portfolio-setup-rdrake251\Assignments\assignment_02\data\hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84

metro_areas <- core_based_statistical_areas(cb = TRUE)


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%

# Check that all data loaded correctly
hospitals <- hospitals %>%
  st_transform(st_crs(pa_counties))

library(tigris)
options(tigris_use_cache = TRUE)
census_tracts <- tracts(state = "PA", cb = TRUE, year = 2024)

#make crs match I'll review this later!
census_tracts <- census_tracts %>%
  st_transform(st_crs(pa_counties))

st_crs(pa_counties)

Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

st_crs(hospitals)

Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

st_crs(census_tracts)

Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

st_crs(districts)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

st_crs(metro_areas)

Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

Questions to answer: - How many hospitals are in your dataset? - There are 223 observations of hospitals in the dataset. - How many census tracts? - There are 3445 census tracts observed. - What coordinate reference system is each dataset in? - They are all in WGS 84. Except Metro which is is NAD83.

Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
# Define the ACS year and dataset
year <- 2021
survey <- "acs5"

# Get tract-level demographic data for Pennsylvania
pa_demo <- get_acs(
  geography = "tract",
  state = "PA",
  year = year,
  survey = survey,
  variables = c(
    total_pop = "B01003_001",         # Total population
    median_income = "B19013_001",     # Median household income (dollars)
    age_65_66 = "B01001_020",         # Male 65-66
    age_67_69 = "B01001_021",         # Male 67-69
    age_70_74 = "B01001_022",         # Male 70-74
    age_75_79 = "B01001_023",         # Male 75-79
    age_80_84 = "B01001_024",         # Male 80-84
    age_85_up = "B01001_025",         # Male 85+
    f_age_65_66 = "B01001_044",       # Female 65-66
    f_age_67_69 = "B01001_045",       # Female 67-69
    f_age_70_74 = "B01001_046",       # Female 70-74
    f_age_75_79 = "B01001_047",       # Female 75-79
    f_age_80_84 = "B01001_048",       # Female 80-84
    f_age_85_up = "B01001_049"        # Female 85+
  ),
  geometry = FALSE  # Set TRUE later if you want spatial data
)
pa_demo_summary <- pa_demo %>%
  select(GEOID, variable, estimate) %>%
  tidyr::pivot_wider(names_from = variable, values_from = estimate) %>%
  mutate(
    pop_65_over = age_65_66 + age_67_69 + age_70_74 +
                  age_75_79 + age_80_84 + age_85_up +
                  f_age_65_66 + f_age_67_69 + f_age_70_74 +
                  f_age_75_79 + f_age_80_84 + f_age_85_up
  ) %>%
  select(GEOID, total_pop, median_income, pop_65_over)

head(pa_demo_summary)

# A tibble: 6 × 4
  GEOID       total_pop median_income pop_65_over
  <chr>           <dbl>         <dbl>       <dbl>
1 42001030101      2649         73233         534
2 42001030103      2268        103438         394
3 42001030104      3513         68984         556
4 42001030200      5545         60438         928
5 42001030300      4364         80614         670
6 42001030400      5437         83173        1084

# Join to tract boundaries
pa_tracts <- tracts(state = "PA", cb = TRUE, year = 2024)
pa_demo_joined <- pa_tracts %>%
  left_join(pa_demo_summary, by = "GEOID")

pa_demo_joined

Simple feature collection with 3445 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -80.51989 ymin: 39.7198 xmax: -74.68952 ymax: 42.26986
Geodetic CRS:  NAD83
First 10 features:
   STATEFP COUNTYFP TRACTCE              GEOIDFQ       GEOID   NAME
1       42      001  031101 1400000US42001031101 42001031101 311.01
2       42      013  100400 1400000US42013100400 42013100400   1004
3       42      013  100500 1400000US42013100500 42013100500   1005
4       42      013  100800 1400000US42013100800 42013100800   1008
5       42      013  101900 1400000US42013101900 42013101900   1019
6       42      011  011200 1400000US42011011200 42011011200    112
7       42      011  000200 1400000US42011000200 42011000200      2
8       42      011  011500 1400000US42011011500 42011011500    115
9       42      011  000600 1400000US42011000600 42011000600      6
10      42      011  001900 1400000US42011001900 42011001900     19
              NAMELSAD STUSPS   NAMELSADCO   STATE_NAME LSAD   ALAND AWATER
1  Census Tract 311.01     PA Adams County Pennsylvania   CT 3043185      0
2    Census Tract 1004     PA Blair County Pennsylvania   CT  993724      0
3    Census Tract 1005     PA Blair County Pennsylvania   CT 1130204      0
4    Census Tract 1008     PA Blair County Pennsylvania   CT  996553      0
5    Census Tract 1019     PA Blair County Pennsylvania   CT  573726      0
6     Census Tract 112     PA Berks County Pennsylvania   CT 1539365   9308
7       Census Tract 2     PA Berks County Pennsylvania   CT 1949529 159015
8     Census Tract 115     PA Berks County Pennsylvania   CT 1978380  12469
9       Census Tract 6     PA Berks County Pennsylvania   CT 1460473      0
10     Census Tract 19     PA Berks County Pennsylvania   CT  182420      0
   total_pop median_income pop_65_over                       geometry
1       5124         56234        1343 MULTIPOLYGON (((-77.03108 3...
2       1504         56979         243 MULTIPOLYGON (((-78.42478 4...
3       3381         46169         457 MULTIPOLYGON (((-78.41661 4...
4       1817         59453         256 MULTIPOLYGON (((-78.41067 4...
5       1515         15304         593 MULTIPOLYGON (((-78.40836 4...
6       4496         50978         667 MULTIPOLYGON (((-75.95433 4...
7       4392         21933         627 MULTIPOLYGON (((-75.96071 4...
8       2934         75027         449 MULTIPOLYGON (((-75.99913 4...
9       3766         80147         474 MULTIPOLYGON (((-75.91528 4...
10      1833         19336         434 MULTIPOLYGON (((-75.91819 4...

plot(pa_demo_joined["median_income"])

Questions to answer: - What year of ACS data are you using? - I tried to use the most recent data so I used 2024. I tried 2025 first but kept getting a Census Bureau error since that data isn’t out yet obviously.I was just curious what it would do if I tried.

How many tracts have missing income data?

Count tracts with missing median household income

missing_income_count <- pa_demo_joined %>% summarize( n_missing_income = sum(is.na(median_income)), total_tracts = n(), percent_missing = 100 * n_missing_income / total_tracts )

missing_income_count

A total of 65 tracts have missing income.

What is the median income across all PA census tracts?

median of tracts median household incomes

median_income_pa <- pa_demo_joined %>% summarize( state_median_income = median(median_income, na.rm = TRUE) )

median_income_pa

The median income for all tracts is $65195.50.

Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
pa_demo_joined <- pa_demo_joined %>%
  mutate(
    pct_65_over = 100 * pop_65_over / total_pop
  )
income_threshold <- 40000
elderly_threshold <- 20

vulnerable_tracts <- pa_demo_joined %>%
  filter(
    median_income <= income_threshold,
    pct_65_over >= elderly_threshold
  )

summary(vulnerable_tracts$median_income)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  11817   24087   32745   30181   36273   39950

summary(vulnerable_tracts$pct_65_over)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  20.03   21.75   23.82   25.45   27.44   47.37

nrow(vulnerable_tracts)

[1] 95

Questions to answer: - What income threshold did you choose and why?

I didn't want to choose the poverty line because individuals (or families) who make just over can still be significantly impacted by income while not being eligible for benifits of programs. Though this is all of PA and not everywhere is as expensive as Philadelphia or Pittsburg but still thought it a fine threshhold.

What elderly population threshold did you choose and why?

This one I had to google a bit to find a suitable threshold. I found that setting a threshold of 20, where 20% of the population is aged 65+ is considered “significant elderly population.”
How many tracts meet your vulnerability criteria?

95 tracts in total.

What percentage of PA census tracts are considered vulnerable by your definition?

2.75%

Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

# Transform to appropriate projected CRS

#NAD83
pa_tracts_proj <- st_transform(pa_demo_joined, 26918)
vulnerable_centroids <- vulnerable_tracts %>%
  st_transform(26918) %>%
  st_centroid()

# Calculate distance from each tract centroid to nearest hospital

hospitals_proj <- st_transform(hospitals, 26918)

dist_matrix <- st_distance(vulnerable_centroids, hospitals_proj)

vulnerable_centroids <- vulnerable_centroids %>%
  mutate(
    dist_to_nearest_hospital_m = apply(dist_matrix, 1, min),
    dist_to_nearest_hospital_miles = dist_to_nearest_hospital_m * 0.000621371
  )

summary(vulnerable_centroids$dist_to_nearest_hospital_miles)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.05916  0.83180  1.53566  2.61855  3.23697 18.68254

head(vulnerable_centroids[, c("GEOID", "dist_to_nearest_hospital_miles")])

Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 83328 ymin: 4454655 xmax: 422165.5 ymax: 4524935
Projected CRS: NAD83 / UTM zone 18N
        GEOID dist_to_nearest_hospital_miles                 geometry
1 42013101900                      0.4153821 POINT (211771.5 4490666)
2 42011001900                      0.5534705 POINT (422165.5 4465276)
3 42097082100                      9.5041231   POINT (348805 4524935)
4 42129806700                      0.5526284 POINT (112714.9 4454655)
5 42003483800                      3.4073368    POINT (83328 4484553)
6 42003561500                      3.6908072 POINT (87030.29 4488021)

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts?

The average distance to the nearest hospital is 2.61 miles.

What is the maximum distance?

The maximum distance is 18.68 miles.

How many vulnerable tracts are more than 15 miles from the nearest hospital?

vulnerable_far <- vulnerable_centroids %>% filter(dist_to_nearest_hospital_miles > 15)

summary_far <- vulnerable_far %>% summarize( n_far = n(), total_vulnerable = nrow(vulnerable_centroids), percent_far = 100 * n_far / total_vulnerable )

summary_far

There is only one tract that is more than 15 miles away from the nearest hospital.

Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable

vulnerable_centroids <- vulnerable_centroids %>%
  mutate(
    underserved = dist_to_nearest_hospital_miles > 15
  )

underserved_summary <- vulnerable_centroids %>%
  summarize(
    n_underserved = sum(underserved, na.rm = TRUE),
    total_vulnerable = n(),
    percent_underserved = 100 * n_underserved / total_vulnerable
  )

underserved_summary

Simple feature collection with 1 feature and 3 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 40154.66 ymin: 4413225 xmax: 488176.7 ymax: 4676298
Projected CRS: NAD83 / UTM zone 18N
  n_underserved total_vulnerable percent_underserved
1             1               95            1.052632
                        geometry
1 MULTIPOINT ((40154.66 45766...

Questions to answer: - How many tracts are underserved?

There is still jsut one underserved tract.

What percentage of vulnerable tracts are underserved?

1%
Does this surprise you? Why or why not?

Yes, I thought there would be more in more rural areas of PA.

Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

# Spatial join tracts to counties
vulnerable_centroids <- st_transform(vulnerable_centroids, st_crs(pa_counties))
pa_counties <- st_transform(pa_counties, st_crs(vulnerable_centroids))

vulnerable_with_county <- st_join(vulnerable_centroids, pa_counties, join = st_within)

# Aggregate statistics by county

county_stats <- vulnerable_with_county %>%
  st_drop_geometry() %>%
  group_by(county_name = NAMELSADCO) %>% 
  summarize(
    n_vulnerable_tracts = n(),
    n_underserved_tracts = sum(underserved, na.rm = TRUE),
    pct_underserved = 100 * n_underserved_tracts / n_vulnerable_tracts,
    avg_dist_miles = mean(dist_to_nearest_hospital_miles, na.rm = TRUE),
    total_vulnerable_pop = sum(total_pop, na.rm = TRUE)
  ) %>%
  arrange(desc(n_underserved_tracts))
head(county_stats)

# A tibble: 6 × 6
  county_name      n_vulnerable_tracts n_underserved_tracts pct_underserved
  <chr>                          <int>                <int>           <dbl>
1 Cameron County                     1                    1             100
2 Allegheny County                  24                    0               0
3 Beaver County                      2                    0               0
4 Berks County                       1                    0               0
5 Blair County                       2                    0               0
6 Cambria County                     6                    0               0
# ℹ 2 more variables: avg_dist_miles <dbl>, total_vulnerable_pop <dbl>

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts?

 Cameron County, 100%

Which counties have the most vulnerable people living far from hospitals?

Allegheny, Cambria, Beaver, Vlair, Berks, and Cameron
Are there any patterns in where underserved counties are located?

Mostly they are rural counties except for Allegheny.

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

# Create and format priority counties table

county_stats <- county_stats %>%
  mutate(
    pct_vulnerable = 100 * n_vulnerable_tracts / sum(n_vulnerable_tracts)
  )

#Put Cameron on the top regardless of vulnerable rank
cameron <- county_stats %>%
  filter(county_name == "Cameron") %>%
  mutate(county_name = paste0("**", county_name, "**"))  
others <- county_stats %>%
  filter(county_name != "Cameron") %>%
  arrange(desc(pct_vulnerable))
priority_counties <- bind_rows(cameron, others)

#Select top 10 counties by % vulnerable tracts
priority_counties <- priority_counties %>%
  select(
    county_name,
    n_vulnerable_tracts,
    n_underserved_tracts,
    pct_vulnerable,
    avg_dist_miles,
    total_vulnerable_pop
  ) %>%
  mutate(
    pct_vulnerable = round(pct_vulnerable, 1),
    avg_dist_miles = round(avg_dist_miles, 1),
    total_vulnerable_pop = prettyNum(total_vulnerable_pop, big.mark = ",")
  )

#Display the table
library(knitr)
priority_counties_top10 <- priority_counties %>% slice_head(n = 10)
kable(priority_counties_top10,
      caption = "Top 10 Priority Counties for Healthcare Investment in Pennsylvania",
      col.names = c("County",
                    "Vulnerable Tracts",
                    "Underserved Tracts",
                    "% Vulnerable",
                    "Avg Distance to Hospital (miles)",
                    "Total Vulnerable Population"),
      align = c("l","r","r","r","r","r"),
      format = "html")

Top 10 Priority Counties for Healthcare Investment in Pennsylvania
County	Vulnerable Tracts	% Vulnerable	Avg Distance to Hospital (miles)	Total Vulnerable Population
Allegheny County	24	25.3	2.3	52,636
Philadelphia County	14	14.7	0.9	50,632
Westmoreland County	8	8.4	3.3	15,663
Cambria County	6	6.3	2.9	11,572
Erie County	5	5.3	0.7	13,486
Northumberland County	4	4.2	11.1	11,428
Lackawanna County	3	3.2	1.2	8,023
Lawrence County	3	3.2	4.5	7,768
Luzerne County	3	3.2	2.5	7,750
Beaver County	2	2.1	2.5	5,723

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)

Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

# Create county-level access map
pa_counties <- pa_counties %>%
  rename(county_name = COUNTY_NAM)

pa_counties_map <- pa_counties %>%
  left_join(county_stats %>% 
              select(county_name, pct_underserved),
            by = c("county_name" = "county_name"))

hospitals_proj <- st_transform(hospitals, st_crs(pa_counties_map))

ggplot() +
  geom_sf(data = pa_counties_map, aes(fill = pct_underserved), color = "gray60", size = 0.3) +
  geom_sf(data = hospitals_proj, color = "red", size = 2, shape = 21, fill = "white") +
  scale_fill_viridis_c(
    option = "plasma",
    direction = -1,
    na.value = "white",
    labels = scales::percent_format(scale = 1),
    name = "% Vulnerable Tracts\nUnderserved"
  ) +
  theme_void() +
  labs(
    title = "Healthcare Access Challenges in Pennsylvania",
    subtitle = "County-level % of vulnerable tracts that are underserved",
    caption = "Red points indicate hospital locations"
  ) +
  theme(
    legend.position = "right",
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 10),
    legend.title = element_text(size = 11),
    legend.text = element_text(size = 10)
  )

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels

Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map
vulnerable_tracts_proj <- st_transform(vulnerable_tracts, st_crs(pa_counties))
pa_counties_proj <- st_transform(pa_counties, st_crs(vulnerable_tracts))
hospitals_proj <- st_transform(hospitals, st_crs(vulnerable_tracts))
vulnerable_tracts_proj <- st_transform(vulnerable_tracts, st_crs(pa_counties))

underserved_summary <- underserved_summary %>%
  rename(underserved = n_underserved)

ulnerable_tracts_proj <- vulnerable_tracts_proj %>%
  left_join(
    vulnerable_centroids %>% st_drop_geometry() %>% select(GEOID, underserved),
    by = "GEOID"
  )

underserved_tracts <- ulnerable_tracts_proj %>%
  filter(underserved == TRUE)

ggplot() +
  # County boundaries (subtle gray)
  geom_sf(data = pa_counties_proj, fill = NA, color = "gray50", size = 0.4) +
  
  # All vulnerable tracts (light fill, low alpha)
  geom_sf(data = ulnerable_tracts_proj, fill = "lightblue", color = NA, alpha = 0.3) +
  
  # Underserved vulnerable tracts (stand out)
  geom_sf(data = underserved_tracts, fill = "red", color = NA, alpha = 0.7) +
  
  # Hospital locations
  geom_sf(data = hospitals_proj, shape = 21, color = "black", fill = "white", size = 2) +
  
  # Titles and labels
  labs(
    title = "Underserved Vulnerable Census Tracts in Pennsylvania",
    subtitle = "Red tracts are vulnerable and more than 15 miles from the nearest hospital",
    caption = "County boundaries shown in gray; hospitals shown as white circles with black border"
  ) +
  
  # Clean map
  theme_void() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 10)
  )

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle

Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization
vulnerable_distances <- vulnerable_centroids %>%
  st_drop_geometry() %>%
  filter(!is.na(dist_to_nearest_hospital_miles))

hist_plot <- ggplot(vulnerable_distances, aes(x = dist_to_nearest_hospital_miles)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "gray40", alpha = 0.8) +
  geom_vline(xintercept = 15, color = "red", linetype = "dashed", size = 1) +
  labs(
    title = "Distribution of Distances to Nearest Hospital for Vulnerable Populations",
    subtitle = "Red dashed line indicates 15-mile underserved threshold",
    x = "Distance to Nearest Hospital (miles)",
    y = "Number of Vulnerable Tracts",
    caption = "Most vulnerable tracts are within 15 miles, but several are farther, indicating potential access gaps."
  ) +
  theme_minimal(base_size = 12) +
  scale_x_continuous(breaks = seq(0, max(vulnerable_distances$dist_to_nearest_hospital_miles), by = 5)) +
  scale_y_continuous(labels = comma)
top_counties <- vulnerable_distances %>%
  group_by(NAMELSADCO) %>%
  summarise(total_vulnerable_pop = sum(total_pop, na.rm = TRUE)) %>%
  arrange(desc(total_vulnerable_pop)) %>%
  slice_head(n = 10) %>%
  pull(NAMELSADCO)

box_plot <- vulnerable_distances %>%
  filter(NAMELSADCO %in% top_counties) %>%
  ggplot(aes(x = reorder(NAMELSADCO, dist_to_nearest_hospital_miles, median), 
             y = dist_to_nearest_hospital_miles)) +
  geom_boxplot(fill = "lightblue", color = "gray40", alpha = 0.8) +
  geom_hline(yintercept = 15, linetype = "dashed", color = "red", size = 0.8) +
  coord_flip() +
  labs(
    title = "Distance to Nearest Hospital by Top 10 Counties (Vulnerable Population)",
    x = "County",
    y = "Distance to Nearest Hospital (miles)",
    caption = "Red dashed line marks the 15-mile threshold for underserved tracts."
  ) +
  theme_minimal(base_size = 12)
  

hist_plot

# To display box plot instead
box_plot

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)

Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question

Education & Youth Services

Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting

Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement

Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment —

Public Safety & Justice

Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies —

Infrastructure & Services

Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance

Health & Wellness

Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention

Emergency Services

Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions

Arts & Culture

Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity

Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Recommended Starting Points

If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics

If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations

Your Analysis

Your Task:

Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics

# Load your additional dataset

state_parks <- st_read(here("Assignments/assignment_02/data/DCNR_StateParks202503.geojson"))

Reading layer `DCNR_StateParks202503' from data source 
  `C:\Users\rydra\OneDrive - PennO365\CPLN 5080 Public Policy\portfolio-setup-rdrake251\Assignments\assignment_02\data\DCNR_StateParks202503.geojson' 
  using driver `GeoJSON'
Simple feature collection with 124 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -80.51957 ymin: 39.72204 xmax: -74.77666 ymax: 42.17311
Geodetic CRS:  WGS 84

state_parks <- st_transform(state_parks, crs = 2272)

st_crs(state_parks)

Coordinate Reference System:
  User input: EPSG:2272 
  wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39.3333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-77.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",1968500,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
        BBOX[39.71,-80.53,41.18,-74.72]],
    ID["EPSG",2272]]

Questions to answer: - What dataset did you choose and why? I chose the DCNR State Parks data set to see state park acess and vizualize distribution per captia.

What is the data source and date? DCNR and 2025.
How many features does it contain? It contains 12 features.
What CRS is it in? Did you need to transform it?

It is in NAD83, I think this is fine for PA.

Pose a research question

What is the state park acreage per capita by county in Pennsylvania? Which counties have the most state park acerages? Where are there Park and Ride locations in the state which are usually a way for underserved populations to access state parks?

Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis
state_parks <- st_read(here("Assignments/assignment_02/data/DCNR_StateParks202503.geojson"))

Reading layer `DCNR_StateParks202503' from data source 
  `C:\Users\rydra\OneDrive - PennO365\CPLN 5080 Public Policy\portfolio-setup-rdrake251\Assignments\assignment_02\data\DCNR_StateParks202503.geojson' 
  using driver `GeoJSON'
Simple feature collection with 124 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -80.51957 ymin: 39.72204 xmax: -74.77666 ymax: 42.17311
Geodetic CRS:  WGS 84

state_parks <- st_transform(state_parks, crs = 2272)

#get acreage 
state_parks <- state_parks %>%
  mutate(area_sqft = st_area(geometry),
         area_acres = as.numeric(area_sqft) / 43560)

pa_counties <- st_transform(pa_counties, st_crs(state_parks))

parks_by_county <- st_intersection(state_parks, pa_counties) %>%
  st_drop_geometry() %>%
  group_by(county_name) %>%
  summarise(total_park_acres = sum(area_acres, na.rm = TRUE))

#total poulation by county
pa_population <- get_acs(
  geography = "county",
  variables = "B01003_001",  # total population
  state = "PA",
  year = 2023,               # or most recent available
  geometry = FALSE           # set TRUE if you want sf object
) %>%
  select(county_name = NAME, total_population = estimate)

pa_population <- pa_population %>%
  mutate(
    county_name = str_replace(county_name, ", Pennsylvania", ""),  # remove state
    county_name = str_replace(county_name, " County", ""),         # remove "County"
    county_name = str_trim(county_name)                             # remove extra spaces
  )
parks_by_county <- parks_by_county %>%
  mutate(county_name = str_to_lower(county_name))

pa_population <- pa_population %>%
  mutate(county_name = str_to_lower(county_name))

pa_counties <- pa_counties %>%
  mutate(county_name = str_to_lower(county_name))


parks_per_capita <- parks_by_county %>%
  left_join(pa_population %>% select(county_name, total_population), by = "county_name") %>%
  mutate(acres_per_capita = total_park_acres / total_population)

parks_per_capita_sf <- pa_counties %>%
  left_join(parks_per_capita, by = "county_name")

#map parks per capita
pa_counties_map_two <- parks_per_capita_sf

library(scales)
#map
ggplot(pa_counties_map_two) +
  geom_sf(aes(fill = acres_per_capita), color = "grey", size = 0.3) +
  # Gradient coloring
  scale_fill_viridis_c(
    option = "plasma",      # vibrant color palette
    direction = -1,          # high values = darker/brighter
    na.value = "white",
    labels = comma,
    name = "Park Acres per Capita"
  ) +
  
  # Titles and caption
  labs(
    title = "State Park Acres per Capita by County in Pennsylvania",
    subtitle = "Higher values indicate more park land per person",
    caption = "Data: PA State Parks & ACS County Population"
  ) +
  
  # Clean theme
  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 10),
    legend.title = element_text(size = 11),
    legend.text = element_text(size = 10)
  )

park_ride <- st_read(here("Assignments/assignment_02/data/Park_and_Ride_Locations_Current_Transportation_20251014.geojson"))

Reading layer `Park_and_Ride_Locations_Current_Transportation_20251014' from data source `C:\Users\rydra\OneDrive - PennO365\CPLN 5080 Public Policy\portfolio-setup-rdrake251\Assignments\assignment_02\data\Park_and_Ride_Locations_Current_Transportation_20251014.geojson' 
  using driver `GeoJSON'
Simple feature collection with 97 features and 28 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.43895 ymin: 39.7517 xmax: -74.85457 ymax: 42.1107
Geodetic CRS:  WGS 84

park_ride <- st_transform(park_ride, crs = 2272)

park_ride_sf <- st_as_sf(park_ride, coords = c("Longitude", "Latitude"), crs = 4326)

park_ride_sf <- park_ride_sf %>%
  rename(code = dot_county_code)

parks_per_capita_sf <- parks_per_capita_sf %>%
  rename(code = COUNTY_N_1)

joined_sf <- st_join(park_ride_sf, parks_per_capita_sf, by = "code")

pa_counties_map_three <- joined_sf

library(scales)
#map
ggplot(pa_counties_map_three) +
  geom_sf(data = pa_counties, fill = "gray90", color = "white") +
  geom_sf(data = joined_sf, color = "blue", size = 2) +
  theme_minimal() +
  
  geom_sf(aes(fill = acres_per_capita), color = "grey", size = 0.3) +
  # Gradient coloring
  scale_fill_viridis_c(
    option = "plasma",      # vibrant color palette
    direction = -1,          # high values = darker/brighter
    na.value = "white",
    labels = comma,
    name = "Park Acres per Capita"
  ) +
  
  geom_sf(color = "blue", size = 2) +
  labs(
    title = "Park-and-Ride Locations in Pennsylvania",
    subtitle = "Pennsylvania Open Data Portal",
    caption = "Data: PA Open Data Portal"
  ) +
  theme_minimal()

  # Titles and caption
  labs(
    title = "Park and Ride Locations and State Park Acres per Capita by County in Pennsylvania",
    subtitle = "Higher values indicate more park land per person",
    caption = "Data: PA State Parks & ACS County Population"
  ) +

  # Clean theme
  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 10),
    legend.title = element_text(size = 11),
    legend.text = element_text(size = 10)
  )

NULL

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation:

[I was curious to see the distribution of state park land per capita in Pennsylvania. Surprisingly, it is pretty evenly distributed. With just a few counties missing data.The park and ride locations however are not as evenly distributed. This would be a policy recommendation to continue to look into distribution for access sake. (I could not get them on the same map!) ]

Finally - A few comments about your incorporation of feedback!

Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.

I am still confused if we are supposed to keep much of the instructions for the website version? It is a really long document and if this is for a portfolio it would need to be heartily cleaned up and presentable? How much of this do we do?

Submission Requirements

What to submit:

Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
- Use embed-resources: true in YAML so it’s a single file
- All code should run without errors
- All maps and charts should display correctly

File naming: LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd