# Join to tract boundariestracts_pa <-tracts(state ="PA", year =2023, cb =TRUE, class ="sf")tracts_demog <- tracts_pa %>%left_join(acs_tract, by ="GEOID")names(tracts_demog)
[We defined low-income tracts as those with a median household income below $72,943.5, which is the statewide median income across all Pennsylvania census tracts. This threshold was chosen because the median reflects the central tendency of income distribution and aligns with a relative, within-state definition of economic disadvantage.]
What elderly population threshold did you choose and why?
[We defined elderly-heavy tracts as those with more than 23.5% of residents aged 65 and older, corresponding to the 75th percentile of the elderly population distribution. This threshold identifies tracts where the elderly population is significantly higher than typical statewide levels, focusing attention on areas with concentrated aging populations.]
How many tracts meet your vulnerability criteria?
[422 tracts meet both criteria (low income and high elderly population).]
What percentage of PA census tracts are considered vulnerable by your definition?
[Out of 3,445 tracts statewide, about 12.2% (422 / 3,445 × 100) are classified as vulnerable under our definition.]
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
Your Task:
# Transform to appropriate projected CRStarget_crs <-5070vulnerable_p <-st_transform(vulnerable_tracts, target_crs)hospitals_p <-st_transform(hospitals, target_crs)vul_centroids <-st_centroid(vulnerable_p)nearest_idx <-st_nearest_feature(vul_centroids, hospitals_p)nearest_dist_m <-st_distance(vul_centroids, hospitals_p[nearest_idx, ], by_element =TRUE)dist_miles <-set_units(nearest_dist_m, "mi")dist_miles_num <-as.numeric(dist_miles)# Calculate distance from each tract centroid to nearest hospitalvulnerable_with_dist <- vulnerable_p %>%mutate(nearest_hosp_id = nearest_idx,dist_miles = dist_miles_num )summary(vulnerable_with_dist$dist_km)
What percentage of vulnerable tracts are underserved?
[5.9%]
Does this surprise you? Why or why not?
[No — because healthcare facilities are concentrated in urban regions, while the underserved tracts are primarily rural and sparsely populated, though small in number, they represent significant accessibility challenges.]
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Percentage of vulnerable tracts that are underserved
Average distance to nearest hospital for vulnerable tracts
Total vulnerable population
Questions to answer:
Which 5 counties have the highest percentage of underserved vulnerable tracts?
[Sullivan County, Cameron County, Forest County, Clearfield County, Pike County]
Which counties have the most vulnerable people living far from hospitals?
[Huntingdon County, Union County, Dauphin County, Clearfield County, Pike County]
Are there any patterns in where underserved counties are located?
[Yes. Underserved counties are clustered in rural and mountainous regions of northern and western Pennsylvania, such as Sullivan, Cameron, Forest, and Clearfield Counties. These areas tend to have low population density and aging populations but few medical facilities, reflecting a classic rural accessibility gap. In contrast, eastern metropolitan counties around Philadelphia and the Harrisburg corridor have dense hospital networks and shorter average distances to care.]
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
Show underserved vulnerable tracts in a contrasting color
Include county boundaries for context
Show hospital locations
Use appropriate visual hierarchy (what should stand out?)
Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create distribution visualizationp_hist <-ggplot(vulnerable_with_dist, aes(x = dist_miles)) +geom_histogram(aes(y =after_stat(density)), bins =30, fill ="#D55E00", color ="white") +geom_density(linewidth =1, alpha =0.6) +labs(title ="Distribution of Distance to Nearest Hospital (Vulnerable Tracts)",x ="Distance to nearest hospital (miles)",y ="Density",caption ="Most tracts are located within 5–10 miles of a hospital, but the distribution is right-skewed, indicating that while access is generally good for the majority, a small number of tracts are much farther away — exceeding 15 miles, and in some extreme cases up to nearly 30 miles.") +theme_minimal(base_size =12)p_hist
Suggested chart types:
Histogram or density plot of distances
Box plot comparing distances across regions
Bar chart of underserved tracts by county
Scatter plot of distance vs. vulnerable population size
Requirements:
Clear axes labels with units
Appropriate title
Professional formatting
Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Challenge Options
Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).
Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question
Education & Youth Services
Option A: Educational Desert Analysis
Data: Schools, Libraries, Recreation Centers, Census tracts (child population)
Question: “Which neighborhoods lack adequate educational infrastructure for children?”
Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density
Policy relevance: School district planning, library placement, after-school program siting
Option B: School Safety Zones
Data: Schools, Crime Incidents, Bike Network
Question: “Are school zones safe for walking/biking, or are they crime hotspots?”
Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage
Policy relevance: Safe Routes to School programs, crossing guard placement
Environmental Justice
Option C: Green Space Equity
Data: Parks, Street Trees, Census tracts (race/income demographics)
Question: “Do low-income and minority neighborhoods have equitable access to green space?”
Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics
Question: “Are polling places accessible for elderly and disabled voters?”
Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access
Policy relevance: Voting rights, election infrastructure, ADA compliance
Health & Wellness
Option F: Recreation & Population Health
Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics)
Question: “Is lack of recreation access associated with vulnerable populations?”
Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators
Policy relevance: Public health investment, recreation programming, obesity prevention
Emergency Services
Option G: EMS Response Coverage
Data: Fire Stations, EMS stations, Population density, High-rise buildings
Question: “Are population-dense areas adequately covered by emergency services?”
Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas
Policy relevance: Emergency preparedness, station siting decisions
Arts & Culture
Option H: Cultural Asset Distribution
Data: Public Art, Museums, Historic sites/markers, Neighborhoods
Question: “Do all neighborhoods have equitable access to cultural amenities?”
Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups
Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity
Data Sources
OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.
Additional Sources:
Pennsylvania Open Data: https://data.pa.gov/
Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns
TIGER/Line (via tigris): Geographic boundaries
Recommended Starting Points
If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics
If you have a different idea: Propose your own question! Just make sure:
You can access the spatial data
You can perform at least 2 spatial operations
Your Analysis
Your Task:
Find and load additional data
Document your data source
Check and standardize the CRS
Provide basic summary statistics
# Load your additional datasetparks <-st_read("data/PPR_Program_Sites.geojson")
Reading layer `PPR_Program_Sites' from data source
`C:\Users\dell\Documents\GitHub\portfolio-setup-Isabelliiii\assignments\assignment2\data\PPR_Program_Sites.geojson'
using driver `GeoJSON'
Simple feature collection with 171 features and 10 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -75.2563 ymin: 39.90444 xmax: -74.96944 ymax: 40.12284
Geodetic CRS: WGS 84
Questions to answer:
What dataset did you choose and why?
[Dataset chosen: Philadelphia Parks & Recreation Program Sites]
Reason for choice: [This dataset was selected to analyze green space accessibility and recreation equity in Philadelphia. It represents public parks and recreation facilities which is a key indicator of environmental and community well-being. By examining their spatial distribution relative to neighborhood demographics, we can assess whether low-income communities have equitable access to recreational opportunities.]
What is the data source and date?
[Data were obtained from OpenDataPhilly – PPR Program Sites, published and maintained by Philadelphia Parks & Recreation. The dataset was accessed and downloaded in October 2025.]
How many features does it contain?
[- Number of features: 171 program sites - Geometry type: Point - Coordinate Reference System (CRS): WGS 84 (EPSG:4326) - Extent: Longitude -75.2563 to -74.9694, Latitude 39.9044 to 40.1228]
What CRS is it in? Did you need to transform it?
[The dataset was originally in WGS 84 (geographic coordinates), which is not ideal for distance-based analysis. It was reprojected to EPSG:5070 (NAD83 / Conus Albers) for accurate spatial measurement and buffering operations.]
parks <-st_transform(parks, 5070)
Pose a research question
Write a clear research statement that your analysis will answer.
Examples:
“Do vulnerable tracts have adequate public transit access to hospitals?”
“Are EMS stations appropriately located near vulnerable populations?”
“Do areas with low vehicle access have worse hospital access?”
Research Question:
Do low-income neighborhoods in Philadelphia have less access to parks and recreation facilities compared to higher-income areas?
Explanation:
This question investigates whether green space accessibility is distributed equitably across socioeconomic groups in Philadelphia. By analyzing the spatial relationship between parks and recreation sites and census tract median income, this study aims to identify whether environmental and recreational resources are disproportionately concentrated in wealthier neighborhoods which is an important issue for urban equity and public health planning. —
Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
ggplot() +geom_sf(data = tracts_5070, fill ="grey92", color =NA) +geom_sf(data = parks_buf, fill ="#A7C957", color =NA, alpha =0.5) +geom_sf(data = tracts_5070 %>%filter(income_group =="Low income", !has_park_access),fill ="#D55E00", color ="white", linewidth =0.15, alpha =0.9 ) +geom_sf(data = parks_5070, color ="darkgreen", size =0.8, alpha =0.8) +labs(title ="Green Space Equity in Philadelphia",subtitle ="Low-income census tracts without a park within 0.5 mile (buffers shown in green)",caption ="Sources: OpenDataPhilly (PPR Program Sites), 2019–2023 ACS (tidycensus); distances in EPSG:5070" ) +theme_void(base_size =12) +theme(plot.title =element_text(face ="bold"),plot.subtitle =element_text(color ="grey20") )
Analysis requirements:
Clear code comments explaining each step
Appropriate CRS transformations
Summary statistics or counts
At least one map showing your findings
Brief interpretation of results (3-5 sentences)
Your interpretation:
[Using a 0.5-mile buffer as a walkable access threshold, low-income tracts exhibit a lower park access rate than high-income tracts, and they also tend to have fewer parks per tract on average. The map highlights clusters of low-income tracts without nearby parks, pointing to potential recreational access gaps. These findings suggest that future park investments and programming should prioritize underserved low-income neighborhoods to improve equitable access to green space and related health benefits.]
Finally - A few comments about your incorporation of feedback!
Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.
Comments:
After receiving feedback on my first assignment, I focused on improving clarity and organization. I combined all library calls into one hidden setup block and removed unnecessary print outputs. I also added clearer explanations and captions so the analysis reads more smoothly and professionally.
Submission Requirements
What to submit:
Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
Use embed-resources: true in YAML so it’s a single file
All code should run without errors
All maps and charts should display correctly
File naming:LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd