# Load required packages (hint: you need tidycensus, tidyverse, and knitr)
library(tidycensus)
library(tidyverse)
library(knitr)
# Set your Census API key
census_api_key("807ea1c0820a3e1e46dde3c53438622057fcc1ba")
# Choose your state for analysis - assign it to a variable called my_state
<- "California" my_state
Assignment 1: Census Data Quality for Policy Decisions
Evaluating Data Reliability for Algorithmic Decision-Making
Assignment Overview
Scenario
You are a data analyst for the California Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.
Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.
Learning Objectives
- Apply dplyr functions to real census data for policy analysis
- Evaluate data quality using margins of error
- Connect technical analysis to algorithmic decision-making
- Identify potential equity implications of data reliability issues
- Create professional documentation for policy stakeholders
Part 1: Portfolio Integration
Setup
State Selection: I have chosen California for this analysis because: I am currently on working on the wildfires that had occurred there with a few partners from UC: San Diego, so I know a bit about the state and pop density + I wish to visit it during the winter break :)!
Part 2: County-Level Resource Assessment
2.1 Data Retrieval
Your Task: Use get_acs()
to retrieve county-level data for your chosen state.
Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide
Hint: Remember to give your variables descriptive names using the variables = c(name = "code")
syntax.
# Write your get_acs() code here
<- c( med_hh_income = "B19013_001", total_pop = "B01003_001")
county_vars
<- get_acs(geography = "county", state = my_state, survey = "acs5", year = 2022, variables = county_vars, output = "wide")
county_raw
# Clean the county names to remove state name and "County"
# Hint: use mutate() with str_remove()
<- county_raw %>%
county mutate(
county_name = str_remove(NAME, paste0(", ", my_state)),
county_name = str_remove(county_name, " County$")
%>%
) select(GEOID, county_name, med_hh_incomeE, med_hh_incomeM, total_popE, total_popM)
# Display the first few rows
head(county)
# A tibble: 6 × 6
GEOID county_name med_hh_incomeE med_hh_incomeM total_popE total_popM
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 06001 Alameda 122488 1231 1663823 NA
2 06003 Alpine 101125 17442 1515 206
3 06005 Amador 74853 6048 40577 NA
4 06007 Butte 66085 2261 213605 NA
5 06009 Calaveras 77526 3875 45674 NA
6 06011 Colusa 69619 5745 21811 NA
2.2 Data Quality Assessment
Your Task: Calculate margin of error percentages and create reliability categories.
Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)
Hint: Use mutate()
with case_when()
for the categories.
# Calculate MOE percentage and reliability categories using mutate()
<- county %>%
county_reliability mutate(
moe_percentage = round((med_hh_incomeM/med_hh_incomeE) * 100, 2),
Reliability = case_when(
< 5 ~ "High Confidence",
moe_percentage >= 5 & moe_percentage <= 10 ~ "Moderate Confidence",
moe_percentage > 10 ~ "Low Confidence"
moe_percentage
)
)
# Create a summary showing count of counties in each reliability category
# Hint: use count() and mutate() to add percentages
<- county_reliability %>%
reliability_summary count(Reliability, name = "Count") %>%
mutate(Proportion = round(100 * Count / sum(Count), 1))
kable(reliability_summary, caption = "County Income Reliability Categories")
Reliability | Count | Proportion |
---|---|---|
High Confidence | 41 | 70.7 |
Low Confidence | 5 | 8.6 |
Moderate Confidence | 12 | 20.7 |
2.3 High Uncertainty Counties
Your Task: Identify the 5 counties with the highest MOE percentages.
Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()
Hint: Use arrange()
, slice()
, and select()
functions.
# Create table of top 5 counties by MOE percentage
<- county_reliability %>%
top5 arrange(desc(moe_percentage)) %>%
slice(1:5) %>%
select(
county_name,
med_hh_incomeE,
med_hh_incomeM,
moe_percentage,
Reliability
)
# Format as table with kable() - include appropriate column names and caption
kable(
top5,caption = "Top 5 Counties by Median Household Income MOE Percentage",
col.names = c("County", "Median Income", "Margin of Error", "MOE %", "Reliability Category"),
digits = 2,
align = c("l", "c", "c", "c", "c"),
format.args = list(big.mark = ",")
)
County | Median Income | Margin of Error | MOE % | Reliability Category |
---|---|---|---|---|
Mono | 82,038 | 15,388 | 18.76 | Low Confidence |
Alpine | 101,125 | 17,442 | 17.25 | Low Confidence |
Sierra | 61,108 | 9,237 | 15.12 | Low Confidence |
Trinity | 47,317 | 5,890 | 12.45 | Low Confidence |
Plumas | 67,885 | 7,772 | 11.45 | Low Confidence |
Data Quality Commentary:
All five of these counties are among the lowest-density areas in California. Because their populations are so small, the ACS relies on limited samples to generate median income estimates, which introduces greater variability. This explains the large disparities and the relatively high margins of error (11–19%). As a result, algorithms that classify or rank counties using these figures could produce erroneous outcomes if they neglect the margins of error. For example, Alpine County appears to have a median income exceeding $100,000, but its margin of error is more than $17,000 an uncertainty that is enormous relative to its ~1,000 residents. This is both a sampling size and representativeness issue, highlighting how misleading the raw point estimate can be without MOE context.
Part 3: Neighborhood-Level Analysis
3.1 Focus Area Selection
Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.
Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.
# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
<- bind_rows(
selected_counties %>%
county_reliability filter(Reliability == "High Confidence") %>%
slice(1),
%>%
county_reliability filter(Reliability == "Moderate Confidence") %>%
slice(1),
%>%
county_reliability filter(Reliability == "Low Confidence") %>%
slice(1),
%>%
) select(
County = county_name,
`Median Income` = med_hh_incomeE,
`Margin of Error` = med_hh_incomeM,
`MOE %` = moe_percentage,
Reliability = Reliability,
Population = total_popE
)
# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category
kable(
selected_counties,caption = "Largest County by Population in Each Reliability Category",
align = c("l", "r", "r", "r", "l", "r"),
format.args = list(big.mark = ",")
)
County | Median Income | Margin of Error | MOE % | Reliability | Population |
---|---|---|---|---|---|
Alameda | 122,488 | 1,231 | 1.00 | High Confidence | 1,663,823 |
Amador | 74,853 | 6,048 | 8.08 | Moderate Confidence | 40,577 |
Alpine | 101,125 | 17,442 | 17.25 | Low Confidence | 1,515 |
Comment on the output: Because I specified no randomness in how the slice is sampling the data across the reliability categories, it quite literally picked the first match it had. Which is why all three are arranged alphabetically. On the positive side we still have atleast 1 sample from each category and Alpine is still with us :)! The lowest county by density in California.
3.2 Tract-Level Demographics
Your Task: Get demographic data for census tracts in your selected counties.
Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.
# Define your race/ethnicity variables with descriptive names
<- c(
race_vars total = "B03002_001",
white = "B03002_003",
black = "B03002_004",
hispanic = "B03002_012"
)
# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
<- c("06001","06003","06005") # Alameda, Alpine, Amador
selected_geoid
<- stringr::str_sub(selected_geoid, 3, 5) # -> "001","003","005"
county_codes
<- get_acs(
tract_raw geography = "tract",
state = my_state,
county = county_codes,
survey = "acs5",
year = 2022,
variables = race_vars,
output = "wide"
)
# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations
<- tract_raw %>%
tract_percent mutate(
county_code = substr(GEOID, 3, 5),
pct_white = 100 * (whiteE / totalE),
pct_black = 100 * (blackE / totalE),
pct_hispanic = 100 * (hispanicE / totalE),
# Make a readable tract label and remove state/country suffixes
tract_label = stringr::str_remove(NAME, paste0(", ", my_state)),
tract_label = stringr::str_remove(tract_label, ", United States$")
%>%
) left_join(
%>%
county filter(GEOID %in% selected_geoid) %>%
transmute(
county_code = substr(GEOID, 3, 5),
county_name
),by = "county_code"
%>%
) select(
GEOID, county_name, tract_label,
totalE, whiteE, blackE, hispanicE,
totalM, whiteM, blackM, hispanicM,
pct_white, pct_black, pct_hispanic
)
# Add readable tract and county name columns using str_extract() or similar
kable(
%>%
tract_percent select(
County = county_name,
Tract = tract_label,
`White (%)` = pct_white,
`Black (%)` = pct_black,
`Hispanic (%)` = pct_hispanic
%>%
) arrange(County, Tract),
caption = "Tract-Level Racial/Ethnic Composition in Selected Counties",
align = c("l","l","r","r","r"),
format.args = list(big.mark = ",", digits = 1)
)
County | Tract | White (%) | Black (%) | Hispanic (%) |
---|---|---|---|---|
Alameda | Census Tract 4001; Alameda County; California | 69 | 5.23 | 5 |
Alameda | Census Tract 4002; Alameda County; California | 70 | 1.77 | 7 |
Alameda | Census Tract 4003; Alameda County; California | 59 | 9.41 | 10 |
Alameda | Census Tract 4004; Alameda County; California | 64 | 11.66 | 7 |
Alameda | Census Tract 4005; Alameda County; California | 41 | 25.15 | 15 |
Alameda | Census Tract 4006; Alameda County; California | 44 | 26.19 | 7 |
Alameda | Census Tract 4007; Alameda County; California | 48 | 23.14 | 10 |
Alameda | Census Tract 4008; Alameda County; California | 51 | 14.83 | 9 |
Alameda | Census Tract 4009; Alameda County; California | 37 | 24.33 | 19 |
Alameda | Census Tract 4010; Alameda County; California | 33 | 23.55 | 22 |
Alameda | Census Tract 4011; Alameda County; California | 42 | 16.36 | 12 |
Alameda | Census Tract 4012; Alameda County; California | 57 | 7.98 | 13 |
Alameda | Census Tract 4013; Alameda County; California | 36 | 28.75 | 9 |
Alameda | Census Tract 4014; Alameda County; California | 26 | 31.82 | 22 |
Alameda | Census Tract 4015; Alameda County; California | 22 | 54.89 | 15 |
Alameda | Census Tract 4016; Alameda County; California | 19 | 32.79 | 29 |
Alameda | Census Tract 4017; Alameda County; California | 45 | 13.80 | 17 |
Alameda | Census Tract 4018; Alameda County; California | 24 | 37.98 | 27 |
Alameda | Census Tract 4022; Alameda County; California | 32 | 27.69 | 23 |
Alameda | Census Tract 4024; Alameda County; California | 13 | 54.59 | 9 |
Alameda | Census Tract 4025; Alameda County; California | 16 | 43.68 | 12 |
Alameda | Census Tract 4026; Alameda County; California | 9 | 38.46 | 6 |
Alameda | Census Tract 4027; Alameda County; California | 26 | 37.42 | 17 |
Alameda | Census Tract 4028.01; Alameda County; California | 27 | 43.36 | 5 |
Alameda | Census Tract 4028.02; Alameda County; California | 23 | 42.87 | 4 |
Alameda | Census Tract 4029; Alameda County; California | 20 | 22.32 | 10 |
Alameda | Census Tract 4030; Alameda County; California | 9 | 5.04 | 2 |
Alameda | Census Tract 4031; Alameda County; California | 40 | 16.64 | 12 |
Alameda | Census Tract 4033.01; Alameda County; California | 10 | 2.79 | 2 |
Alameda | Census Tract 4033.02; Alameda County; California | 52 | 4.81 | 6 |
Alameda | Census Tract 4034.01; Alameda County; California | 42 | 16.45 | 9 |
Alameda | Census Tract 4034.02; Alameda County; California | 43 | 15.17 | 12 |
Alameda | Census Tract 4035.01; Alameda County; California | 38 | 15.09 | 8 |
Alameda | Census Tract 4035.02; Alameda County; California | 49 | 11.52 | 17 |
Alameda | Census Tract 4036; Alameda County; California | 38 | 23.22 | 17 |
Alameda | Census Tract 4037.01; Alameda County; California | 42 | 21.73 | 10 |
Alameda | Census Tract 4037.02; Alameda County; California | 55 | 16.49 | 11 |
Alameda | Census Tract 4038; Alameda County; California | 62 | 15.12 | 7 |
Alameda | Census Tract 4039; Alameda County; California | 63 | 9.27 | 11 |
Alameda | Census Tract 4040; Alameda County; California | 51 | 11.99 | 20 |
Alameda | Census Tract 4041.01; Alameda County; California | 74 | 8.39 | 4 |
Alameda | Census Tract 4041.02; Alameda County; California | 53 | 5.74 | 11 |
Alameda | Census Tract 4042; Alameda County; California | 67 | 4.93 | 7 |
Alameda | Census Tract 4043; Alameda County; California | 75 | 1.28 | 6 |
Alameda | Census Tract 4044; Alameda County; California | 59 | 2.44 | 3 |
Alameda | Census Tract 4045.01; Alameda County; California | 66 | 5.16 | 6 |
Alameda | Census Tract 4045.02; Alameda County; California | 66 | 5.78 | 5 |
Alameda | Census Tract 4046; Alameda County; California | 58 | 5.33 | 4 |
Alameda | Census Tract 4047; Alameda County; California | 68 | 4.57 | 9 |
Alameda | Census Tract 4048; Alameda County; California | 53 | 11.48 | 10 |
Alameda | Census Tract 4049; Alameda County; California | 54 | 6.06 | 16 |
Alameda | Census Tract 4050; Alameda County; California | 57 | 13.45 | 10 |
Alameda | Census Tract 4051; Alameda County; California | 66 | 5.66 | 7 |
Alameda | Census Tract 4052; Alameda County; California | 34 | 13.84 | 6 |
Alameda | Census Tract 4053.01; Alameda County; California | 49 | 13.93 | 17 |
Alameda | Census Tract 4053.02; Alameda County; California | 34 | 23.29 | 6 |
Alameda | Census Tract 4054.01; Alameda County; California | 25 | 12.54 | 24 |
Alameda | Census Tract 4054.02; Alameda County; California | 20 | 15.27 | 23 |
Alameda | Census Tract 4055; Alameda County; California | 21 | 19.51 | 15 |
Alameda | Census Tract 4056; Alameda County; California | 31 | 17.21 | 22 |
Alameda | Census Tract 4057; Alameda County; California | 15 | 27.97 | 25 |
Alameda | Census Tract 4058; Alameda County; California | 12 | 20.33 | 21 |
Alameda | Census Tract 4059.01; Alameda County; California | 5 | 17.11 | 34 |
Alameda | Census Tract 4059.02; Alameda County; California | 12 | 7.46 | 29 |
Alameda | Census Tract 4060; Alameda County; California | 18 | 19.22 | 20 |
Alameda | Census Tract 4061; Alameda County; California | 18 | 8.87 | 47 |
Alameda | Census Tract 4062.01; Alameda County; California | 6 | 13.70 | 48 |
Alameda | Census Tract 4062.02; Alameda County; California | 7 | 12.51 | 62 |
Alameda | Census Tract 4063; Alameda County; California | 11 | 19.32 | 35 |
Alameda | Census Tract 4064; Alameda County; California | 23 | 16.55 | 31 |
Alameda | Census Tract 4065; Alameda County; California | 12 | 13.26 | 49 |
Alameda | Census Tract 4066.01; Alameda County; California | 23 | 28.74 | 24 |
Alameda | Census Tract 4066.02; Alameda County; California | 14 | 22.32 | 34 |
Alameda | Census Tract 4067; Alameda County; California | 46 | 6.80 | 17 |
Alameda | Census Tract 4068; Alameda County; California | 28 | 20.98 | 24 |
Alameda | Census Tract 4069; Alameda County; California | 40 | 16.97 | 8 |
Alameda | Census Tract 4070; Alameda County; California | 18 | 15.64 | 36 |
Alameda | Census Tract 4071.01; Alameda County; California | 12 | 18.48 | 40 |
Alameda | Census Tract 4071.02; Alameda County; California | 7 | 23.69 | 37 |
Alameda | Census Tract 4072; Alameda County; California | 14 | 3.22 | 71 |
Alameda | Census Tract 4073; Alameda County; California | 11 | 8.42 | 65 |
Alameda | Census Tract 4074; Alameda County; California | 6 | 36.49 | 44 |
Alameda | Census Tract 4075; Alameda County; California | 7 | 22.82 | 48 |
Alameda | Census Tract 4076; Alameda County; California | 25 | 25.03 | 34 |
Alameda | Census Tract 4077; Alameda County; California | 45 | 21.17 | 18 |
Alameda | Census Tract 4078; Alameda County; California | 36 | 21.74 | 24 |
Alameda | Census Tract 4079; Alameda County; California | 50 | 12.89 | 13 |
Alameda | Census Tract 4080; Alameda County; California | 61 | 8.22 | 11 |
Alameda | Census Tract 4081; Alameda County; California | 37 | 25.00 | 7 |
Alameda | Census Tract 4082; Alameda County; California | 17 | 40.51 | 29 |
Alameda | Census Tract 4083; Alameda County; California | 22 | 38.13 | 21 |
Alameda | Census Tract 4084; Alameda County; California | 4 | 39.49 | 47 |
Alameda | Census Tract 4085; Alameda County; California | 3 | 39.33 | 49 |
Alameda | Census Tract 4086; Alameda County; California | 6 | 43.23 | 41 |
Alameda | Census Tract 4087; Alameda County; California | 9 | 32.38 | 51 |
Alameda | Census Tract 4088; Alameda County; California | 4 | 34.17 | 46 |
Alameda | Census Tract 4089; Alameda County; California | 4 | 21.13 | 68 |
Alameda | Census Tract 4090; Alameda County; California | 3 | 29.31 | 59 |
Alameda | Census Tract 4091; Alameda County; California | 10 | 20.38 | 62 |
Alameda | Census Tract 4092; Alameda County; California | 1 | 27.30 | 62 |
Alameda | Census Tract 4093; Alameda County; California | 2 | 23.63 | 66 |
Alameda | Census Tract 4094; Alameda County; California | 5 | 18.07 | 66 |
Alameda | Census Tract 4095; Alameda County; California | 12 | 17.69 | 64 |
Alameda | Census Tract 4096; Alameda County; California | 3 | 24.27 | 70 |
Alameda | Census Tract 4097; Alameda County; California | 8 | 35.78 | 36 |
Alameda | Census Tract 4098; Alameda County; California | 18 | 50.19 | 15 |
Alameda | Census Tract 4099; Alameda County; California | 23 | 41.05 | 13 |
Alameda | Census Tract 4100; Alameda County; California | 33 | 47.28 | 5 |
Alameda | Census Tract 4101; Alameda County; California | 10 | 51.86 | 18 |
Alameda | Census Tract 4102; Alameda County; California | 11 | 44.22 | 37 |
Alameda | Census Tract 4103; Alameda County; California | 2 | 16.49 | 78 |
Alameda | Census Tract 4104; Alameda County; California | 5 | 31.26 | 52 |
Alameda | Census Tract 4105; Alameda County; California | 21 | 58.88 | 12 |
Alameda | Census Tract 4201; Alameda County; California | 54 | 4.64 | 13 |
Alameda | Census Tract 4202; Alameda County; California | 35 | 5.65 | 5 |
Alameda | Census Tract 4203.01; Alameda County; California | 38 | 2.42 | 9 |
Alameda | Census Tract 4203.02; Alameda County; California | 35 | 11.73 | 7 |
Alameda | Census Tract 4204.01; Alameda County; California | 32 | 0.00 | 24 |
Alameda | Census Tract 4204.02; Alameda County; California | 32 | 3.13 | 28 |
Alameda | Census Tract 4205; Alameda County; California | 49 | 2.24 | 16 |
Alameda | Census Tract 4206; Alameda County; California | 64 | 1.03 | 8 |
Alameda | Census Tract 4211; Alameda County; California | 77 | 1.66 | 5 |
Alameda | Census Tract 4212; Alameda County; California | 70 | 0.67 | 2 |
Alameda | Census Tract 4213; Alameda County; California | 72 | 0.17 | 3 |
Alameda | Census Tract 4214; Alameda County; California | 78 | 1.89 | 9 |
Alameda | Census Tract 4215; Alameda County; California | 81 | 1.60 | 5 |
Alameda | Census Tract 4216; Alameda County; California | 71 | 2.41 | 4 |
Alameda | Census Tract 4217; Alameda County; California | 65 | 4.09 | 6 |
Alameda | Census Tract 4218; Alameda County; California | 70 | 0.88 | 5 |
Alameda | Census Tract 4219; Alameda County; California | 50 | 13.26 | 7 |
Alameda | Census Tract 4220; Alameda County; California | 49 | 4.85 | 7 |
Alameda | Census Tract 4221; Alameda County; California | 47 | 11.69 | 20 |
Alameda | Census Tract 4222; Alameda County; California | 63 | 2.53 | 7 |
Alameda | Census Tract 4223; Alameda County; California | 53 | 12.45 | 7 |
Alameda | Census Tract 4224; Alameda County; California | 38 | 3.20 | 12 |
Alameda | Census Tract 4225; Alameda County; California | 45 | 4.10 | 13 |
Alameda | Census Tract 4227; Alameda County; California | 36 | 3.87 | 17 |
Alameda | Census Tract 4228; Alameda County; California | 28 | 5.73 | 19 |
Alameda | Census Tract 4229.01; Alameda County; California | 41 | 0.00 | 3 |
Alameda | Census Tract 4229.02; Alameda County; California | 30 | 6.72 | 7 |
Alameda | Census Tract 4230; Alameda County; California | 57 | 7.54 | 8 |
Alameda | Census Tract 4231; Alameda County; California | 41 | 23.74 | 10 |
Alameda | Census Tract 4232; Alameda County; California | 39 | 17.55 | 33 |
Alameda | Census Tract 4233; Alameda County; California | 46 | 19.85 | 17 |
Alameda | Census Tract 4234; Alameda County; California | 47 | 13.28 | 22 |
Alameda | Census Tract 4235; Alameda County; California | 55 | 10.08 | 17 |
Alameda | Census Tract 4236.01; Alameda County; California | 63 | 3.73 | 13 |
Alameda | Census Tract 4236.02; Alameda County; California | 47 | 1.50 | 14 |
Alameda | Census Tract 4237; Alameda County; California | 60 | 2.38 | 17 |
Alameda | Census Tract 4238; Alameda County; California | 73 | 1.08 | 6 |
Alameda | Census Tract 4239.01; Alameda County; California | 52 | 12.67 | 20 |
Alameda | Census Tract 4239.02; Alameda County; California | 69 | 5.95 | 7 |
Alameda | Census Tract 4240.01; Alameda County; California | 46 | 18.12 | 16 |
Alameda | Census Tract 4240.02; Alameda County; California | 38 | 31.17 | 17 |
Alameda | Census Tract 4251.01; Alameda County; California | 45 | 5.93 | 11 |
Alameda | Census Tract 4251.02; Alameda County; California | 29 | 12.77 | 10 |
Alameda | Census Tract 4251.03; Alameda County; California | 31 | 26.62 | 7 |
Alameda | Census Tract 4251.04; Alameda County; California | 42 | 15.55 | 10 |
Alameda | Census Tract 4261; Alameda County; California | 68 | 0.00 | 2 |
Alameda | Census Tract 4262; Alameda County; California | 68 | 1.44 | 6 |
Alameda | Census Tract 4271; Alameda County; California | 58 | 1.21 | 15 |
Alameda | Census Tract 4272; Alameda County; California | 33 | 3.31 | 17 |
Alameda | Census Tract 4273; Alameda County; California | 32 | 16.26 | 8 |
Alameda | Census Tract 4276; Alameda County; California | 29 | 15.57 | 15 |
Alameda | Census Tract 4277; Alameda County; California | 61 | 2.14 | 11 |
Alameda | Census Tract 4278; Alameda County; California | 51 | 4.38 | 9 |
Alameda | Census Tract 4279; Alameda County; California | 52 | 0.66 | 14 |
Alameda | Census Tract 4280; Alameda County; California | 32 | 12.04 | 14 |
Alameda | Census Tract 4281; Alameda County; California | 51 | 6.82 | 10 |
Alameda | Census Tract 4282; Alameda County; California | 51 | 3.32 | 13 |
Alameda | Census Tract 4283.01; Alameda County; California | 26 | 5.11 | 9 |
Alameda | Census Tract 4283.02; Alameda County; California | 44 | 0.87 | 7 |
Alameda | Census Tract 4284; Alameda County; California | 37 | 7.21 | 16 |
Alameda | Census Tract 4285; Alameda County; California | 42 | 10.93 | 16 |
Alameda | Census Tract 4286; Alameda County; California | 42 | 5.97 | 11 |
Alameda | Census Tract 4287; Alameda County; California | 23 | 17.70 | 13 |
Alameda | Census Tract 4301.01; Alameda County; California | 34 | 2.93 | 10 |
Alameda | Census Tract 4301.02; Alameda County; California | 53 | 0.89 | 14 |
Alameda | Census Tract 4302; Alameda County; California | 51 | 3.45 | 10 |
Alameda | Census Tract 4303; Alameda County; California | 44 | 0.49 | 24 |
Alameda | Census Tract 4304; Alameda County; California | 47 | 4.27 | 7 |
Alameda | Census Tract 4305; Alameda County; California | 24 | 37.25 | 16 |
Alameda | Census Tract 4306; Alameda County; California | 39 | 3.92 | 16 |
Alameda | Census Tract 4307; Alameda County; California | 39 | 4.97 | 18 |
Alameda | Census Tract 4308; Alameda County; California | 43 | 0.84 | 16 |
Alameda | Census Tract 4309; Alameda County; California | 28 | 6.48 | 28 |
Alameda | Census Tract 4310; Alameda County; California | 27 | 16.31 | 16 |
Alameda | Census Tract 4311; Alameda County; California | 26 | 27.71 | 26 |
Alameda | Census Tract 4312; Alameda County; California | 36 | 14.15 | 28 |
Alameda | Census Tract 4321; Alameda County; California | 32 | 24.38 | 25 |
Alameda | Census Tract 4322; Alameda County; California | 27 | 9.76 | 37 |
Alameda | Census Tract 4323; Alameda County; California | 25 | 12.60 | 30 |
Alameda | Census Tract 4324; Alameda County; California | 13 | 4.41 | 55 |
Alameda | Census Tract 4325.01; Alameda County; California | 20 | 3.74 | 28 |
Alameda | Census Tract 4325.02; Alameda County; California | 11 | 19.37 | 30 |
Alameda | Census Tract 4326.01; Alameda County; California | 24 | 22.17 | 25 |
Alameda | Census Tract 4326.02; Alameda County; California | 14 | 25.30 | 30 |
Alameda | Census Tract 4327; Alameda County; California | 52 | 4.90 | 21 |
Alameda | Census Tract 4328; Alameda County; California | 37 | 6.84 | 14 |
Alameda | Census Tract 4330; Alameda County; California | 36 | 8.13 | 16 |
Alameda | Census Tract 4331.02; Alameda County; California | 7 | 6.95 | 29 |
Alameda | Census Tract 4331.03; Alameda County; California | 13 | 11.43 | 40 |
Alameda | Census Tract 4331.04; Alameda County; California | 24 | 17.42 | 38 |
Alameda | Census Tract 4332; Alameda County; California | 10 | 9.73 | 33 |
Alameda | Census Tract 4333; Alameda County; California | 18 | 0.50 | 27 |
Alameda | Census Tract 4334; Alameda County; California | 13 | 8.51 | 8 |
Alameda | Census Tract 4335; Alameda County; California | 25 | 2.92 | 21 |
Alameda | Census Tract 4336; Alameda County; California | 28 | 5.07 | 20 |
Alameda | Census Tract 4337; Alameda County; California | 13 | 2.72 | 52 |
Alameda | Census Tract 4338.01; Alameda County; California | 9 | 13.38 | 47 |
Alameda | Census Tract 4338.02; Alameda County; California | 8 | 12.58 | 21 |
Alameda | Census Tract 4339; Alameda County; California | 9 | 26.33 | 43 |
Alameda | Census Tract 4340; Alameda County; California | 18 | 13.14 | 45 |
Alameda | Census Tract 4351.02; Alameda County; California | 25 | 14.47 | 27 |
Alameda | Census Tract 4351.03; Alameda County; California | 35 | 7.40 | 6 |
Alameda | Census Tract 4351.04; Alameda County; California | 15 | 7.38 | 31 |
Alameda | Census Tract 4352; Alameda County; California | 23 | 22.36 | 27 |
Alameda | Census Tract 4353; Alameda County; California | 22 | 15.13 | 34 |
Alameda | Census Tract 4354; Alameda County; California | 23 | 15.79 | 33 |
Alameda | Census Tract 4355; Alameda County; California | 26 | 12.57 | 48 |
Alameda | Census Tract 4356.01; Alameda County; California | 12 | 10.39 | 57 |
Alameda | Census Tract 4356.02; Alameda County; California | 20 | 13.45 | 52 |
Alameda | Census Tract 4357; Alameda County; California | 21 | 2.43 | 50 |
Alameda | Census Tract 4358; Alameda County; California | 21 | 4.50 | 38 |
Alameda | Census Tract 4359; Alameda County; California | 29 | 0.66 | 25 |
Alameda | Census Tract 4360; Alameda County; California | 27 | 0.61 | 40 |
Alameda | Census Tract 4361; Alameda County; California | 17 | 4.35 | 32 |
Alameda | Census Tract 4362; Alameda County; California | 11 | 12.79 | 64 |
Alameda | Census Tract 4363.01; Alameda County; California | 6 | 10.75 | 46 |
Alameda | Census Tract 4363.02; Alameda County; California | 17 | 11.72 | 46 |
Alameda | Census Tract 4364.02; Alameda County; California | 40 | 13.76 | 24 |
Alameda | Census Tract 4364.03; Alameda County; California | 27 | 9.38 | 26 |
Alameda | Census Tract 4364.04; Alameda County; California | 45 | 11.29 | 21 |
Alameda | Census Tract 4365; Alameda County; California | 13 | 9.60 | 49 |
Alameda | Census Tract 4366.01; Alameda County; California | 11 | 10.61 | 58 |
Alameda | Census Tract 4366.02; Alameda County; California | 7 | 6.17 | 53 |
Alameda | Census Tract 4367; Alameda County; California | 11 | 13.31 | 48 |
Alameda | Census Tract 4368; Alameda County; California | 11 | 5.93 | 49 |
Alameda | Census Tract 4369; Alameda County; California | 12 | 6.97 | 58 |
Alameda | Census Tract 4370; Alameda County; California | 22 | 5.79 | 38 |
Alameda | Census Tract 4371.01; Alameda County; California | 10 | 6.27 | 27 |
Alameda | Census Tract 4371.02; Alameda County; California | 10 | 6.91 | 42 |
Alameda | Census Tract 4372; Alameda County; California | 11 | 6.43 | 32 |
Alameda | Census Tract 4373; Alameda County; California | 13 | 12.22 | 34 |
Alameda | Census Tract 4374; Alameda County; California | 15 | 3.75 | 52 |
Alameda | Census Tract 4375; Alameda County; California | 11 | 5.90 | 56 |
Alameda | Census Tract 4376; Alameda County; California | 11 | 9.46 | 31 |
Alameda | Census Tract 4377.01; Alameda County; California | 10 | 15.62 | 50 |
Alameda | Census Tract 4377.02; Alameda County; California | 6 | 1.91 | 85 |
Alameda | Census Tract 4378; Alameda County; California | 15 | 7.69 | 37 |
Alameda | Census Tract 4379; Alameda County; California | 12 | 11.32 | 41 |
Alameda | Census Tract 4380; Alameda County; California | 23 | 11.47 | 23 |
Alameda | Census Tract 4381; Alameda County; California | 12 | 6.31 | 37 |
Alameda | Census Tract 4382.01; Alameda County; California | 7 | 5.62 | 56 |
Alameda | Census Tract 4382.03; Alameda County; California | 25 | 3.62 | 21 |
Alameda | Census Tract 4382.04; Alameda County; California | 9 | 4.03 | 36 |
Alameda | Census Tract 4383; Alameda County; California | 6 | 6.46 | 37 |
Alameda | Census Tract 4384; Alameda County; California | 16 | 10.86 | 24 |
Alameda | Census Tract 4401; Alameda County; California | 40 | 5.19 | 21 |
Alameda | Census Tract 4402; Alameda County; California | 3 | 0.46 | 70 |
Alameda | Census Tract 4403.01; Alameda County; California | 29 | 5.74 | 34 |
Alameda | Census Tract 4403.04; Alameda County; California | 10 | 7.94 | 12 |
Alameda | Census Tract 4403.05; Alameda County; California | 21 | 1.28 | 14 |
Alameda | Census Tract 4403.06; Alameda County; California | 9 | 5.89 | 11 |
Alameda | Census Tract 4403.07; Alameda County; California | 14 | 5.78 | 21 |
Alameda | Census Tract 4403.08; Alameda County; California | 13 | 4.11 | 26 |
Alameda | Census Tract 4403.31; Alameda County; California | 12 | 3.79 | 16 |
Alameda | Census Tract 4403.32; Alameda County; California | 8 | 1.56 | 8 |
Alameda | Census Tract 4403.33; Alameda County; California | 7 | 0.81 | 5 |
Alameda | Census Tract 4403.34; Alameda County; California | 11 | 5.77 | 13 |
Alameda | Census Tract 4403.36; Alameda County; California | 15 | 12.57 | 10 |
Alameda | Census Tract 4403.37; Alameda County; California | 6 | 4.91 | 9 |
Alameda | Census Tract 4403.38; Alameda County; California | 22 | 0.49 | 9 |
Alameda | Census Tract 4411; Alameda County; California | 44 | 0.04 | 16 |
Alameda | Census Tract 4412; Alameda County; California | 32 | 2.60 | 12 |
Alameda | Census Tract 4413.01; Alameda County; California | 20 | 10.90 | 7 |
Alameda | Census Tract 4413.02; Alameda County; California | 16 | 2.71 | 11 |
Alameda | Census Tract 4414.01; Alameda County; California | 18 | 1.15 | 10 |
Alameda | Census Tract 4414.02; Alameda County; California | 18 | 1.27 | 7 |
Alameda | Census Tract 4415.01; Alameda County; California | 8 | 5.01 | 5 |
Alameda | Census Tract 4415.03; Alameda County; California | 7 | 0.24 | 4 |
Alameda | Census Tract 4415.21; Alameda County; California | 11 | 0.25 | 6 |
Alameda | Census Tract 4415.22; Alameda County; California | 18 | 2.62 | 7 |
Alameda | Census Tract 4415.23; Alameda County; California | 11 | 4.20 | 7 |
Alameda | Census Tract 4415.24; Alameda County; California | 5 | 0.36 | 1 |
Alameda | Census Tract 4415.25; Alameda County; California | 7 | 3.49 | 11 |
Alameda | Census Tract 4416.01; Alameda County; California | 31 | 8.17 | 14 |
Alameda | Census Tract 4416.02; Alameda County; California | 28 | 8.46 | 23 |
Alameda | Census Tract 4417.01; Alameda County; California | 12 | 0.46 | 20 |
Alameda | Census Tract 4417.02; Alameda County; California | 19 | 10.01 | 16 |
Alameda | Census Tract 4418; Alameda County; California | 32 | 1.77 | 6 |
Alameda | Census Tract 4419.21; Alameda County; California | 19 | 0.31 | 24 |
Alameda | Census Tract 4419.23; Alameda County; California | 12 | 3.41 | 13 |
Alameda | Census Tract 4419.24; Alameda County; California | 17 | 1.51 | 10 |
Alameda | Census Tract 4419.26; Alameda County; California | 13 | 3.25 | 25 |
Alameda | Census Tract 4419.27; Alameda County; California | 19 | 3.74 | 10 |
Alameda | Census Tract 4419.28; Alameda County; California | 17 | 13.20 | 8 |
Alameda | Census Tract 4419.29; Alameda County; California | 21 | 0.85 | 8 |
Alameda | Census Tract 4420; Alameda County; California | 15 | 0.13 | 10 |
Alameda | Census Tract 4421; Alameda County; California | 9 | 1.60 | 1 |
Alameda | Census Tract 4422; Alameda County; California | 13 | 2.43 | 6 |
Alameda | Census Tract 4423.01; Alameda County; California | 20 | 2.37 | 15 |
Alameda | Census Tract 4423.02; Alameda County; California | 15 | 5.66 | 15 |
Alameda | Census Tract 4424; Alameda County; California | 25 | 1.75 | 25 |
Alameda | Census Tract 4425.01; Alameda County; California | 13 | 3.17 | 28 |
Alameda | Census Tract 4425.02; Alameda County; California | 17 | 4.24 | 30 |
Alameda | Census Tract 4426.01; Alameda County; California | 28 | 3.47 | 28 |
Alameda | Census Tract 4426.02; Alameda County; California | 29 | 10.38 | 17 |
Alameda | Census Tract 4427; Alameda County; California | 30 | 1.05 | 11 |
Alameda | Census Tract 4428; Alameda County; California | 24 | 0.74 | 16 |
Alameda | Census Tract 4429; Alameda County; California | 15 | 5.15 | 13 |
Alameda | Census Tract 4430.01; Alameda County; California | 18 | 4.89 | 32 |
Alameda | Census Tract 4430.02; Alameda County; California | 16 | 1.74 | 17 |
Alameda | Census Tract 4431.02; Alameda County; California | 11 | 0.00 | 6 |
Alameda | Census Tract 4431.03; Alameda County; California | 19 | 1.21 | 3 |
Alameda | Census Tract 4431.04; Alameda County; California | 14 | 6.27 | 2 |
Alameda | Census Tract 4431.05; Alameda County; California | 10 | 0.43 | 2 |
Alameda | Census Tract 4432; Alameda County; California | 14 | 0.30 | 1 |
Alameda | Census Tract 4433.01; Alameda County; California | 18 | 1.02 | 9 |
Alameda | Census Tract 4433.21; Alameda County; California | 3 | 1.69 | 7 |
Alameda | Census Tract 4433.22; Alameda County; California | 19 | 1.09 | 7 |
Alameda | Census Tract 4441; Alameda County; California | 29 | 6.61 | 25 |
Alameda | Census Tract 4442; Alameda County; California | 21 | 1.34 | 30 |
Alameda | Census Tract 4443.01; Alameda County; California | 29 | 0.76 | 26 |
Alameda | Census Tract 4443.03; Alameda County; California | NaN | NaN | NaN |
Alameda | Census Tract 4443.04; Alameda County; California | 12 | 0.81 | 32 |
Alameda | Census Tract 4444; Alameda County; California | 15 | 2.87 | 54 |
Alameda | Census Tract 4445; Alameda County; California | 18 | 4.61 | 44 |
Alameda | Census Tract 4446.01; Alameda County; California | 17 | 1.20 | 18 |
Alameda | Census Tract 4446.02; Alameda County; California | 15 | 5.82 | 10 |
Alameda | Census Tract 4501.01; Alameda County; California | 23 | 2.96 | 8 |
Alameda | Census Tract 4501.02; Alameda County; California | 19 | 14.83 | 14 |
Alameda | Census Tract 4502; Alameda County; California | 40 | 3.57 | 9 |
Alameda | Census Tract 4503; Alameda County; California | 47 | 6.27 | 15 |
Alameda | Census Tract 4504; Alameda County; California | 35 | 3.79 | 19 |
Alameda | Census Tract 4505.01; Alameda County; California | 54 | 0.00 | 13 |
Alameda | Census Tract 4505.02; Alameda County; California | 42 | 0.65 | 9 |
Alameda | Census Tract 4506.01; Alameda County; California | 47 | 0.74 | 7 |
Alameda | Census Tract 4506.03; Alameda County; California | 42 | 0.43 | 15 |
Alameda | Census Tract 4506.04; Alameda County; California | 53 | 0.44 | 13 |
Alameda | Census Tract 4506.05; Alameda County; California | 49 | 0.26 | 9 |
Alameda | Census Tract 4506.06; Alameda County; California | 50 | 0.00 | 5 |
Alameda | Census Tract 4506.07; Alameda County; California | 33 | 1.72 | 15 |
Alameda | Census Tract 4506.08; Alameda County; California | 41 | 1.67 | 8 |
Alameda | Census Tract 4506.09; Alameda County; California | 48 | 2.03 | 14 |
Alameda | Census Tract 4507.01; Alameda County; California | 45 | 1.66 | 7 |
Alameda | Census Tract 4507.41; Alameda County; California | 45 | 1.18 | 21 |
Alameda | Census Tract 4507.42; Alameda County; California | 49 | 0.18 | 9 |
Alameda | Census Tract 4507.43; Alameda County; California | 20 | 7.91 | 13 |
Alameda | Census Tract 4507.44; Alameda County; California | 47 | 0.00 | 13 |
Alameda | Census Tract 4507.45; Alameda County; California | 33 | 0.44 | 7 |
Alameda | Census Tract 4507.46; Alameda County; California | 47 | 0.43 | 19 |
Alameda | Census Tract 4507.50; Alameda County; California | 20 | 3.56 | 7 |
Alameda | Census Tract 4507.51; Alameda County; California | 13 | 2.69 | 5 |
Alameda | Census Tract 4507.52; Alameda County; California | 14 | 2.37 | 7 |
Alameda | Census Tract 4511.02; Alameda County; California | 74 | 0.51 | 7 |
Alameda | Census Tract 4511.03; Alameda County; California | 86 | 1.76 | 12 |
Alameda | Census Tract 4511.04; Alameda County; California | 54 | 0.04 | 18 |
Alameda | Census Tract 4512.01; Alameda County; California | 48 | 1.64 | 25 |
Alameda | Census Tract 4512.02; Alameda County; California | 45 | 0.93 | 17 |
Alameda | Census Tract 4513; Alameda County; California | 54 | 1.22 | 27 |
Alameda | Census Tract 4514.01; Alameda County; California | 39 | 5.35 | 38 |
Alameda | Census Tract 4514.03; Alameda County; California | 58 | 1.60 | 23 |
Alameda | Census Tract 4514.04; Alameda County; California | 32 | 1.06 | 59 |
Alameda | Census Tract 4515.01; Alameda County; California | 59 | 5.07 | 14 |
Alameda | Census Tract 4515.03; Alameda County; California | 51 | 0.59 | 20 |
Alameda | Census Tract 4515.04; Alameda County; California | 46 | 0.00 | 27 |
Alameda | Census Tract 4515.05; Alameda County; California | 66 | 1.01 | 13 |
Alameda | Census Tract 4515.06; Alameda County; California | 37 | 3.97 | 36 |
Alameda | Census Tract 4516.01; Alameda County; California | 70 | 0.00 | 9 |
Alameda | Census Tract 4516.02; Alameda County; California | 65 | 2.20 | 19 |
Alameda | Census Tract 4517.01; Alameda County; California | 54 | 4.26 | 13 |
Alameda | Census Tract 4517.03; Alameda County; California | 61 | 0.73 | 13 |
Alameda | Census Tract 4517.04; Alameda County; California | 69 | 0.23 | 12 |
Alameda | Census Tract 9819; Alameda County; California | 60 | 0.00 | 40 |
Alameda | Census Tract 9820; Alameda County; California | 50 | 10.00 | 0 |
Alameda | Census Tract 9821; Alameda County; California | 28 | 8.88 | 18 |
Alameda | Census Tract 9832; Alameda County; California | 54 | 13.84 | 5 |
Alameda | Census Tract 9900; Alameda County; California | NaN | NaN | NaN |
Alpine | Census Tract 100; Alpine County; California | 58 | 0.00 | 14 |
Amador | Census Tract 1.01; Amador County; California | 80 | 0.44 | 13 |
Amador | Census Tract 1.02; Amador County; California | 85 | 0.32 | 8 |
Amador | Census Tract 2.01; Amador County; California | 86 | 0.72 | 10 |
Amador | Census Tract 2.02; Amador County; California | 73 | 2.30 | 14 |
Amador | Census Tract 3.01; Amador County; California | 46 | 9.89 | 36 |
Amador | Census Tract 3.03; Amador County; California | 80 | 0.09 | 9 |
Amador | Census Tract 3.04; Amador County; California | 78 | 0.80 | 10 |
Amador | Census Tract 4.01; Amador County; California | 79 | 0.70 | 8 |
Amador | Census Tract 4.02; Amador County; California | 78 | 0.17 | 16 |
Amador | Census Tract 5; Amador County; California | 72 | 0.09 | 24 |
3.3 Demographic Analysis
Your Task: Analyze the demographic patterns in your selected areas.
# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
<- tract_percent %>%
top_hispanic_tract arrange(desc(pct_hispanic)) %>%
slice(1) %>%
select(GEOID, tract_label, county_name, pct_hispanic)
kable(top_hispanic_tract, caption = "Tract with Highest % Hispanic/Latino")
GEOID | tract_label | county_name | pct_hispanic |
---|---|---|---|
06001437702 | Census Tract 4377.02; Alameda County; California | Alameda | 85 |
# Calculate average demographics by county using group_by() and summarize()
# Show: number of tracts, average percentage for each racial/ethnic group
<- tract_percent %>%
county_demo_avgs group_by(county_name) %>%
summarise(
"Number of Tracts" = n(),
"Average White Percentage" = mean(pct_white, na.rm = TRUE),
"Average Black Percentage" = mean(pct_black, na.rm = TRUE),
"Average Hispanic Percentage" = mean(pct_hispanic, na.rm = TRUE)
)
# Create a nicely formatted table of your results using kable()
kable(
county_demo_avgs,caption = "Average Tract Demographics by County",
digits = 1,
align = c("l","c","c","c","c"),
col.names = c("County Names", "Number of Tracts", "Average White Percentage","Average Black Percentage","Average Hispanic Percentage")
)
County Names | Number of Tracts | Average White Percentage | Average Black Percentage | Average Hispanic Percentage |
---|---|---|---|---|
Alameda | 379 | 31.0 | 10.7 | 21.4 |
Alpine | 1 | 58.1 | 0.0 | 14.1 |
Amador | 10 | 75.7 | 1.6 | 14.9 |
Part 4: Comprehensive Data Quality Evaluation
4.1 MOE Analysis for Demographic Variables
Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.
Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics
# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
# Create a flag for tracts with high MOE on any demographic variable
# Use logical operators (| for OR) in an ifelse() statement
<- unique(stringr::str_sub(county$GEOID, 3, 5))
county_codes_state
<- c(
race_vars total = "B03002_001",
white = "B03002_003",
black = "B03002_004",
hispanic = "B03002_012"
)
<- get_acs(
tract_state_raw geography = "tract",
state = my_state,
county = county_codes_state, # all counties in the state
survey = "acs5",
year = 2022,
variables = race_vars,
output = "wide"
)
<- tract_state_raw %>%
tract_state_percent mutate(
county_code = substr(GEOID, 3, 5),
pct_white = 100 * (whiteE / totalE),
pct_black = 100 * (blackE / totalE),
pct_hispanic = 100 * (hispanicE / totalE),
tract_label = stringr::str_remove(NAME, paste0(", ", my_state)),
tract_label = stringr::str_remove(tract_label, ", United States$")
%>%
) left_join(
%>%
county transmute(
county_code = substr(GEOID, 3, 5),
county_name
),by = "county_code"
)
<- tract_state_percent %>%
tract_quality mutate(
moe_total_pct = 100 * (totalM / totalE),
moe_white_pct = 100 * (whiteM / whiteE),
moe_black_pct = 100 * (blackM / blackE),
moe_hispanic_pct = 100 * (hispanicM / hispanicE),
high_moe_flag = (moe_white_pct > 15) | (moe_black_pct > 15) | (moe_hispanic_pct > 15)
)
<- tract_quality %>%
tract_quality_summary summarise(
tracts_total = n(),
tracts_high_moe = sum(high_moe_flag, na.rm = TRUE),
percent_high_moe = round(100 * tracts_high_moe / tracts_total, 1)
)
kable(
tract_quality_summary,caption = "Tract-Level High-MOE Summary (>15% on any demographic variable) — Statewide",
col.names = c("Total Tracts", "High-MOE Tracts", "Percent High-MOE (%)"),
align = "c"
)
Total Tracts | High-MOE Tracts | Percent High-MOE (%) |
---|---|---|
9129 | 9123 | 99.9 |
4.2 Pattern Analysis
Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.
# Group tracts by whether they have high MOE issues
# Calculate average characteristics for each group:
# - population size, demographic percentages
<- tract_quality %>%
pattern_table group_by(high_moe_flag, county_name) %>%
summarise(
tracts = n(),
avg_pop = mean(totalE, na.rm = TRUE),
avg_pct_white = mean(pct_white, na.rm = TRUE),
avg_pct_black = mean(pct_black, na.rm = TRUE),
avg_pct_hispanic = mean(pct_hispanic, na.rm = TRUE),
.groups = "drop"
%>%
) arrange(desc(high_moe_flag), county_name)
kable(
pattern_table,caption = "Characteristics by High-MOE Status (Any Demographic Variable > 15% MOE) — Statewide",
col.names = c(
"High-MOE (Any >15%)",
"County",
"Tracts",
"Avg Pop",
"Avg White (%)",
"Avg Black (%)",
"Avg Hispanic (%)"
),align = c("c","l","r","r","r","r","r"),
digits = c(NA, NA, 0, 0, 1, 1, 1),
format.args = list(big.mark = ",")
)
High-MOE (Any >15%) | County | Tracts | Avg Pop | Avg White (%) | Avg Black (%) | Avg Hispanic (%) |
---|---|---|---|---|---|---|
TRUE | Alameda | 379 | 4,390 | 31.0 | 10.7 | 21.4 |
TRUE | Alpine | 1 | 1,515 | 58.1 | 0.0 | 14.1 |
TRUE | Amador | 10 | 4,058 | 75.7 | 1.6 | 14.9 |
TRUE | Butte | 54 | 3,956 | 69.3 | 1.5 | 17.4 |
TRUE | Calaveras | 14 | 3,262 | 81.0 | 0.9 | 11.6 |
TRUE | Colusa | 6 | 3,635 | 34.0 | 1.6 | 60.5 |
TRUE | Contra Costa | 242 | 4,804 | 42.6 | 8.0 | 25.1 |
TRUE | Del Norte | 9 | 3,051 | 59.5 | 2.2 | 19.6 |
TRUE | El Dorado | 55 | 3,486 | 76.0 | 0.6 | 13.8 |
TRUE | Fresno | 225 | 4,481 | 28.4 | 4.1 | 54.1 |
TRUE | Glenn | 8 | 3,582 | 54.0 | 0.3 | 39.2 |
TRUE | Humboldt | 36 | 3,781 | 72.1 | 1.3 | 11.6 |
TRUE | Imperial | 40 | 4,489 | 11.5 | 2.6 | 82.2 |
TRUE | Inyo | 6 | 3,138 | 62.1 | 0.8 | 23.2 |
TRUE | Kern | 236 | 3,843 | 33.1 | 4.7 | 54.0 |
TRUE | Kings | 30 | 4,830 | 30.0 | 5.0 | 57.6 |
TRUE | Lake | 21 | 3,239 | 69.3 | 2.5 | 20.0 |
TRUE | Lassen | 8 | 3,020 | 75.0 | 2.8 | 12.0 |
TRUE | Los Angeles | 2,497 | 3,976 | 26.3 | 7.6 | 47.6 |
TRUE | Madera | 33 | 4,552 | 33.9 | 2.0 | 58.3 |
TRUE | Marin | 63 | 4,135 | 69.2 | 2.5 | 16.5 |
TRUE | Mariposa | 6 | 2,855 | 76.8 | 0.9 | 13.4 |
TRUE | Mendocino | 24 | 3,798 | 64.6 | 0.5 | 24.9 |
TRUE | Merced | 63 | 4,481 | 25.4 | 2.8 | 62.1 |
TRUE | Modoc | 4 | 2,163 | 76.6 | 1.4 | 15.1 |
TRUE | Mono | 4 | 3,305 | 64.1 | 0.2 | 27.8 |
TRUE | Monterey | 104 | 4,208 | 35.2 | 2.0 | 52.6 |
TRUE | Napa | 40 | 3,435 | 54.6 | 2.1 | 31.7 |
TRUE | Nevada | 26 | 3,935 | 83.4 | 0.3 | 9.6 |
TRUE | Orange | 614 | 5,171 | 41.3 | 1.5 | 32.4 |
TRUE | Placer | 92 | 4,420 | 70.9 | 1.4 | 14.5 |
TRUE | Plumas | 7 | 2,807 | 85.2 | 0.6 | 8.6 |
TRUE | Riverside | 518 | 4,690 | 34.6 | 5.7 | 49.9 |
TRUE | Sacramento | 363 | 4,350 | 43.2 | 9.1 | 23.8 |
TRUE | San Benito | 12 | 5,396 | 32.8 | 0.8 | 59.5 |
TRUE | San Bernardino | 465 | 4,682 | 28.5 | 7.1 | 53.3 |
TRUE | San Diego | 737 | 4,464 | 45.5 | 4.4 | 33.3 |
TRUE | San Francisco | 244 | 3,488 | 39.5 | 5.1 | 15.1 |
TRUE | San Joaquin | 174 | 4,480 | 29.8 | 6.7 | 43.6 |
TRUE | San Luis Obispo | 70 | 4,024 | 67.1 | 1.3 | 23.2 |
TRUE | San Mateo | 174 | 4,335 | 37.9 | 2.1 | 23.5 |
TRUE | Santa Barbara | 109 | 4,085 | 46.0 | 1.8 | 43.6 |
TRUE | Santa Clara | 408 | 4,698 | 29.6 | 2.3 | 25.3 |
TRUE | Santa Cruz | 70 | 3,837 | 56.7 | 0.8 | 34.2 |
TRUE | Shasta | 50 | 3,637 | 77.7 | 0.9 | 10.6 |
TRUE | Sierra | 1 | 2,916 | 86.6 | 0.2 | 11.4 |
TRUE | Siskiyou | 16 | 2,753 | 73.9 | 1.2 | 14.5 |
TRUE | Solano | 99 | 4,497 | 36.3 | 12.7 | 28.2 |
TRUE | Sonoma | 122 | 4,004 | 63.6 | 1.4 | 25.6 |
TRUE | Stanislaus | 112 | 4,929 | 38.7 | 2.7 | 48.9 |
TRUE | Sutter | 21 | 4,719 | 45.8 | 1.8 | 32.5 |
TRUE | Tehama | 14 | 4,677 | 65.9 | 0.9 | 26.0 |
TRUE | Trinity | 4 | 3,972 | 79.2 | 1.7 | 7.0 |
TRUE | Tulare | 103 | 4,597 | 27.4 | 1.2 | 65.3 |
TRUE | Tuolumne | 18 | 3,055 | 78.1 | 1.8 | 13.5 |
TRUE | Ventura | 190 | 4,432 | 45.0 | 1.7 | 42.1 |
TRUE | Yolo | 53 | 4,097 | 46.5 | 2.7 | 31.1 |
TRUE | Yuba | 19 | 4,300 | 56.0 | 3.2 | 26.7 |
FALSE | Kings | 1 | 7,612 | 18.3 | 25.1 | 49.9 |
FALSE | Lassen | 1 | 7,717 | 30.4 | 22.1 | 42.4 |
FALSE | Los Angeles | 1 | 8,994 | 16.8 | 33.6 | 41.4 |
FALSE | Madera | 1 | 7,043 | 25.2 | 14.9 | 49.2 |
FALSE | San Bernardino | 1 | 3,618 | 15.5 | 25.5 | 50.4 |
FALSE | Solano | 1 | 5,774 | 18.5 | 43.6 | 29.0 |
<- tract_quality %>%
tract_flag_driver mutate(
flag_white = moe_white_pct > 15,
flag_black = moe_black_pct > 15,
flag_hispanic = moe_hispanic_pct > 15,
driver_groups = case_when(
& !flag_black & !flag_hispanic ~ "White",
flag_white !flag_white & flag_black & !flag_hispanic ~ "Black",
!flag_white & !flag_black & flag_hispanic ~ "Hispanic",
| flag_black | flag_hispanic ~ "Multiple",
flag_white TRUE ~ "None"
)
)
<- tract_flag_driver %>%
driver_totals filter(high_moe_flag) %>%
summarise(
White = sum(flag_white, na.rm = TRUE),
Black = sum(flag_black, na.rm = TRUE),
Hispanic = sum(flag_hispanic, na.rm = TRUE)
%>%
) ::pivot_longer(everything(),
tidyrnames_to = "Group",
values_to = "Flagged Tracts")
kable(
%>% arrange(desc(`Flagged Tracts`)),
driver_totals caption = "Which Groups Drove High-MOE Flags (MOE% > 15) — Statewide",
col.names = c("Group", "Flagged Tracts"),
align = c("l","r"),
format.args = list(big.mark = ",")
)
Group | Flagged Tracts |
---|---|
Black | 9,109 |
Hispanic | 8,631 |
White | 8,144 |
Pattern Analysis: If we are to detect patterns using the instructions given, then 99.9% of all census tracts are within the MOE margins, meaning there is some type of data error present. That is a clear indication that the issue is effectively random at the tract level. However, collapsing the results by demographic group rather than by tract reveals that the burden of error is not evenly shared: it falls overwhelmingly on minority populations, with Black communities the most affected. This stems from well-known challenges of under-sampling in the ACS, which causes large variations within smaller racial categories. Tracts with high MOEs tend to have smaller populations overall, amplifying sampling error, and even within White-majority tracts the subgroup estimates for Black and Hispanic residents frequently exceed the 15% MOE threshold. In some cases, there are so few observations that subgroup estimates are unstable or missing altogether. Together, these dynamics show that while the pattern may look random at the tract level, the reliability problem is systematically tied to the representation of minority populations, raising clear concerns for algorithmic decision-making.
Part 5: Policy Recommendations
5.1 Analysis Integration and Professional Summary
Executive Summary:
Across county- and tract-level analyses, two systematic patterns consistently appear. First, tracts and counties with smaller populations tend to have disproportionately high margins of error, making their estimates far less stable than those from larger areas. Second, the reliability of racial and ethnic subgroup estimates varies sharply: Black and Hispanic populations are much more likely to have margins of error above 15%, and in some cases, the ACS does not capture enough observations to produce valid estimates. Together, these patterns show that measurement error is pervasive but not random it reflects structural features of both tract size and demographic composition.
Communities facing the greatest risk of algorithmic bias are those that are either very small and rural or racially/ethnically diverse. Rural tracts, because of small sample sizes, may be flagged as unreliable and thus deprioritized in automated systems, despite having genuine needs. At the same time, urban minority communities, particularly those with large Hispanic or Black populations, often show the highest subgroup MOEs, meaning their conditions could be systematically misclassified or underestimated. In both cases, the communities already at risk of marginalization are the same ones where the data is least reliable.
The drivers of these problems are structural. In rural areas, small sample sizes inflate margins of error, while in diverse urban tracts, underrepresentation of minority subgroups disrupts the accuracy of need assessments. This underrepresentation is tied to long-standing stratification in data collection, where certain groups are less visible in surveys, and to socio-spatial self-selection, where minorities concentrate in particular neighborhoods that are often harder to measure with precision. These processes produce systematic biases: the very communities whose needs are greatest — low-income, minority, and geographically marginalized — are those most likely to be misrepresented in the data.
The Department should treat reliability as central to its algorithmic framework. Specifically, it should (a) adjust for MOE when prioritizing communities, so noisy estimates are not misclassified as real differences; (b) avoid strict cutoffs in low-confidence areas by using broader eligibility bands; (c) supplement ACS data with administrative or community-level sources in minority-dense neighborhoods where subgroup reliability is weakest; and (d) incorporate transparency and equity audits to ensure that stratification and data gaps do not reinforce existing inequalities. By embedding these safeguards, the Department can ensure its allocation strategies are both statistically sound and socially just.
6.3 Specific Recommendations
Your Task: Create a decision framework for algorithm implementation.
# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category
<- county_reliability %>%
recommendations select(
County = county_name,
`Median Income` = med_hh_incomeE,
`MOE %` = moe_percentage,
`Reliability Category` = Reliability
%>%
) mutate(
Recommendation = case_when(
`Reliability Category` == "High Confidence" ~ "Safe for algorithmic decisions",
`Reliability Category` == "Moderate Confidence" ~ "Use with caution – monitor outcomes",
`Reliability Category` == "Low Confidence" ~ "Requires manual review or additional data",
TRUE ~ NA_character_
)
)
# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"
# - Low Confidence: "Requires manual review or additional data"
# Format as a professional table with kable()
kable(
%>%
recommendations arrange(Recommendation, County),
caption = "Decision Framework for Algorithm Implementation (Arranged by Recommendation)",
col.names = c("County", "Median Income", "MOE %", "Reliability Category", "Recommendation"),
digits = 2,
format.args = list(big.mark = ",")
)
County | Median Income | MOE % | Reliability Category | Recommendation |
---|---|---|---|---|
Alpine | 101,125 | 17.25 | Low Confidence | Requires manual review or additional data |
Mono | 82,038 | 18.76 | Low Confidence | Requires manual review or additional data |
Plumas | 67,885 | 11.45 | Low Confidence | Requires manual review or additional data |
Sierra | 61,108 | 15.12 | Low Confidence | Requires manual review or additional data |
Trinity | 47,317 | 12.45 | Low Confidence | Requires manual review or additional data |
Alameda | 122,488 | 1.00 | High Confidence | Safe for algorithmic decisions |
Butte | 66,085 | 3.42 | High Confidence | Safe for algorithmic decisions |
Contra Costa | 120,020 | 1.25 | High Confidence | Safe for algorithmic decisions |
El Dorado | 99,246 | 3.36 | High Confidence | Safe for algorithmic decisions |
Fresno | 67,756 | 1.43 | High Confidence | Safe for algorithmic decisions |
Humboldt | 57,881 | 3.68 | High Confidence | Safe for algorithmic decisions |
Imperial | 53,847 | 4.11 | High Confidence | Safe for algorithmic decisions |
Kern | 63,883 | 2.07 | High Confidence | Safe for algorithmic decisions |
Kings | 68,540 | 3.29 | High Confidence | Safe for algorithmic decisions |
Lake | 56,259 | 4.34 | High Confidence | Safe for algorithmic decisions |
Los Angeles | 83,411 | 0.53 | High Confidence | Safe for algorithmic decisions |
Madera | 73,543 | 3.87 | High Confidence | Safe for algorithmic decisions |
Marin | 142,019 | 2.89 | High Confidence | Safe for algorithmic decisions |
Mendocino | 61,335 | 3.58 | High Confidence | Safe for algorithmic decisions |
Merced | 64,772 | 3.31 | High Confidence | Safe for algorithmic decisions |
Monterey | 91,043 | 2.09 | High Confidence | Safe for algorithmic decisions |
Napa | 105,809 | 2.82 | High Confidence | Safe for algorithmic decisions |
Nevada | 79,395 | 4.82 | High Confidence | Safe for algorithmic decisions |
Orange | 109,361 | 0.81 | High Confidence | Safe for algorithmic decisions |
Placer | 109,375 | 1.70 | High Confidence | Safe for algorithmic decisions |
Riverside | 84,505 | 1.26 | High Confidence | Safe for algorithmic decisions |
Sacramento | 84,010 | 0.97 | High Confidence | Safe for algorithmic decisions |
San Bernardino | 77,423 | 1.04 | High Confidence | Safe for algorithmic decisions |
San Diego | 96,974 | 1.02 | High Confidence | Safe for algorithmic decisions |
San Francisco | 136,689 | 1.43 | High Confidence | Safe for algorithmic decisions |
San Joaquin | 82,837 | 1.75 | High Confidence | Safe for algorithmic decisions |
San Luis Obispo | 90,158 | 2.56 | High Confidence | Safe for algorithmic decisions |
San Mateo | 149,907 | 1.75 | High Confidence | Safe for algorithmic decisions |
Santa Barbara | 92,332 | 2.05 | High Confidence | Safe for algorithmic decisions |
Santa Clara | 153,792 | 1.00 | High Confidence | Safe for algorithmic decisions |
Santa Cruz | 104,409 | 3.04 | High Confidence | Safe for algorithmic decisions |
Shasta | 68,347 | 3.63 | High Confidence | Safe for algorithmic decisions |
Siskiyou | 53,898 | 4.90 | High Confidence | Safe for algorithmic decisions |
Solano | 97,037 | 1.78 | High Confidence | Safe for algorithmic decisions |
Sonoma | 99,266 | 2.00 | High Confidence | Safe for algorithmic decisions |
Stanislaus | 74,872 | 1.83 | High Confidence | Safe for algorithmic decisions |
Sutter | 72,654 | 4.71 | High Confidence | Safe for algorithmic decisions |
Tulare | 64,474 | 2.31 | High Confidence | Safe for algorithmic decisions |
Ventura | 102,141 | 1.50 | High Confidence | Safe for algorithmic decisions |
Yolo | 85,097 | 2.74 | High Confidence | Safe for algorithmic decisions |
Yuba | 66,693 | 4.19 | High Confidence | Safe for algorithmic decisions |
Amador | 74,853 | 8.08 | Moderate Confidence | Use with caution – monitor outcomes |
Calaveras | 77,526 | 5.00 | Moderate Confidence | Use with caution – monitor outcomes |
Colusa | 69,619 | 8.25 | Moderate Confidence | Use with caution – monitor outcomes |
Del Norte | 61,149 | 7.16 | Moderate Confidence | Use with caution – monitor outcomes |
Glenn | 64,033 | 6.19 | Moderate Confidence | Use with caution – monitor outcomes |
Inyo | 63,417 | 8.60 | Moderate Confidence | Use with caution – monitor outcomes |
Lassen | 59,515 | 5.97 | Moderate Confidence | Use with caution – monitor outcomes |
Mariposa | 60,021 | 8.82 | Moderate Confidence | Use with caution – monitor outcomes |
Modoc | 54,962 | 9.80 | Moderate Confidence | Use with caution – monitor outcomes |
San Benito | 104,451 | 5.23 | Moderate Confidence | Use with caution – monitor outcomes |
Tehama | 59,029 | 6.95 | Moderate Confidence | Use with caution – monitor outcomes |
Tuolumne | 70,432 | 6.66 | Moderate Confidence | Use with caution – monitor outcomes |
Key Recommendations:
Your Task: Use your analysis results to provide specific guidance to the department.
Counties suitable for immediate algorithmic implementation: Alameda, Butte, Contra Costa, El Dorado, Fresno, Humboldt, Imperial, Kern, Kings, Lake, Los Angeles, Madera Marin, Mendocino, Merced, Monterey, Napa, Nevada, Orange, Placer, Riverside, Sacramento, San Bernardino, San Diego, San Francisco, San Joaquin, San Luis Obispo, San Mateo, Santa Barbara, Santa Clara, Santa Cruz, Shasta,Siskiyou, Solano, Sonoma, Stainislaus, Sutter, Tulare, Ventura, Yolo, and Yuba
Counties requiring additional oversight: Amador, Calaveras, Colusa, Del Norte, Glenn, Inyo, Lassen, Mariposa, Modoc, San Benito, Tehama, and Tuolumne
Counties needing alternative approaches: Alpine, Mono, Plumas, Sierra, and Trinity
Questions for Further Investigation
Are high-MOE tracts clustered spatially (e.g., along rural–urban boundaries or in specific regions of the state), or do they appear evenly dispersed?
Do MOE patterns persist across ACS releases, or do they improve over time with larger samples? A time-series comparison could reveal whether underrepresentation of minority or rural communities is a persistent structural issue, similar to how you track flood or disaster impacts across years.
How do MOE patterns for racial and ethnic groups vary across states? Are high MOEs for Hispanic and Black populations a uniquely California phenomenon, or do they reflect a broader national issue embedded in ACS sampling design?
Technical Notes
Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [date]
Reproducibility: - All analysis conducted in R version 4.5.1 - Census API key required for replication - Complete code and documentation available at: https://musa-5080-fall-2025.github.io/portfolio-setup-MohamadAlAbbas-PhD/
Methodology Notes: Margins of error (MOE) were standardized as percentages of the estimate, and counties were classified into High, Moderate, and Low Confidence categories using thresholds of <5%, 5–10%, and >10% respectively. Reliability flags at the tract level were set when any racial/ethnic subgroup estimate exceeded 15% MOE. No smoothing or imputation was applied to extreme or infinite MOE values; tracts with zero subgroup observations were retained as-is to reflect the raw survey limitations.
County codes were extracted directly from GEOID strings to facilitate joins, and descriptive statistics were calculated using simple group means. Data outputs were formatted using kable() for presentation, and no additional modeling or weighting adjustments were performed beyond what the ACS provides.
Limitations: Several limitations should be noted. The scale of our unit of analysis demonstrates that census tract-level analysis tends to carry high MOEs that makes interpretation and deployment of algorithmic solutions problematic rather it might be ideal to use county-level analysis. Second, subgroup estimates for racial and ethnic minorities often carried very high MOEs, and in some cases, no observations were available, producing infinite or undefined percentages. These issues were left unadjusted to remain consistent with assignment instructions but highlight important data reliability challenges.
Third, the analysis is limited to a single 5-year ACS period; no longitudinal comparison was made to assess whether MOE patterns persist or shift over time. Finally, aggregating tract-level characteristics to the county level masks within-county variability that may be relevant for equity considerations.
Submission Checklist
Before submitting your portfolio link on Canvas:
Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/assignments/assignment_1/your_file_name.html