Assignment 1: Census Data Quality for Policy Decisions

Evaluating Data Reliability for Algorithmic Decision-Making

Author

Mohamad Al Abbas

Published

September 28, 2025

Assignment Overview

Scenario

You are a data analyst for the California Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.

Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.

Learning Objectives

  • Apply dplyr functions to real census data for policy analysis
  • Evaluate data quality using margins of error
  • Connect technical analysis to algorithmic decision-making
  • Identify potential equity implications of data reliability issues
  • Create professional documentation for policy stakeholders

Part 1: Portfolio Integration

Setup

# Load required packages (hint: you need tidycensus, tidyverse, and knitr)

library(tidycensus)
library(tidyverse)
library(knitr)

# Set your Census API key

census_api_key("807ea1c0820a3e1e46dde3c53438622057fcc1ba")

# Choose your state for analysis - assign it to a variable called my_state

my_state <- "California"

State Selection: I have chosen California for this analysis because: I am currently on working on the wildfires that had occurred there with a few partners from UC: San Diego, so I know a bit about the state and pop density + I wish to visit it during the winter break :)!

Part 2: County-Level Resource Assessment

2.1 Data Retrieval

Your Task: Use get_acs() to retrieve county-level data for your chosen state.

Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide

Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.

# Write your get_acs() code here
county_vars <- c( med_hh_income = "B19013_001", total_pop     = "B01003_001")

county_raw <- get_acs(geography = "county", state =  my_state, survey = "acs5", year = 2022, variables = county_vars, output = "wide")

# Clean the county names to remove state name and "County" 
# Hint: use mutate() with str_remove()

county <- county_raw %>%
  mutate(
    county_name = str_remove(NAME, paste0(", ", my_state)),
    county_name = str_remove(county_name, " County$")
  ) %>%
  select(GEOID, county_name, med_hh_incomeE, med_hh_incomeM, total_popE, total_popM)

# Display the first few rows
head(county)
# A tibble: 6 × 6
  GEOID county_name med_hh_incomeE med_hh_incomeM total_popE total_popM
  <chr> <chr>                <dbl>          <dbl>      <dbl>      <dbl>
1 06001 Alameda             122488           1231    1663823         NA
2 06003 Alpine              101125          17442       1515        206
3 06005 Amador               74853           6048      40577         NA
4 06007 Butte                66085           2261     213605         NA
5 06009 Calaveras            77526           3875      45674         NA
6 06011 Colusa               69619           5745      21811         NA

2.2 Data Quality Assessment

Your Task: Calculate margin of error percentages and create reliability categories.

Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)

Hint: Use mutate() with case_when() for the categories.

# Calculate MOE percentage and reliability categories using mutate()
county_reliability <- county %>%
  mutate(
    moe_percentage = round((med_hh_incomeM/med_hh_incomeE) * 100, 2),
    Reliability = case_when(
      moe_percentage < 5 ~ "High Confidence",
      moe_percentage >= 5 & moe_percentage <= 10 ~ "Moderate Confidence",
      moe_percentage > 10 ~ "Low Confidence"
    )
  )

# Create a summary showing count of counties in each reliability category
# Hint: use count() and mutate() to add percentages

reliability_summary <- county_reliability %>%
  count(Reliability, name = "Count") %>%
  mutate(Proportion = round(100 * Count / sum(Count), 1))

kable(reliability_summary, caption = "County Income Reliability Categories")
County Income Reliability Categories
Reliability Count Proportion
High Confidence 41 70.7
Low Confidence 5 8.6
Moderate Confidence 12 20.7

2.3 High Uncertainty Counties

Your Task: Identify the 5 counties with the highest MOE percentages.

Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()

Hint: Use arrange(), slice(), and select() functions.

# Create table of top 5 counties by MOE percentage

top5 <- county_reliability %>%
  arrange(desc(moe_percentage)) %>%
  slice(1:5) %>%
  select(
    county_name,
    med_hh_incomeE,
    med_hh_incomeM,
    moe_percentage,
    Reliability
  )
  
# Format as table with kable() - include appropriate column names and caption

kable(
  top5,
  caption = "Top 5 Counties by Median Household Income MOE Percentage",
  col.names = c("County", "Median Income", "Margin of Error", "MOE %", "Reliability Category"),
  digits = 2,
  align = c("l", "c", "c", "c", "c"),
  format.args = list(big.mark = ",")
)
Top 5 Counties by Median Household Income MOE Percentage
County Median Income Margin of Error MOE % Reliability Category
Mono 82,038 15,388 18.76 Low Confidence
Alpine 101,125 17,442 17.25 Low Confidence
Sierra 61,108 9,237 15.12 Low Confidence
Trinity 47,317 5,890 12.45 Low Confidence
Plumas 67,885 7,772 11.45 Low Confidence

Data Quality Commentary:

All five of these counties are among the lowest-density areas in California. Because their populations are so small, the ACS relies on limited samples to generate median income estimates, which introduces greater variability. This explains the large disparities and the relatively high margins of error (11–19%). As a result, algorithms that classify or rank counties using these figures could produce erroneous outcomes if they neglect the margins of error. For example, Alpine County appears to have a median income exceeding $100,000, but its margin of error is more than $17,000 an uncertainty that is enormous relative to its ~1,000 residents. This is both a sampling size and representativeness issue, highlighting how misleading the raw point estimate can be without MOE context.

Part 3: Neighborhood-Level Analysis

3.1 Focus Area Selection

Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.

Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.

# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties

selected_counties <- bind_rows(
  county_reliability %>%
    filter(Reliability == "High Confidence") %>%
    slice(1),
  county_reliability %>%
    filter(Reliability == "Moderate Confidence") %>%
    slice(1),
  county_reliability %>%
    filter(Reliability == "Low Confidence") %>%
    slice(1),
) %>%
  select(
    County = county_name,
    `Median Income` = med_hh_incomeE,
    `Margin of Error` = med_hh_incomeM,
    `MOE %` = moe_percentage,
    Reliability = Reliability,
    Population = total_popE
  )

# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category
kable(
  selected_counties,
  caption = "Largest County by Population in Each Reliability Category",
  align = c("l", "r", "r", "r", "l", "r"),
  format.args = list(big.mark = ",")
)
Largest County by Population in Each Reliability Category
County Median Income Margin of Error MOE % Reliability Population
Alameda 122,488 1,231 1.00 High Confidence 1,663,823
Amador 74,853 6,048 8.08 Moderate Confidence 40,577
Alpine 101,125 17,442 17.25 Low Confidence 1,515

Comment on the output: Because I specified no randomness in how the slice is sampling the data across the reliability categories, it quite literally picked the first match it had. Which is why all three are arranged alphabetically. On the positive side we still have atleast 1 sample from each category and Alpine is still with us :)! The lowest county by density in California.

3.2 Tract-Level Demographics

Your Task: Get demographic data for census tracts in your selected counties.

Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.

# Define your race/ethnicity variables with descriptive names

race_vars <- c(
  total    = "B03002_001",
  white    = "B03002_003",
  black    = "B03002_004",
  hispanic = "B03002_012"
)


# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter

selected_geoid <- c("06001","06003","06005")  # Alameda, Alpine, Amador

county_codes <- stringr::str_sub(selected_geoid, 3, 5)  # -> "001","003","005"

tract_raw <- get_acs(
  geography = "tract",
  state     = my_state,
  county    = county_codes,
  survey    = "acs5",
  year      = 2022,
  variables = race_vars,
  output    = "wide"
)

# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations

tract_percent <- tract_raw %>%
  mutate(
    county_code  = substr(GEOID, 3, 5),
    pct_white    = 100 * (whiteE    / totalE),
    pct_black    = 100 * (blackE    / totalE),
    pct_hispanic = 100 * (hispanicE / totalE),
    # Make a readable tract label and remove state/country suffixes
    tract_label  = stringr::str_remove(NAME, paste0(", ", my_state)),
    tract_label  = stringr::str_remove(tract_label, ", United States$")
  ) %>%
  left_join(
    county %>%
      filter(GEOID %in% selected_geoid) %>%
      transmute(
        county_code = substr(GEOID, 3, 5),
        county_name
      ),
    by = "county_code"
  ) %>%
  select(
    GEOID, county_name, tract_label,
    totalE, whiteE, blackE, hispanicE,
    totalM, whiteM, blackM, hispanicM,
    pct_white, pct_black, pct_hispanic
  )

# Add readable tract and county name columns using str_extract() or similar

kable(
  tract_percent %>%
    select(
      County = county_name,
      Tract  = tract_label,
      `White (%)`    = pct_white,
      `Black (%)`    = pct_black,
      `Hispanic (%)` = pct_hispanic
    ) %>%
    arrange(County, Tract),
  caption = "Tract-Level Racial/Ethnic Composition in Selected Counties",
  align = c("l","l","r","r","r"),
  format.args = list(big.mark = ",", digits = 1)
)
Tract-Level Racial/Ethnic Composition in Selected Counties
County Tract White (%) Black (%) Hispanic (%)
Alameda Census Tract 4001; Alameda County; California 69 5.23 5
Alameda Census Tract 4002; Alameda County; California 70 1.77 7
Alameda Census Tract 4003; Alameda County; California 59 9.41 10
Alameda Census Tract 4004; Alameda County; California 64 11.66 7
Alameda Census Tract 4005; Alameda County; California 41 25.15 15
Alameda Census Tract 4006; Alameda County; California 44 26.19 7
Alameda Census Tract 4007; Alameda County; California 48 23.14 10
Alameda Census Tract 4008; Alameda County; California 51 14.83 9
Alameda Census Tract 4009; Alameda County; California 37 24.33 19
Alameda Census Tract 4010; Alameda County; California 33 23.55 22
Alameda Census Tract 4011; Alameda County; California 42 16.36 12
Alameda Census Tract 4012; Alameda County; California 57 7.98 13
Alameda Census Tract 4013; Alameda County; California 36 28.75 9
Alameda Census Tract 4014; Alameda County; California 26 31.82 22
Alameda Census Tract 4015; Alameda County; California 22 54.89 15
Alameda Census Tract 4016; Alameda County; California 19 32.79 29
Alameda Census Tract 4017; Alameda County; California 45 13.80 17
Alameda Census Tract 4018; Alameda County; California 24 37.98 27
Alameda Census Tract 4022; Alameda County; California 32 27.69 23
Alameda Census Tract 4024; Alameda County; California 13 54.59 9
Alameda Census Tract 4025; Alameda County; California 16 43.68 12
Alameda Census Tract 4026; Alameda County; California 9 38.46 6
Alameda Census Tract 4027; Alameda County; California 26 37.42 17
Alameda Census Tract 4028.01; Alameda County; California 27 43.36 5
Alameda Census Tract 4028.02; Alameda County; California 23 42.87 4
Alameda Census Tract 4029; Alameda County; California 20 22.32 10
Alameda Census Tract 4030; Alameda County; California 9 5.04 2
Alameda Census Tract 4031; Alameda County; California 40 16.64 12
Alameda Census Tract 4033.01; Alameda County; California 10 2.79 2
Alameda Census Tract 4033.02; Alameda County; California 52 4.81 6
Alameda Census Tract 4034.01; Alameda County; California 42 16.45 9
Alameda Census Tract 4034.02; Alameda County; California 43 15.17 12
Alameda Census Tract 4035.01; Alameda County; California 38 15.09 8
Alameda Census Tract 4035.02; Alameda County; California 49 11.52 17
Alameda Census Tract 4036; Alameda County; California 38 23.22 17
Alameda Census Tract 4037.01; Alameda County; California 42 21.73 10
Alameda Census Tract 4037.02; Alameda County; California 55 16.49 11
Alameda Census Tract 4038; Alameda County; California 62 15.12 7
Alameda Census Tract 4039; Alameda County; California 63 9.27 11
Alameda Census Tract 4040; Alameda County; California 51 11.99 20
Alameda Census Tract 4041.01; Alameda County; California 74 8.39 4
Alameda Census Tract 4041.02; Alameda County; California 53 5.74 11
Alameda Census Tract 4042; Alameda County; California 67 4.93 7
Alameda Census Tract 4043; Alameda County; California 75 1.28 6
Alameda Census Tract 4044; Alameda County; California 59 2.44 3
Alameda Census Tract 4045.01; Alameda County; California 66 5.16 6
Alameda Census Tract 4045.02; Alameda County; California 66 5.78 5
Alameda Census Tract 4046; Alameda County; California 58 5.33 4
Alameda Census Tract 4047; Alameda County; California 68 4.57 9
Alameda Census Tract 4048; Alameda County; California 53 11.48 10
Alameda Census Tract 4049; Alameda County; California 54 6.06 16
Alameda Census Tract 4050; Alameda County; California 57 13.45 10
Alameda Census Tract 4051; Alameda County; California 66 5.66 7
Alameda Census Tract 4052; Alameda County; California 34 13.84 6
Alameda Census Tract 4053.01; Alameda County; California 49 13.93 17
Alameda Census Tract 4053.02; Alameda County; California 34 23.29 6
Alameda Census Tract 4054.01; Alameda County; California 25 12.54 24
Alameda Census Tract 4054.02; Alameda County; California 20 15.27 23
Alameda Census Tract 4055; Alameda County; California 21 19.51 15
Alameda Census Tract 4056; Alameda County; California 31 17.21 22
Alameda Census Tract 4057; Alameda County; California 15 27.97 25
Alameda Census Tract 4058; Alameda County; California 12 20.33 21
Alameda Census Tract 4059.01; Alameda County; California 5 17.11 34
Alameda Census Tract 4059.02; Alameda County; California 12 7.46 29
Alameda Census Tract 4060; Alameda County; California 18 19.22 20
Alameda Census Tract 4061; Alameda County; California 18 8.87 47
Alameda Census Tract 4062.01; Alameda County; California 6 13.70 48
Alameda Census Tract 4062.02; Alameda County; California 7 12.51 62
Alameda Census Tract 4063; Alameda County; California 11 19.32 35
Alameda Census Tract 4064; Alameda County; California 23 16.55 31
Alameda Census Tract 4065; Alameda County; California 12 13.26 49
Alameda Census Tract 4066.01; Alameda County; California 23 28.74 24
Alameda Census Tract 4066.02; Alameda County; California 14 22.32 34
Alameda Census Tract 4067; Alameda County; California 46 6.80 17
Alameda Census Tract 4068; Alameda County; California 28 20.98 24
Alameda Census Tract 4069; Alameda County; California 40 16.97 8
Alameda Census Tract 4070; Alameda County; California 18 15.64 36
Alameda Census Tract 4071.01; Alameda County; California 12 18.48 40
Alameda Census Tract 4071.02; Alameda County; California 7 23.69 37
Alameda Census Tract 4072; Alameda County; California 14 3.22 71
Alameda Census Tract 4073; Alameda County; California 11 8.42 65
Alameda Census Tract 4074; Alameda County; California 6 36.49 44
Alameda Census Tract 4075; Alameda County; California 7 22.82 48
Alameda Census Tract 4076; Alameda County; California 25 25.03 34
Alameda Census Tract 4077; Alameda County; California 45 21.17 18
Alameda Census Tract 4078; Alameda County; California 36 21.74 24
Alameda Census Tract 4079; Alameda County; California 50 12.89 13
Alameda Census Tract 4080; Alameda County; California 61 8.22 11
Alameda Census Tract 4081; Alameda County; California 37 25.00 7
Alameda Census Tract 4082; Alameda County; California 17 40.51 29
Alameda Census Tract 4083; Alameda County; California 22 38.13 21
Alameda Census Tract 4084; Alameda County; California 4 39.49 47
Alameda Census Tract 4085; Alameda County; California 3 39.33 49
Alameda Census Tract 4086; Alameda County; California 6 43.23 41
Alameda Census Tract 4087; Alameda County; California 9 32.38 51
Alameda Census Tract 4088; Alameda County; California 4 34.17 46
Alameda Census Tract 4089; Alameda County; California 4 21.13 68
Alameda Census Tract 4090; Alameda County; California 3 29.31 59
Alameda Census Tract 4091; Alameda County; California 10 20.38 62
Alameda Census Tract 4092; Alameda County; California 1 27.30 62
Alameda Census Tract 4093; Alameda County; California 2 23.63 66
Alameda Census Tract 4094; Alameda County; California 5 18.07 66
Alameda Census Tract 4095; Alameda County; California 12 17.69 64
Alameda Census Tract 4096; Alameda County; California 3 24.27 70
Alameda Census Tract 4097; Alameda County; California 8 35.78 36
Alameda Census Tract 4098; Alameda County; California 18 50.19 15
Alameda Census Tract 4099; Alameda County; California 23 41.05 13
Alameda Census Tract 4100; Alameda County; California 33 47.28 5
Alameda Census Tract 4101; Alameda County; California 10 51.86 18
Alameda Census Tract 4102; Alameda County; California 11 44.22 37
Alameda Census Tract 4103; Alameda County; California 2 16.49 78
Alameda Census Tract 4104; Alameda County; California 5 31.26 52
Alameda Census Tract 4105; Alameda County; California 21 58.88 12
Alameda Census Tract 4201; Alameda County; California 54 4.64 13
Alameda Census Tract 4202; Alameda County; California 35 5.65 5
Alameda Census Tract 4203.01; Alameda County; California 38 2.42 9
Alameda Census Tract 4203.02; Alameda County; California 35 11.73 7
Alameda Census Tract 4204.01; Alameda County; California 32 0.00 24
Alameda Census Tract 4204.02; Alameda County; California 32 3.13 28
Alameda Census Tract 4205; Alameda County; California 49 2.24 16
Alameda Census Tract 4206; Alameda County; California 64 1.03 8
Alameda Census Tract 4211; Alameda County; California 77 1.66 5
Alameda Census Tract 4212; Alameda County; California 70 0.67 2
Alameda Census Tract 4213; Alameda County; California 72 0.17 3
Alameda Census Tract 4214; Alameda County; California 78 1.89 9
Alameda Census Tract 4215; Alameda County; California 81 1.60 5
Alameda Census Tract 4216; Alameda County; California 71 2.41 4
Alameda Census Tract 4217; Alameda County; California 65 4.09 6
Alameda Census Tract 4218; Alameda County; California 70 0.88 5
Alameda Census Tract 4219; Alameda County; California 50 13.26 7
Alameda Census Tract 4220; Alameda County; California 49 4.85 7
Alameda Census Tract 4221; Alameda County; California 47 11.69 20
Alameda Census Tract 4222; Alameda County; California 63 2.53 7
Alameda Census Tract 4223; Alameda County; California 53 12.45 7
Alameda Census Tract 4224; Alameda County; California 38 3.20 12
Alameda Census Tract 4225; Alameda County; California 45 4.10 13
Alameda Census Tract 4227; Alameda County; California 36 3.87 17
Alameda Census Tract 4228; Alameda County; California 28 5.73 19
Alameda Census Tract 4229.01; Alameda County; California 41 0.00 3
Alameda Census Tract 4229.02; Alameda County; California 30 6.72 7
Alameda Census Tract 4230; Alameda County; California 57 7.54 8
Alameda Census Tract 4231; Alameda County; California 41 23.74 10
Alameda Census Tract 4232; Alameda County; California 39 17.55 33
Alameda Census Tract 4233; Alameda County; California 46 19.85 17
Alameda Census Tract 4234; Alameda County; California 47 13.28 22
Alameda Census Tract 4235; Alameda County; California 55 10.08 17
Alameda Census Tract 4236.01; Alameda County; California 63 3.73 13
Alameda Census Tract 4236.02; Alameda County; California 47 1.50 14
Alameda Census Tract 4237; Alameda County; California 60 2.38 17
Alameda Census Tract 4238; Alameda County; California 73 1.08 6
Alameda Census Tract 4239.01; Alameda County; California 52 12.67 20
Alameda Census Tract 4239.02; Alameda County; California 69 5.95 7
Alameda Census Tract 4240.01; Alameda County; California 46 18.12 16
Alameda Census Tract 4240.02; Alameda County; California 38 31.17 17
Alameda Census Tract 4251.01; Alameda County; California 45 5.93 11
Alameda Census Tract 4251.02; Alameda County; California 29 12.77 10
Alameda Census Tract 4251.03; Alameda County; California 31 26.62 7
Alameda Census Tract 4251.04; Alameda County; California 42 15.55 10
Alameda Census Tract 4261; Alameda County; California 68 0.00 2
Alameda Census Tract 4262; Alameda County; California 68 1.44 6
Alameda Census Tract 4271; Alameda County; California 58 1.21 15
Alameda Census Tract 4272; Alameda County; California 33 3.31 17
Alameda Census Tract 4273; Alameda County; California 32 16.26 8
Alameda Census Tract 4276; Alameda County; California 29 15.57 15
Alameda Census Tract 4277; Alameda County; California 61 2.14 11
Alameda Census Tract 4278; Alameda County; California 51 4.38 9
Alameda Census Tract 4279; Alameda County; California 52 0.66 14
Alameda Census Tract 4280; Alameda County; California 32 12.04 14
Alameda Census Tract 4281; Alameda County; California 51 6.82 10
Alameda Census Tract 4282; Alameda County; California 51 3.32 13
Alameda Census Tract 4283.01; Alameda County; California 26 5.11 9
Alameda Census Tract 4283.02; Alameda County; California 44 0.87 7
Alameda Census Tract 4284; Alameda County; California 37 7.21 16
Alameda Census Tract 4285; Alameda County; California 42 10.93 16
Alameda Census Tract 4286; Alameda County; California 42 5.97 11
Alameda Census Tract 4287; Alameda County; California 23 17.70 13
Alameda Census Tract 4301.01; Alameda County; California 34 2.93 10
Alameda Census Tract 4301.02; Alameda County; California 53 0.89 14
Alameda Census Tract 4302; Alameda County; California 51 3.45 10
Alameda Census Tract 4303; Alameda County; California 44 0.49 24
Alameda Census Tract 4304; Alameda County; California 47 4.27 7
Alameda Census Tract 4305; Alameda County; California 24 37.25 16
Alameda Census Tract 4306; Alameda County; California 39 3.92 16
Alameda Census Tract 4307; Alameda County; California 39 4.97 18
Alameda Census Tract 4308; Alameda County; California 43 0.84 16
Alameda Census Tract 4309; Alameda County; California 28 6.48 28
Alameda Census Tract 4310; Alameda County; California 27 16.31 16
Alameda Census Tract 4311; Alameda County; California 26 27.71 26
Alameda Census Tract 4312; Alameda County; California 36 14.15 28
Alameda Census Tract 4321; Alameda County; California 32 24.38 25
Alameda Census Tract 4322; Alameda County; California 27 9.76 37
Alameda Census Tract 4323; Alameda County; California 25 12.60 30
Alameda Census Tract 4324; Alameda County; California 13 4.41 55
Alameda Census Tract 4325.01; Alameda County; California 20 3.74 28
Alameda Census Tract 4325.02; Alameda County; California 11 19.37 30
Alameda Census Tract 4326.01; Alameda County; California 24 22.17 25
Alameda Census Tract 4326.02; Alameda County; California 14 25.30 30
Alameda Census Tract 4327; Alameda County; California 52 4.90 21
Alameda Census Tract 4328; Alameda County; California 37 6.84 14
Alameda Census Tract 4330; Alameda County; California 36 8.13 16
Alameda Census Tract 4331.02; Alameda County; California 7 6.95 29
Alameda Census Tract 4331.03; Alameda County; California 13 11.43 40
Alameda Census Tract 4331.04; Alameda County; California 24 17.42 38
Alameda Census Tract 4332; Alameda County; California 10 9.73 33
Alameda Census Tract 4333; Alameda County; California 18 0.50 27
Alameda Census Tract 4334; Alameda County; California 13 8.51 8
Alameda Census Tract 4335; Alameda County; California 25 2.92 21
Alameda Census Tract 4336; Alameda County; California 28 5.07 20
Alameda Census Tract 4337; Alameda County; California 13 2.72 52
Alameda Census Tract 4338.01; Alameda County; California 9 13.38 47
Alameda Census Tract 4338.02; Alameda County; California 8 12.58 21
Alameda Census Tract 4339; Alameda County; California 9 26.33 43
Alameda Census Tract 4340; Alameda County; California 18 13.14 45
Alameda Census Tract 4351.02; Alameda County; California 25 14.47 27
Alameda Census Tract 4351.03; Alameda County; California 35 7.40 6
Alameda Census Tract 4351.04; Alameda County; California 15 7.38 31
Alameda Census Tract 4352; Alameda County; California 23 22.36 27
Alameda Census Tract 4353; Alameda County; California 22 15.13 34
Alameda Census Tract 4354; Alameda County; California 23 15.79 33
Alameda Census Tract 4355; Alameda County; California 26 12.57 48
Alameda Census Tract 4356.01; Alameda County; California 12 10.39 57
Alameda Census Tract 4356.02; Alameda County; California 20 13.45 52
Alameda Census Tract 4357; Alameda County; California 21 2.43 50
Alameda Census Tract 4358; Alameda County; California 21 4.50 38
Alameda Census Tract 4359; Alameda County; California 29 0.66 25
Alameda Census Tract 4360; Alameda County; California 27 0.61 40
Alameda Census Tract 4361; Alameda County; California 17 4.35 32
Alameda Census Tract 4362; Alameda County; California 11 12.79 64
Alameda Census Tract 4363.01; Alameda County; California 6 10.75 46
Alameda Census Tract 4363.02; Alameda County; California 17 11.72 46
Alameda Census Tract 4364.02; Alameda County; California 40 13.76 24
Alameda Census Tract 4364.03; Alameda County; California 27 9.38 26
Alameda Census Tract 4364.04; Alameda County; California 45 11.29 21
Alameda Census Tract 4365; Alameda County; California 13 9.60 49
Alameda Census Tract 4366.01; Alameda County; California 11 10.61 58
Alameda Census Tract 4366.02; Alameda County; California 7 6.17 53
Alameda Census Tract 4367; Alameda County; California 11 13.31 48
Alameda Census Tract 4368; Alameda County; California 11 5.93 49
Alameda Census Tract 4369; Alameda County; California 12 6.97 58
Alameda Census Tract 4370; Alameda County; California 22 5.79 38
Alameda Census Tract 4371.01; Alameda County; California 10 6.27 27
Alameda Census Tract 4371.02; Alameda County; California 10 6.91 42
Alameda Census Tract 4372; Alameda County; California 11 6.43 32
Alameda Census Tract 4373; Alameda County; California 13 12.22 34
Alameda Census Tract 4374; Alameda County; California 15 3.75 52
Alameda Census Tract 4375; Alameda County; California 11 5.90 56
Alameda Census Tract 4376; Alameda County; California 11 9.46 31
Alameda Census Tract 4377.01; Alameda County; California 10 15.62 50
Alameda Census Tract 4377.02; Alameda County; California 6 1.91 85
Alameda Census Tract 4378; Alameda County; California 15 7.69 37
Alameda Census Tract 4379; Alameda County; California 12 11.32 41
Alameda Census Tract 4380; Alameda County; California 23 11.47 23
Alameda Census Tract 4381; Alameda County; California 12 6.31 37
Alameda Census Tract 4382.01; Alameda County; California 7 5.62 56
Alameda Census Tract 4382.03; Alameda County; California 25 3.62 21
Alameda Census Tract 4382.04; Alameda County; California 9 4.03 36
Alameda Census Tract 4383; Alameda County; California 6 6.46 37
Alameda Census Tract 4384; Alameda County; California 16 10.86 24
Alameda Census Tract 4401; Alameda County; California 40 5.19 21
Alameda Census Tract 4402; Alameda County; California 3 0.46 70
Alameda Census Tract 4403.01; Alameda County; California 29 5.74 34
Alameda Census Tract 4403.04; Alameda County; California 10 7.94 12
Alameda Census Tract 4403.05; Alameda County; California 21 1.28 14
Alameda Census Tract 4403.06; Alameda County; California 9 5.89 11
Alameda Census Tract 4403.07; Alameda County; California 14 5.78 21
Alameda Census Tract 4403.08; Alameda County; California 13 4.11 26
Alameda Census Tract 4403.31; Alameda County; California 12 3.79 16
Alameda Census Tract 4403.32; Alameda County; California 8 1.56 8
Alameda Census Tract 4403.33; Alameda County; California 7 0.81 5
Alameda Census Tract 4403.34; Alameda County; California 11 5.77 13
Alameda Census Tract 4403.36; Alameda County; California 15 12.57 10
Alameda Census Tract 4403.37; Alameda County; California 6 4.91 9
Alameda Census Tract 4403.38; Alameda County; California 22 0.49 9
Alameda Census Tract 4411; Alameda County; California 44 0.04 16
Alameda Census Tract 4412; Alameda County; California 32 2.60 12
Alameda Census Tract 4413.01; Alameda County; California 20 10.90 7
Alameda Census Tract 4413.02; Alameda County; California 16 2.71 11
Alameda Census Tract 4414.01; Alameda County; California 18 1.15 10
Alameda Census Tract 4414.02; Alameda County; California 18 1.27 7
Alameda Census Tract 4415.01; Alameda County; California 8 5.01 5
Alameda Census Tract 4415.03; Alameda County; California 7 0.24 4
Alameda Census Tract 4415.21; Alameda County; California 11 0.25 6
Alameda Census Tract 4415.22; Alameda County; California 18 2.62 7
Alameda Census Tract 4415.23; Alameda County; California 11 4.20 7
Alameda Census Tract 4415.24; Alameda County; California 5 0.36 1
Alameda Census Tract 4415.25; Alameda County; California 7 3.49 11
Alameda Census Tract 4416.01; Alameda County; California 31 8.17 14
Alameda Census Tract 4416.02; Alameda County; California 28 8.46 23
Alameda Census Tract 4417.01; Alameda County; California 12 0.46 20
Alameda Census Tract 4417.02; Alameda County; California 19 10.01 16
Alameda Census Tract 4418; Alameda County; California 32 1.77 6
Alameda Census Tract 4419.21; Alameda County; California 19 0.31 24
Alameda Census Tract 4419.23; Alameda County; California 12 3.41 13
Alameda Census Tract 4419.24; Alameda County; California 17 1.51 10
Alameda Census Tract 4419.26; Alameda County; California 13 3.25 25
Alameda Census Tract 4419.27; Alameda County; California 19 3.74 10
Alameda Census Tract 4419.28; Alameda County; California 17 13.20 8
Alameda Census Tract 4419.29; Alameda County; California 21 0.85 8
Alameda Census Tract 4420; Alameda County; California 15 0.13 10
Alameda Census Tract 4421; Alameda County; California 9 1.60 1
Alameda Census Tract 4422; Alameda County; California 13 2.43 6
Alameda Census Tract 4423.01; Alameda County; California 20 2.37 15
Alameda Census Tract 4423.02; Alameda County; California 15 5.66 15
Alameda Census Tract 4424; Alameda County; California 25 1.75 25
Alameda Census Tract 4425.01; Alameda County; California 13 3.17 28
Alameda Census Tract 4425.02; Alameda County; California 17 4.24 30
Alameda Census Tract 4426.01; Alameda County; California 28 3.47 28
Alameda Census Tract 4426.02; Alameda County; California 29 10.38 17
Alameda Census Tract 4427; Alameda County; California 30 1.05 11
Alameda Census Tract 4428; Alameda County; California 24 0.74 16
Alameda Census Tract 4429; Alameda County; California 15 5.15 13
Alameda Census Tract 4430.01; Alameda County; California 18 4.89 32
Alameda Census Tract 4430.02; Alameda County; California 16 1.74 17
Alameda Census Tract 4431.02; Alameda County; California 11 0.00 6
Alameda Census Tract 4431.03; Alameda County; California 19 1.21 3
Alameda Census Tract 4431.04; Alameda County; California 14 6.27 2
Alameda Census Tract 4431.05; Alameda County; California 10 0.43 2
Alameda Census Tract 4432; Alameda County; California 14 0.30 1
Alameda Census Tract 4433.01; Alameda County; California 18 1.02 9
Alameda Census Tract 4433.21; Alameda County; California 3 1.69 7
Alameda Census Tract 4433.22; Alameda County; California 19 1.09 7
Alameda Census Tract 4441; Alameda County; California 29 6.61 25
Alameda Census Tract 4442; Alameda County; California 21 1.34 30
Alameda Census Tract 4443.01; Alameda County; California 29 0.76 26
Alameda Census Tract 4443.03; Alameda County; California NaN NaN NaN
Alameda Census Tract 4443.04; Alameda County; California 12 0.81 32
Alameda Census Tract 4444; Alameda County; California 15 2.87 54
Alameda Census Tract 4445; Alameda County; California 18 4.61 44
Alameda Census Tract 4446.01; Alameda County; California 17 1.20 18
Alameda Census Tract 4446.02; Alameda County; California 15 5.82 10
Alameda Census Tract 4501.01; Alameda County; California 23 2.96 8
Alameda Census Tract 4501.02; Alameda County; California 19 14.83 14
Alameda Census Tract 4502; Alameda County; California 40 3.57 9
Alameda Census Tract 4503; Alameda County; California 47 6.27 15
Alameda Census Tract 4504; Alameda County; California 35 3.79 19
Alameda Census Tract 4505.01; Alameda County; California 54 0.00 13
Alameda Census Tract 4505.02; Alameda County; California 42 0.65 9
Alameda Census Tract 4506.01; Alameda County; California 47 0.74 7
Alameda Census Tract 4506.03; Alameda County; California 42 0.43 15
Alameda Census Tract 4506.04; Alameda County; California 53 0.44 13
Alameda Census Tract 4506.05; Alameda County; California 49 0.26 9
Alameda Census Tract 4506.06; Alameda County; California 50 0.00 5
Alameda Census Tract 4506.07; Alameda County; California 33 1.72 15
Alameda Census Tract 4506.08; Alameda County; California 41 1.67 8
Alameda Census Tract 4506.09; Alameda County; California 48 2.03 14
Alameda Census Tract 4507.01; Alameda County; California 45 1.66 7
Alameda Census Tract 4507.41; Alameda County; California 45 1.18 21
Alameda Census Tract 4507.42; Alameda County; California 49 0.18 9
Alameda Census Tract 4507.43; Alameda County; California 20 7.91 13
Alameda Census Tract 4507.44; Alameda County; California 47 0.00 13
Alameda Census Tract 4507.45; Alameda County; California 33 0.44 7
Alameda Census Tract 4507.46; Alameda County; California 47 0.43 19
Alameda Census Tract 4507.50; Alameda County; California 20 3.56 7
Alameda Census Tract 4507.51; Alameda County; California 13 2.69 5
Alameda Census Tract 4507.52; Alameda County; California 14 2.37 7
Alameda Census Tract 4511.02; Alameda County; California 74 0.51 7
Alameda Census Tract 4511.03; Alameda County; California 86 1.76 12
Alameda Census Tract 4511.04; Alameda County; California 54 0.04 18
Alameda Census Tract 4512.01; Alameda County; California 48 1.64 25
Alameda Census Tract 4512.02; Alameda County; California 45 0.93 17
Alameda Census Tract 4513; Alameda County; California 54 1.22 27
Alameda Census Tract 4514.01; Alameda County; California 39 5.35 38
Alameda Census Tract 4514.03; Alameda County; California 58 1.60 23
Alameda Census Tract 4514.04; Alameda County; California 32 1.06 59
Alameda Census Tract 4515.01; Alameda County; California 59 5.07 14
Alameda Census Tract 4515.03; Alameda County; California 51 0.59 20
Alameda Census Tract 4515.04; Alameda County; California 46 0.00 27
Alameda Census Tract 4515.05; Alameda County; California 66 1.01 13
Alameda Census Tract 4515.06; Alameda County; California 37 3.97 36
Alameda Census Tract 4516.01; Alameda County; California 70 0.00 9
Alameda Census Tract 4516.02; Alameda County; California 65 2.20 19
Alameda Census Tract 4517.01; Alameda County; California 54 4.26 13
Alameda Census Tract 4517.03; Alameda County; California 61 0.73 13
Alameda Census Tract 4517.04; Alameda County; California 69 0.23 12
Alameda Census Tract 9819; Alameda County; California 60 0.00 40
Alameda Census Tract 9820; Alameda County; California 50 10.00 0
Alameda Census Tract 9821; Alameda County; California 28 8.88 18
Alameda Census Tract 9832; Alameda County; California 54 13.84 5
Alameda Census Tract 9900; Alameda County; California NaN NaN NaN
Alpine Census Tract 100; Alpine County; California 58 0.00 14
Amador Census Tract 1.01; Amador County; California 80 0.44 13
Amador Census Tract 1.02; Amador County; California 85 0.32 8
Amador Census Tract 2.01; Amador County; California 86 0.72 10
Amador Census Tract 2.02; Amador County; California 73 2.30 14
Amador Census Tract 3.01; Amador County; California 46 9.89 36
Amador Census Tract 3.03; Amador County; California 80 0.09 9
Amador Census Tract 3.04; Amador County; California 78 0.80 10
Amador Census Tract 4.01; Amador County; California 79 0.70 8
Amador Census Tract 4.02; Amador County; California 78 0.17 16
Amador Census Tract 5; Amador County; California 72 0.09 24

3.3 Demographic Analysis

Your Task: Analyze the demographic patterns in your selected areas.

# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract

top_hispanic_tract <- tract_percent %>%
  arrange(desc(pct_hispanic)) %>%
  slice(1) %>%
  select(GEOID, tract_label, county_name, pct_hispanic)

kable(top_hispanic_tract, caption = "Tract with Highest % Hispanic/Latino")
Tract with Highest % Hispanic/Latino
GEOID tract_label county_name pct_hispanic
06001437702 Census Tract 4377.02; Alameda County; California Alameda 85
# Calculate average demographics by county using group_by() and summarize()
# Show: number of tracts, average percentage for each racial/ethnic group

county_demo_avgs <- tract_percent %>%
  group_by(county_name) %>%
  summarise(
    "Number of Tracts" = n(),
    "Average White Percentage" = mean(pct_white, na.rm = TRUE),
    "Average Black Percentage" = mean(pct_black, na.rm = TRUE),
    "Average Hispanic Percentage"  = mean(pct_hispanic, na.rm = TRUE)
  )

# Create a nicely formatted table of your results using kable()

kable(
  county_demo_avgs,
  caption = "Average Tract Demographics by County",
  digits = 1,
  align = c("l","c","c","c","c"),
  col.names = c("County Names", "Number of Tracts", "Average White Percentage","Average Black Percentage","Average Hispanic Percentage")
)
Average Tract Demographics by County
County Names Number of Tracts Average White Percentage Average Black Percentage Average Hispanic Percentage
Alameda 379 31.0 10.7 21.4
Alpine 1 58.1 0.0 14.1
Amador 10 75.7 1.6 14.9

Part 4: Comprehensive Data Quality Evaluation

4.1 MOE Analysis for Demographic Variables

Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.

Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics

# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)

# Create a flag for tracts with high MOE on any demographic variable
# Use logical operators (| for OR) in an ifelse() statement

county_codes_state <- unique(stringr::str_sub(county$GEOID, 3, 5))

race_vars <- c(
  total    = "B03002_001",
  white    = "B03002_003",
  black    = "B03002_004",
  hispanic = "B03002_012"
)

tract_state_raw <- get_acs(
  geography = "tract",
  state     = my_state,
  county    = county_codes_state,  # all counties in the state
  survey    = "acs5",
  year      = 2022,
  variables = race_vars,
  output    = "wide"
)

tract_state_percent <- tract_state_raw %>%
  mutate(
    county_code  = substr(GEOID, 3, 5),
    pct_white    = 100 * (whiteE    / totalE),
    pct_black    = 100 * (blackE    / totalE),
    pct_hispanic = 100 * (hispanicE / totalE),
    tract_label  = stringr::str_remove(NAME, paste0(", ", my_state)),
    tract_label  = stringr::str_remove(tract_label, ", United States$")
  ) %>%
  left_join(
    county %>%
      transmute(
        county_code = substr(GEOID, 3, 5),
        county_name
      ),
    by = "county_code"
  )

tract_quality <- tract_state_percent %>%
  mutate(
    moe_total_pct    = 100 * (totalM    / totalE),
    moe_white_pct    = 100 * (whiteM    / whiteE),
    moe_black_pct    = 100 * (blackM    / blackE),
    moe_hispanic_pct = 100 * (hispanicM / hispanicE),
    high_moe_flag = (moe_white_pct > 15) | (moe_black_pct > 15) | (moe_hispanic_pct > 15)
  )

tract_quality_summary <- tract_quality %>%
  summarise(
    tracts_total     = n(),
    tracts_high_moe  = sum(high_moe_flag, na.rm = TRUE),
    percent_high_moe = round(100 * tracts_high_moe / tracts_total, 1)
  )

kable(
  tract_quality_summary,
  caption   = "Tract-Level High-MOE Summary (>15% on any demographic variable) — Statewide",
  col.names = c("Total Tracts", "High-MOE Tracts", "Percent High-MOE (%)"),
  align     = "c"
)
Tract-Level High-MOE Summary (>15% on any demographic variable) — Statewide
Total Tracts High-MOE Tracts Percent High-MOE (%)
9129 9123 99.9

4.2 Pattern Analysis

Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.

# Group tracts by whether they have high MOE issues
# Calculate average characteristics for each group:
# - population size, demographic percentages

pattern_table <- tract_quality %>%
  group_by(high_moe_flag, county_name) %>%
  summarise(
    tracts         = n(),
    avg_pop        = mean(totalE, na.rm = TRUE),
    avg_pct_white  = mean(pct_white, na.rm = TRUE),
    avg_pct_black  = mean(pct_black, na.rm = TRUE),
    avg_pct_hispanic = mean(pct_hispanic, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(high_moe_flag), county_name)

kable(
  pattern_table,
  caption = "Characteristics by High-MOE Status (Any Demographic Variable > 15% MOE) — Statewide",
  col.names = c(
    "High-MOE (Any >15%)",
    "County",
    "Tracts",
    "Avg Pop",
    "Avg White (%)",
    "Avg Black (%)",
    "Avg Hispanic (%)"
  ),
  align = c("c","l","r","r","r","r","r"),
  digits = c(NA, NA, 0, 0, 1, 1, 1),
  format.args = list(big.mark = ",")
)
Characteristics by High-MOE Status (Any Demographic Variable > 15% MOE) — Statewide
High-MOE (Any >15%) County Tracts Avg Pop Avg White (%) Avg Black (%) Avg Hispanic (%)
TRUE Alameda 379 4,390 31.0 10.7 21.4
TRUE Alpine 1 1,515 58.1 0.0 14.1
TRUE Amador 10 4,058 75.7 1.6 14.9
TRUE Butte 54 3,956 69.3 1.5 17.4
TRUE Calaveras 14 3,262 81.0 0.9 11.6
TRUE Colusa 6 3,635 34.0 1.6 60.5
TRUE Contra Costa 242 4,804 42.6 8.0 25.1
TRUE Del Norte 9 3,051 59.5 2.2 19.6
TRUE El Dorado 55 3,486 76.0 0.6 13.8
TRUE Fresno 225 4,481 28.4 4.1 54.1
TRUE Glenn 8 3,582 54.0 0.3 39.2
TRUE Humboldt 36 3,781 72.1 1.3 11.6
TRUE Imperial 40 4,489 11.5 2.6 82.2
TRUE Inyo 6 3,138 62.1 0.8 23.2
TRUE Kern 236 3,843 33.1 4.7 54.0
TRUE Kings 30 4,830 30.0 5.0 57.6
TRUE Lake 21 3,239 69.3 2.5 20.0
TRUE Lassen 8 3,020 75.0 2.8 12.0
TRUE Los Angeles 2,497 3,976 26.3 7.6 47.6
TRUE Madera 33 4,552 33.9 2.0 58.3
TRUE Marin 63 4,135 69.2 2.5 16.5
TRUE Mariposa 6 2,855 76.8 0.9 13.4
TRUE Mendocino 24 3,798 64.6 0.5 24.9
TRUE Merced 63 4,481 25.4 2.8 62.1
TRUE Modoc 4 2,163 76.6 1.4 15.1
TRUE Mono 4 3,305 64.1 0.2 27.8
TRUE Monterey 104 4,208 35.2 2.0 52.6
TRUE Napa 40 3,435 54.6 2.1 31.7
TRUE Nevada 26 3,935 83.4 0.3 9.6
TRUE Orange 614 5,171 41.3 1.5 32.4
TRUE Placer 92 4,420 70.9 1.4 14.5
TRUE Plumas 7 2,807 85.2 0.6 8.6
TRUE Riverside 518 4,690 34.6 5.7 49.9
TRUE Sacramento 363 4,350 43.2 9.1 23.8
TRUE San Benito 12 5,396 32.8 0.8 59.5
TRUE San Bernardino 465 4,682 28.5 7.1 53.3
TRUE San Diego 737 4,464 45.5 4.4 33.3
TRUE San Francisco 244 3,488 39.5 5.1 15.1
TRUE San Joaquin 174 4,480 29.8 6.7 43.6
TRUE San Luis Obispo 70 4,024 67.1 1.3 23.2
TRUE San Mateo 174 4,335 37.9 2.1 23.5
TRUE Santa Barbara 109 4,085 46.0 1.8 43.6
TRUE Santa Clara 408 4,698 29.6 2.3 25.3
TRUE Santa Cruz 70 3,837 56.7 0.8 34.2
TRUE Shasta 50 3,637 77.7 0.9 10.6
TRUE Sierra 1 2,916 86.6 0.2 11.4
TRUE Siskiyou 16 2,753 73.9 1.2 14.5
TRUE Solano 99 4,497 36.3 12.7 28.2
TRUE Sonoma 122 4,004 63.6 1.4 25.6
TRUE Stanislaus 112 4,929 38.7 2.7 48.9
TRUE Sutter 21 4,719 45.8 1.8 32.5
TRUE Tehama 14 4,677 65.9 0.9 26.0
TRUE Trinity 4 3,972 79.2 1.7 7.0
TRUE Tulare 103 4,597 27.4 1.2 65.3
TRUE Tuolumne 18 3,055 78.1 1.8 13.5
TRUE Ventura 190 4,432 45.0 1.7 42.1
TRUE Yolo 53 4,097 46.5 2.7 31.1
TRUE Yuba 19 4,300 56.0 3.2 26.7
FALSE Kings 1 7,612 18.3 25.1 49.9
FALSE Lassen 1 7,717 30.4 22.1 42.4
FALSE Los Angeles 1 8,994 16.8 33.6 41.4
FALSE Madera 1 7,043 25.2 14.9 49.2
FALSE San Bernardino 1 3,618 15.5 25.5 50.4
FALSE Solano 1 5,774 18.5 43.6 29.0
tract_flag_driver <- tract_quality %>%
  mutate(
    flag_white    = moe_white_pct    > 15,
    flag_black    = moe_black_pct    > 15,
    flag_hispanic = moe_hispanic_pct > 15,
    driver_groups = case_when(
      flag_white & !flag_black & !flag_hispanic ~ "White",
      !flag_white & flag_black & !flag_hispanic ~ "Black",
      !flag_white & !flag_black & flag_hispanic ~ "Hispanic",
      flag_white | flag_black | flag_hispanic   ~ "Multiple",
      TRUE                                       ~ "None"
    )
  )

driver_totals <- tract_flag_driver %>%
  filter(high_moe_flag) %>%
  summarise(
    White    = sum(flag_white,    na.rm = TRUE),
    Black    = sum(flag_black,    na.rm = TRUE),
    Hispanic = sum(flag_hispanic, na.rm = TRUE)
  ) %>%
  tidyr::pivot_longer(everything(),
                      names_to = "Group",
                      values_to = "Flagged Tracts")

kable(
  driver_totals %>% arrange(desc(`Flagged Tracts`)),
  caption   = "Which Groups Drove High-MOE Flags (MOE% > 15) — Statewide",
  col.names = c("Group", "Flagged Tracts"),
  align     = c("l","r"),
  format.args = list(big.mark = ",")
)
Which Groups Drove High-MOE Flags (MOE% > 15) — Statewide
Group Flagged Tracts
Black 9,109
Hispanic 8,631
White 8,144

Pattern Analysis: If we are to detect patterns using the instructions given, then 99.9% of all census tracts are within the MOE margins, meaning there is some type of data error present. That is a clear indication that the issue is effectively random at the tract level. However, collapsing the results by demographic group rather than by tract reveals that the burden of error is not evenly shared: it falls overwhelmingly on minority populations, with Black communities the most affected. This stems from well-known challenges of under-sampling in the ACS, which causes large variations within smaller racial categories. Tracts with high MOEs tend to have smaller populations overall, amplifying sampling error, and even within White-majority tracts the subgroup estimates for Black and Hispanic residents frequently exceed the 15% MOE threshold. In some cases, there are so few observations that subgroup estimates are unstable or missing altogether. Together, these dynamics show that while the pattern may look random at the tract level, the reliability problem is systematically tied to the representation of minority populations, raising clear concerns for algorithmic decision-making.

Part 5: Policy Recommendations

5.1 Analysis Integration and Professional Summary

Executive Summary:

Across county- and tract-level analyses, two systematic patterns consistently appear. First, tracts and counties with smaller populations tend to have disproportionately high margins of error, making their estimates far less stable than those from larger areas. Second, the reliability of racial and ethnic subgroup estimates varies sharply: Black and Hispanic populations are much more likely to have margins of error above 15%, and in some cases, the ACS does not capture enough observations to produce valid estimates. Together, these patterns show that measurement error is pervasive but not random it reflects structural features of both tract size and demographic composition.

Communities facing the greatest risk of algorithmic bias are those that are either very small and rural or racially/ethnically diverse. Rural tracts, because of small sample sizes, may be flagged as unreliable and thus deprioritized in automated systems, despite having genuine needs. At the same time, urban minority communities, particularly those with large Hispanic or Black populations, often show the highest subgroup MOEs, meaning their conditions could be systematically misclassified or underestimated. In both cases, the communities already at risk of marginalization are the same ones where the data is least reliable.

The drivers of these problems are structural. In rural areas, small sample sizes inflate margins of error, while in diverse urban tracts, underrepresentation of minority subgroups disrupts the accuracy of need assessments. This underrepresentation is tied to long-standing stratification in data collection, where certain groups are less visible in surveys, and to socio-spatial self-selection, where minorities concentrate in particular neighborhoods that are often harder to measure with precision. These processes produce systematic biases: the very communities whose needs are greatest — low-income, minority, and geographically marginalized — are those most likely to be misrepresented in the data.

The Department should treat reliability as central to its algorithmic framework. Specifically, it should (a) adjust for MOE when prioritizing communities, so noisy estimates are not misclassified as real differences; (b) avoid strict cutoffs in low-confidence areas by using broader eligibility bands; (c) supplement ACS data with administrative or community-level sources in minority-dense neighborhoods where subgroup reliability is weakest; and (d) incorporate transparency and equity audits to ensure that stratification and data gaps do not reinforce existing inequalities. By embedding these safeguards, the Department can ensure its allocation strategies are both statistically sound and socially just.

6.3 Specific Recommendations

Your Task: Create a decision framework for algorithm implementation.

# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category

recommendations <- county_reliability %>%
  select(
    County = county_name,
    `Median Income` = med_hh_incomeE,
    `MOE %` = moe_percentage,
    `Reliability Category` = Reliability
  ) %>%
  mutate(
    Recommendation = case_when(
      `Reliability Category` == "High Confidence"     ~ "Safe for algorithmic decisions",
      `Reliability Category` == "Moderate Confidence" ~ "Use with caution – monitor outcomes",
      `Reliability Category` == "Low Confidence"      ~ "Requires manual review or additional data",
      TRUE                                            ~ NA_character_
    )
  )

# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"  
# - Low Confidence: "Requires manual review or additional data"

# Format as a professional table with kable()

kable(
  recommendations %>%
    arrange(Recommendation, County),
  caption = "Decision Framework for Algorithm Implementation (Arranged by Recommendation)",
  col.names = c("County", "Median Income", "MOE %", "Reliability Category", "Recommendation"),
  digits = 2,
  format.args = list(big.mark = ",")
)
Decision Framework for Algorithm Implementation (Arranged by Recommendation)
County Median Income MOE % Reliability Category Recommendation
Alpine 101,125 17.25 Low Confidence Requires manual review or additional data
Mono 82,038 18.76 Low Confidence Requires manual review or additional data
Plumas 67,885 11.45 Low Confidence Requires manual review or additional data
Sierra 61,108 15.12 Low Confidence Requires manual review or additional data
Trinity 47,317 12.45 Low Confidence Requires manual review or additional data
Alameda 122,488 1.00 High Confidence Safe for algorithmic decisions
Butte 66,085 3.42 High Confidence Safe for algorithmic decisions
Contra Costa 120,020 1.25 High Confidence Safe for algorithmic decisions
El Dorado 99,246 3.36 High Confidence Safe for algorithmic decisions
Fresno 67,756 1.43 High Confidence Safe for algorithmic decisions
Humboldt 57,881 3.68 High Confidence Safe for algorithmic decisions
Imperial 53,847 4.11 High Confidence Safe for algorithmic decisions
Kern 63,883 2.07 High Confidence Safe for algorithmic decisions
Kings 68,540 3.29 High Confidence Safe for algorithmic decisions
Lake 56,259 4.34 High Confidence Safe for algorithmic decisions
Los Angeles 83,411 0.53 High Confidence Safe for algorithmic decisions
Madera 73,543 3.87 High Confidence Safe for algorithmic decisions
Marin 142,019 2.89 High Confidence Safe for algorithmic decisions
Mendocino 61,335 3.58 High Confidence Safe for algorithmic decisions
Merced 64,772 3.31 High Confidence Safe for algorithmic decisions
Monterey 91,043 2.09 High Confidence Safe for algorithmic decisions
Napa 105,809 2.82 High Confidence Safe for algorithmic decisions
Nevada 79,395 4.82 High Confidence Safe for algorithmic decisions
Orange 109,361 0.81 High Confidence Safe for algorithmic decisions
Placer 109,375 1.70 High Confidence Safe for algorithmic decisions
Riverside 84,505 1.26 High Confidence Safe for algorithmic decisions
Sacramento 84,010 0.97 High Confidence Safe for algorithmic decisions
San Bernardino 77,423 1.04 High Confidence Safe for algorithmic decisions
San Diego 96,974 1.02 High Confidence Safe for algorithmic decisions
San Francisco 136,689 1.43 High Confidence Safe for algorithmic decisions
San Joaquin 82,837 1.75 High Confidence Safe for algorithmic decisions
San Luis Obispo 90,158 2.56 High Confidence Safe for algorithmic decisions
San Mateo 149,907 1.75 High Confidence Safe for algorithmic decisions
Santa Barbara 92,332 2.05 High Confidence Safe for algorithmic decisions
Santa Clara 153,792 1.00 High Confidence Safe for algorithmic decisions
Santa Cruz 104,409 3.04 High Confidence Safe for algorithmic decisions
Shasta 68,347 3.63 High Confidence Safe for algorithmic decisions
Siskiyou 53,898 4.90 High Confidence Safe for algorithmic decisions
Solano 97,037 1.78 High Confidence Safe for algorithmic decisions
Sonoma 99,266 2.00 High Confidence Safe for algorithmic decisions
Stanislaus 74,872 1.83 High Confidence Safe for algorithmic decisions
Sutter 72,654 4.71 High Confidence Safe for algorithmic decisions
Tulare 64,474 2.31 High Confidence Safe for algorithmic decisions
Ventura 102,141 1.50 High Confidence Safe for algorithmic decisions
Yolo 85,097 2.74 High Confidence Safe for algorithmic decisions
Yuba 66,693 4.19 High Confidence Safe for algorithmic decisions
Amador 74,853 8.08 Moderate Confidence Use with caution – monitor outcomes
Calaveras 77,526 5.00 Moderate Confidence Use with caution – monitor outcomes
Colusa 69,619 8.25 Moderate Confidence Use with caution – monitor outcomes
Del Norte 61,149 7.16 Moderate Confidence Use with caution – monitor outcomes
Glenn 64,033 6.19 Moderate Confidence Use with caution – monitor outcomes
Inyo 63,417 8.60 Moderate Confidence Use with caution – monitor outcomes
Lassen 59,515 5.97 Moderate Confidence Use with caution – monitor outcomes
Mariposa 60,021 8.82 Moderate Confidence Use with caution – monitor outcomes
Modoc 54,962 9.80 Moderate Confidence Use with caution – monitor outcomes
San Benito 104,451 5.23 Moderate Confidence Use with caution – monitor outcomes
Tehama 59,029 6.95 Moderate Confidence Use with caution – monitor outcomes
Tuolumne 70,432 6.66 Moderate Confidence Use with caution – monitor outcomes

Key Recommendations:

Your Task: Use your analysis results to provide specific guidance to the department.

  1. Counties suitable for immediate algorithmic implementation: Alameda, Butte, Contra Costa, El Dorado, Fresno, Humboldt, Imperial, Kern, Kings, Lake, Los Angeles, Madera Marin, Mendocino, Merced, Monterey, Napa, Nevada, Orange, Placer, Riverside, Sacramento, San Bernardino, San Diego, San Francisco, San Joaquin, San Luis Obispo, San Mateo, Santa Barbara, Santa Clara, Santa Cruz, Shasta,Siskiyou, Solano, Sonoma, Stainislaus, Sutter, Tulare, Ventura, Yolo, and Yuba

  2. Counties requiring additional oversight: Amador, Calaveras, Colusa, Del Norte, Glenn, Inyo, Lassen, Mariposa, Modoc, San Benito, Tehama, and Tuolumne

  3. Counties needing alternative approaches: Alpine, Mono, Plumas, Sierra, and Trinity

Questions for Further Investigation

  1. Are high-MOE tracts clustered spatially (e.g., along rural–urban boundaries or in specific regions of the state), or do they appear evenly dispersed?

  2. Do MOE patterns persist across ACS releases, or do they improve over time with larger samples? A time-series comparison could reveal whether underrepresentation of minority or rural communities is a persistent structural issue, similar to how you track flood or disaster impacts across years.

  3. How do MOE patterns for racial and ethnic groups vary across states? Are high MOEs for Hispanic and Black populations a uniquely California phenomenon, or do they reflect a broader national issue embedded in ACS sampling design?

Technical Notes

Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [date]

Reproducibility: - All analysis conducted in R version 4.5.1 - Census API key required for replication - Complete code and documentation available at: https://musa-5080-fall-2025.github.io/portfolio-setup-MohamadAlAbbas-PhD/

Methodology Notes: Margins of error (MOE) were standardized as percentages of the estimate, and counties were classified into High, Moderate, and Low Confidence categories using thresholds of <5%, 5–10%, and >10% respectively. Reliability flags at the tract level were set when any racial/ethnic subgroup estimate exceeded 15% MOE. No smoothing or imputation was applied to extreme or infinite MOE values; tracts with zero subgroup observations were retained as-is to reflect the raw survey limitations.

County codes were extracted directly from GEOID strings to facilitate joins, and descriptive statistics were calculated using simple group means. Data outputs were formatted using kable() for presentation, and no additional modeling or weighting adjustments were performed beyond what the ACS provides.

Limitations: Several limitations should be noted. The scale of our unit of analysis demonstrates that census tract-level analysis tends to carry high MOEs that makes interpretation and deployment of algorithmic solutions problematic rather it might be ideal to use county-level analysis. Second, subgroup estimates for racial and ethnic minorities often carried very high MOEs, and in some cases, no observations were available, producing infinite or undefined percentages. These issues were left unadjusted to remain consistent with assignment instructions but highlight important data reliability challenges.

Third, the analysis is limited to a single 5-year ACS period; no longitudinal comparison was made to assess whether MOE patterns persist or shift over time. Finally, aggregating tract-level characteristics to the county level masks within-county variability that may be relevant for equity considerations.


Submission Checklist

Before submitting your portfolio link on Canvas:

Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/assignments/assignment_1/your_file_name.html