Assignment 1: Census Data Quality for Policy Decisions

Evaluating Data Reliability for Algorithmic Decision-Making

Author

Luciano

Published

December 8, 2025

Setup

# Load required packages
library(tidycensus)
library(tidyverse)
library(knitr)

# Set your Census API key
census_api_key("b236a5b2547ce79c3e203c3e1366ed7fa7b3d463", install = FALSE)
Sys.getenv("CENSUS_API_KEY")
[1] "b236a5b2547ce79c3e203c3e1366ed7fa7b3d463"
# Choose your state for analysis
my_state <- "Missouri"

I have chosen Missouri for this analysis because: major metropolitan areas (like St. Louis and Kansas City) with a large number of rural counties. This scenario can help to illustrate how data quality may vary between urban and rural areas, which is important for equitable policy decisions.

Part 2: County-Level Resource Assessment

# Retrieve county-level ACS data for Missouri
county_data <- get_acs(
  geography = "county",
  state = my_state,
  variables = c(
    median_income = "B19013_001",
    total_pop = "B01003_001"
  ),
  year   = 2022,
  survey = "acs5",
  output = "wide"
)

# Clean county names: remove ", Missouri" and " County"
county_data <- county_data %>%
  mutate(NAME = str_remove(NAME, ", Missouri"),
         NAME = str_remove(NAME, " County"))

# Display the first few rows
head(county_data)
# A tibble: 6 × 6
  GEOID NAME     median_incomeE median_incomeM total_popE total_popM
  <chr> <chr>             <dbl>          <dbl>      <dbl>      <dbl>
1 29001 Adair             51020           4430      25299         NA
2 29003 Andrew            68774           4776      18069         NA
3 29005 Atchison          58521           3686       5270         NA
4 29007 Audrain           51745           2309      24873         NA
5 29009 Barry             55592           5385      34701         NA
6 29011 Barton            48105           5576      11683         NA

2.2 Data Quality Assessment

library(dplyr)
library(stringr)

income_reliability <- county_data %>%
  mutate(income_moe_pct = if_else(
      median_incomeE > 0,
      100 * median_incomeM / median_incomeE,
      NA_real_
  ),
  reliability = case_when(
      income_moe_pct < 5 ~ "High Confidence",
      income_moe_pct >= 5 & income_moe_pct <= 10 ~ "Moderate Confidence",
      income_moe_pct > 10 ~ "Low Confidence",
      TRUE ~ NA_character_
  ),
  unreliable_flag = if_else(income_moe_pct > 10, TRUE, FALSE),
  total_popE = if_else(total_popE < 0, NA_real_, total_popE
  )) %>%
  select(GEOID, NAME, median_incomeE, median_incomeM, income_moe_pct, reliability, unreliable_flag, total_popE)

# Display the first few rows
head(income_reliability)
# A tibble: 6 × 8
  GEOID NAME     median_incomeE median_incomeM income_moe_pct reliability       
  <chr> <chr>             <dbl>          <dbl>          <dbl> <chr>             
1 29001 Adair             51020           4430           8.68 Moderate Confiden…
2 29003 Andrew            68774           4776           6.94 Moderate Confiden…
3 29005 Atchison          58521           3686           6.30 Moderate Confiden…
4 29007 Audrain           51745           2309           4.46 High Confidence   
5 29009 Barry             55592           5385           9.69 Moderate Confiden…
6 29011 Barton            48105           5576          11.6  Low Confidence    
# ℹ 2 more variables: unreliable_flag <lgl>, total_popE <dbl>

2.3 High Uncertainty Counties

# Create table of top 5 counties by MOE percentage
library(knitr)

top5_uncertain <- income_reliability %>%
  arrange(desc(income_moe_pct)) %>%
  slice(1:5) %>%
  select(
    County = NAME,
    total_popE = total_popE,
    `Median Income (Estimate)` = median_incomeE,
    `Margin of Error` = median_incomeM,
    `MOE %` = income_moe_pct,
    `Reliability` = reliability
  )

# Format as table with kable() - include appropriate column names and caption
kable(
  top5_uncertain,
  caption = "Top 5 Counties with Highest Median Income MOE Percentages in Missouri"
)
Top 5 Counties with Highest Median Income MOE Percentages in Missouri
County total_popE Median Income (Estimate) Margin of Error MOE % Reliability
Shannon 7132 46767 9920 21.21154 Low Confidence
Carter 5299 45737 8517 18.62168 Low Confidence
Mississippi 12305 40833 7546 18.48015 Low Confidence
Ozark 8688 39125 7092 18.12652 Low Confidence
Mercer 3517 55592 10045 18.06915 Low Confidence

Data Quality Commentary:

The top five counties all have populations of around 40,000 to 60,000 or even fewer. For example, Mercer has just over 50,000 residents, yet its MOE reaches 18%. This suggests that smaller populations may lead to higher uncertainty in estimates, likely due to smaller sample sizes in the ACS survey. Carter, Shannon, and Ozark are located in Missouri’s Ozark region, an area characterized by both limited resources and highly dispersed populations. These factors can contribute to challenges in data collection, resulting in less reliable estimates.

Part 3: Neighborhood-Level Analysis

# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
selected_counties <- income_reliability %>%
  filter(
    NAME %in% c("St. Louis",    # High Confidence
                "Buchanan",      # Moderate Confidence
                "Texas")     # Low Confidence
  ) %>%
  select(
    County = NAME,
    `Median Income (Estimate)` = median_incomeE,
    `MOE %` = income_moe_pct,
    `Reliability` = reliability
  )

# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category
selected_counties
# A tibble: 3 × 4
  County    `Median Income (Estimate)` `MOE %` Reliability        
  <chr>                          <dbl>   <dbl> <chr>              
1 Buchanan                       58303    5.07 Moderate Confidence
2 St. Louis                      78067    1.64 High Confidence    
3 Texas                          42870   11.4  Low Confidence     

I selected St. Louis, Buchanan, and Texas counties to represent high, moderate, and low data reliability contexts—urban, mid-sized, and rural areas, respectively.

3.2 Tract-Level Demographics

# Define your race/ethnicity variables with descriptive names
race_vars <- c(
  total_pop = "B03002_001",
  white     = "B03002_003",
  black     = "B03002_004",
  hispanic  = "B03002_012"
)

# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
county_codes <- income_reliability %>%
  filter(NAME %in% selected_counties$County) %>%
  transmute(county_code = str_sub(GEOID, 3, 5)) %>%
  distinct() %>%
  pull(county_code)
tract_demo_raw <- get_acs(
  geography = "tract",
  state     = my_state,
  county    = county_codes,
  variables = race_vars,
  year      = 2022,
  survey    = "acs5",
  output    = "wide"
)


# Create percentages for white, Black, and Hispanic populations
tract_demo <- tract_demo_raw %>%
  mutate(
    white_pct    = if_else(total_popE > 0, 100 * whiteE   / total_popE, NA_real_),
    black_pct    = if_else(total_popE > 0, 100 * blackE   / total_popE, NA_real_),
    hispanic_pct = if_else(total_popE > 0, 100 * hispanicE/ total_popE, NA_real_),
    total_population = total_popE,
    tract_name   = str_extract(NAME, "Census Tract[^,]+"),
    county_name  = str_extract(NAME, "Census Tract[^,]+")) %>%
  select(
    GEOID, tract_name, county_name,
    total_population, whiteE, blackE, hispanicE,
    white_pct, black_pct, hispanic_pct
  )


tract_demo <- tract_demo %>%
  mutate(
    county_name = county_name %>%
      str_replace_all(",", ";") %>%
      { str_split_fixed(., ";", 3)[, 2] } %>%
      str_trim()
  )

# Add readable tract and county name columns using str_extract() or similar
kable(
  head(tract_demo, 10),
  caption = "Selected Counties: Tract-Level Race/Ethnicity (ACS 2018–2022)"
)
Selected Counties: Tract-Level Race/Ethnicity (ACS 2018–2022)
GEOID tract_name county_name total_population whiteE blackE hispanicE white_pct black_pct hispanic_pct
29021000100 Census Tract 1; Buchanan County; Missouri Buchanan County 5914 4906 265 123 82.95570 4.480893 2.079811
29021000200 Census Tract 2; Buchanan County; Missouri Buchanan County 4522 3369 100 622 74.50243 2.211411 13.754976
29021000300 Census Tract 3; Buchanan County; Missouri Buchanan County 2571 2030 75 283 78.95760 2.917153 11.007390
29021000400 Census Tract 4; Buchanan County; Missouri Buchanan County 1444 1205 43 57 83.44875 2.977839 3.947368
29021000500 Census Tract 5; Buchanan County; Missouri Buchanan County 3077 2453 96 399 79.72051 3.119922 12.967176
29021000600 Census Tract 6; Buchanan County; Missouri Buchanan County 4836 3535 496 288 73.09760 10.256410 5.955335
29021000701 Census Tract 7.01; Buchanan County; Missouri Buchanan County 4567 3554 387 505 77.81914 8.473834 11.057587
29021000702 Census Tract 7.02; Buchanan County; Missouri Buchanan County 4198 3475 226 201 82.77751 5.383516 4.787994
29021000900 Census Tract 9; Buchanan County; Missouri Buchanan County 4500 3432 431 382 76.26667 9.577778 8.488889
29021001000 Census Tract 10; Buchanan County; Missouri Buchanan County 2149 1456 382 161 67.75244 17.775710 7.491857

3.3 Demographic Analysis

# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
top_hispanic_tract <- tract_demo %>%
  arrange(desc(hispanic_pct)) %>%
  slice(1) %>%
  select(
    GEOID, tract_name, county_name, total_population,
    white_pct, black_pct, hispanic_pct
  )

kable(
  top_hispanic_tract,
  caption = "Tract with Highest Hispanic/Latino Percentage (Selected Counties)"
)
Tract with Highest Hispanic/Latino Percentage (Selected Counties)
GEOID tract_name county_name total_population white_pct black_pct hispanic_pct
29189214700 Census Tract 2147; St. Louis County; Missouri St. Louis County 8305 43.66045 18.81999 32.51054
# Calculate average demographics by county using group_by() and summarize()
county_summary_unweighted <- tract_demo %>%
  group_by(county_name) %>%
  summarise(
    n_tracts = n(),
    avg_white_pct    = mean(white_pct,    na.rm = TRUE),
    avg_black_pct    = mean(black_pct,    na.rm = TRUE),
    avg_hispanic_pct = mean(hispanic_pct, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_hispanic_pct))

# Show: number of tracts, average percentage for each racial/ethnic group
# Create a nicely formatted table of your results using kable()
kable(
  county_summary_unweighted,
  caption = "Average Demographics by County"
)
Average Demographics by County
county_name n_tracts avg_white_pct avg_black_pct avg_hispanic_pct
Buchanan County 26 80.95432 5.691374 7.305424
St. Louis County 236 61.42962 26.113783 3.002732
Texas County 8 90.21293 1.883072 2.467462

Part 4: Comprehensive Data Quality Evaluation

# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
moe_pct <- tract_demo_raw %>%
  transmute(
    GEOID,
    white_moe_pct    = if_else(whiteE    > 0, 100 * whiteM    / whiteE,    NA_real_),
    black_moe_pct    = if_else(blackE    > 0, 100 * blackM    / blackE,    NA_real_),
    hispanic_moe_pct = if_else(hispanicE > 0, 100 * hispanicM / hispanicE, NA_real_)
  )

# Create a flag for tracts with high MOE on any demographic variable
# Use logical operators (| for OR) in an ifelse() statement
tract_quality <- tract_demo %>%
  select(GEOID, county_name, tract_name, total_population,
         white_pct, black_pct, hispanic_pct) %>%
  left_join(moe_pct, by = "GEOID") %>%
  mutate(
    high_moe_flag = ifelse(
      coalesce(white_moe_pct    > 50, FALSE) |
      coalesce(black_moe_pct    > 50, FALSE) |
      coalesce(hispanic_moe_pct > 50, FALSE),
      TRUE, FALSE
    )
  )

# Create summary statistics showing how many tracts have data quality issues
overall_summary <- tract_quality %>%
  summarise(
    n_tracts       = n(),
    n_high_moe     = sum(high_moe_flag, na.rm = TRUE),
    share_high_moe = round(100 * n_high_moe / n_tracts, 1)
  )
kable(overall_summary, caption = "Overall count and share of high-MOE tracts")
Overall count and share of high-MOE tracts
n_tracts n_high_moe share_high_moe
270 255 94.4
county_summary <- tract_quality %>%
  group_by(county_name) %>%
  summarise(
    n_tracts       = n(),
    n_high_moe     = sum(high_moe_flag, na.rm = TRUE),
    share_high_moe = round(100 * n_high_moe / n_tracts, 1)
  ) %>%
  arrange(desc(share_high_moe), desc(n_high_moe))
kable(county_summary, caption = "High-MOE tracts by county")
High-MOE tracts by county
county_name n_tracts n_high_moe share_high_moe
Buchanan County 26 26 100.0
St. Louis County 236 223 94.5
Texas County 8 6 75.0

4.2 Pattern Analysis

# Group tracts by whether they have high MOE issues
# Calculate average characteristics for each group:
# - population size, demographic percentages
pattern_summary <- tract_quality %>%
  group_by(high_moe_flag) %>%
  summarise(
    n_tracts          = n(),
    avg_population    = mean(total_population, na.rm = TRUE),
    avg_white_pct     = mean(white_pct, na.rm = TRUE),
    avg_black_pct     = mean(black_pct, na.rm = TRUE),
    avg_hispanic_pct  = mean(hispanic_pct, na.rm = TRUE)
  )

# Use group_by() and summarize() to create this comparison
# Create a professional table showing the patterns
kable(
  pattern_summary,
  caption = "Comparison of Community Characteristics by Data Quality Flag"
)
Comparison of Community Characteristics by Data Quality Flag
high_moe_flag n_tracts avg_population avg_white_pct avg_black_pct avg_hispanic_pct
FALSE 15 4488.133 35.73905 53.43592 4.259571
TRUE 255 4085.306 65.83459 21.66414 3.350713

Pattern Analysis: Even after raising the threshold for high margins of error from 15% to 50%, nearly 94% of tracts remain flagged as high-MOE. Such an extreme distribution calls the reliability of the data into question: either the ACS sample sizes in these counties are insufficient, or the estimation procedures struggle to produce stable values in small communities. Some results are also counterintuitive. For instance, certain tracts with larger populations and higher shares of minority residents appear to have lower error rates—a pattern that challenges expectations. At this stage, it is difficult to draw substantive conclusions; what emerges most clearly is the presence of potential bias and instability within the data itself.

Part 5: Policy Recommendations

5.1 Analysis Integration and Professional Summary

Executive Summary:

The analysis shows that data reliability problems are not randomly distributed but concentrated in specific variables and geographies. At the county level, median household income estimates (B19013_001) exhibit much higher margins of error in small, rural counties: Shannon (21.2%), Carter (18.6%), Mississippi (18.5%), Ozark (18.1%), and Mercer (18.1%) all far exceed the 10% threshold, while larger metropolitan counties such as St. Louis report far lower error rates (1.64%). At the tract level, racial and ethnic variables (B03002) are especially unstable when minority group counts are small. Using a 50% MOE threshold, 94.4% of tracts were flagged as high-MOE, with Black and Hispanic estimates most often responsible for the flag. This pattern is stark in predominantly white, rural counties such as Buchanan (100% of tracts flagged) and Texas (75%), where Black and Hispanic groups make up less than 10% of the population. Even in St. Louis County, where the sample base is larger, 94.5% of tracts still exceeded the threshold, largely because Hispanic residents constitute only 3% of the county population.

Because algorithms allocate resources based on point estimates without accounting for their uncertainty, unstable figures can translate directly into misclassification. In rural tracts with very small Black or Hispanic populations, ACS samples often produce highly volatile estimates. An algorithm that interprets these values at face value may conclude that such communities have little or no need for targeted services, even when real needs exist. Conversely, areas where small samples happen to inflate minority counts could be over-prioritized. The core risk is that statistical noise is treated as social reality, where parts of the already vulnerable groups are more exposed to under-investment.

The ACS is a very sample survey. 1. For a large city or county, hundreds of households might be surveyed, with estimate of median income and demographic counts. For a tiny county, only a handful of households are surveyed; 2. When a population has a dominant majority and a few minority members, the minority data will have high variance. Missouri’s rural counties often have very few minority residents, and thus data about those residents is scant and uncertain. If the algorithm tries to pinpoint, say, where to fund a minority outreach program, it might miss small communities entirely due to these data gaps. 3. Some communities (e.g., very low-income households, remote rural residents, certain ethnic minorities, immigrants) have historically lower response rates to the census and surveys. Language barriers, distrust in government, or simply being hard to reach (like no internet, P.O. box addresses, etc.) can lead to undercounting. That means the ACS might not just have statistical uncertainty, but actual systematic bias in who is represented.

The algorithm should not treat all data points equally. Wherever possible, include the margin of error or a confidence weight in the algorithm’s calculations. For example, if ranking counties by median income (lowest income = highest need), adjust the ranking to account for MOE. A county with an income of $40k ± $8k should be considered as uncertain and perhaps be grouped with counties that have say $35k median ± $2k, rather than confidently placing it above or below them. Set rules that flag communities (counties or tracts) with low data confidence for manual review. If the data is too poor, don’t let the algorithm alone make the call. For instance, the algorithm could produce a preliminary list of priority areas but mark any area with “Low Confidence” data (like those >10% MOE counties, or tracts with >50% MOE on key stats) for human analysts to double-check. This ensures places like Shannon County or Texas County are not ignored just because of shaky data.

6.3 Specific Recommendations

Your Task: Create a decision framework for algorithm implementation.

# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category

# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"  
# - Low Confidence: "Requires manual review or additional data"
recommendations <- income_reliability %>%
  mutate(
    recommendation = case_when(
      reliability == "High Confidence" ~ "Safe for algorithmic decisions",
      reliability == "Moderate Confidence" ~ "Use with caution - monitor outcomes",
      reliability == "Low Confidence" ~ "Requires manual review or additional data",
      TRUE ~ NA_character_
    )
  ) %>%
  select(
    County = NAME,
    `Median Income (Estimate)` = median_incomeE,
    `MOE %` = income_moe_pct,
    `Reliability` = reliability,
    Recommendation = recommendation
  )

kable(
  head(recommendations, 10),
  caption = "Decision Framework for Algorithm Implementation"
)
Decision Framework for Algorithm Implementation
County Median Income (Estimate) MOE % Reliability Recommendation
Adair 51020 8.682870 Moderate Confidence Use with caution - monitor outcomes
Andrew 68774 6.944485 Moderate Confidence Use with caution - monitor outcomes
Atchison 58521 6.298594 Moderate Confidence Use with caution - monitor outcomes
Audrain 51745 4.462267 High Confidence Safe for algorithmic decisions
Barry 55592 9.686646 Moderate Confidence Use with caution - monitor outcomes
Barton 48105 11.591311 Low Confidence Requires manual review or additional data
Bates 54122 9.574665 Moderate Confidence Use with caution - monitor outcomes
Benton 50229 7.722630 Moderate Confidence Use with caution - monitor outcomes
Bollinger 52306 16.525829 Low Confidence Requires manual review or additional data
Boone 66564 2.665104 High Confidence Safe for algorithmic decisions
# Format as a professional table with kable()

Key Recommendations:

  1. Counties suitable for immediate algorithmic implementation: Major metropolitan and other large counties with reliable data (High Confidence). For Missouri, these include places like St. Louis County, St. Louis City, Jackson County, St. Charles County, Clay County, Greene County, and a few others. These areas have MOEs well under 5% for key metrics. The department can confidently use the algorithm to drive decisions in these locales because any ranking or prioritization based on ACS data is grounded in fairly accurate information.

  2. Counties requiring additional oversight: Mid-sized or somewhat smaller counties with moderate data confidence. This list might include Buchanan, Boone, Cole, Jasper, Newton, Platte, Cape Girardeau, etc., roughly counties with populations in the few tens of thousands up to around 100k. In these cases, the algorithm’s output should be reviewed by staff. For instance, if the algorithm ranks Buchanan County as the 10th highest need, because of moderate MOE, staff might double-check recent economic conditions in Buchanan (maybe there was a plant closure not reflected fully in the 2018–2022 data, or maybe the MOE means it could actually rank 8th or 12th). For example, if funds were given to a county for outreach but uptake is low, was it because the need was overestimated? Or if a county not prioritized starts showing signs of distress, was it an oversight due to data noise? Essentially

  3. Counties needing alternative approaches: Counties with low confidence data (mostly rural counties and those flagged earlier like Shannon, Carter, Ozark, Mississippi, Mercer, and many others). In these cases, I recommend manual review and supplementary analysis as a prerequisite for decision-making. The algorithm might initially rank these places oddly (perhaps not high need because of an overestimated median income or not low need because of a weird population estimate).

  1. Look at local poverty indicators (e.g., school district free lunch percentages, local food pantry demand).
  2. Consult qualitative reports (maybe county commissioners or local non-profits can speak to the community’s situation).
  3. Possibly use regional grouping: If one county’s data is flaky, consider looking at a cluster of surrounding similar counties to infer needs.

Questions for Further Investigation

  1. We observed that many of the highest-MOE counties cluster in certain regions (e.g., the Ozarks). A deeper spatial analysis could reveal regional trends. So, are there geographic clusters of poor data quality?
  2. It would be insightful to examine if data reliability is improving or worsening over time. For instance, how do the 2018–2022 ACS margins of error compare to 2010–2014 ACS for these same counties?
  3. We focused on median income and a few racial groups. What about other variables that an algorithm might use? For example, poverty rates, unemployment rates, education levels, or age distributions by tract/county.
  4. what factors best predict high MOE or data issues? Our analysis suggests population size and homogeneity are factors. We could formally test correlations: e.g., does a lower response rate or a higher proportion of rental housing correlate with higher MOEs?

Technical Notes

Data Sources: All data for this analysis comes from the U.S. Census Bureau’s American Community Survey (ACS) 2018–2022 5-Year Estimates. The data was accessed via the tidycensus R package. Key tables used were:

B19013: Median Household Income in the Past 12 Months (in 2022 inflation-adjusted dollars) – for county-level median income and MOE. B01003: Total Population – for county populations. B03002: Hispanic or Latino Origin by Race – for tract-level total population and breakdown by White (non-Hispanic), Black (non-Hispanic), and Hispanic (any race) populations and their MOEs.

Reproducibility: The analysis was conducted using R (version 4.x) and the following main packages: tidycensus for data retrieval, tidyverse (dplyr, stringr) for data manipulation, and knitr/kable for presenting tables. A Census API key (personal to the analyst) was used to authenticate data requests; this key is required to replicate the data pulls. All code and documentation for this analysis are available in https://musa-5080-fall-2025.github.io/portfolio-setup-lluluciano0505/

Methodology Notes: a. Reliability Thresholds: We defined “High”, “Moderate”, and “Low” confidence using specific MOE percentage cutoffs (5% and 10%). These thresholds are somewhat arbitrary but are common sense rules of thumb in survey analysis. A 5% MOE indicates a very tight estimate, while beyond 10% starts to indicate caution. b. High MOE Flag at Tract Level: I chose a 50% MOE as the flag criterion for tract-level demographic data. The assignment prompt mentioned 15% as a possible threshold to consider “unreliable,” but we observed that using 15% would flag virtually every tract. This decision was to focus discussion on the worst data problems; however, it means we understate the prevalence of “moderate” data issues. In reality, even a 20% MOE might be problematic for decision-making, so our approach was a conservative one. c. County Selection: We intentionally picked one county for each reliability category to compare (rather than random selection). This allowed us to illustrate contrasts, but it means some findings (like patterns in St. Louis vs. Texas County) are examples, not exhaustive. If a different high confidence county were chosen (say, Jackson County instead of St. Louis County), many patterns would be similar, but specific numbers would differ.

Limitations: Margin of Error Interpretation: All MOEs discussed are at the 90% confidence level. This means there is a 10% chance the true value lies outside the given interval. If one prefers 95% confidence, margins would be wider. For simplicity, we stuck to ACS’s MOE as given. ACS Data Limitations: The ACS 5-year data, while comprehensive, has known limitations. Small or hard-to-reach populations (e.g., homeless individuals, transient laborers, undocumented immigrants) may be undercounted. Our analysis doesn’t correct for any undercount bias; it only measures sampling error (MOE). —