Assignment 1: Census Data Quality for Policy Decisions

Evaluating Data Reliability for Algorithmic Decision-Making

Author

Luciano

Published

December 8, 2025

Setup

# Load required packages
library(tidycensus)
library(tidyverse)
library(knitr)

# Set your Census API key
census_api_key("b236a5b2547ce79c3e203c3e1366ed7fa7b3d463", install = FALSE)
Sys.getenv("CENSUS_API_KEY")

[1] "b236a5b2547ce79c3e203c3e1366ed7fa7b3d463"

# Choose your state for analysis
my_state <- "Missouri"

I have chosen Missouri for this analysis because: major metropolitan areas (like St. Louis and Kansas City) with a large number of rural counties. This scenario can help to illustrate how data quality may vary between urban and rural areas, which is important for equitable policy decisions.

Part 2: County-Level Resource Assessment

# Retrieve county-level ACS data for Missouri
county_data <- get_acs(
  geography = "county",
  state = my_state,
  variables = c(
    median_income = "B19013_001",
    total_pop = "B01003_001"
  ),
  year   = 2022,
  survey = "acs5",
  output = "wide"
)

# Clean county names: remove ", Missouri" and " County"
county_data <- county_data %>%
  mutate(NAME = str_remove(NAME, ", Missouri"),
         NAME = str_remove(NAME, " County"))

# Display the first few rows
head(county_data)

# A tibble: 6 × 6
  GEOID NAME     median_incomeE median_incomeM total_popE total_popM
  <chr> <chr>             <dbl>          <dbl>      <dbl>      <dbl>
1 29001 Adair             51020           4430      25299         NA
2 29003 Andrew            68774           4776      18069         NA
3 29005 Atchison          58521           3686       5270         NA
4 29007 Audrain           51745           2309      24873         NA
5 29009 Barry             55592           5385      34701         NA
6 29011 Barton            48105           5576      11683         NA

2.2 Data Quality Assessment

library(dplyr)
library(stringr)

income_reliability <- county_data %>%
  mutate(income_moe_pct = if_else(
      median_incomeE > 0,
      100 * median_incomeM / median_incomeE,
      NA_real_
  ),
  reliability = case_when(
      income_moe_pct < 5 ~ "High Confidence",
      income_moe_pct >= 5 & income_moe_pct <= 10 ~ "Moderate Confidence",
      income_moe_pct > 10 ~ "Low Confidence",
      TRUE ~ NA_character_
  ),
  unreliable_flag = if_else(income_moe_pct > 10, TRUE, FALSE),
  total_popE = if_else(total_popE < 0, NA_real_, total_popE
  )) %>%
  select(GEOID, NAME, median_incomeE, median_incomeM, income_moe_pct, reliability, unreliable_flag, total_popE)

# Display the first few rows
head(income_reliability)

# A tibble: 6 × 8
  GEOID NAME     median_incomeE median_incomeM income_moe_pct reliability       
  <chr> <chr>             <dbl>          <dbl>          <dbl> <chr>             
1 29001 Adair             51020           4430           8.68 Moderate Confiden…
2 29003 Andrew            68774           4776           6.94 Moderate Confiden…
3 29005 Atchison          58521           3686           6.30 Moderate Confiden…
4 29007 Audrain           51745           2309           4.46 High Confidence   
5 29009 Barry             55592           5385           9.69 Moderate Confiden…
6 29011 Barton            48105           5576          11.6  Low Confidence    
# ℹ 2 more variables: unreliable_flag <lgl>, total_popE <dbl>

2.3 High Uncertainty Counties

# Create table of top 5 counties by MOE percentage
library(knitr)

top5_uncertain <- income_reliability %>%
  arrange(desc(income_moe_pct)) %>%
  slice(1:5) %>%
  select(
    County = NAME,
    total_popE = total_popE,
    `Median Income (Estimate)` = median_incomeE,
    `Margin of Error` = median_incomeM,
    `MOE %` = income_moe_pct,
    `Reliability` = reliability
  )

# Format as table with kable() - include appropriate column names and caption
kable(
  top5_uncertain,
  caption = "Top 5 Counties with Highest Median Income MOE Percentages in Missouri"
)

Top 5 Counties with Highest Median Income MOE Percentages in Missouri
County	total_popE	Median Income (Estimate)	Margin of Error	MOE %	Reliability
Shannon	7132	46767	9920	21.21154	Low Confidence
Carter	5299	45737	8517	18.62168	Low Confidence
Mississippi	12305	40833	7546	18.48015	Low Confidence
Ozark	8688	39125	7092	18.12652	Low Confidence
Mercer	3517	55592	10045	18.06915	Low Confidence

Data Quality Commentary:

The top five counties all have populations of around 40,000 to 60,000 or even fewer. For example, Mercer has just over 50,000 residents, yet its MOE reaches 18%. This suggests that smaller populations may lead to higher uncertainty in estimates, likely due to smaller sample sizes in the ACS survey. Carter, Shannon, and Ozark are located in Missouri’s Ozark region, an area characterized by both limited resources and highly dispersed populations. These factors can contribute to challenges in data collection, resulting in less reliable estimates.

Part 3: Neighborhood-Level Analysis

# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
selected_counties <- income_reliability %>%
  filter(
    NAME %in% c("St. Louis",    # High Confidence
                "Buchanan",      # Moderate Confidence
                "Texas")     # Low Confidence
  ) %>%
  select(
    County = NAME,
    `Median Income (Estimate)` = median_incomeE,
    `MOE %` = income_moe_pct,
    `Reliability` = reliability
  )

# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category
selected_counties

# A tibble: 3 × 4
  County    `Median Income (Estimate)` `MOE %` Reliability        
  <chr>                          <dbl>   <dbl> <chr>              
1 Buchanan                       58303    5.07 Moderate Confidence
2 St. Louis                      78067    1.64 High Confidence    
3 Texas                          42870   11.4  Low Confidence

I selected St. Louis, Buchanan, and Texas counties to represent high, moderate, and low data reliability contexts—urban, mid-sized, and rural areas, respectively.

3.2 Tract-Level Demographics

# Define your race/ethnicity variables with descriptive names
race_vars <- c(
  total_pop = "B03002_001",
  white     = "B03002_003",
  black     = "B03002_004",
  hispanic  = "B03002_012"
)

# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
county_codes <- income_reliability %>%
  filter(NAME %in% selected_counties$County) %>%
  transmute(county_code = str_sub(GEOID, 3, 5)) %>%
  distinct() %>%
  pull(county_code)
tract_demo_raw <- get_acs(
  geography = "tract",
  state     = my_state,
  county    = county_codes,
  variables = race_vars,
  year      = 2022,
  survey    = "acs5",
  output    = "wide"
)


# Create percentages for white, Black, and Hispanic populations
tract_demo <- tract_demo_raw %>%
  mutate(
    white_pct    = if_else(total_popE > 0, 100 * whiteE   / total_popE, NA_real_),
    black_pct    = if_else(total_popE > 0, 100 * blackE   / total_popE, NA_real_),
    hispanic_pct = if_else(total_popE > 0, 100 * hispanicE/ total_popE, NA_real_),
    total_population = total_popE,
    tract_name   = str_extract(NAME, "Census Tract[^,]+"),
    county_name  = str_extract(NAME, "Census Tract[^,]+")) %>%
  select(
    GEOID, tract_name, county_name,
    total_population, whiteE, blackE, hispanicE,
    white_pct, black_pct, hispanic_pct
  )


tract_demo <- tract_demo %>%
  mutate(
    county_name = county_name %>%
      str_replace_all(",", ";") %>%
      { str_split_fixed(., ";", 3)[, 2] } %>%
      str_trim()
  )

# Add readable tract and county name columns using str_extract() or similar
kable(
  head(tract_demo, 10),
  caption = "Selected Counties: Tract-Level Race/Ethnicity (ACS 2018–2022)"
)

Selected Counties: Tract-Level Race/Ethnicity (ACS 2018–2022)
GEOID	tract_name	county_name	total_population	whiteE	blackE	hispanicE	white_pct	black_pct	hispanic_pct
29021000100	Census Tract 1; Buchanan County; Missouri	Buchanan County	5914	4906	265	123	82.95570	4.480893	2.079811
29021000200	Census Tract 2; Buchanan County; Missouri	Buchanan County	4522	3369	100	622	74.50243	2.211411	13.754976
29021000300	Census Tract 3; Buchanan County; Missouri	Buchanan County	2571	2030	75	283	78.95760	2.917153	11.007390
29021000400	Census Tract 4; Buchanan County; Missouri	Buchanan County	1444	1205	43	57	83.44875	2.977839	3.947368
29021000500	Census Tract 5; Buchanan County; Missouri	Buchanan County	3077	2453	96	399	79.72051	3.119922	12.967176
29021000600	Census Tract 6; Buchanan County; Missouri	Buchanan County	4836	3535	496	288	73.09760	10.256410	5.955335
29021000701	Census Tract 7.01; Buchanan County; Missouri	Buchanan County	4567	3554	387	505	77.81914	8.473834	11.057587
29021000702	Census Tract 7.02; Buchanan County; Missouri	Buchanan County	4198	3475	226	201	82.77751	5.383516	4.787994
29021000900	Census Tract 9; Buchanan County; Missouri	Buchanan County	4500	3432	431	382	76.26667	9.577778	8.488889
29021001000	Census Tract 10; Buchanan County; Missouri	Buchanan County	2149	1456	382	161	67.75244	17.775710	7.491857

3.3 Demographic Analysis

# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
top_hispanic_tract <- tract_demo %>%
  arrange(desc(hispanic_pct)) %>%
  slice(1) %>%
  select(
    GEOID, tract_name, county_name, total_population,
    white_pct, black_pct, hispanic_pct
  )

kable(
  top_hispanic_tract,
  caption = "Tract with Highest Hispanic/Latino Percentage (Selected Counties)"
)

Tract with Highest Hispanic/Latino Percentage (Selected Counties)
GEOID	tract_name	county_name	total_population	white_pct	black_pct	hispanic_pct
29189214700	Census Tract 2147; St. Louis County; Missouri	St. Louis County	8305	43.66045	18.81999	32.51054

# Calculate average demographics by county using group_by() and summarize()
county_summary_unweighted <- tract_demo %>%
  group_by(county_name) %>%
  summarise(
    n_tracts = n(),
    avg_white_pct    = mean(white_pct,    na.rm = TRUE),
    avg_black_pct    = mean(black_pct,    na.rm = TRUE),
    avg_hispanic_pct = mean(hispanic_pct, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_hispanic_pct))

# Show: number of tracts, average percentage for each racial/ethnic group
# Create a nicely formatted table of your results using kable()
kable(
  county_summary_unweighted,
  caption = "Average Demographics by County"
)

Average Demographics by County
county_name	n_tracts	avg_white_pct	avg_black_pct	avg_hispanic_pct
Buchanan County	26	80.95432	5.691374	7.305424
St. Louis County	236	61.42962	26.113783	3.002732
Texas County	8	90.21293	1.883072	2.467462

Part 4: Comprehensive Data Quality Evaluation

# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
moe_pct <- tract_demo_raw %>%
  transmute(
    GEOID,
    white_moe_pct    = if_else(whiteE    > 0, 100 * whiteM    / whiteE,    NA_real_),
    black_moe_pct    = if_else(blackE    > 0, 100 * blackM    / blackE,    NA_real_),
    hispanic_moe_pct = if_else(hispanicE > 0, 100 * hispanicM / hispanicE, NA_real_)
  )

# Create a flag for tracts with high MOE on any demographic variable
# Use logical operators (| for OR) in an ifelse() statement
tract_quality <- tract_demo %>%
  select(GEOID, county_name, tract_name, total_population,
         white_pct, black_pct, hispanic_pct) %>%
  left_join(moe_pct, by = "GEOID") %>%
  mutate(
    high_moe_flag = ifelse(
      coalesce(white_moe_pct    > 50, FALSE) |
      coalesce(black_moe_pct    > 50, FALSE) |
      coalesce(hispanic_moe_pct > 50, FALSE),
      TRUE, FALSE
    )
  )

# Create summary statistics showing how many tracts have data quality issues
overall_summary <- tract_quality %>%
  summarise(
    n_tracts       = n(),
    n_high_moe     = sum(high_moe_flag, na.rm = TRUE),
    share_high_moe = round(100 * n_high_moe / n_tracts, 1)
  )
kable(overall_summary, caption = "Overall count and share of high-MOE tracts")

Overall count and share of high-MOE tracts
n_tracts	n_high_moe	share_high_moe
270	255	94.4

county_summary <- tract_quality %>%
  group_by(county_name) %>%
  summarise(
    n_tracts       = n(),
    n_high_moe     = sum(high_moe_flag, na.rm = TRUE),
    share_high_moe = round(100 * n_high_moe / n_tracts, 1)
  ) %>%
  arrange(desc(share_high_moe), desc(n_high_moe))
kable(county_summary, caption = "High-MOE tracts by county")

High-MOE tracts by county
county_name	n_tracts	n_high_moe	share_high_moe
Buchanan County	26	26	100.0
St. Louis County	236	223	94.5
Texas County	8	6	75.0

4.2 Pattern Analysis

# Group tracts by whether they have high MOE issues
# Calculate average characteristics for each group:
# - population size, demographic percentages
pattern_summary <- tract_quality %>%
  group_by(high_moe_flag) %>%
  summarise(
    n_tracts          = n(),
    avg_population    = mean(total_population, na.rm = TRUE),
    avg_white_pct     = mean(white_pct, na.rm = TRUE),
    avg_black_pct     = mean(black_pct, na.rm = TRUE),
    avg_hispanic_pct  = mean(hispanic_pct, na.rm = TRUE)
  )

# Use group_by() and summarize() to create this comparison
# Create a professional table showing the patterns
kable(
  pattern_summary,
  caption = "Comparison of Community Characteristics by Data Quality Flag"
)

Comparison of Community Characteristics by Data Quality Flag
high_moe_flag	n_tracts	avg_population	avg_white_pct	avg_black_pct	avg_hispanic_pct
FALSE	15	4488.133	35.73905	53.43592	4.259571
TRUE	255	4085.306	65.83459	21.66414	3.350713

Pattern Analysis: Even after raising the threshold for high margins of error from 15% to 50%, nearly 94% of tracts remain flagged as high-MOE. Such an extreme distribution calls the reliability of the data into question: either the ACS sample sizes in these counties are insufficient, or the estimation procedures struggle to produce stable values in small communities. Some results are also counterintuitive. For instance, certain tracts with larger populations and higher shares of minority residents appear to have lower error rates—a pattern that challenges expectations. At this stage, it is difficult to draw substantive conclusions; what emerges most clearly is the presence of potential bias and instability within the data itself.

Part 5: Policy Recommendations

5.1 Analysis Integration and Professional Summary

Executive Summary:

The analysis shows that data reliability problems are not randomly distributed but concentrated in specific variables and geographies. At the county level, median household income estimates (B19013_001) exhibit much higher margins of error in small, rural counties: Shannon (21.2%), Carter (18.6%), Mississippi (18.5%), Ozark (18.1%), and Mercer (18.1%) all far exceed the 10% threshold, while larger metropolitan counties such as St. Louis report far lower error rates (1.64%). At the tract level, racial and ethnic variables (B03002) are especially unstable when minority group counts are small. Using a 50% MOE threshold, 94.4% of tracts were flagged as high-MOE, with Black and Hispanic estimates most often responsible for the flag. This pattern is stark in predominantly white, rural counties such as Buchanan (100% of tracts flagged) and Texas (75%), where Black and Hispanic groups make up less than 10% of the population. Even in St. Louis County, where the sample base is larger, 94.5% of tracts still exceeded the threshold, largely because Hispanic residents constitute only 3% of the county population.

Because algorithms allocate resources based on point estimates without accounting for their uncertainty, unstable figures can translate directly into misclassification. In rural tracts with very small Black or Hispanic populations, ACS samples often produce highly volatile estimates. An algorithm that interprets these values at face value may conclude that such communities have little or no need for targeted services, even when real needs exist. Conversely, areas where small samples happen to inflate minority counts could be over-prioritized. The core risk is that statistical noise is treated as social reality, where parts of the already vulnerable groups are more exposed to under-investment.

The ACS is a very sample survey. 1. For a large city or county, hundreds of households might be surveyed, with estimate of median income and demographic counts. For a tiny county, only a handful of households are surveyed; 2. When a population has a dominant majority and a few minority members, the minority data will have high variance. Missouri’s rural counties often have very few minority residents, and thus data about those residents is scant and uncertain. If the algorithm tries to pinpoint, say, where to fund a minority outreach program, it might miss small communities entirely due to these data gaps. 3. Some communities (e.g., very low-income households, remote rural residents, certain ethnic minorities, immigrants) have historically lower response rates to the census and surveys. Language barriers, distrust in government, or simply being hard to reach (like no internet, P.O. box addresses, etc.) can lead to undercounting. That means the ACS might not just have statistical uncertainty, but actual systematic bias in who is represented.

The algorithm should not treat all data points equally. Wherever possible, include the margin of error or a confidence weight in the algorithm’s calculations. For example, if ranking counties by median income (lowest income = highest need), adjust the ranking to account for MOE. A county with an income of $40k ± $8k should be considered as uncertain and perhaps be grouped with counties that have say $35k median ± $2k, rather than confidently placing it above or below them. Set rules that flag communities (counties or tracts) with low data confidence for manual review. If the data is too poor, don’t let the algorithm alone make the call. For instance, the algorithm could produce a preliminary list of priority areas but mark any area with “Low Confidence” data (like those >10% MOE counties, or tracts with >50% MOE on key stats) for human analysts to double-check. This ensures places like Shannon County or Texas County are not ignored just because of shaky data.

6.3 Specific Recommendations

Your Task: Create a decision framework for algorithm implementation.

# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category

# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"  
# - Low Confidence: "Requires manual review or additional data"
recommendations <- income_reliability %>%
  mutate(
    recommendation = case_when(
      reliability == "High Confidence" ~ "Safe for algorithmic decisions",
      reliability == "Moderate Confidence" ~ "Use with caution - monitor outcomes",
      reliability == "Low Confidence" ~ "Requires manual review or additional data",
      TRUE ~ NA_character_
    )
  ) %>%
  select(
    County = NAME,
    `Median Income (Estimate)` = median_incomeE,
    `MOE %` = income_moe_pct,
    `Reliability` = reliability,
    Recommendation = recommendation
  )

kable(
  head(recommendations, 10),
  caption = "Decision Framework for Algorithm Implementation"
)

Decision Framework for Algorithm Implementation
County	Median Income (Estimate)	MOE %	Reliability	Recommendation
Adair	51020	8.682870	Moderate Confidence	Use with caution - monitor outcomes
Andrew	68774	6.944485	Moderate Confidence	Use with caution - monitor outcomes
Atchison	58521	6.298594	Moderate Confidence	Use with caution - monitor outcomes
Audrain	51745	4.462267	High Confidence	Safe for algorithmic decisions
Barry	55592	9.686646	Moderate Confidence	Use with caution - monitor outcomes
Barton	48105	11.591311	Low Confidence	Requires manual review or additional data
Bates	54122	9.574665	Moderate Confidence	Use with caution - monitor outcomes
Benton	50229	7.722630	Moderate Confidence	Use with caution - monitor outcomes
Bollinger	52306	16.525829	Low Confidence	Requires manual review or additional data
Boone	66564	2.665104	High Confidence	Safe for algorithmic decisions

# Format as a professional table with kable()

Key Recommendations:

Counties suitable for immediate algorithmic implementation: Major metropolitan and other large counties with reliable data (High Confidence). For Missouri, these include places like St. Louis County, St. Louis City, Jackson County, St. Charles County, Clay County, Greene County, and a few others. These areas have MOEs well under 5% for key metrics. The department can confidently use the algorithm to drive decisions in these locales because any ranking or prioritization based on ACS data is grounded in fairly accurate information.
Counties requiring additional oversight: Mid-sized or somewhat smaller counties with moderate data confidence. This list might include Buchanan, Boone, Cole, Jasper, Newton, Platte, Cape Girardeau, etc., roughly counties with populations in the few tens of thousands up to around 100k. In these cases, the algorithm’s output should be reviewed by staff. For instance, if the algorithm ranks Buchanan County as the 10th highest need, because of moderate MOE, staff might double-check recent economic conditions in Buchanan (maybe there was a plant closure not reflected fully in the 2018–2022 data, or maybe the MOE means it could actually rank 8th or 12th). For example, if funds were given to a county for outreach but uptake is low, was it because the need was overestimated? Or if a county not prioritized starts showing signs of distress, was it an oversight due to data noise? Essentially
Counties needing alternative approaches: Counties with low confidence data (mostly rural counties and those flagged earlier like Shannon, Carter, Ozark, Mississippi, Mercer, and many others). In these cases, I recommend manual review and supplementary analysis as a prerequisite for decision-making. The algorithm might initially rank these places oddly (perhaps not high need because of an overestimated median income or not low need because of a weird population estimate).

Look at local poverty indicators (e.g., school district free lunch percentages, local food pantry demand).
Consult qualitative reports (maybe county commissioners or local non-profits can speak to the community’s situation).
Possibly use regional grouping: If one county’s data is flaky, consider looking at a cluster of surrounding similar counties to infer needs.

Questions for Further Investigation

We observed that many of the highest-MOE counties cluster in certain regions (e.g., the Ozarks). A deeper spatial analysis could reveal regional trends. So, are there geographic clusters of poor data quality?
It would be insightful to examine if data reliability is improving or worsening over time. For instance, how do the 2018–2022 ACS margins of error compare to 2010–2014 ACS for these same counties?
We focused on median income and a few racial groups. What about other variables that an algorithm might use? For example, poverty rates, unemployment rates, education levels, or age distributions by tract/county.
what factors best predict high MOE or data issues? Our analysis suggests population size and homogeneity are factors. We could formally test correlations: e.g., does a lower response rate or a higher proportion of rental housing correlate with higher MOEs?

Technical Notes

Data Sources: All data for this analysis comes from the U.S. Census Bureau’s American Community Survey (ACS) 2018–2022 5-Year Estimates. The data was accessed via the tidycensus R package. Key tables used were:

B19013: Median Household Income in the Past 12 Months (in 2022 inflation-adjusted dollars) – for county-level median income and MOE. B01003: Total Population – for county populations. B03002: Hispanic or Latino Origin by Race – for tract-level total population and breakdown by White (non-Hispanic), Black (non-Hispanic), and Hispanic (any race) populations and their MOEs.

Reproducibility: The analysis was conducted using R (version 4.x) and the following main packages: tidycensus for data retrieval, tidyverse (dplyr, stringr) for data manipulation, and knitr/kable for presenting tables. A Census API key (personal to the analyst) was used to authenticate data requests; this key is required to replicate the data pulls. All code and documentation for this analysis are available in https://musa-5080-fall-2025.github.io/portfolio-setup-lluluciano0505/

Methodology Notes: a. Reliability Thresholds: We defined “High”, “Moderate”, and “Low” confidence using specific MOE percentage cutoffs (5% and 10%). These thresholds are somewhat arbitrary but are common sense rules of thumb in survey analysis. A 5% MOE indicates a very tight estimate, while beyond 10% starts to indicate caution. b. High MOE Flag at Tract Level: I chose a 50% MOE as the flag criterion for tract-level demographic data. The assignment prompt mentioned 15% as a possible threshold to consider “unreliable,” but we observed that using 15% would flag virtually every tract. This decision was to focus discussion on the worst data problems; however, it means we understate the prevalence of “moderate” data issues. In reality, even a 20% MOE might be problematic for decision-making, so our approach was a conservative one. c. County Selection: We intentionally picked one county for each reliability category to compare (rather than random selection). This allowed us to illustrate contrasts, but it means some findings (like patterns in St. Louis vs. Texas County) are examples, not exhaustive. If a different high confidence county were chosen (say, Jackson County instead of St. Louis County), many patterns would be similar, but specific numbers would differ.

Limitations: Margin of Error Interpretation: All MOEs discussed are at the 90% confidence level. This means there is a 10% chance the true value lies outside the given interval. If one prefers 95% confidence, margins would be wider. For simplicity, we stuck to ACS’s MOE as given. ACS Data Limitations: The ACS 5-year data, while comprehensive, has known limitations. Small or hard-to-reach populations (e.g., homeless individuals, transient laborers, undocumented immigrants) may be undercounted. Our analysis doesn’t correct for any undercount bias; it only measures sampling error (MOE). —