Assignment 1: Census Data Quality for Policy Decisions

Evaluating Data Reliability for Algorithmic Decision-Making

Author

Isabelle Li

Published

December 11, 2025

Assignment Overview

Scenario

You are a data analyst for the Mississippi Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.

Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.

Learning Objectives

  • Apply dplyr functions to real census data for policy analysis
  • Evaluate data quality using margins of error
  • Connect technical analysis to algorithmic decision-making
  • Identify potential equity implications of data reliability issues
  • Create professional documentation for policy stakeholders

Submission Instructions

Submit by posting your updated portfolio link on Canvas. Your assignment should be accessible at your-portfolio-url/assignments/assignment_1/

Make sure to update your _quarto.yml navigation to include this assignment under an “Assignments” menu.

Part 1: Portfolio Integration

Create this assignment in your portfolio repository under an assignments/assignment_1/ folder structure. Update your navigation menu to include:

- text: Assignments
  menu:
    - href: assignments/assignment_1/your_file_name.qmd
      text: "Assignment 1: Census Data Exploration"

If there is a special character like comma, you need use double quote mark so that the quarto can identify this as text

Setup

# Load required packages (hint: you need tidycensus, tidyverse, and knitr)
library(tidycensus)
library(tidyverse)
library(knitr)
library(scales)

# Set your Census API key
readRenviron("~/.Renviron")
Sys.getenv("CENSUS_API_KEY")
[1] ""
# Choose your state for analysis - assign it to a variable called my_state
my_state<- "Mississippi"

State Selection: I have chosen Mississippi for this analysis because: Mississippi is one of the states with the highest poverty rates and significant rural populations in the United States. These factors make it especially important to evaluate the quality and reliability of census data,as smaller and more rural communities often have higher margins of error in survey estimates.

Part 2: County-Level Resource Assessment

2.1 Data Retrieval

Your Task: Use get_acs() to retrieve county-level data for your chosen state.

Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide

Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.

# Write your get_acs() code here
acs_data <- get_acs(
  geography ="county",
  state =my_state,
  variables=c(
    median_income="B19013_001",
    total_population="B01003_001"
  ),
  year=2022,
  survey="acs5",
  output="wide"
)
# Clean the county names to remove state name and "County" 
library(dplyr)
library(stringr)

acs_data_clean <- acs_data %>%
  mutate(
    county=NAME %>%
      str_remove(", Mississippi$")%>%
      str_remove(" County$")%>%
      str_remove(" Parish$")
    )%>%
 select(county,everything(),-NAME)    
  
# Hint: use mutate() with str_remove()

# Display the first few rows
head(acs_data_clean)
# A tibble: 6 × 6
  county GEOID median_incomeE median_incomeM total_populationE total_populationM
  <chr>  <chr>          <dbl>          <dbl>             <dbl>             <dbl>
1 Adams  28001          37271           4671             29425                NA
2 Alcorn 28003          47716           4160             34717                NA
3 Amite  28005          34866           3839             12683                NA
4 Attala 28007          42680           5034             17842                NA
5 Benton 28009          38750           3034              7637                NA
6 Boliv… 28011          37845           3176             30688                NA

2.2 Data Quality Assessment

Your Task: Calculate margin of error percentages and create reliability categories.

Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)

Hint: Use mutate() with case_when() for the categories.

# Calculate MOE percentage and reliability categories using mutate()
acs_income_quality<-acs_data_clean %>%
  mutate(
    income_moe_pct=100*median_incomeM/median_incomeE,
    income_moe_pct=if_else(is.finite(income_moe_pct),income_moe_pct,NA_real_),
    income_reliability=case_when(
      is.na(income_moe_pct)~"Missing",
      income_moe_pct<5 ~ "High Confidence",
      income_moe_pct<=10 ~ "Moderate Confidence",
      TRUE ~ "Low Confidence"
    ),
    income_unreliable=income_moe_pct>10
  )
# Create a summary showing count of counties in each reliability category
income_reliability_summary <- acs_income_quality %>%
  count(income_reliability,name="n") %>%
  mutate(
    pct=100*n/sum(n),
    pct=round(pct,1)
  )%>%
  arrange(factor(income_reliability,
                 levels=c("High Confidence","Moderate Confidence","Low Confidence","Missing")))

head(acs_income_quality)
# A tibble: 6 × 9
  county GEOID median_incomeE median_incomeM total_populationE total_populationM
  <chr>  <chr>          <dbl>          <dbl>             <dbl>             <dbl>
1 Adams  28001          37271           4671             29425                NA
2 Alcorn 28003          47716           4160             34717                NA
3 Amite  28005          34866           3839             12683                NA
4 Attala 28007          42680           5034             17842                NA
5 Benton 28009          38750           3034              7637                NA
6 Boliv… 28011          37845           3176             30688                NA
# ℹ 3 more variables: income_moe_pct <dbl>, income_reliability <chr>,
#   income_unreliable <lgl>
income_reliability_summary
# A tibble: 3 × 3
  income_reliability      n   pct
  <chr>               <int> <dbl>
1 High Confidence         9  11  
2 Moderate Confidence    30  36.6
3 Low Confidence         43  52.4

2.3 High Uncertainty Counties

Your Task: Identify the 5 counties with the highest MOE percentages.

Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()

Hint: Use arrange(), slice(), and select() functions.

# Create table of top 5 counties by MOE percentage
top5_high_uncertainty<-acs_income_quality %>%
  arrange(desc(income_moe_pct)) %>%
  slice_head(n=5) %>%
  select(
    county,
    median_incomeE,
    median_incomeM,
    income_moe_pct,
    income_reliability
  ) %>%
  mutate(
    'Median Household Income'=dollar(median_incomeE),
    'Margin of Error'=dollar(median_incomeM),
    'MOE(%)'=round(income_moe_pct,1),
    'Reliability'=income_reliability
  ) %>%
  select(
    County=county,
    'Median Household Income',
    'Margin of Error',
    'MOE(%)',
    Reliability
  )
# Format as table with kable() - include appropriate column names and caption
kable(
  top5_high_uncertainty,
  caption="Top 5 Mississippi Counties by Median Income MOE Percentage"
)
Top 5 Mississippi Counties by Median Income MOE Percentage
County Median Household Income Margin of Error MOE(%) Reliability
Carroll $42,285 $14,597 34.5 Low Confidence
Issaquena $17,900 $5,191 29.0 Low Confidence
Sharkey $41,000 $10,196 24.9 Low Confidence
Tunica $41,676 $10,036 24.1 Low Confidence
Wilkinson $34,928 $8,245 23.6 Low Confidence

Data Quality Commentary:

These results show that several rural counties in Mississippi, such as Carroll, Issaquena, and Sharkey, have income estimates with very high margins of error,making them less reliable for decision-making. If an algorithm were to rely on these estimates without accounting for uncertainty, these communities could be misclassified and potentially receive less funding or outreach than they need. Higher uncertainty is often linked to small population sizes, lower survey response rates, and greater variability in economic condition, all of which are more common in sparsely populated rural areas.

Part 3: Neighborhood-Level Analysis

3.1 Focus Area Selection

Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.

Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.

# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
acs_income_quality %>%
  group_by(income_reliability) %>%
  slice_head(n=3) %>%
  select(county,income_reliability)
# A tibble: 9 × 2
# Groups:   income_reliability [3]
  county   income_reliability 
  <chr>    <chr>              
1 DeSoto   High Confidence    
2 Harrison High Confidence    
3 Jackson  High Confidence    
4 Adams    Low Confidence     
5 Amite    Low Confidence     
6 Attala   Low Confidence     
7 Alcorn   Moderate Confidence
8 Benton   Moderate Confidence
9 Bolivar  Moderate Confidence
#High Confidence:DeSoto Moderate Confidence:Alcorn Low Confidence:Adams
selected_counties <- acs_income_quality %>%
  filter(county %in% c("DeSoto","Alcorn","Adams")) %>%
  mutate('MOE(%)'=round(income_moe_pct,1)) %>%
  select(
    County=county,
    'Median Household Income'=median_incomeE,
    'Margin of Error'=median_incomeM,
    'MOE(%)',
    Reliability=income_reliability
  )
selected_counties
# A tibble: 3 × 5
  County `Median Household Income` `Margin of Error` `MOE(%)` Reliability       
  <chr>                      <dbl>             <dbl>    <dbl> <chr>             
1 Adams                      37271              4671     12.5 Low Confidence    
2 Alcorn                     47716              4160      8.7 Moderate Confiden…
3 DeSoto                     79666              2535      3.2 High Confidence   
# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category

Comment on the output: These results illustrate how data reliability varies across counties. DeSoto, a large suburban county, has a very low MOE(3.2%), meaning its income estimate is highly reliable. Alcorn falls into the moderate range with an MOE of 8.7%, while Adams has much higher MOE(12.5%),indicating lower confidence. This comparison highlights how smaller or more rural counties like Adams may be poorly represented in survey data, which could affect the fairness of algorithmic decisions that rely on these estimates.

3.2 Tract-Level Demographics

Your Task: Get demographic data for census tracts in your selected counties.

Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.

# Define your race/ethnicity variables with descriptive names
target_counties <- c("Adams", "Alcorn", "DeSoto")

data("fips_codes")
ms_fips <- fips_codes %>%
  filter(state == "MS") %>%                         
  mutate(county_clean = str_remove(county, " County$")) 
county_fips_tbl <- ms_fips %>%
  filter(county_clean %in% target_counties) %>%
  select(county = county_clean, county_code) %>%
  mutate(county_code = as.character(county_code))       

county_fips_tbl
     county county_code
1402  Adams         001
1403 Alcorn         003
1418 DeSoto         033
race_vars <- c(
  total_pop = "B03002_001",
  white="B03002_003",
  black="B03002_004",
  hispanic="B03002_012"
)
# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
county_fips <- c("001","003","033")

tract_demo <- get_acs(
  geography = "tract",
  state=my_state,
  county=county_fips,
  variables=race_vars,
  year=2022,
  survey="acs5",
  output="wide"
)
# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations
tract_demo_pct <- tract_demo %>%
  mutate(
    pct_white=if_else(total_popE>0,100*whiteE/total_popE,NA_real_),
    pct_black    = if_else(total_popE > 0, 100 * blackE   / total_popE, NA_real_),
    pct_hispanic = if_else(total_popE > 0, 100 * hispanicE/ total_popE, NA_real_)
  ) %>%
# Add readable tract and county name columns using str_extract() or similar
  mutate(
    tract_name = str_replace(NAME, "[,;]\\s*[^,;]+ County[,;]\\s*[^,;]+$", ""),
    county_name = str_extract(NAME, "[,;]\\s*[^,;]+ County") %>%
                  str_remove("^[,;]\\s*") %>%
                  str_remove("\\s*County$")
  ) %>%
  
  select(
    tract_geoid=GEOID,
    tract_name,
    county_name,
    total_popE,total_popM,
    whiteE,whiteM,
    blackE,blackM,
    hispanicE,hispanicM,
    pct_white,pct_black,pct_hispanic
  )

head(tract_demo_pct)
# A tibble: 6 × 14
  tract_geoid tract_name  county_name total_popE total_popM whiteE whiteM blackE
  <chr>       <chr>       <chr>            <dbl>      <dbl>  <dbl>  <dbl>  <dbl>
1 28001000101 Census Tra… Adams             4710        704   3087    399    776
2 28001000102 Census Tra… Adams             2350        557   1340    271    758
3 28001000200 Census Tra… Adams             3930        560    379    198   2891
4 28001000300 Census Tra… Adams             1774        443     33     34   1678
5 28001000400 Census Tra… Adams             2983        610     14      2   2575
6 28001000500 Census Tra… Adams             2966        430    802    137   2082
# ℹ 6 more variables: blackM <dbl>, hispanicE <dbl>, hispanicM <dbl>,
#   pct_white <dbl>, pct_black <dbl>, pct_hispanic <dbl>

3.3 Demographic Analysis

Your Task: Analyze the demographic patterns in your selected areas.

# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
top_hispanic_tract <- tract_demo_pct %>%
  filter(county_name %in% selected_counties$County)%>%
  arrange(desc(pct_hispanic))%>%
  slice_head(n=1)%>%
  transmute(
    'Tract GEOID'=tract_geoid,
    'Tract Name'=tract_name,
    County=county_name,
    'Hispanic/Latino (%)'=round(pct_hispanic,1),
    'Total Pop(E)'=total_popE
  )
  
kable(top_hispanic_tract,
      caption="Top Tract by % Hispanic/Latino")
Top Tract by % Hispanic/Latino
Tract GEOID Tract Name County Hispanic/Latino (%) Total Pop(E)
28001000200 Census Tract 2 Adams 15.3 3930
# Calculate average demographics by county using group_by() and summarize()
county_demo_summary <- tract_demo_pct %>%
  filter(county_name %in% selected_counties$County)%>%
  group_by(County=county_name)%>%
  summarize(
    `Avg White (%)`    = round(mean(pct_white,    na.rm = TRUE), 1),
    `Avg Black (%)`    = round(mean(pct_black,    na.rm = TRUE), 1),
    `Avg Hispanic (%)` = round(mean(pct_hispanic, na.rm = TRUE), 1),
    .groups = "drop"
  )

# Show: number of tracts, average percentage for each racial/ethnic group

# Create a nicely formatted table of your results using kable()

kable(county_demo_summary, caption = "Average Demographics by County")
Average Demographics by County
County Avg White (%) Avg Black (%) Avg Hispanic (%)
Adams 34.9 56.1 6.3
Alcorn 81.1 9.4 3.5
DeSoto 60.2 30.2 5.4

Part 4: Comprehensive Data Quality Evaluation

4.1 MOE Analysis for Demographic Variables

Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.

Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics

# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
tract_demo_moe <- tract_demo_pct %>%
  filter(county_name %in% selected_counties$County) %>%
  mutate(
    white_moe_pct=if_else(whiteE>0,100*whiteM/whiteE,NA_real_),
    black_moe_pct=if_else(blackE>0,100*blackM/blackE,NA_real_),
    hispanic_moe_pct=if_else(hispanicE>0,100*hispanicM/hispanicE,NA_real_)
  )%>%
  
# Create a flag for tracts with high MOE on any demographic variable
   mutate(
    high_moe_white    = white_moe_pct    > 15,
    high_moe_black    = black_moe_pct    > 15,
    high_moe_hispanic = hispanic_moe_pct > 15,
    high_moe_any      = coalesce(high_moe_white, FALSE) |
                        coalesce(high_moe_black, FALSE) |
                        coalesce(high_moe_hispanic, FALSE)
  )
# Use logical operators (| for OR) in an ifelse() statement
# Create summary statistics showing how many tracts have data quality issues
overall_moe_summary<-tract_demo_moe %>%
  summarize(
    Tracts=n(),
    high_moe_any_n = sum(high_moe_any, na.rm = TRUE),
    pct_tracts = round(100 * high_moe_any_n / Tracts, 1)
  )
by_county_moe_summary<-tract_demo_moe %>%
  group_by(County=county_name)%>%
  summarize(
    Tracts=n(),
    high_moe_any_n = sum(high_moe_any, na.rm = TRUE),
    pct_tracts = round(100 * high_moe_any_n / Tracts, 1)
  )

kable(overall_moe_summary,caption="Overall: Tracts with high MOE")
Overall: Tracts with high MOE
Tracts high_moe_any_n pct_tracts
61 61 100
kable(by_county_moe_summary,caption="By County: Tracts with high MOE")
By County: Tracts with high MOE
County Tracts high_moe_any_n pct_tracts
Adams 10 10 100
Alcorn 10 10 100
DeSoto 41 41 100

4.2 Pattern Analysis

Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.

# Group tracts by whether they have high MOE issues
pattern_summary<-tract_demo_moe %>%
  group_by(MOE=if_else(high_moe_any,"High MOE(>50%)","Lower MOE"))%>%
   summarize(
    `Avg White (%)`    = round(mean(pct_white,    na.rm = TRUE), 1),
    `Avg Black (%)`    = round(mean(pct_black,    na.rm = TRUE), 1),
    `Avg Hispanic (%)` = round(mean(pct_hispanic, na.rm = TRUE), 1),
    .groups = "drop"
  )

kable(pattern_summary,
      caption="Comparsion of Tracts by MOE")
Comparsion of Tracts by MOE
MOE Avg White (%) Avg Black (%) Avg Hispanic (%)
High MOE(>50%) 59.5 31 5.3
# Calculate average characteristics for each group:
# - population size, demographic percentages

# Use group_by() and summarize() to create this comparison
# Create a professional table showing the patterns

Pattern Analysis: DeSoto County demonstrates that large, more prosperous, and suburban counties can produce low MOE values with diverse racial compositions, due to higher population sizes ad more stable survey estimates. In contrast, Adams County combines racial diversity with a smaller, more rural population base, leading to higher margins of error and less reliable data. This suggests that population size and socioeconomic context, not just diversity, are key drivers of data reliability.

Part 5: Policy Recommendations

5.1 Analysis Integration and Professional Summary

Your Task: Write an executive summary that integrates findings from all four analyses.

Executive Summary Requirements: 1. Overall Pattern Identification: What are the systematic patterns across all your analyses? 2. Equity Assessment: Which communities face the greatest risk of algorithmic bias based on your findings? 3. Root Cause Analysis: What underlying factors drive both data quality issues and bias risk? 4. Strategic Recommendations: What should the Department implement to address these systematic issues?

Executive Summary:

Overall Pattern Identification

Our analysis of Mississippi’s county, and tract-level ACS data reveals systematic differences in data reliability across the state. Larger, more urbanized, and higher-income counties such as DeSoto, Jackson, and Madison consistently show low margins of error (MOE < 5%), making them suitable for algorithmic decision-making. In contrast, smaller, rural, and economically disadvantaged counties—such as Adams, Carroll, Sharkey, and Wilkinson exhibit much higher MOEs, often exceeding 10–20%. These patterns highlight a divide between counties where survey data is robust and those where uncertainty limits the safe use of algorithmic systems.

Equity Assessment

Communities at the greatest risk of algorithmic bias are those with both high demographic diversity and weaker survey reliability. Rural Black-majority counties such as Adams, or counties with small Hispanic populations, often display elevated MOEs. If algorithms rely on these estimates without adjustment, they risk systematically underrepresenting marginalized groups and misallocating resources away from the very populations most in need of support. By contrast, suburban counties with stable data will receive more consistent outcomes, reinforcing existing inequities between regions.

Root Cause Analysis

The root causes of these disparities lie in structural differences in population size, socioeconomic context, and geographic accessibility. Smaller populations with smaller ACS sample sizes, amplifying statistical uncertainty. Rural areas often face higher survey nonresponse rates and logistical barriers that further reduce data precision. At the demographic level, minority populations within smaller tracts tend to have particularly high MOEs, not because their needs are less, but because the survey infrastructure is less effective at capturing their experiences.

Strategic Recommendations

To address these challenges, the Department should implement a tiered algorithmic framework. For counties with high-confidence data, algorithmic allocation can proceed with minimal oversight. For counties with moderate confidence, algorithms should be deployed but coupled with regular audits and feedback loops to monitor performance. For low-confidence counties, algorithmic reliance should be minimized, with decisions supplemented by local administrative data, targeted surveys, and manual review by policy experts. Beyond the immediate framework, the Department should invest in improving data infrastructure expanding local survey capacity, partnering with community organizations, and ensuring demographic groups with historically unreliable estimates are better represented.

6.3 Specific Recommendations

Your Task: Create a decision framework for algorithm implementation.

# Create a summary table using your county reliability data
recommendation_tbl<-acs_income_quality %>%
  transmute(
    County=county,
    'Median Household Income'=median_incomeE,
    'MOE(%)'=round(income_moe_pct,1),
    'Reliability'=income_reliability,
    'Algorithm Recommendation'=case_when(
      income_reliability=="High Confidence"~"Safe for algorithmic decisions",
      income_reliability=="Moderate Confidence"~"Use with caution - monitor outcomes",
      income_reliability=="Low Confidence"~"Requires manual review or additional data"
    )
  )%>%
  arrange(factor('Reliability',
                 levels=c("High Confidence","Moderate Confidence","Low Confidence")),
          desc('MOE(%)'))

kable(
  recommendation_tbl,
  caption="County-Level Data Reliability&Algorithmic Decisions"
)
County-Level Data Reliability&Algorithmic Decisions
County Median Household Income MOE(%) Reliability Algorithm Recommendation
Adams 37271 12.5 Low Confidence Requires manual review or additional data
Alcorn 47716 8.7 Moderate Confidence Use with caution - monitor outcomes
Amite 34866 11.0 Low Confidence Requires manual review or additional data
Attala 42680 11.8 Low Confidence Requires manual review or additional data
Benton 38750 7.8 Moderate Confidence Use with caution - monitor outcomes
Bolivar 37845 8.4 Moderate Confidence Use with caution - monitor outcomes
Calhoun 44505 8.5 Moderate Confidence Use with caution - monitor outcomes
Carroll 42285 34.5 Low Confidence Requires manual review or additional data
Chickasaw 40224 7.7 Moderate Confidence Use with caution - monitor outcomes
Choctaw 41887 10.7 Low Confidence Requires manual review or additional data
Claiborne 34282 11.4 Low Confidence Requires manual review or additional data
Clarke 46329 19.8 Low Confidence Requires manual review or additional data
Clay 37412 10.5 Low Confidence Requires manual review or additional data
Coahoma 36075 7.1 Moderate Confidence Use with caution - monitor outcomes
Copiah 46889 7.7 Moderate Confidence Use with caution - monitor outcomes
Covington 40164 14.3 Low Confidence Requires manual review or additional data
DeSoto 79666 3.2 High Confidence Safe for algorithmic decisions
Forrest 49340 5.6 Moderate Confidence Use with caution - monitor outcomes
Franklin 43942 14.4 Low Confidence Requires manual review or additional data
George 51349 9.5 Moderate Confidence Use with caution - monitor outcomes
Greene 50000 14.7 Low Confidence Requires manual review or additional data
Grenada 45745 10.8 Low Confidence Requires manual review or additional data
Hancock 63623 5.0 Moderate Confidence Use with caution - monitor outcomes
Harrison 55211 4.3 High Confidence Safe for algorithmic decisions
Hinds 48596 5.6 Moderate Confidence Use with caution - monitor outcomes
Holmes 28818 7.9 Moderate Confidence Use with caution - monitor outcomes
Humphreys 31907 11.5 Low Confidence Requires manual review or additional data
Issaquena 17900 29.0 Low Confidence Requires manual review or additional data
Itawamba 57252 12.7 Low Confidence Requires manual review or additional data
Jackson 60045 3.9 High Confidence Safe for algorithmic decisions
Jasper 43914 11.8 Low Confidence Requires manual review or additional data
Jefferson 31544 13.2 Low Confidence Requires manual review or additional data
Jefferson Davis 36473 15.4 Low Confidence Requires manual review or additional data
Jones 49451 7.7 Moderate Confidence Use with caution - monitor outcomes
Kemper 42947 9.8 Moderate Confidence Use with caution - monitor outcomes
Lafayette 59748 5.3 Moderate Confidence Use with caution - monitor outcomes
Lamar 67972 4.5 High Confidence Safe for algorithmic decisions
Lauderdale 45649 5.0 Moderate Confidence Use with caution - monitor outcomes
Lawrence 41096 9.4 Moderate Confidence Use with caution - monitor outcomes
Leake 46669 8.3 Moderate Confidence Use with caution - monitor outcomes
Lee 64479 5.4 Moderate Confidence Use with caution - monitor outcomes
Leflore 33115 8.3 Moderate Confidence Use with caution - monitor outcomes
Lincoln 47069 7.8 Moderate Confidence Use with caution - monitor outcomes
Lowndes 53687 4.2 High Confidence Safe for algorithmic decisions
Madison 79105 3.6 High Confidence Safe for algorithmic decisions
Marion 38399 7.7 Moderate Confidence Use with caution - monitor outcomes
Marshall 51431 14.5 Low Confidence Requires manual review or additional data
Monroe 51190 4.7 High Confidence Safe for algorithmic decisions
Montgomery 36845 16.3 Low Confidence Requires manual review or additional data
Neshoba 47400 11.6 Low Confidence Requires manual review or additional data
Newton 49160 8.8 Moderate Confidence Use with caution - monitor outcomes
Noxubee 42298 10.9 Low Confidence Requires manual review or additional data
Oktibbeha 42953 10.6 Low Confidence Requires manual review or additional data
Panola 47894 16.3 Low Confidence Requires manual review or additional data
Pearl River 54220 9.1 Moderate Confidence Use with caution - monitor outcomes
Perry 48333 10.9 Low Confidence Requires manual review or additional data
Pike 40131 5.7 Moderate Confidence Use with caution - monitor outcomes
Pontotoc 54414 9.3 Moderate Confidence Use with caution - monitor outcomes
Prentiss 51578 12.3 Low Confidence Requires manual review or additional data
Quitman 31192 12.5 Low Confidence Requires manual review or additional data
Rankin 76460 4.0 High Confidence Safe for algorithmic decisions
Scott 44968 15.5 Low Confidence Requires manual review or additional data
Sharkey 41000 24.9 Low Confidence Requires manual review or additional data
Simpson 50867 6.6 Moderate Confidence Use with caution - monitor outcomes
Smith 51983 20.9 Low Confidence Requires manual review or additional data
Stone 55894 11.5 Low Confidence Requires manual review or additional data
Sunflower 37403 10.6 Low Confidence Requires manual review or additional data
Tallahatchie 35428 12.1 Low Confidence Requires manual review or additional data
Tate 61286 8.3 Moderate Confidence Use with caution - monitor outcomes
Tippah 47968 9.4 Moderate Confidence Use with caution - monitor outcomes
Tishomingo 45545 14.5 Low Confidence Requires manual review or additional data
Tunica 41676 24.1 Low Confidence Requires manual review or additional data
Union 55970 9.4 Moderate Confidence Use with caution - monitor outcomes
Walthall 37145 17.3 Low Confidence Requires manual review or additional data
Warren 54117 3.9 High Confidence Safe for algorithmic decisions
Washington 38394 11.1 Low Confidence Requires manual review or additional data
Wayne 34875 12.9 Low Confidence Requires manual review or additional data
Webster 55657 12.1 Low Confidence Requires manual review or additional data
Wilkinson 34928 23.6 Low Confidence Requires manual review or additional data
Winston 45516 12.7 Low Confidence Requires manual review or additional data
Yalobusha 47006 11.3 Low Confidence Requires manual review or additional data
Yazoo 41867 10.6 Low Confidence Requires manual review or additional data

Key Recommendations:

Your Task: Use your analysis results to provide specific guidance to the department.

  1. Counties suitable for immediate algorithmic implementation:

These counties have High Confidence data (MOE < 5%), meaning the estimates are reliable enough for algorithmic allocation. Examples include DeSoto, Harrison, Jackson, Lamar, Lowndes, Madison, Monroe, Rankin, and Warren. These counties generally have larger or more affluent populations, which improves ACS survey precision. Algorithms can be implemented directly in these counties with minimal risk.

  1. Counties requiring additional oversight:

These are Moderate Confidence counties (MOE 5–10%). Examples include Alcorn (8.7%), Benton (7.8%), Forrest (5.6%), Lee (5.4%), Pike (5.7%) and many others across Mississippi. While the ACS data is usable, there is enough uncertainty that algorithmic decisions could introduce bias or misallocation if left unchecked. Recommendation: Algorithms may be applied here, but outcomes should be monitored carefully. Suggested oversight strategies include:

  • Conducting periodic audits of funding allocations.
  • Comparing algorithmic outputs with administrative or programmatic data.
  • Implementing feedback loops from local service providers to catch anomalies.
  1. Counties needing alternative approaches:

These are Low Confidence counties (MOE > 10%), such as Adams (12.5%), Covington (14.3%), Sharkey (24.9%), Wilkinson (23.6%), Carroll (34.5%) and many others. These counties tend to be smaller, more rural, or economically disadvantaged, which leads to higher survey uncertainty. Relying purely on ACS estimates here risks misidentifying need and exacerbating inequities. Recommendation: Algorithms should not be the sole basis for decision-making in these counties. Alternative approaches include:

  • Supplementing ACS data with local administrative records (e.g., school enrollment, SNAP participation).
  • Commissioning targeted surveys to improve reliability.
  • Using manual review and qualitative input from local stakeholders to guide funding decisions.

Questions for Further Investigation

  • Are certain racial or ethnic groups (e.g., Hispanic populations in smaller tracts) more likely to have unreliable estimates, and how might this bias algorithmic outcomes for those communities?
  • If we compare ACS estimates across multiple years, do the same counties consistently show high margins of error, or are reliability issues shifting over time?
  • Are counties with higher MOE values concentrated in rural regions, and how does this affect equitable distribution of resources across urban vs. rural Mississippi?

Technical Notes

Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [26/09/2025]

Reproducibility: - All analysis conducted in R version [4.5.1] - Census API key required for replication - Complete code and documentation available at: [https://musa-5080-fall-2025.github.io/portfolio-setup-Isabelliiii/]

Methodology Notes:

  • County selection was purposive: one high-confidence (DeSoto), one moderate-confidence (Alcorn), and one low-confidence (Adams) county were chosen for tract-level analysis to illustrate variation in data reliability.

  • Data cleaning steps included removing redundant suffixes from county names, calculating MOE percentages, and categorizing estimates into High, Moderate, and Low confidence tiers.

  • Demographic percentages (White, Black, Hispanic) were derived by dividing subgroup estimates by total population at the tract level.

  • Reliability thresholds (5% and 10% MOE) were selected based on standard practice in survey research, though different cutoffs could produce slightly different classifications.

Limitations:

  • ACS margins of error are systematically higher in small and rural counties, producing uneven reliability across geographies.

  • At the tract level, nearly all estimates exceeded the 15% MOE threshold. This reflects inherent sampling limitations of the ACS rather than coding errors, since smaller tracts with limited populations yield highly unstable estimates. As a result, tract-level classifications of “high MOE” should be interpreted as relative signals of data quality rather than absolute indicators of unusability.

  • Small denominators for minority populations inflate MOEs disproportionately, which may bias interpretations of racial/ethnic patterns.

  • Recommendations are based on statistical reliability alone; qualitative knowledge and administrative data sources were not incorporated but would be essential in real-world policy design.


Submission Checklist

Before submitting your portfolio link on Canvas:

Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/assignments/assignment_1/your_file_name.html