Assignment 1: Census Data Quality for Policy Decisions

Evaluating Data Reliability for Algorithmic Decision-Making

Author

Isabelle Li

Published

December 11, 2025

Assignment Overview

Scenario

You are a data analyst for the Mississippi Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.

Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.

Learning Objectives

Apply dplyr functions to real census data for policy analysis
Evaluate data quality using margins of error
Connect technical analysis to algorithmic decision-making
Identify potential equity implications of data reliability issues
Create professional documentation for policy stakeholders

Submission Instructions

Submit by posting your updated portfolio link on Canvas. Your assignment should be accessible at your-portfolio-url/assignments/assignment_1/

Make sure to update your _quarto.yml navigation to include this assignment under an “Assignments” menu.

Part 1: Portfolio Integration

Create this assignment in your portfolio repository under an assignments/assignment_1/ folder structure. Update your navigation menu to include:

- text: Assignments
  menu:
    - href: assignments/assignment_1/your_file_name.qmd
      text: "Assignment 1: Census Data Exploration"

If there is a special character like comma, you need use double quote mark so that the quarto can identify this as text

Setup

# Load required packages (hint: you need tidycensus, tidyverse, and knitr)
library(tidycensus)
library(tidyverse)
library(knitr)
library(scales)

# Set your Census API key
readRenviron("~/.Renviron")
Sys.getenv("CENSUS_API_KEY")

[1] ""

# Choose your state for analysis - assign it to a variable called my_state
my_state<- "Mississippi"

State Selection: I have chosen Mississippi for this analysis because: Mississippi is one of the states with the highest poverty rates and significant rural populations in the United States. These factors make it especially important to evaluate the quality and reliability of census data,as smaller and more rural communities often have higher margins of error in survey estimates.

Part 2: County-Level Resource Assessment

2.1 Data Retrieval

Your Task: Use get_acs() to retrieve county-level data for your chosen state.

Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide

Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.

# Write your get_acs() code here
acs_data <- get_acs(
  geography ="county",
  state =my_state,
  variables=c(
    median_income="B19013_001",
    total_population="B01003_001"
  ),
  year=2022,
  survey="acs5",
  output="wide"
)
# Clean the county names to remove state name and "County" 
library(dplyr)
library(stringr)

acs_data_clean <- acs_data %>%
  mutate(
    county=NAME %>%
      str_remove(", Mississippi$")%>%
      str_remove(" County$")%>%
      str_remove(" Parish$")
    )%>%
 select(county,everything(),-NAME)    
  
# Hint: use mutate() with str_remove()

# Display the first few rows
head(acs_data_clean)

# A tibble: 6 × 6
  county GEOID median_incomeE median_incomeM total_populationE total_populationM
  <chr>  <chr>          <dbl>          <dbl>             <dbl>             <dbl>
1 Adams  28001          37271           4671             29425                NA
2 Alcorn 28003          47716           4160             34717                NA
3 Amite  28005          34866           3839             12683                NA
4 Attala 28007          42680           5034             17842                NA
5 Benton 28009          38750           3034              7637                NA
6 Boliv… 28011          37845           3176             30688                NA

2.2 Data Quality Assessment

Your Task: Calculate margin of error percentages and create reliability categories.

Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)

Hint: Use mutate() with case_when() for the categories.

# Calculate MOE percentage and reliability categories using mutate()
acs_income_quality<-acs_data_clean %>%
  mutate(
    income_moe_pct=100*median_incomeM/median_incomeE,
    income_moe_pct=if_else(is.finite(income_moe_pct),income_moe_pct,NA_real_),
    income_reliability=case_when(
      is.na(income_moe_pct)~"Missing",
      income_moe_pct<5 ~ "High Confidence",
      income_moe_pct<=10 ~ "Moderate Confidence",
      TRUE ~ "Low Confidence"
    ),
    income_unreliable=income_moe_pct>10
  )
# Create a summary showing count of counties in each reliability category
income_reliability_summary <- acs_income_quality %>%
  count(income_reliability,name="n") %>%
  mutate(
    pct=100*n/sum(n),
    pct=round(pct,1)
  )%>%
  arrange(factor(income_reliability,
                 levels=c("High Confidence","Moderate Confidence","Low Confidence","Missing")))

head(acs_income_quality)

# A tibble: 6 × 9
  county GEOID median_incomeE median_incomeM total_populationE total_populationM
  <chr>  <chr>          <dbl>          <dbl>             <dbl>             <dbl>
1 Adams  28001          37271           4671             29425                NA
2 Alcorn 28003          47716           4160             34717                NA
3 Amite  28005          34866           3839             12683                NA
4 Attala 28007          42680           5034             17842                NA
5 Benton 28009          38750           3034              7637                NA
6 Boliv… 28011          37845           3176             30688                NA
# ℹ 3 more variables: income_moe_pct <dbl>, income_reliability <chr>,
#   income_unreliable <lgl>

income_reliability_summary

# A tibble: 3 × 3
  income_reliability      n   pct
  <chr>               <int> <dbl>
1 High Confidence         9  11  
2 Moderate Confidence    30  36.6
3 Low Confidence         43  52.4

2.3 High Uncertainty Counties

Your Task: Identify the 5 counties with the highest MOE percentages.

Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()

Hint: Use arrange(), slice(), and select() functions.

# Create table of top 5 counties by MOE percentage
top5_high_uncertainty<-acs_income_quality %>%
  arrange(desc(income_moe_pct)) %>%
  slice_head(n=5) %>%
  select(
    county,
    median_incomeE,
    median_incomeM,
    income_moe_pct,
    income_reliability
  ) %>%
  mutate(
    'Median Household Income'=dollar(median_incomeE),
    'Margin of Error'=dollar(median_incomeM),
    'MOE(%)'=round(income_moe_pct,1),
    'Reliability'=income_reliability
  ) %>%
  select(
    County=county,
    'Median Household Income',
    'Margin of Error',
    'MOE(%)',
    Reliability
  )
# Format as table with kable() - include appropriate column names and caption
kable(
  top5_high_uncertainty,
  caption="Top 5 Mississippi Counties by Median Income MOE Percentage"
)

Top 5 Mississippi Counties by Median Income MOE Percentage
County	Median Household Income	Margin of Error	MOE(%)	Reliability
Carroll	$42,285	$14,597	34.5	Low Confidence
Issaquena	$17,900	$5,191	29.0	Low Confidence
Sharkey	$41,000	$10,196	24.9	Low Confidence
Tunica	$41,676	$10,036	24.1	Low Confidence
Wilkinson	$34,928	$8,245	23.6	Low Confidence

Data Quality Commentary:

These results show that several rural counties in Mississippi, such as Carroll, Issaquena, and Sharkey, have income estimates with very high margins of error,making them less reliable for decision-making. If an algorithm were to rely on these estimates without accounting for uncertainty, these communities could be misclassified and potentially receive less funding or outreach than they need. Higher uncertainty is often linked to small population sizes, lower survey response rates, and greater variability in economic condition, all of which are more common in sparsely populated rural areas.

Part 3: Neighborhood-Level Analysis

3.1 Focus Area Selection

Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.

Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.

# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
acs_income_quality %>%
  group_by(income_reliability) %>%
  slice_head(n=3) %>%
  select(county,income_reliability)

# A tibble: 9 × 2
# Groups:   income_reliability [3]
  county   income_reliability 
  <chr>    <chr>              
1 DeSoto   High Confidence    
2 Harrison High Confidence    
3 Jackson  High Confidence    
4 Adams    Low Confidence     
5 Amite    Low Confidence     
6 Attala   Low Confidence     
7 Alcorn   Moderate Confidence
8 Benton   Moderate Confidence
9 Bolivar  Moderate Confidence

#High Confidence:DeSoto Moderate Confidence:Alcorn Low Confidence:Adams
selected_counties <- acs_income_quality %>%
  filter(county %in% c("DeSoto","Alcorn","Adams")) %>%
  mutate('MOE(%)'=round(income_moe_pct,1)) %>%
  select(
    County=county,
    'Median Household Income'=median_incomeE,
    'Margin of Error'=median_incomeM,
    'MOE(%)',
    Reliability=income_reliability
  )
selected_counties

# A tibble: 3 × 5
  County `Median Household Income` `Margin of Error` `MOE(%)` Reliability       
  <chr>                      <dbl>             <dbl>    <dbl> <chr>             
1 Adams                      37271              4671     12.5 Low Confidence    
2 Alcorn                     47716              4160      8.7 Moderate Confiden…
3 DeSoto                     79666              2535      3.2 High Confidence

# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category

Comment on the output: These results illustrate how data reliability varies across counties. DeSoto, a large suburban county, has a very low MOE(3.2%), meaning its income estimate is highly reliable. Alcorn falls into the moderate range with an MOE of 8.7%, while Adams has much higher MOE(12.5%),indicating lower confidence. This comparison highlights how smaller or more rural counties like Adams may be poorly represented in survey data, which could affect the fairness of algorithmic decisions that rely on these estimates.

3.2 Tract-Level Demographics

Your Task: Get demographic data for census tracts in your selected counties.

Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.

# Define your race/ethnicity variables with descriptive names
target_counties <- c("Adams", "Alcorn", "DeSoto")

data("fips_codes")
ms_fips <- fips_codes %>%
  filter(state == "MS") %>%                         
  mutate(county_clean = str_remove(county, " County$")) 
county_fips_tbl <- ms_fips %>%
  filter(county_clean %in% target_counties) %>%
  select(county = county_clean, county_code) %>%
  mutate(county_code = as.character(county_code))       

county_fips_tbl

     county county_code
1402  Adams         001
1403 Alcorn         003
1418 DeSoto         033

race_vars <- c(
  total_pop = "B03002_001",
  white="B03002_003",
  black="B03002_004",
  hispanic="B03002_012"
)
# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
county_fips <- c("001","003","033")

tract_demo <- get_acs(
  geography = "tract",
  state=my_state,
  county=county_fips,
  variables=race_vars,
  year=2022,
  survey="acs5",
  output="wide"
)
# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations
tract_demo_pct <- tract_demo %>%
  mutate(
    pct_white=if_else(total_popE>0,100*whiteE/total_popE,NA_real_),
    pct_black    = if_else(total_popE > 0, 100 * blackE   / total_popE, NA_real_),
    pct_hispanic = if_else(total_popE > 0, 100 * hispanicE/ total_popE, NA_real_)
  ) %>%
# Add readable tract and county name columns using str_extract() or similar
  mutate(
    tract_name = str_replace(NAME, "[,;]\\s*[^,;]+ County[,;]\\s*[^,;]+$", ""),
    county_name = str_extract(NAME, "[,;]\\s*[^,;]+ County") %>%
                  str_remove("^[,;]\\s*") %>%
                  str_remove("\\s*County$")
  ) %>%
  
  select(
    tract_geoid=GEOID,
    tract_name,
    county_name,
    total_popE,total_popM,
    whiteE,whiteM,
    blackE,blackM,
    hispanicE,hispanicM,
    pct_white,pct_black,pct_hispanic
  )

head(tract_demo_pct)

# A tibble: 6 × 14
  tract_geoid tract_name  county_name total_popE total_popM whiteE whiteM blackE
  <chr>       <chr>       <chr>            <dbl>      <dbl>  <dbl>  <dbl>  <dbl>
1 28001000101 Census Tra… Adams             4710        704   3087    399    776
2 28001000102 Census Tra… Adams             2350        557   1340    271    758
3 28001000200 Census Tra… Adams             3930        560    379    198   2891
4 28001000300 Census Tra… Adams             1774        443     33     34   1678
5 28001000400 Census Tra… Adams             2983        610     14      2   2575
6 28001000500 Census Tra… Adams             2966        430    802    137   2082
# ℹ 6 more variables: blackM <dbl>, hispanicE <dbl>, hispanicM <dbl>,
#   pct_white <dbl>, pct_black <dbl>, pct_hispanic <dbl>

3.3 Demographic Analysis

Your Task: Analyze the demographic patterns in your selected areas.

# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
top_hispanic_tract <- tract_demo_pct %>%
  filter(county_name %in% selected_counties$County)%>%
  arrange(desc(pct_hispanic))%>%
  slice_head(n=1)%>%
  transmute(
    'Tract GEOID'=tract_geoid,
    'Tract Name'=tract_name,
    County=county_name,
    'Hispanic/Latino (%)'=round(pct_hispanic,1),
    'Total Pop(E)'=total_popE
  )
  
kable(top_hispanic_tract,
      caption="Top Tract by % Hispanic/Latino")

Top Tract by % Hispanic/Latino
Tract GEOID	Tract Name	County	Hispanic/Latino (%)	Total Pop(E)
28001000200	Census Tract 2	Adams	15.3	3930

# Calculate average demographics by county using group_by() and summarize()
county_demo_summary <- tract_demo_pct %>%
  filter(county_name %in% selected_counties$County)%>%
  group_by(County=county_name)%>%
  summarize(
    `Avg White (%)`    = round(mean(pct_white,    na.rm = TRUE), 1),
    `Avg Black (%)`    = round(mean(pct_black,    na.rm = TRUE), 1),
    `Avg Hispanic (%)` = round(mean(pct_hispanic, na.rm = TRUE), 1),
    .groups = "drop"
  )

# Show: number of tracts, average percentage for each racial/ethnic group

# Create a nicely formatted table of your results using kable()

kable(county_demo_summary, caption = "Average Demographics by County")

Average Demographics by County
County	Avg White (%)	Avg Black (%)	Avg Hispanic (%)
Adams	34.9	56.1	6.3
Alcorn	81.1	9.4	3.5
DeSoto	60.2	30.2	5.4

Part 4: Comprehensive Data Quality Evaluation

4.1 MOE Analysis for Demographic Variables

Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.

Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics

# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
tract_demo_moe <- tract_demo_pct %>%
  filter(county_name %in% selected_counties$County) %>%
  mutate(
    white_moe_pct=if_else(whiteE>0,100*whiteM/whiteE,NA_real_),
    black_moe_pct=if_else(blackE>0,100*blackM/blackE,NA_real_),
    hispanic_moe_pct=if_else(hispanicE>0,100*hispanicM/hispanicE,NA_real_)
  )%>%
  
# Create a flag for tracts with high MOE on any demographic variable
   mutate(
    high_moe_white    = white_moe_pct    > 15,
    high_moe_black    = black_moe_pct    > 15,
    high_moe_hispanic = hispanic_moe_pct > 15,
    high_moe_any      = coalesce(high_moe_white, FALSE) |
                        coalesce(high_moe_black, FALSE) |
                        coalesce(high_moe_hispanic, FALSE)
  )
# Use logical operators (| for OR) in an ifelse() statement
# Create summary statistics showing how many tracts have data quality issues
overall_moe_summary<-tract_demo_moe %>%
  summarize(
    Tracts=n(),
    high_moe_any_n = sum(high_moe_any, na.rm = TRUE),
    pct_tracts = round(100 * high_moe_any_n / Tracts, 1)
  )
by_county_moe_summary<-tract_demo_moe %>%
  group_by(County=county_name)%>%
  summarize(
    Tracts=n(),
    high_moe_any_n = sum(high_moe_any, na.rm = TRUE),
    pct_tracts = round(100 * high_moe_any_n / Tracts, 1)
  )

kable(overall_moe_summary,caption="Overall: Tracts with high MOE")

Overall: Tracts with high MOE
Tracts	high_moe_any_n	pct_tracts
61	61	100

kable(by_county_moe_summary,caption="By County: Tracts with high MOE")

By County: Tracts with high MOE
County	Tracts	high_moe_any_n	pct_tracts
Adams	10	10	100
Alcorn	10	10	100
DeSoto	41	41	100

4.2 Pattern Analysis

Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.

# Group tracts by whether they have high MOE issues
pattern_summary<-tract_demo_moe %>%
  group_by(MOE=if_else(high_moe_any,"High MOE(>50%)","Lower MOE"))%>%
   summarize(
    `Avg White (%)`    = round(mean(pct_white,    na.rm = TRUE), 1),
    `Avg Black (%)`    = round(mean(pct_black,    na.rm = TRUE), 1),
    `Avg Hispanic (%)` = round(mean(pct_hispanic, na.rm = TRUE), 1),
    .groups = "drop"
  )

kable(pattern_summary,
      caption="Comparsion of Tracts by MOE")

Comparsion of Tracts by MOE
MOE	Avg White (%)	Avg Black (%)	Avg Hispanic (%)
High MOE(>50%)	59.5	31	5.3

# Calculate average characteristics for each group:
# - population size, demographic percentages

# Use group_by() and summarize() to create this comparison
# Create a professional table showing the patterns

Pattern Analysis: DeSoto County demonstrates that large, more prosperous, and suburban counties can produce low MOE values with diverse racial compositions, due to higher population sizes ad more stable survey estimates. In contrast, Adams County combines racial diversity with a smaller, more rural population base, leading to higher margins of error and less reliable data. This suggests that population size and socioeconomic context, not just diversity, are key drivers of data reliability.

Part 5: Policy Recommendations

5.1 Analysis Integration and Professional Summary

Your Task: Write an executive summary that integrates findings from all four analyses.

Executive Summary Requirements: 1. Overall Pattern Identification: What are the systematic patterns across all your analyses? 2. Equity Assessment: Which communities face the greatest risk of algorithmic bias based on your findings? 3. Root Cause Analysis: What underlying factors drive both data quality issues and bias risk? 4. Strategic Recommendations: What should the Department implement to address these systematic issues?

Executive Summary:

Overall Pattern Identification

Our analysis of Mississippi’s county, and tract-level ACS data reveals systematic differences in data reliability across the state. Larger, more urbanized, and higher-income counties such as DeSoto, Jackson, and Madison consistently show low margins of error (MOE < 5%), making them suitable for algorithmic decision-making. In contrast, smaller, rural, and economically disadvantaged counties—such as Adams, Carroll, Sharkey, and Wilkinson exhibit much higher MOEs, often exceeding 10–20%. These patterns highlight a divide between counties where survey data is robust and those where uncertainty limits the safe use of algorithmic systems.

Equity Assessment

Communities at the greatest risk of algorithmic bias are those with both high demographic diversity and weaker survey reliability. Rural Black-majority counties such as Adams, or counties with small Hispanic populations, often display elevated MOEs. If algorithms rely on these estimates without adjustment, they risk systematically underrepresenting marginalized groups and misallocating resources away from the very populations most in need of support. By contrast, suburban counties with stable data will receive more consistent outcomes, reinforcing existing inequities between regions.

Root Cause Analysis

The root causes of these disparities lie in structural differences in population size, socioeconomic context, and geographic accessibility. Smaller populations with smaller ACS sample sizes, amplifying statistical uncertainty. Rural areas often face higher survey nonresponse rates and logistical barriers that further reduce data precision. At the demographic level, minority populations within smaller tracts tend to have particularly high MOEs, not because their needs are less, but because the survey infrastructure is less effective at capturing their experiences.

Strategic Recommendations

To address these challenges, the Department should implement a tiered algorithmic framework. For counties with high-confidence data, algorithmic allocation can proceed with minimal oversight. For counties with moderate confidence, algorithms should be deployed but coupled with regular audits and feedback loops to monitor performance. For low-confidence counties, algorithmic reliance should be minimized, with decisions supplemented by local administrative data, targeted surveys, and manual review by policy experts. Beyond the immediate framework, the Department should invest in improving data infrastructure expanding local survey capacity, partnering with community organizations, and ensuring demographic groups with historically unreliable estimates are better represented.

6.3 Specific Recommendations

Your Task: Create a decision framework for algorithm implementation.

# Create a summary table using your county reliability data
recommendation_tbl<-acs_income_quality %>%
  transmute(
    County=county,
    'Median Household Income'=median_incomeE,
    'MOE(%)'=round(income_moe_pct,1),
    'Reliability'=income_reliability,
    'Algorithm Recommendation'=case_when(
      income_reliability=="High Confidence"~"Safe for algorithmic decisions",
      income_reliability=="Moderate Confidence"~"Use with caution - monitor outcomes",
      income_reliability=="Low Confidence"~"Requires manual review or additional data"
    )
  )%>%
  arrange(factor('Reliability',
                 levels=c("High Confidence","Moderate Confidence","Low Confidence")),
          desc('MOE(%)'))

kable(
  recommendation_tbl,
  caption="County-Level Data Reliability&Algorithmic Decisions"
)

County-Level Data Reliability&Algorithmic Decisions
County	Median Household Income	MOE(%)	Reliability	Algorithm Recommendation
Adams	37271	12.5	Low Confidence	Requires manual review or additional data
Alcorn	47716	8.7	Moderate Confidence	Use with caution - monitor outcomes
Amite	34866	11.0	Low Confidence	Requires manual review or additional data
Attala	42680	11.8	Low Confidence	Requires manual review or additional data
Benton	38750	7.8	Moderate Confidence	Use with caution - monitor outcomes
Bolivar	37845	8.4	Moderate Confidence	Use with caution - monitor outcomes
Calhoun	44505	8.5	Moderate Confidence	Use with caution - monitor outcomes
Carroll	42285	34.5	Low Confidence	Requires manual review or additional data
Chickasaw	40224	7.7	Moderate Confidence	Use with caution - monitor outcomes
Choctaw	41887	10.7	Low Confidence	Requires manual review or additional data
Claiborne	34282	11.4	Low Confidence	Requires manual review or additional data
Clarke	46329	19.8	Low Confidence	Requires manual review or additional data
Clay	37412	10.5	Low Confidence	Requires manual review or additional data
Coahoma	36075	7.1	Moderate Confidence	Use with caution - monitor outcomes
Copiah	46889	7.7	Moderate Confidence	Use with caution - monitor outcomes
Covington	40164	14.3	Low Confidence	Requires manual review or additional data
DeSoto	79666	3.2	High Confidence	Safe for algorithmic decisions
Forrest	49340	5.6	Moderate Confidence	Use with caution - monitor outcomes
Franklin	43942	14.4	Low Confidence	Requires manual review or additional data
George	51349	9.5	Moderate Confidence	Use with caution - monitor outcomes
Greene	50000	14.7	Low Confidence	Requires manual review or additional data
Grenada	45745	10.8	Low Confidence	Requires manual review or additional data
Hancock	63623	5.0	Moderate Confidence	Use with caution - monitor outcomes
Harrison	55211	4.3	High Confidence	Safe for algorithmic decisions
Hinds	48596	5.6	Moderate Confidence	Use with caution - monitor outcomes
Holmes	28818	7.9	Moderate Confidence	Use with caution - monitor outcomes
Humphreys	31907	11.5	Low Confidence	Requires manual review or additional data
Issaquena	17900	29.0	Low Confidence	Requires manual review or additional data
Itawamba	57252	12.7	Low Confidence	Requires manual review or additional data
Jackson	60045	3.9	High Confidence	Safe for algorithmic decisions
Jasper	43914	11.8	Low Confidence	Requires manual review or additional data
Jefferson	31544	13.2	Low Confidence	Requires manual review or additional data
Jefferson Davis	36473	15.4	Low Confidence	Requires manual review or additional data
Jones	49451	7.7	Moderate Confidence	Use with caution - monitor outcomes
Kemper	42947	9.8	Moderate Confidence	Use with caution - monitor outcomes
Lafayette	59748	5.3	Moderate Confidence	Use with caution - monitor outcomes
Lamar	67972	4.5	High Confidence	Safe for algorithmic decisions
Lauderdale	45649	5.0	Moderate Confidence	Use with caution - monitor outcomes
Lawrence	41096	9.4	Moderate Confidence	Use with caution - monitor outcomes
Leake	46669	8.3	Moderate Confidence	Use with caution - monitor outcomes
Lee	64479	5.4	Moderate Confidence	Use with caution - monitor outcomes
Leflore	33115	8.3	Moderate Confidence	Use with caution - monitor outcomes
Lincoln	47069	7.8	Moderate Confidence	Use with caution - monitor outcomes
Lowndes	53687	4.2	High Confidence	Safe for algorithmic decisions
Madison	79105	3.6	High Confidence	Safe for algorithmic decisions
Marion	38399	7.7	Moderate Confidence	Use with caution - monitor outcomes
Marshall	51431	14.5	Low Confidence	Requires manual review or additional data
Monroe	51190	4.7	High Confidence	Safe for algorithmic decisions
Montgomery	36845	16.3	Low Confidence	Requires manual review or additional data
Neshoba	47400	11.6	Low Confidence	Requires manual review or additional data
Newton	49160	8.8	Moderate Confidence	Use with caution - monitor outcomes
Noxubee	42298	10.9	Low Confidence	Requires manual review or additional data
Oktibbeha	42953	10.6	Low Confidence	Requires manual review or additional data
Panola	47894	16.3	Low Confidence	Requires manual review or additional data
Pearl River	54220	9.1	Moderate Confidence	Use with caution - monitor outcomes
Perry	48333	10.9	Low Confidence	Requires manual review or additional data
Pike	40131	5.7	Moderate Confidence	Use with caution - monitor outcomes
Pontotoc	54414	9.3	Moderate Confidence	Use with caution - monitor outcomes
Prentiss	51578	12.3	Low Confidence	Requires manual review or additional data
Quitman	31192	12.5	Low Confidence	Requires manual review or additional data
Rankin	76460	4.0	High Confidence	Safe for algorithmic decisions
Scott	44968	15.5	Low Confidence	Requires manual review or additional data
Sharkey	41000	24.9	Low Confidence	Requires manual review or additional data
Simpson	50867	6.6	Moderate Confidence	Use with caution - monitor outcomes
Smith	51983	20.9	Low Confidence	Requires manual review or additional data
Stone	55894	11.5	Low Confidence	Requires manual review or additional data
Sunflower	37403	10.6	Low Confidence	Requires manual review or additional data
Tallahatchie	35428	12.1	Low Confidence	Requires manual review or additional data
Tate	61286	8.3	Moderate Confidence	Use with caution - monitor outcomes
Tippah	47968	9.4	Moderate Confidence	Use with caution - monitor outcomes
Tishomingo	45545	14.5	Low Confidence	Requires manual review or additional data
Tunica	41676	24.1	Low Confidence	Requires manual review or additional data
Union	55970	9.4	Moderate Confidence	Use with caution - monitor outcomes
Walthall	37145	17.3	Low Confidence	Requires manual review or additional data
Warren	54117	3.9	High Confidence	Safe for algorithmic decisions
Washington	38394	11.1	Low Confidence	Requires manual review or additional data
Wayne	34875	12.9	Low Confidence	Requires manual review or additional data
Webster	55657	12.1	Low Confidence	Requires manual review or additional data
Wilkinson	34928	23.6	Low Confidence	Requires manual review or additional data
Winston	45516	12.7	Low Confidence	Requires manual review or additional data
Yalobusha	47006	11.3	Low Confidence	Requires manual review or additional data
Yazoo	41867	10.6	Low Confidence	Requires manual review or additional data

Key Recommendations:

Your Task: Use your analysis results to provide specific guidance to the department.

Counties suitable for immediate algorithmic implementation:

These counties have High Confidence data (MOE < 5%), meaning the estimates are reliable enough for algorithmic allocation. Examples include DeSoto, Harrison, Jackson, Lamar, Lowndes, Madison, Monroe, Rankin, and Warren. These counties generally have larger or more affluent populations, which improves ACS survey precision. Algorithms can be implemented directly in these counties with minimal risk.

Counties requiring additional oversight:

These are Moderate Confidence counties (MOE 5–10%). Examples include Alcorn (8.7%), Benton (7.8%), Forrest (5.6%), Lee (5.4%), Pike (5.7%) and many others across Mississippi. While the ACS data is usable, there is enough uncertainty that algorithmic decisions could introduce bias or misallocation if left unchecked. Recommendation: Algorithms may be applied here, but outcomes should be monitored carefully. Suggested oversight strategies include:

Conducting periodic audits of funding allocations.
Comparing algorithmic outputs with administrative or programmatic data.
Implementing feedback loops from local service providers to catch anomalies.

Counties needing alternative approaches:

These are Low Confidence counties (MOE > 10%), such as Adams (12.5%), Covington (14.3%), Sharkey (24.9%), Wilkinson (23.6%), Carroll (34.5%) and many others. These counties tend to be smaller, more rural, or economically disadvantaged, which leads to higher survey uncertainty. Relying purely on ACS estimates here risks misidentifying need and exacerbating inequities. Recommendation: Algorithms should not be the sole basis for decision-making in these counties. Alternative approaches include:

Supplementing ACS data with local administrative records (e.g., school enrollment, SNAP participation).
Commissioning targeted surveys to improve reliability.
Using manual review and qualitative input from local stakeholders to guide funding decisions.

Questions for Further Investigation

Are certain racial or ethnic groups (e.g., Hispanic populations in smaller tracts) more likely to have unreliable estimates, and how might this bias algorithmic outcomes for those communities?
If we compare ACS estimates across multiple years, do the same counties consistently show high margins of error, or are reliability issues shifting over time?
Are counties with higher MOE values concentrated in rural regions, and how does this affect equitable distribution of resources across urban vs. rural Mississippi?

Technical Notes

Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [26/09/2025]

Reproducibility: - All analysis conducted in R version [4.5.1] - Census API key required for replication - Complete code and documentation available at: [https://musa-5080-fall-2025.github.io/portfolio-setup-Isabelliiii/]

Methodology Notes:

County selection was purposive: one high-confidence (DeSoto), one moderate-confidence (Alcorn), and one low-confidence (Adams) county were chosen for tract-level analysis to illustrate variation in data reliability.
Data cleaning steps included removing redundant suffixes from county names, calculating MOE percentages, and categorizing estimates into High, Moderate, and Low confidence tiers.
Demographic percentages (White, Black, Hispanic) were derived by dividing subgroup estimates by total population at the tract level.
Reliability thresholds (5% and 10% MOE) were selected based on standard practice in survey research, though different cutoffs could produce slightly different classifications.

Limitations:

ACS margins of error are systematically higher in small and rural counties, producing uneven reliability across geographies.
At the tract level, nearly all estimates exceeded the 15% MOE threshold. This reflects inherent sampling limitations of the ACS rather than coding errors, since smaller tracts with limited populations yield highly unstable estimates. As a result, tract-level classifications of “high MOE” should be interpreted as relative signals of data quality rather than absolute indicators of unusability.
Small denominators for minority populations inflate MOEs disproportionately, which may bias interpretations of racial/ethnic patterns.
Recommendations are based on statistical reliability alone; qualitative knowledge and administrative data sources were not incorporated but would be essential in real-world policy design.

Submission Checklist

Before submitting your portfolio link on Canvas:

All code chunks run without errors
All “[Fill this in]” prompts have been completed
Tables are properly formatted and readable
Executive summary addresses all four required components
Portfolio navigation includes this assignment
Census API key is properly set
Document renders correctly to HTML

Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/assignments/assignment_1/your_file_name.html