Adding BCM - Post Office Validation Exercise

Date: November 6, 2025
Survey ID: 5407
Exercise Name: BCM - Post Office Validation

Overview

This document details the steps taken to add the “BCM - Post Office Validation” exercise to the data quality automation app. This serves as a reference for adding similar exercises in the future.

Step 1: Identify the Survey

Interactive API Exploration

library(reticulate)
library(dplyr)
use_python(".venv/bin/python")
data_bridges_knots <- import("data_bridges_knots")
config_path <- "data_bridges_api_config.yaml"
client <- data_bridges_knots$DataBridgesShapes(config_path)

# Search for surveys
surveys <- client$get_household_surveys_list()
post_office_surveys <- surveys[grepl("post office", surveys$surveyName, ignore.case = TRUE), ]

Results

Found 3 surveys with “post office” in the name:

Survey 5407: “BCM - Post Office Validation - 2025” - 97 responses ✅
Survey 5406: “OSM - Post Office Validation - 2025” (already integrated)
Survey 5405: “BCM - Post Office Validation - 2025” - 0 responses (duplicate/empty)

Decision: Use Survey 5407

Step 2: Explore Data Structure

Key Columns Identified

survey_id <- as.integer(5407)
data <- client$get_household_survey(survey_id, "full")

# Data dimensions
# 97 responses x 55 columns

Important Columns

Demographics:
- Q2_1_gender: Gender (female/male)
Location:
- _1_6_Post_office_name: Post office codes (1-85, “bal_circle”)
- governorate: 11 governorates (Amman, Zarqa, Irbid, etc.)
Time Metrics:
- Q3_2_waitingtime: Waiting time in minutes
- Q3_3_processtime: Process time in minutes
Satisfaction:
- Q4_1_were_trated_respectfully: All respondents said yes (1)
- Q4_2_satisfied_w_validation_process: 93 yes, 4 no
- Q3_6_comfortable_using_validation_equipment: 90 yes, 7 no
Awareness:
- Q2_1_how_did_you_know_about_validation: Multi-select (1, 2, 3, 5, “1 2”, “1 3”, “2 1”, “3 1”)
Monitor:
- monitor_name: 11 different monitors

Data Distribution

Post offices: 44 unique locations (Arabic names in metadata)
Governorates: 11 (Amman: 21, Mafraq: 17, Karak: 14, Zarqa: 13, etc.)
Monitors: 11 (Meshael: 24, Malek: 15, Ahmad: 15, etc.)
Gender: Female: 58, Male: 39

Step 3: Process Metadata

Find Metadata File

Located in: metadata/raw/20251103/General Food Assistance xls/BCM_PO_validation_2025.xlsx

Add Filename Mapping

File: app/R/metadata_helpers.r

map_metadata_to_survey_ids <- function(metadata_list) {
  filename_mappings <- list(
    "Welcome Meals.xlsx" = "5404",
    "post_offices_osm_july_2025.xlsx" = "5406",
    "BCM_HD_validation_2025.xlsx" = "5387",
    "BCM_PO_validation_2025.xlsx" = "5407",  # ADDED
    "helpdesk_validation_OSM_2025.xlsx" = "5405",
    "wfp_hd_monitoring_check_List.xlsx" = "5373"
  )
  # ...
}

Process and Save Metadata

library(here)
library(dplyr)
source(here('app/R/metadata_helpers.r'))

# Load and process
xlsform_path <- here('metadata/raw/20251103/General Food Assistance xls/BCM_PO_validation_2025.xlsx')
xlsform <- load_xlsform(xlsform_path)
metadata <- process_xlsform(xlsform)

# Map and save
metadata_list <- list()
metadata_list[['BCM_PO_validation_2025.xlsx']] <- metadata
mapped <- map_metadata_to_survey_ids(metadata_list)
save_processed_metadata(mapped, here('metadata/processed'))

Metadata Contents

Variables: 42 variables extracted
Choice Lists: 7 lists
- complaint: Complaint types
- field: Field monitor names
- gender: Male/Female
- governorate: 12 Jordanian governorates
- post_name: 45+ post office names (in Arabic)
- source: Information sources (SMS, Word of mouth, Facebook, Leaflets, Other)
- yes_no: Yes/No values

Output Files:

metadata/processed/5407_metadata.rds
metadata/processed/all_metadata.rds (updated)

Step 4: Create Exercise Definition

File: app/exercises/bcm_post_office_validation.r

Key Configuration

exercise <- list(
  id = "5407",
  name = "BCM - Post Office Validation",
  type = "survey",
  description = "Jordan BCM post office validation survey data quality",
  
  # Filter configuration
  filter_column = "_1_6_Post_office_name",
  filter_label = "Post Office",
  
  # Data fetching
  fetch_data = function(client) {
    databridges_survey_fetch(
      client = client,
      survey_id = 5407L,
      access_type = "full"
    )
  },
  
  # Visualizations with metadata
  visualizations = function(data) {
    metadata <- load_survey_metadata("5407", "../metadata/processed")
    bcm_post_office_validation_dashboard(data, metadata = metadata)
  },
  
  # Validation rules (7 rules total)
  run_validations = function(data) {
    # ID columns for CSV export
    options('affirm.id_cols' = c(
      "date", "_1_6_Post_office_name", "governorate", 
      "_submission_time", "start", "end", "_duration", "monitor_name"
    ))
    
    data |>
      duration_check_rule()(id = 1) |>              # 5 min - 24 hours
      same_day_submission_rule()(id = 2) |>         # Same day submission
      late_submission_rule()(id = 3) |>             # Late submission flag
      time_outlier_rule()(id = 4) |>                # Waiting/process time outliers
      conditional_field_rule()(id = 5) |>           # Q2.3.1 & Q4.2.1 logic
      respectful_treatment_flag()(id = 6) |>        # Q4.1 = No flag
      missing_data_rule()(id = 7)                   # Missing data check
  }
)

Step 5: Create Dashboard Function

File: app/R/visualization_helpers.r

Dashboard Structure

bcm_post_office_validation_dashboard <- function(data, metadata = NULL) {
  # Helper for metadata labels
  get_label <- function(column_name, fallback) {
    if (!is.null(metadata)) {
      label <- get_variable_label(metadata, column_name)
      if (label != column_name) return(label)
    }
    return(fallback)
  }

  tagList(
    # 1. Service Time Metrics
    #    - Waiting time
    #    - Process time
    
    # 2. Beneficiary Satisfaction
    #    - Treated respectfully
    #    - Satisfied with process
    
    # 3. User Experience
    #    - Comfortable using equipment
    
    # 4. Demographics
    #    - Gender distribution
    
    # 5. Awareness & Information
    #    - How they learned about validation
    
    # 6. Coverage by Post Office
    #    - Visits by post office
    
    # 7. Geographic Coverage
    #    - Visits by governorate
    
    # 8. Monitor Activity
    #    - Surveys by monitor
  )
}

Visualization Components

Time Metrics: time_metric_card() for waiting/process times
Satisfaction: satisfaction_metric_card() for binary yes/no questions
Breakdowns: breakdown_card() for categorical distributions with metadata labels

Step 6: Registration

The exercise is automatically registered because:

The file exists in app/exercises/ directory
It defines an exercise object with an id field
The load_exercises() function in app/R/exercise_registry.r automatically loads all .r files

No manual registration needed!

Step 7: Testing

Launch the App

# From the app/ directory
shiny::runApp()

Verify

Exercise appears in dropdown: “BCM - Post Office Validation”
Data loads: 97 responses displayed
Column Details: Shows variable labels from metadata
Filter works: Can filter by Post Office
Visualizations display:
- Time metrics show average waiting/process times
- Satisfaction cards show percentages
- Breakdowns show proper labels (not codes)
- Post office names show Arabic text
- Governorates, monitors, and gender all labeled correctly
Multi-select handling: “How did you know” question shows combined labels like “SMS, Facebook”
Validation runs: Can export validation results

Key Technical Details

Multi-Select Question Handling

The breakdown_card() function was previously enhanced to handle multi-select questions where:

Values like "1 2" are normalized to sorted order before counting
Display labels are created by splitting codes and mapping each to its label
Results are consolidated (e.g., "1 2" and "2 1" counted together)

Metadata Integration

The exercise loads metadata in the visualizations() function:

metadata <- tryCatch({
  load_survey_metadata("5407", "../metadata/processed")
}, error = function(e) {
  NULL
})

This metadata is then passed to dashboard functions and visualization helpers for:

Variable label lookup via get_variable_label(metadata, column_name)
Choice value mapping via choice lists (e.g., metadata$choices$post_name)

Choice Lists Used

gender: Maps 0/1 or male/female codes to labels
source: Maps 1-5 to SMS, Word of mouth, Facebook, Leaflets, Other
post_name: Maps numeric codes to Arabic post office names
governorate: Maps governorate codes to proper names
field: Maps monitor name codes (if needed)

Files Changed

app/R/metadata_helpers.r: Added "BCM_PO_validation_2025.xlsx" = "5407" mapping
app/exercises/bcm_post_office_validation.r: New exercise definition (70 lines)
app/R/visualization_helpers.r: Added bcm_post_office_validation_dashboard() function (110 lines)
metadata/processed/5407_metadata.rds: New processed metadata file
metadata/processed/all_metadata.rds: Updated with new survey

Step 8: Additional Validations Based on Requirements

After reviewing the official requirements document, additional validation rules were implemented:

New Validation Rules Added

File: app/R/validation_rules.r

1. Time Outlier Rule (`time_outlier_rule`)

Flags waiting and process times that are significantly above or below average:

Q3.2 Waiting time: Flags if outside 0.5x - 2x average (~10 min)
Q3.3 Process time: Flags if outside 0.5x - 2x average (~3.5 min)
Purpose: Requirement “3.1 & 3.2 – highlight values that are above or below average entries”

2. Conditional Field Rule (`conditional_field_rule`)

Ensures follow-up questions are answered when triggered:

Q2.3.1: Must be answered when Q2.3 = No (didn’t know which PO)
Q4.2.1: Must be answered when Q4.2 = No (not satisfied with process)
Purpose: Requirement “2.3.1 and 4.2.1 should not be NA or N/A”

3. Respectful Treatment Flag (`respectful_treatment_flag`)

Flags any response where Q4.1 (treated respectfully) = No:

Q4.1: Flags if value = 0 (not treated respectfully)
Purpose: Requirement “flag 4.1 if the answer is ‘No’”
Note: Current data shows 0 “No” responses (all 97 respondents said Yes)

Dashboard Enhancement

Added Average Survey Duration display:

Shows mean duration across all surveys
Displays value in minutes
Includes count of surveys
Purpose: Requirement “Average time duration”

Validation Coverage Summary

From requirements document:

Part 1 - Data Quality:

✅ ~~GPS check~~ (skipped per user request)
⚠️ Q2.1.1a “Other” text validation (not implemented - low priority)
✅ Q2.3.1 and Q4.2.1 conditional checks (implemented)
✅ Time outlier detection for Q3.1 & Q3.2 (implemented)

Part 2 - Survey Practice:

✅ Duration 5 min - 24 hours (existing duration_check_rule)
✅ Average time duration display (added to dashboard)
✅ Same-day submission check (existing same_day_submission_rule)
❌ Case ID/phone duplicate check (N/A - these columns don’t exist in data)
✅ Late submission flagging (existing late_submission_rule)

Part 2.2 - Preliminary Analysis:

✅ Flag Q4.1 = No (implemented respectful_treatment_flag)

Total Validation Rules: 7

Duration check (5 min - 24 hours)
Same-day submission (submitted same day as collection)
Late submission (flag delays)
Time outliers (NEW - waiting/process time outliers)
Conditional fields (NEW - Q2.3.1 & Q4.2.1 logic)
Respectful treatment (NEW - flag Q4.1 = No)
Missing data (general missing data check)

Summary

Successfully integrated BCM Post Office Validation (Survey 5407) with:

✅ 97 survey responses
✅ 42 variables with labels
✅ 7 choice lists for value mapping
✅ 9 visualization sections (including average duration)
✅ 7 validation rules (3 new survey-specific rules)
✅ Full metadata integration
✅ Multi-select question support
✅ Arabic text support for post office names
✅ Compliance with official requirements document

The exercise is now live and accessible in the app dropdown!

Future Reference

When adding new exercises, follow this pattern:

Identify survey ID via API exploration
Explore data structure and key columns
Find and process metadata file
Add filename mapping to metadata_helpers.r
Create exercise definition in exercises/
Create dashboard function in visualization_helpers.r
Test thoroughly

For more details, see:

/docs/guides/ADDING_NEW_EXERCISES.md
/docs/guides/ADDING_METADATA.md