Adding BCM - Post Office Validation Exercise
Date: November 6, 2025
Survey ID: 5407
Exercise Name: BCM - Post Office Validation
Overview
This document details the steps taken to add the “BCM - Post Office Validation” exercise to the data quality automation app. This serves as a reference for adding similar exercises in the future.
Step 1: Identify the Survey
Interactive API Exploration
library(reticulate)
library(dplyr)
use_python(".venv/bin/python")
data_bridges_knots <- import("data_bridges_knots")
config_path <- "data_bridges_api_config.yaml"
client <- data_bridges_knots$DataBridgesShapes(config_path)
# Search for surveys
surveys <- client$get_household_surveys_list()
post_office_surveys <- surveys[grepl("post office", surveys$surveyName, ignore.case = TRUE), ]Results
Found 3 surveys with “post office” in the name:
- Survey 5407: “BCM - Post Office Validation - 2025” - 97 responses ✅
- Survey 5406: “OSM - Post Office Validation - 2025” (already integrated)
- Survey 5405: “BCM - Post Office Validation - 2025” - 0 responses (duplicate/empty)
Decision: Use Survey 5407
Step 2: Explore Data Structure
Key Columns Identified
survey_id <- as.integer(5407)
data <- client$get_household_survey(survey_id, "full")
# Data dimensions
# 97 responses x 55 columnsImportant Columns
- Demographics:
Q2_1_gender: Gender (female/male)
- Location:
_1_6_Post_office_name: Post office codes (1-85, “bal_circle”)governorate: 11 governorates (Amman, Zarqa, Irbid, etc.)
- Time Metrics:
Q3_2_waitingtime: Waiting time in minutesQ3_3_processtime: Process time in minutes
- Satisfaction:
Q4_1_were_trated_respectfully: All respondents said yes (1)Q4_2_satisfied_w_validation_process: 93 yes, 4 noQ3_6_comfortable_using_validation_equipment: 90 yes, 7 no
- Awareness:
Q2_1_how_did_you_know_about_validation: Multi-select (1, 2, 3, 5, “1 2”, “1 3”, “2 1”, “3 1”)
- Monitor:
monitor_name: 11 different monitors
Data Distribution
Post offices: 44 unique locations (Arabic names in metadata)
Governorates: 11 (Amman: 21, Mafraq: 17, Karak: 14, Zarqa: 13, etc.)
Monitors: 11 (Meshael: 24, Malek: 15, Ahmad: 15, etc.)
Gender: Female: 58, Male: 39
Step 3: Process Metadata
Find Metadata File
Located in: metadata/raw/20251103/General Food Assistance xls/BCM_PO_validation_2025.xlsx
Add Filename Mapping
File: app/R/metadata_helpers.r
map_metadata_to_survey_ids <- function(metadata_list) {
filename_mappings <- list(
"Welcome Meals.xlsx" = "5404",
"post_offices_osm_july_2025.xlsx" = "5406",
"BCM_HD_validation_2025.xlsx" = "5387",
"BCM_PO_validation_2025.xlsx" = "5407", # ADDED
"helpdesk_validation_OSM_2025.xlsx" = "5405",
"wfp_hd_monitoring_check_List.xlsx" = "5373"
)
# ...
}Process and Save Metadata
library(here)
library(dplyr)
source(here('app/R/metadata_helpers.r'))
# Load and process
xlsform_path <- here('metadata/raw/20251103/General Food Assistance xls/BCM_PO_validation_2025.xlsx')
xlsform <- load_xlsform(xlsform_path)
metadata <- process_xlsform(xlsform)
# Map and save
metadata_list <- list()
metadata_list[['BCM_PO_validation_2025.xlsx']] <- metadata
mapped <- map_metadata_to_survey_ids(metadata_list)
save_processed_metadata(mapped, here('metadata/processed'))Metadata Contents
- Variables: 42 variables extracted
- Choice Lists: 7 lists
complaint: Complaint typesfield: Field monitor namesgender: Male/Femalegovernorate: 12 Jordanian governoratespost_name: 45+ post office names (in Arabic)source: Information sources (SMS, Word of mouth, Facebook, Leaflets, Other)yes_no: Yes/No values
Output Files:
metadata/processed/5407_metadata.rdsmetadata/processed/all_metadata.rds(updated)
Step 4: Create Exercise Definition
File: app/exercises/bcm_post_office_validation.r
Key Configuration
exercise <- list(
id = "5407",
name = "BCM - Post Office Validation",
type = "survey",
description = "Jordan BCM post office validation survey data quality",
# Filter configuration
filter_column = "_1_6_Post_office_name",
filter_label = "Post Office",
# Data fetching
fetch_data = function(client) {
databridges_survey_fetch(
client = client,
survey_id = 5407L,
access_type = "full"
)
},
# Visualizations with metadata
visualizations = function(data) {
metadata <- load_survey_metadata("5407", "../metadata/processed")
bcm_post_office_validation_dashboard(data, metadata = metadata)
},
# Validation rules (7 rules total)
run_validations = function(data) {
# ID columns for CSV export
options('affirm.id_cols' = c(
"date", "_1_6_Post_office_name", "governorate",
"_submission_time", "start", "end", "_duration", "monitor_name"
))
data |>
duration_check_rule()(id = 1) |> # 5 min - 24 hours
same_day_submission_rule()(id = 2) |> # Same day submission
late_submission_rule()(id = 3) |> # Late submission flag
time_outlier_rule()(id = 4) |> # Waiting/process time outliers
conditional_field_rule()(id = 5) |> # Q2.3.1 & Q4.2.1 logic
respectful_treatment_flag()(id = 6) |> # Q4.1 = No flag
missing_data_rule()(id = 7) # Missing data check
}
)Step 5: Create Dashboard Function
File: app/R/visualization_helpers.r
Dashboard Structure
bcm_post_office_validation_dashboard <- function(data, metadata = NULL) {
# Helper for metadata labels
get_label <- function(column_name, fallback) {
if (!is.null(metadata)) {
label <- get_variable_label(metadata, column_name)
if (label != column_name) return(label)
}
return(fallback)
}
tagList(
# 1. Service Time Metrics
# - Waiting time
# - Process time
# 2. Beneficiary Satisfaction
# - Treated respectfully
# - Satisfied with process
# 3. User Experience
# - Comfortable using equipment
# 4. Demographics
# - Gender distribution
# 5. Awareness & Information
# - How they learned about validation
# 6. Coverage by Post Office
# - Visits by post office
# 7. Geographic Coverage
# - Visits by governorate
# 8. Monitor Activity
# - Surveys by monitor
)
}Visualization Components
- Time Metrics:
time_metric_card()for waiting/process times - Satisfaction:
satisfaction_metric_card()for binary yes/no questions - Breakdowns:
breakdown_card()for categorical distributions with metadata labels
Step 6: Registration
The exercise is automatically registered because:
- The file exists in
app/exercises/directory - It defines an
exerciseobject with anidfield - The
load_exercises()function inapp/R/exercise_registry.rautomatically loads all.rfiles
No manual registration needed!
Step 7: Testing
Launch the App
# From the app/ directory
shiny::runApp()Verify
- Exercise appears in dropdown: “BCM - Post Office Validation”
- Data loads: 97 responses displayed
- Column Details: Shows variable labels from metadata
- Filter works: Can filter by Post Office
- Visualizations display:
- Time metrics show average waiting/process times
- Satisfaction cards show percentages
- Breakdowns show proper labels (not codes)
- Post office names show Arabic text
- Governorates, monitors, and gender all labeled correctly
- Multi-select handling: “How did you know” question shows combined labels like “SMS, Facebook”
- Validation runs: Can export validation results
Key Technical Details
Multi-Select Question Handling
The breakdown_card() function was previously enhanced to handle multi-select questions where:
- Values like
"1 2"are normalized to sorted order before counting - Display labels are created by splitting codes and mapping each to its label
- Results are consolidated (e.g.,
"1 2"and"2 1"counted together)
Metadata Integration
The exercise loads metadata in the visualizations() function:
metadata <- tryCatch({
load_survey_metadata("5407", "../metadata/processed")
}, error = function(e) {
NULL
})This metadata is then passed to dashboard functions and visualization helpers for:
- Variable label lookup via
get_variable_label(metadata, column_name) - Choice value mapping via choice lists (e.g.,
metadata$choices$post_name)
Choice Lists Used
gender: Maps 0/1 or male/female codes to labelssource: Maps 1-5 to SMS, Word of mouth, Facebook, Leaflets, Otherpost_name: Maps numeric codes to Arabic post office namesgovernorate: Maps governorate codes to proper namesfield: Maps monitor name codes (if needed)
Files Changed
app/R/metadata_helpers.r: Added"BCM_PO_validation_2025.xlsx" = "5407"mappingapp/exercises/bcm_post_office_validation.r: New exercise definition (70 lines)app/R/visualization_helpers.r: Addedbcm_post_office_validation_dashboard()function (110 lines)metadata/processed/5407_metadata.rds: New processed metadata filemetadata/processed/all_metadata.rds: Updated with new survey
Step 8: Additional Validations Based on Requirements
After reviewing the official requirements document, additional validation rules were implemented:
New Validation Rules Added
File: app/R/validation_rules.r
1. Time Outlier Rule (time_outlier_rule)
Flags waiting and process times that are significantly above or below average:
- Q3.2 Waiting time: Flags if outside 0.5x - 2x average (~10 min)
- Q3.3 Process time: Flags if outside 0.5x - 2x average (~3.5 min)
- Purpose: Requirement “3.1 & 3.2 – highlight values that are above or below average entries”
2. Conditional Field Rule (conditional_field_rule)
Ensures follow-up questions are answered when triggered:
- Q2.3.1: Must be answered when Q2.3 = No (didn’t know which PO)
- Q4.2.1: Must be answered when Q4.2 = No (not satisfied with process)
- Purpose: Requirement “2.3.1 and 4.2.1 should not be NA or N/A”
3. Respectful Treatment Flag (respectful_treatment_flag)
Flags any response where Q4.1 (treated respectfully) = No:
- Q4.1: Flags if value = 0 (not treated respectfully)
- Purpose: Requirement “flag 4.1 if the answer is ‘No’”
- Note: Current data shows 0 “No” responses (all 97 respondents said Yes)
Dashboard Enhancement
Added Average Survey Duration display:
- Shows mean duration across all surveys
- Displays value in minutes
- Includes count of surveys
- Purpose: Requirement “Average time duration”
Validation Coverage Summary
From requirements document:
Part 1 - Data Quality:
- ✅
GPS check(skipped per user request) - ⚠️ Q2.1.1a “Other” text validation (not implemented - low priority)
- ✅ Q2.3.1 and Q4.2.1 conditional checks (implemented)
- ✅ Time outlier detection for Q3.1 & Q3.2 (implemented)
Part 2 - Survey Practice:
- ✅ Duration 5 min - 24 hours (existing
duration_check_rule) - ✅ Average time duration display (added to dashboard)
- ✅ Same-day submission check (existing
same_day_submission_rule) - ❌ Case ID/phone duplicate check (N/A - these columns don’t exist in data)
- ✅ Late submission flagging (existing
late_submission_rule)
Part 2.2 - Preliminary Analysis:
- ✅ Flag Q4.1 = No (implemented
respectful_treatment_flag)
Total Validation Rules: 7
- Duration check (5 min - 24 hours)
- Same-day submission (submitted same day as collection)
- Late submission (flag delays)
- Time outliers (NEW - waiting/process time outliers)
- Conditional fields (NEW - Q2.3.1 & Q4.2.1 logic)
- Respectful treatment (NEW - flag Q4.1 = No)
- Missing data (general missing data check)
Summary
Successfully integrated BCM Post Office Validation (Survey 5407) with:
- ✅ 97 survey responses
- ✅ 42 variables with labels
- ✅ 7 choice lists for value mapping
- ✅ 9 visualization sections (including average duration)
- ✅ 7 validation rules (3 new survey-specific rules)
- ✅ Full metadata integration
- ✅ Multi-select question support
- ✅ Arabic text support for post office names
- ✅ Compliance with official requirements document
The exercise is now live and accessible in the app dropdown!
Future Reference
When adding new exercises, follow this pattern:
- Identify survey ID via API exploration
- Explore data structure and key columns
- Find and process metadata file
- Add filename mapping to
metadata_helpers.r - Create exercise definition in
exercises/ - Create dashboard function in
visualization_helpers.r - Test thoroughly
For more details, see:
/docs/guides/ADDING_NEW_EXERCISES.md/docs/guides/ADDING_METADATA.md