Adding Metadata to Exercises
This guide shows you how to integrate metadata (variable labels and choice lists) into your exercises.
Quick Start
Follow these 4 steps to add metadata to an exercise:
1. Download and Place Raw Metadata
Download the XLSForm file from SharePoint and place it in:
metadata/raw/YYYYMMDD/[category]/
Example: metadata/raw/20251201/General Food Assistance xls/WFP - FSOM Q3 2025 - Script.xlsx
2. Add Survey ID Mapping
Edit app/R/metadata_helpers.r and add your mapping in map_metadata_to_survey_ids():
filename_mappings <- list(
"Welcome Meals.xlsx" = "5404",
"BCM_Helpdesk_mon_260525.xlsx" = "5387",
"hm_schools_hc_20250320.xlsx" = "5422",
"hm_schools_hc_20250819_camps.xlsx" = "5423",
"WFP - FSOM Q3 2025 - Script.xlsx" = "5485",
"your_survey.xlsx" = "YOUR_SURVEY_ID" # Add this line
)3. Process the Metadata
Create a processing script (or modify existing):
# tools/process_your_survey_metadata.r
library(here)
source(here("app/R/metadata_helpers.r"))
# Load and process
xlsform <- load_xlsform(here("metadata/raw/.../your_survey.xlsx"))
metadata <- process_xlsform(xlsform)
# Save
saveRDS(metadata, here("metadata/processed/YOUR_SURVEY_ID_metadata.rds"))
# Update combined file
all_metadata <- load_all_metadata(here("metadata/processed"))
all_metadata[["YOUR_SURVEY_ID"]] <- metadata
saveRDS(all_metadata, here("metadata/processed/all_metadata.rds"))Run it:
Rscript tools/process_your_survey_metadata.r4. Update Exercise Definition
Edit your exercise file (e.g., app/exercises/your_exercise.r):
visualizations = function(data) {
# Load metadata for this exercise
metadata <- tryCatch({
load_survey_metadata("YOUR_SURVEY_ID", "../metadata/processed")
}, error = function(e) {
NULL
})
# Pass to your dashboard
your_dashboard(data, metadata = metadata)
}Using Metadata in Visualizations
Update Dashboard Function Signature
your_dashboard <- function(data, metadata = NULL) {
# ... dashboard code ...
}Add Variable Labels to Breakdown Cards
# Before - shows raw values
breakdown_card(data, "status", "Application Status")
# After - shows labeled values
breakdown_card(
data,
"status",
"Application Status",
metadata = metadata,
choice_list_name = "application_status" # Name from XLSForm choices sheet
)Get Individual Labels
# Get variable label
label <- get_variable_label(metadata, "gender")
# Returns: "Gender of Respondent"
# Get choice label
label <- get_choice_label(metadata, "yesno", "Yes")
# Returns: "Yes" (or localized version if available)Finding Choice List Names
Choice list names come from the list_name column in the XLSForm choices sheet.
Example XLSForm:
survey sheet:
type | name | label
select_one yesno | consent| Do you consent?
select_one app_status | status | Application Status
choices sheet:
list_name | name | label
yesno | Yes | Yes
yesno | No | No
app_status | pending | Pending Review
app_status | approved | Approved
app_status | rejected | Rejected
In this case:
- Use
choice_list_name = "yesno"for the consent field - Use
choice_list_name = "app_status"for the status field
Note: name vs value Column
Some XLSForm files use value instead of name in the choices sheet. The metadata processing functions handle both automatically:
- Standard format: Uses
namecolumn for choice values - Alternative format: Uses
valuecolumn for choice values (e.g.,gfa_sbcc_endline_2025_v2.xls)
Both formats are supported - the processing function will detect and use whichever column is present.
Verifying Metadata
Check What’s in the Metadata
source("app/R/metadata_helpers.r")
metadata <- load_survey_metadata("YOUR_SURVEY_ID")
# See all variables
View(metadata$variables)
# See all choice lists
names(metadata$choices)
# See specific choice list
metadata$choices$your_list_nameTest Script
# Test loading
metadata <- load_survey_metadata("YOUR_SURVEY_ID", "metadata/processed")
print(paste("Variables:", nrow(metadata$variables)))
print(paste("Choice lists:", length(metadata$choices)))
# Test specific lookups
print(get_variable_label(metadata, "your_column"))
print(get_choice_label(metadata, "your_list", "your_value"))Common Issues
Issue: Metadata Not Loading
Symptom: Variable labels don’t appear, gender still shows codes
Solutions:
- Check file exists:
file.exists("metadata/processed/YOUR_ID_metadata.rds") - Check survey ID mapping in
map_metadata_to_survey_ids() - Verify
all_metadata.rdswas updated - Restart Shiny app
Issue: Choice Labels Not Showing
Symptom: Still seeing raw values instead of labels
Solutions:
- Verify choice list name matches XLSForm:
names(metadata$choices) - Check data values match choice names exactly (case-sensitive!)
- Ensure metadata is being passed to
breakdown_card() - Check that dashboard function accepts
metadataparameter
Issue: Wrong Column Name
Symptom: if (!column %in% names(data)) returns NULL
Solutions:
- Check actual column names in data:
names(data) - XLSForm
namemight differ from API column name - Some columns might be auto-prefixed (e.g.,
Q2_1_gendervsgender) - Use
grep("gender", names(data), value = TRUE)to find it
Issue: Choice Lists Not Extracted (0 choice lists found)
Symptom: Metadata processes successfully but shows 0 choice lists
Possible causes:
- Choices sheet uses
valuecolumn instead ofnamecolumn - Missing
list_namecolumn in choices sheet - All rows have NA values in
list_nameor value columns
Solutions:
- Check choices sheet structure:
read_excel("your_file.xlsx", sheet = "choices") - Verify column names: Should have
list_nameand eithernameorvalue - The processing function now handles both
nameandvaluecolumns automatically - If still not working, check for empty rows or formatting issues in Excel file
Complete Example
See app/exercises/bcm_helpdesk_validation.r for a complete working example:
exercise <- list(
id = "5387",
name = "BCM - Helpdesk Validation",
# ... other fields ...
visualizations = function(data) {
# Load metadata
metadata <- tryCatch({
load_survey_metadata("5387", "../metadata/processed")
}, error = function(e) {
NULL
})
# Pass to dashboard
bcm_monitoring_dashboard(data, metadata = metadata)
}
)And in app/R/visualization_helpers.r:
bcm_monitoring_dashboard <- function(data, metadata = NULL) {
# ... other visualizations ...
# Gender breakdown with labels
breakdown_card(
data,
"gender", # Column name in data
"Gender Distribution", # Display title
metadata = metadata, # Metadata object
choice_list_name = "gender" # Choice list from XLSForm
)
}Best Practices
- Always handle missing metadata gracefully - Use
tryCatch()and default to NULL - Keep metadata processing separate - Don’t process in app, use preprocessed RDS files
- Use descriptive choice list names - Match them to XLSForm conventions
- Test with actual data - Verify codes in data match XLSForm choice names
- Update combined file - Always update
all_metadata.rdsafter processing new surveys - Document mappings - Add comments explaining non-obvious survey ID mappings
GPS Coordinates Metadata
GPS coordinates are stored separately from survey metadata and are used for location validation. This allows you to validate that GPS coordinates entered during data collection match the location selected from a dropdown menu.
GPS Coordinates File Structure
GPS coordinates are stored in metadata/raw/GPS coordinates.xlsx with multiple sheets:
- HelpDesk sheet: Contains helpdesk locations with columns:
list_name: Filter identifier (e.g., “helpdesk”, “Bread”, “Shops”)name: Location identifier used in survey (e.g., “future_amman”, “red_crescent”)label: Human-readable location name (e.g., “Future Club Huassin”, “Jordan Red Crescent”)Gov: GovernorateGPS coordinate: Coordinate string in format “lat, lon” (e.g., “32.0, 35.9”)Comment: Optional notes
- POs sheet: Contains post office locations (similar structure)
Processing GPS Coordinates
To process GPS coordinates from the Excel file:
# Run the GPS processing script
source("tools/process_gps_coordinates.r")This script will: 1. Load GPS coordinates from metadata/raw/GPS coordinates.xlsx 2. Parse GPS coordinate strings (format: “lat, lon”) into separate lat and lon columns 3. Remove rows with invalid coordinates 4. Save processed .rds files to metadata/processed/: - gps_HelpDesk.rds - Helpdesk locations (17 locations) - gps_POs.rds - Post office locations (92 locations) - all_gps_coordinates.rds - Combined file with all sheets
Example output:
GPS COORDINATES PROCESSING PIPELINE
================================================================================
Loading GPS coordinates from: .../metadata/raw/GPS coordinates.xlsx
=== Loading Sheets ===
Found sheets: POs, HelpDesk
Processing sheet: HelpDesk
Rows: 30
Columns: list_name, name, label, Gov, GPS coordinate, Comment
Valid coordinates: 30
✓ Processed successfully
=== Saving Processed GPS Coordinates ===
Saved: .../metadata/processed/gps_HelpDesk.rds
Rows: 30
Loading GPS Coordinates
# Load GPS coordinates for a specific sheet
gps_data <- load_gps_coordinates("HelpDesk", "../metadata/processed")
# Check what's loaded
print(head(gps_data[, c("name", "label", "lat", "lon")]))
# Load all GPS coordinates
all_gps <- load_all_gps_coordinates("../metadata/processed")Using GPS Coordinates in Validation
GPS coordinates are used to validate that entered GPS coordinates match the selected location. See the complete example in GPS Validation Guide.
Quick reference:
# In your exercise's run_validations function
# For HelpDesk locations:
gps_data <- tryCatch({
load_gps_coordinates("HelpDesk", "../metadata/processed")
}, error = function(e) {
NULL
})
# For Post Office locations:
# gps_data <- tryCatch({
# load_gps_coordinates("POs", "../metadata/processed")
# }, error = function(e) {
# NULL
# })
if (!is.null(gps_data)) {
result <- result |>
gps_location_match_rule(
location_column = "_1_6_Helpdesk_name", # Your location dropdown column
gps_data = gps_data,
list_name_filter = "helpdesk", # Must match list_name in GPS file
max_distance_meters = 100 # Distance threshold
)(id = 7, data_name = exercise_name)
}
# The validation automatically adds a 'gps_distance_to_target_m' column
# showing the distance in meters for each recordImportant Notes:
list_name_filtermust match: Thelist_name_filterparameter must match thelist_namevalues in your GPS coordinates file. For example, if your GPS file haslist_name = "helpdesk", uselist_name_filter = "helpdesk".Location matching: The validation uses a robust multi-strategy matching approach (in order of preference):
- Exact match by
name(most common - numeric codes like “1”, “2”, “3” for post offices) - Exact match by
label(Arabic or English names) - Case-insensitive exact match by
nameorlabel - Partial string matching (fallback) - checks if location name appears in GPS data or vice versa
- All matching includes whitespace trimming for reliability
- Matching is optimized to handle common data variations and formatting differences
- Exact match by
GPS column formats: The validation automatically handles:
_geolocationcolumn: List/vector formatc(lat, lon)or string"lat lon"gpscolumn: String format"lat lon alt accuracy"- Separate columns:
latitude/longitudeorlat/lon
Keep GPS columns: Don’t remove
_geolocationinprepare_data- it’s needed for validation!Distance column: The validation automatically adds a
gps_distance_to_target_mcolumn showing the distance in meters between entered GPS coordinates and the expected location. This column:- Appears in the affirm validation report
- Is included in CSV exports (add
"gps_distance_to_target_m"toaffirm.id_cols) - Shows
NAfor records without GPS data or when location is not found - Helps identify how far off GPS coordinates are from expected locations
Updating GPS Coordinates
When GPS coordinates need to be updated or corrected:
- Edit the Excel file: Update
metadata/raw/GPS coordinates.xlsxwith new or corrected coordinates - Reprocess: Run
source("tools/process_gps_coordinates.r")to regenerate processed files - Restart app: Restart the Shiny app to load new GPS coordinates
Adding new locations: - Add rows to the appropriate sheet (HelpDesk, POs, etc.) - Ensure list_name matches what you’ll use in list_name_filter - Format GPS coordinates as “lat, lon” (e.g., “32.0, 35.9”) - Run the processing script to update .rds files
Example: Complete GPS Validation Setup
See these complete working examples:
- BCM Helpdesk Validation (
app/exercises/bcm_helpdesk_validation.r): Uses HelpDesk GPS coordinates withlist_name_filter = "helpdesk" - BCM Post Office Validation (
app/exercises/bcm_post_office_validation.r): Uses Post Office GPS coordinates withlist_name_filter = "post_name" - OSM Post Office Validation (
app/exercises/osm_post_office.r): Uses Post Office GPS coordinates withlist_name_filter = "post_name" - FSOM 2025 Q3 (
app/exercises/fsom.r): Non-GPS example that derivesvisit_datefromstarttimeand uses standard metadata
The GPS examples demonstrate: - Processing GPS coordinates - Keeping _geolocation column in prepare_data - Loading GPS data in run_validations - Applying GPS validation with proper error handling - Including gps_distance_to_target_m in CSV exports
References
- Full integration details:
docs/development/METADATA_INTEGRATION.md - Metadata helpers API:
app/R/metadata_helpers.r - Example usage:
tools/examples/explore_metadata.r - Test script:
tools/test_metadata_integration.r