Skip to content

User Guide

Quick start

Python

Run the following example to extract commodity data:

from data_bridges_knots import DataBridgesShapes

CONFIG_PATH = r"data_bridges_api_config.yaml"

client = DataBridgesShapes(CONFIG_PATH)

# COMMODITY DATA
commodity_units_list = client.get_commodity_units_list(country_iso3="TZA", commodity_unit_name="Kg", page=1, format='json')

R

library(reticulate)

# Import the Python module through reticulate
data_bridges_knots <- import("data_bridges_knots")

# Point to our virtual environment's Python
use_python(".venv/bin/python")

# Create client instance
config_path <- "data_bridges_api_config.yaml"
client <- data_bridges_knots$DataBridgesShapes(config_path)

# COMMODITY DATA
# Get commodity unit list for Tanzania
commodity_units <- client$get_commodity_units_list(
  country_code = "TZA",
  commodity_unit_name = "Kg",
  page = 1L,
  format = "json"
)
Examples on how to use the package are in the examples folder and in the API Reference document

Getting variable and choice labels

DataBridgesKnots come with some helper functions to make the datasets more human-readable.

data_bridges_knots.labels.get_variable_labels(xlsform_df, format='dict')

Build a mapping between variable name and variable labels from a DataBridges XLSForm and return it in the desired format.

Empty labels default to the corresponding name. For duplicate names, the latest occurrence overrides earlier values.

Parameters:

Name Type Description Default
xlsform_df DataFrame

DataFrame with at least "name" and "label" columns.

required
format str

One of "dict", "json", or "df".

'dict'
- ``"dict"``

returns dict[str, str].

required
- ``"json"``

returns a JSON-formatted str.

required
- ``"df"``

returns a pandas.DataFrame with columns ["colName", "label"].

required

Returns:

Type Description
Union[dict[str, str], str, DataFrame]

dict | str | pandas.DataFrame: Labels mapping in the requested format.

Raises:

Type Description
KeyError

If required columns are missing.

ValueError

If format is not one of {"dict", "json", "df"}.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['n1', 'n2', 'n2'], 'label': ['L1', '', 'L2']})
>>> get_variable_labels(df, 'dict')
{'n1': 'L1', 'n2': 'L2'}
>>> get_variable_labels(df, 'df')
colName label
0      n1    L1
1      n2    L2

data_bridges_knots.labels.get_choice_labels(xlsform_df, format='dict')

Build a mapping from each XLSForm question name to its choice value labels, and return it as a dictionary, JSON string, or DataFrame.

The function expects an input DataFrame with
  • a column "name" for the question (field) names, and
  • a column "choiceList" whose rows contain a structure with a "choices" list. Each item in choices is a dict with "name" (the choice value/code) and "label" (the human-readable label).

Duplicate question names are merged, with later entries updating earlier ones.

Parameters:

Name Type Description Default
xlsform_df DataFrame

Input DataFrame containing at least the columns "name" and "choiceList". Each choiceList entry should include a "choices" list of dicts with keys "name" and "label".

required
format str

Output format; one of "dict", "json", or "df". Defaults to "dict". - "dict": returns dict[str, dict[str, str]] mapping question name to a dict of choice_namechoice_label. - "json": returns a JSON-formatted str of the above mapping. - "df": returns a pandas.DataFrame with columns ["colName", "label"], where "label" contains the nested dict of choice labels for each question.

'dict'

Raises:

Type Description
KeyError

If required columns (e.g., "name", "choiceList") or keys within choiceList (e.g., "choices", "name", "label") are missing.

ValueError

If format is not one of {"dict", "json", "df"}.

Returns:

Type Description
Union[dict[str, str], str, DataFrame]

dict[str, dict[str, str]] | str | pandas.DataFrame: Labels mapping in the requested format.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "name": ["q1", "q2"],
...     "choiceList": [
...         {"choices": [{"name": "yes", "label": "Yes"}, {"name": "no", "label": "No"}]},
...         {"choices": [{"name": "a", "label": "Option A"}, {"name": "b", "label": "Option B"}]}
...     ]
... })
>>> get_choice_labels(df, format="dict")
{'q1': {'yes': 'Yes', 'no': 'No'}, 'q2': {'a': 'Option A', 'b': 'Option B'}}
>>> print(get_choice_labels(df, format="json"))
>>> get_choice_labels(df, format="df")

data_bridges_knots.labels.map_value_labels(survey_df, xlsform_df)

Map numerical choice values to human-readable labels based on XLSForm choices to a DataFrame.

Parameters:

Name Type Description Default
survey_df DataFrame

The survey data with coded values.

required
xlsform_df DataFrame

DataFrame containing "name" and "choiceList". Each choiceList entry includes a "choices" list of dicts with keys "name" (code) and "label" (display text).

required

Raises:

Type Description
KeyError

If required columns ("name", "choiceList") or keys in the choices ("name", "label") are missing.

Example

import pandas as pd survey = pd.DataFrame({"q1": ["yes", "no"], "q2": ["a", "b"]}) xls = pd.DataFrame({ ... "name": ["q1", "q2"], ... "choiceList": [ ... {"choices": [{"name": "yes", "label": "Yes"}, {"name": "no", "label": "No"}]}, ... {"choices": [{"name": "a", "label": "Option A"}, {"name": "b", "label": "Option B"}]} ... ] ... }) map_value_labels(survey, xls)

Returns:

Type Description
DataFrame

pandas.DataFrame: A copy of survey_df where columns present in the

DataFrame

XLSForm mapping have codes replaced by labels.