User Guide

Quick start

Python

Run the following example to extract commodity data:

from data_bridges_knots import DataBridgesKnots

CONFIG_PATH = r"data_bridges_api_config.yaml"

client = DataBridgesKnots(CONFIG_PATH)

# COMMODITY DATA
commodity_units_list = client.get_commodity_units_list(country_iso3="TZA", commodity_unit_name="Kg", page=1, format='json')
Additional examples are in the User Documentation

R

library(reticulate)

# Import the Python module through reticulate
data_bridges_knots <- import("data_bridges_knots")

# Point to our virtual environment's Python
use_python(".venv/bin/python")

# Create client instance
config_path <- "data_bridges_api_config.yaml"
client <- data_bridges_knots$DataBridgesKnots(config_path)

# COMMODITY DATA
# Get commodity unit list for Tanzania
commodity_units <- client$get_commodity_units_list(
  country_iso3 = "TZA",
  commodity_unit_name = "Kg",
  page = 1L,
  format = "json"
)

Getting variable and choice labels

DataBridgesKnots come with some helper functions to make the datasets more human-readable.

data_bridges_knots.labels.get_variable_labels(xlsform_df, format='dict')

Build a mapping between variable name and variable labels from a DataBridges XLSForm and return it in the desired format.

Empty labels default to the corresponding name. For duplicate names, the latest occurrence overrides earlier values.

Parameters:

Name Type Description Default
xlsform_df DataFrame

DataFrame with at least "name" and "label" columns.

required
format str

Output format. Defaults to "dict".

One of: - "dict": returns dict[str, str]. - "json": returns a JSON-formatted str. - "df": returns a pandas.DataFrame with columns ["colName", "label"].

'dict'

Returns:

Type Description
Union[dict[str, str], str, DataFrame]

dict | str | pandas.DataFrame: Labels mapping in the requested format.

Raises:

Type Description
KeyError

If required columns are missing.

ValueError

If format is not one of {"dict", "json", "df"}.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['n1', 'n2', 'n2'], 'label': ['L1', '', 'L2']})
>>> get_variable_labels(df, 'dict')
{'n1': 'L1', 'n2': 'L2'}
>>> get_variable_labels(df, 'df')
colName label
0      n1    L1
1      n2    L2

data_bridges_knots.labels.get_choice_labels(xlsform_df, format='dict')

Build a mapping from each XLSForm question name to its choice value labels, and return it as a dictionary, JSON string, or DataFrame.

Parameters:

Name Type Description Default
xlsform_df DataFrame

Input DataFrame containing at least the columns "name" and "choiceList".

required
format str

Output format. Defaults to "dict".

One of: - "dict": returns dict[str, dict[str, str]]. - "json": returns a JSON-formatted str. - "df": returns a pandas.DataFrame.

'dict'

Returns:

Type Description
Union[dict[str, str], str, DataFrame]

dict[str, dict[str, str]] | str | pandas.DataFrame: Labels mapping in the requested format.

Raises:

Type Description
KeyError

If required columns are missing.

ValueError

If format is not one of {"dict", "json", "df"}.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "name": ["q1", "q2"],
...     "choiceList": [
...         {"choices": [{"name": "yes", "label": "Yes"}, {"name": "no", "label": "No"}]},
...         {"choices": [{"name": "a", "label": "Option A"}, {"name": "b", "label": "Option B"}]}
...     ]
... })
>>> get_choice_labels(df, format="dict")
{'q1': {'yes': 'Yes', 'no': 'No'}, 'q2': {'a': 'Option A', 'b': 'Option B'}}
>>> print(get_choice_labels(df, format="json"))
>>> get_choice_labels(df, format="df")

data_bridges_knots.labels.map_value_labels(survey_df, xlsform_df)

Map numerical choice values to human-readable labels based on XLSForm choices to a DataFrame.

Parameters:

Name Type Description Default
survey_df DataFrame

The survey data with coded values.

required
xlsform_df DataFrame

DataFrame containing "name" and "choiceList". Each choiceList entry includes a "choices" list of dicts with keys "name" (code) and "label" (display text).

required

Raises:

Type Description
KeyError

If required columns ("name", "choiceList") or keys in the choices ("name", "label") are missing.

Example

import pandas as pd survey = pd.DataFrame({"q1": ["yes", "no"], "q2": ["a", "b"]}) xls = pd.DataFrame({ ... "name": ["q1", "q2"], ... "choiceList": [ ... {"choices": [{"name": "yes", "label": "Yes"}, {"name": "no", "label": "No"}]}, ... {"choices": [{"name": "a", "label": "Option A"}, {"name": "b", "label": "Option B"}]} ... ] ... }) map_value_labels(survey, xls)

Returns:

Type Description
DataFrame

pandas.DataFrame: A copy of survey_df where columns present in the

DataFrame

XLSForm mapping have codes replaced by labels.