Analyzing spatiotemporal data - a seasonal monitoring example

Quick recap¶

In our first session, we saw how to define our area of interest, load data, and explore it visually.

We define our AOI using the AnalysisArea object.

In [2]:

Copied!

from hip.analysis import AnalysisArea
from hip.analysis import AnalysisArea

In [3]:

Copied!





BBOX = (37.968750,0.878872,50.976563,12.554564) # Horn of Africa
DATES = "2022-01-01/2023-03-31"

area = AnalysisArea(
    bbox=BBOX,
    datetime_range=DATES
)
BBOX = (37.968750,0.878872,50.976563,12.554564) # Horn of Africa
DATES = "2022-01-01/2023-03-31"

area = AnalysisArea(
    bbox=BBOX,
    datetime_range=DATES
)

We use the AnalysisArea.get_dataset() to connect to datasources and load them into memory when appropriate.

In [4]:

Copied!

spi3m = area.get_dataset(["CHIRPS","R3S_DEKAD"])
spi3m = spi3m.load()
spi3m = area.get_dataset(["CHIRPS","R3S_DEKAD"])
spi3m = spi3m.load()

GDAL_DATA                    = /envs/shared/hdc/share/gdal
GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR
GDAL_HTTP_MAX_RETRY          = 10
GDAL_HTTP_RETRY_DELAY        = 0.5
AWS_ACCESS_KEY_ID            = xx..xx
AWS_SECRET_ACCESS_KEY        = xx..xx
AWS_SESSION_TOKEN            = xx..xx

And we can explore the data with tools in the hip.viz accessor.

In [5]:

Copied!

spi3m.hip.viz.point_timeseries()
spi3m.hip.viz.point_timeseries()

Out[5]:

In today's session, we look at some hip-analysis features that are particularly releveant for national-scale monitoring applications on medium resolution data (0.05-0.01 deg), for example seasonal monitoring. These features include use of administrative boundaries, zonal aggregations and further mapping and timeseries visualizations.

Note that this tutorial is primarily to demonstrate the functionality available. It does not necessarily present a scientifically sound approach to seasonal monitoring.

Define your AOI from administrative boundaries¶

Many analyses are primarily interested in reporting at admin level. To facilitate this, one can define the AnalysisArea object using the country's iso3 code and the desired an administrative level. This will automatically read administrative boundaries from the VAM GeoAPI and store them in the AnalysisArea object, and configure the appropriate GeoBox.

Let's look at Ethiopia at admin level 1.

In [6]:

Copied!

DATES = "2022-01-01/2023-03-31"
area = AnalysisArea.from_admin_boundaries(iso3='ETH', admin_level=1, datetime_range=DATES)
area.geobox
DATES = "2022-01-01/2023-03-31"
area = AnalysisArea.from_admin_boundaries(iso3='ETH', admin_level=1, datetime_range=DATES)
area.geobox

Out[6]:

GeoBox

Dimensions

300x230

EPSG

4326

Resolution

0.05°

Cell

50px

WKT

GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

The preloaded the administrative boundaries are stored into a GeoPandas DataFrame under the [area.BASE_AREA_DATASET] key. These will be used as default by any AnalysisArea function that needs, unless you explictly provide alternative geometries.

In [7]:

Copied!

area.get_dataset([area.BASE_AREA_DATASET])
area.get_dataset([area.BASE_AREA_DATASET])

Out[7]:

	geometry	Code	Name
Name
Addis Ababa	POLYGON ((38.65820 9.01460, 38.65940 8.98060, ...	900532	Addis Ababa
Oromia	POLYGON ((34.12950 9.20560, 34.13180 9.17360, ...	900539	Oromia
SNNPR	POLYGON ((34.19420 7.09660, 34.19510 7.06000, ...	900540	SNNPR
Somali	POLYGON ((41.79490 10.99190, 41.76660 10.99410...	900541	Somali
Tigray	POLYGON ((39.98320 14.44730, 39.95580 14.41530...	900542	Tigray
Afar	POLYGON ((39.98320 14.44730, 39.99450 14.38800...	900533	Afar
Amhara	POLYGON ((36.45450 13.76740, 36.42240 13.72880...	900534	Amhara
B. Gumuz	MULTIPOLYGON (((36.38140 9.47180, 36.38190 9.4...	900535	B. Gumuz
Dire Dawa	POLYGON ((42.34190 9.72130, 42.29680 9.76080, ...	900536	Dire Dawa
Gambela	POLYGON ((35.26420 7.54080, 35.26890 7.52180, ...	900537	Gambela
Harari	POLYGON ((42.07370 9.30730, 42.06370 9.26920, ...	900538	Harari

It is also possible to query any administrative areas of interest at any point of your analysis, using the AnalysisArea.get_admin_boundaries method. Note however that these geometries won't have the special status that is given to the geometries used to create the AnalysisArea object.

In [8]:

Copied!

area.get_admin_boundaries(iso3='SOM', admin_level=1)
area.get_admin_boundaries(iso3='SOM', admin_level=1)

Out[8]:

	geometry	Code	Name
0	MULTIPOLYGON (((43.44780 11.48900, 43.45170 11...	2688	Awdal
1	POLYGON ((42.88530 4.30790, 42.89090 4.29150, ...	2689	Bakool
2	POLYGON ((50.17720 8.32820, 50.18250 8.33160, ...	2690	Bari
3	POLYGON ((43.12090 3.67150, 43.10210 3.25740, ...	2691	Bay
4	POLYGON ((45.60600 2.18480, 45.58760 2.18320, ...	2692	Banadir
5	POLYGON ((46.53630 6.51890, 46.46950 6.45040, ...	2693	Galgaduud
6	POLYGON ((42.88530 4.30790, 42.86240 4.28940, ...	2694	Gedo
7	POLYGON ((44.59210 4.93490, 44.60280 4.87170, ...	2695	Hiraan
8	MULTIPOLYGON (((42.89030 0.00090, 42.90510 0.0...	2696	Juba Hoose
9	POLYGON ((42.98090 1.46170, 43.26100 1.04640, ...	2697	Shabelle Hoose
10	POLYGON ((42.40090 1.95170, 41.64090 1.94170, ...	2698	Juba Dhexe
11	POLYGON ((45.39100 2.15170, 45.46100 2.17160, ...	2699	Shabelle Dhexe
12	POLYGON ((47.55920 7.56190, 47.55620 7.56230, ...	2700	Mudug
13	POLYGON ((49.08110 9.14050, 48.73100 8.78100, ...	2701	Nugaal
14	POLYGON ((47.41320 11.17750, 47.40670 11.17430...	2702	Sanaag
15	POLYGON ((46.04940 10.05040, 46.15100 9.44100,...	2703	Sool
16	POLYGON ((46.04940 10.05040, 46.01100 10.08090...	2704	Togdheer
17	POLYGON ((44.69100 8.77180, 44.69100 9.31100, ...	2705	Woqooyi Galbeed

Zonal statistics¶

We are often intestested in getting data over an area rather than at pixel level. We can use the AnalysisArea.zonal_stats method to get the statistics over an area, including 'mean', 'max', 'min', 'sum', 'std', 'var', 'count'.

Here we will calculate the mean and standard deviation of monthly rainfall Standard Precipitation Index (SPI) over the administrative unit we used to create the AnalysisArea object.

Technical note: mean of SPIs over pixels (calculated here) does not equal to the more accurate SPI of the mean across pixels.

In [9]:

Copied!

admin_spi = area.zonal_stats(['CHIRPS','R1S_DEKAD'], stats=['mean','std'])
admin_spi
admin_spi = area.zonal_stats(['CHIRPS','R1S_DEKAD'], stats=['mean','std'])
admin_spi

GDAL_DATA                    = /envs/shared/hdc/share/gdal
GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR
GDAL_HTTP_MAX_RETRY          = 10
GDAL_HTTP_RETRY_DELAY        = 0.5
AWS_ACCESS_KEY_ID            = xx..xx
AWS_SECRET_ACCESS_KEY        = xx..xx
AWS_SESSION_TOKEN            = xx..xx

Out[9]:

		mean	std
zone	time
Addis Ababa	2022-01-01	-183.894737	164.336271
Oromia	2022-01-01	-58.526425	834.220157
SNNPR	2022-01-01	127.673807	737.566795
Somali	2022-01-01	-395.969627	665.164207
Tigray	2022-01-01	1865.853372	1463.635691
...	...	...	...
Amhara	2023-03-21	522.217703	799.686746
B. Gumuz	2023-03-21	-212.592525	446.880870
Dire Dawa	2023-03-21	1920.181818	68.074137
Gambela	2023-03-21	132.798561	263.439440
Harari	2023-03-21	2180.545455	60.233430

495 rows × 2 columns

The AnalysisArea.zonal_stats takes care of the necessary steps for you: it loads the data if you have not done so already and calculates aggregate statistscs using the default admin boundaries (i.e. those used to create the AnalysisArea object).

Custom boundaries can also be provided, both in the form of a GeoPandas DataFrame or an Xarray DataArray. We will see an example of this later in this session. See method reference for more details.

Visualising zonal statisics¶

We can use the hvplot package to analyse the timeseries interactively.

In [10]:

Copied!

import hvplot.pandas
admin_spi.hvplot(groupby='zone')
import hvplot.pandas
admin_spi.hvplot(groupby='zone')

Out[10]:

To map zonal statistics we need to combine the calculated statistics with their corresponding geometries. We can use AnalysisArea.join_zonal_stats for this.

In [11]:

Copied!

zonal_gdf = area.join_zonal_stats(admin_spi['mean']) # Note: can only join one statistic at a time
zonal_gdf = area.join_zonal_stats(admin_spi['mean']) # Note: can only join one statistic at a time

Now that the data is combined back with the corresponding geometries, we can use the hip.viz accessor to make maps for reporting.

In [14]:

Copied!





from hdc.colors.rainfall import rxs
date = '2022-03-21'
zonal_gdf.hip.viz.map(
    title = f"1-Month SPI {date}", 
    column =  date, # Select one date to plott 
    annotate = 'Name', # Column to use to add labels to areas.
    cmap=rxs.cmap,
    vmin=-2000, vmax=2000,
    legend=True
)
from hdc.colors.rainfall import rxs
date = '2022-03-21'
zonal_gdf.hip.viz.map(
    title = f"1-Month SPI {date}", 
    column =  date, # Select one date to plott 
    annotate = 'Name', # Column to use to add labels to areas.
    cmap=rxs.cmap,
    vmin=-2000, vmax=2000,
    legend=True
)

Out[14]:

No description has been provided for this image

Your turn!¶

In this exercise, you will calculate area statistics over admin areas and visualize the results.

Create an AnalysisArea object from the administrative uints of a country of your choice.
Choose a data sources offered by hip-analysis, and calculate some zonal stats over admin areas.
Use hvplot, hip.viz.map, or any other tools you are familiar with, to inspect the results.

Extra credit

Use AnalysisArea.get_admin_boundaries to source a different set of geometries from the ones used to create the AnalysisArea. Lookig at the zonal_stats reference, try to calculate zonal stats over this different geometries instead of the AnalysisArea's default geometries.

Monitoring the season¶

Seasonal monitoring applications are, naturally, most interested in the evolution of key variables during the key growning season. Season timings, however, can vary significantly across space. Here we show how to mask out-of season data on a pixel-by-pixel basis.

To do so, we will use a dataset created by RAM, identifying the (climatological) start and end of up to two seasons for each pixel. These are stored asdekadal indices, where 0 is the first dekad in January and dekads over 35 describe seasons that end in the following year. It can be accessed via the ["WFP_RAM","SOC_EOC"] key.

Technical note: the seasonality dataset may be subject to revision as we improve the methodology used to create it.

In [15]:

Copied!

seasons = area.get_dataset(["WFP_RAM","SOC_EOC"])
seasons = area.get_dataset(["WFP_RAM","SOC_EOC"])

WARNING:root:nodata attribute is not set in the product '['WFP_RAM', 'SOC_EOC']'. Computing without masking nodata values.

For example, the map below shows the start of season 1, showing a prevalence of season starts in dekads ~5-10 (Feb-Apr) in the south east, and a later season start in the north west.

In [16]:

Copied!

seasons.sel(soc_eoc_idx='soc_1').index.hip.viz.map(title='Start of season 1', vmin=0)
seasons.sel(soc_eoc_idx='soc_1').index.hip.viz.map(title='Start of season 1', vmin=0)

Out[16]:

We can use this general information to extract specific season dates for a given point in time, and then use this information to mask out-of-season data on a pixel-by-pixel basis. The following code block is commented to explain the steps. Note that this sequence of steps is likely to be wrapped into a single function in the near future.

In [26]:

Copied!





# Imports the necessary packages and methods
import datetime
import numpy as np
from hip.analysis.ops._seasonality import get_season_dates

# Pick the 'point in time'. This routine will find the dates of the most recent completed seasons before this date.
run_date = datetime.datetime(2023,3,31)

# Convert generic dekadal indices to specific season dates before run_date
ss1_dates, ss2_dates = get_season_dates(seasons, date=run_date)

# Pick season 1 or 2, which ever was most recent
seasondates = ss2_dates.where(
        (
            (np.datetime64(run_date) - ss2_dates[:, :, 1])
            < (np.datetime64(run_date) - ss1_dates[:, :, 1])
        ),
        ss1_dates,
)

# Load 1 month SPI data
spi = area.get_dataset(['CHIRPS','R1S_DEKAD'])


# Mask all pixel-timestep combinations that are out-of-season.
spi_season = spi.where(
    ((spi.time >= seasondates.sel(soc_eoc_idx="soc")) & (spi.time <= seasondates.sel(soc_eoc_idx="eoc"))),
    np.nan,
    drop=True
)
# Imports the necessary packages and methods
import datetime
import numpy as np
from hip.analysis.ops._seasonality import get_season_dates

# Pick the 'point in time'. This routine will find the dates of the most recent completed seasons before this date.
run_date = datetime.datetime(2023,3,31)

# Convert generic dekadal indices to specific season dates before run_date
ss1_dates, ss2_dates = get_season_dates(seasons, date=run_date)

# Pick season 1 or 2, which ever was most recent
seasondates = ss2_dates.where(
        (
            (np.datetime64(run_date) - ss2_dates[:, :, 1])
            < (np.datetime64(run_date) - ss1_dates[:, :, 1])
        ),
        ss1_dates,
)

# Load 1 month SPI data
spi = area.get_dataset(['CHIRPS','R1S_DEKAD'])


# Mask all pixel-timestep combinations that are out-of-season.
spi_season = spi.where(
    ((spi.time >= seasondates.sel(soc_eoc_idx="soc")) & (spi.time <= seasondates.sel(soc_eoc_idx="eoc"))),
    np.nan,
    drop=True
)

/envs/shared/hdc/lib/python3.10/site-packages/hip/analysis/ops/_seasonality.py:1155: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  sos = xr.apply_ufunc(
/envs/shared/hdc/lib/python3.10/site-packages/hip/analysis/ops/_seasonality.py:1163: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  ss1_dates = xr.apply_ufunc(

The spi_season xr.DataArray now has null values whenever the pixel is out of season. We can now pass this data to the AnalysisArea.zonal_stats function to calcualte seasonally-masked admin timeseries.

Note how this example shows how AnalysisArea.zonal_stats can accept either a DataSource key or an xr.DataArray / xr.Dataset as input data.

In [29]:

Copied!

admin_spi = area.zonal_stats(spi_season, stats=['mean','std'])
admin_spi = area.zonal_stats(spi_season, stats=['mean','std'])

Finally, we show present a further visualization tools that can be particularly handy when one wants to inspect the evolution of a variable over a panel of units, like a set of administrative areas.

The hip.viz.heatmap accessor can be called on any result of AnalysisArea.zonal_stats to render a heatmap showing the timeseries of a variable across all admin areas.

In [31]:

Copied!





admin_spi['mean'].hip.viz.heatmap(
    title='Admin 1-Month SPI', 
    kwargs={
        'cmap':'RdBu', 
        'vmax':'2000', 
        'vmin':'-2000'}
)
admin_spi['mean'].hip.viz.heatmap(
    title='Admin 1-Month SPI', 
    kwargs={
        'cmap':'RdBu', 
        'vmax':'2000', 
        'vmin':'-2000'}
)

Out[31]:

Data-driven spatial clustering¶

Administrative units are convenient for reporting and policy responses. However, as they are not climatologically homogenous, aggregations over their area may hide the heterogeneity of the seasonal evolution within them.

As an alternative approach, we demonstrate how to define areas based on similarity in their underlying data. We use the popular k-means algorithm to derive a spatial clustering. We use this technique here in the context of seasonal monitoring, but of course the method can be used in a variety of applications.

Recall that we already have an spi xr.DataArray at hand.

In [33]:

Copied!

spi
spi

Out[33]:

<xarray.DataArray 'band' (time: 45, latitude: 230, longitude: 300)> Size: 25MB
dask.array<add, shape=(45, 230, 300), dtype=float64, chunksize=(1, 230, 300), chunktype=numpy.ndarray>
Coordinates:
  * latitude     (latitude) float64 2kB 14.88 14.82 14.78 ... 3.525 3.475 3.425
  * longitude    (longitude) float64 2kB 33.02 33.07 33.12 ... 47.88 47.92 47.98
    spatial_ref  int32 4B 4326
  * time         (time) datetime64[ns] 360B 2022-01-01 2022-01-11 ... 2023-03-21
Attributes:
    nodata:   nan

The following code clusters the XY space into 10 clusters based on the similariy of their SPI timeseries.

In [34]:

Copied!





from hip.analysis.ops.clustering import xrKMeans
clustering = xrKMeans(n_clusters=10, random_state=0)
clustering.preprocess_data(spi)
clustering.fit()
from hip.analysis.ops.clustering import xrKMeans
clustering = xrKMeans(n_clusters=10, random_state=0)
clustering.preprocess_data(spi)
clustering.fit()

/envs/shared/hdc/lib/python3.10/site-packages/hip/analysis/ops/clustering.py:29: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  self.feature_dims = list(set(self.data.dims.keys()) - set(spatial_dims))
/envs/shared/hdc/lib/python3.10/site-packages/hip/analysis/ops/clustering.py:40: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  [self.data.dims[d] for d in self.feature_dims] + [len(self.data.data_vars)]

Out[34]:

xrKMeans(n_clusters=10, random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The resulting clustered area are stored in the labels_da attribute, and can be visualized with hip.viz.

In [40]:

Copied!





clustering.labels_da.hip.viz.map(
    title="Areas clustered by SPI",
    cmap='tab10'
)
clustering.labels_da.hip.viz.map(
    title="Areas clustered by SPI",
    cmap='tab10'
)

Out[40]:

As we are carrying out an analysis for Ethiopia, we would like to clip these clustered areas to the country's boundaries. AnalysisArea.get_admin_boundaries comes in handy here, as it will return a rasterized geometry if applying the rasterize=True flag. We can then mask the pixels that fall outside of the country boundaries.

In [42]:

Copied!





# Imports
import xarray as xr

# Get ETH geometries as raster
admin_raster = area.get_admin_boundaries(iso3='ETH', admin_level=1, rasterize=True)

# Turn clustering.labels_da into zones, with null values outside of ETH
zones = xr.where(admin_raster!=-1, clustering.labels_da, np.nan)

# Inspect the results in a map
zones.hip.viz.map(
    title="ETH Areas clustered by SPI",
    cmap='tab10'
)
# Imports
import xarray as xr

# Get ETH geometries as raster
admin_raster = area.get_admin_boundaries(iso3='ETH', admin_level=1, rasterize=True)

# Turn clustering.labels_da into zones, with null values outside of ETH
zones = xr.where(admin_raster!=-1, clustering.labels_da, np.nan)

# Inspect the results in a map
zones.hip.viz.map(
    title="ETH Areas clustered by SPI",
    cmap='tab10'
)

Out[42]:

Finally, we can re-run our zonal statistics over the clustered areas. To do so, note how we pass an argument to the zones parameter of AnalysisArea.zonal_stats.

In [43]:

Copied!

spi_custom_zones = area.zonal_stats(spi_season, zones=zones, stats=['mean','std'])
spi_custom_zones = area.zonal_stats(spi_season, zones=zones, stats=['mean','std'])

Here we display the same heatmap seen perviously. However, now each row represents a spatial area that has experienced the timeseries sequence displayed homogenously throught its area. Though not really in this case, this apporach may help uncover specific data sequences that might get hidden when aggregating over climatologically arbitrary administrative boundaries.

In [44]:

Copied!





spi_custom_zones['mean'].hip.viz.heatmap(
    title='1-Month SPI clusters', 
    kwargs={
        'cmap':'RdBu',
        'vmax':'2000', 
        'vmin':'-2000'
    }
)
spi_custom_zones['mean'].hip.viz.heatmap(
    title='1-Month SPI clusters', 
    kwargs={
        'cmap':'RdBu',
        'vmax':'2000', 
        'vmin':'-2000'
    }
)

Out[44]:

The following code helps understand where a given zone is found.

In [50]:

Copied!

zone_id = 6
(zones==zone_id).hip.viz.map(cmap='Reds')
zone_id = 6
(zones==zone_id).hip.viz.map(cmap='Reds')

Out[50]:

Your turn!¶

By now you have seen many of the features offered by hip-analysis, including a large DataSource collection; the ability to access administrative boundaries or to cluster custom areas based on data; the zonal_stats method; and a variety of visualization tools in hip.viz.

This excercise consists in attempting a minimal end-to-end analysis. Look at the DataSource catalogue and identify a dataset that might be helpful to answer a question of interest. Use the hip-analysis tools you have heard about so far, along with any other tools you already know, and carry out an end-to-end workflow around your topic of interest, including data loading, calculations and visualization.

Good luck! We are here to answer any questions!

In [ ]: