Loading and visualizing spatiotemporal data
Defining your area of interest¶
Humanitarian analyses are always situated in a geography and time. The first step is therefore to define our area of interest.
We do so by creating an AnalysisArea object. As we will see in due course, the AnalysisArea is central to carrying out analyses. Apart from defining an AOI, it also acts as an entry point to data sources and holds configuration parameters about how we want to load and preprocess data.
The simplest way to create an AnalysisArea is with a bounding box of the form min_lon, min_lat, max_lon, max_lat and a time range in the form YYYY-MM-DD/YYYY-MM-DD/.
from hip.analysis import AnalysisArea
bbox = (120.300293,5.922045,127.089844,13.560562)
time_range = "2020-01-01/2023-06-30"
area = AnalysisArea(
bbox=bbox,
datetime_range=time_range,
)
The AnalysisArea converts our input into a GeoBox object. A GeoBox is a grid that is fully defined by its coordinate reference system, resolution and bounding box in the GeoBox's crs. We can inspect the AnalysisArea geobox like this:
area.geobox
GeoBox
WKT
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Notice how the AnalysisArea has defaulted to using a 0.05° (~5km) resolution. This will be suitable for most analyses of areas ranging from sub-national to regional scope, but not for analysis of high-resolution data (ex. Landsat) on smaller areas. In that case, using the hi_res=True flag will change the default resolution to 30m.
bbox = (14.536457,38.551253,14.598598,38.586015)
time_range = "2020-01-01/2023-06-30"
area = AnalysisArea(
bbox=bbox,
datetime_range=time_range,
hi_res=True
)
area.geobox
GeoBox
WKT
BASEGEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 33N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",15,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Navigation and medium accuracy spatial referencing."],
AREA["Between 12°E and 18°E, northern hemisphere between equator and 84°N, onshore and offshore. Austria. Bosnia and Herzegovina. Cameroon. Central African Republic. Chad. Congo. Croatia. Czechia. Democratic Republic of the Congo (Zaire). Gabon. Germany. Hungary. Italy. Libya. Malta. Niger. Nigeria. Norway. Poland. San Marino. Slovakia. Slovenia. Svalbard. Sweden. Vatican City State."],
BBOX[0,12,84,18]],
ID["EPSG",32633]]
While it doesn't show up in the zoomed-in map above, we are looking at the beautiful Sicilian island of Filicudi!
Your turn!¶
Choose a bounding box (do you know bboxfinder?) and define your AnalysisArea.
- Did you manage to select the part of the world you intended?
- Is the resolution what you would expect? If not, did you set the correct hi_res flag?
- Is the EPSG code of the coordinate reference system the one you would expect?
Extra credit:
- A floating point number takes 8 bytes of memory. How much memory will 1 time step of your grid for 1 variable take? What about 1 year of 1 variable at dekadal frequency?
Further reading:
bbox = (37.968750,0.878872,50.976563,12.554564)
time_range = "2023-01-01/2023-04-01"
area = AnalysisArea(
bbox=bbox,
datetime_range=time_range
)
area.geobox
GeoBox
WKT
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
The AnalysisArea has a powerful get_dataset() method that allows you to access a variety of datasets and derived indicators.
Each dataset is identified by a key consisting of a list of 2 strings. For example, this line connects to CHIRPS rainfall estimates.
rfh = area.get_dataset(["CHIRPS","RFH_DEKAD"])
rfh
GDAL_DATA = /Users/paolo/miniforge3/envs/hip-analysis-dev-env/share/gdal GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5
<xarray.DataArray 'band' (time: 10, latitude: 235, longitude: 261)>
dask.array<add, shape=(10, 235, 261), dtype=float64, chunksize=(1, 235, 261), chunktype=numpy.ndarray>
Coordinates:
* latitude (latitude) float64 12.57 12.52 12.47 ... 0.975 0.925 0.875
* longitude (longitude) float64 37.98 38.02 38.08 ... 50.88 50.93 50.98
spatial_ref int32 4326
* time (time) datetime64[ns] 2023-01-01 2023-01-11 ... 2023-04-01
Attributes:
nodata: nanRaster data will be loaded into an xarray.DataArray or xarray.Dataset. You can learn more about how to use xarray here
The above image tells us a lot of useful information about the structure of the data returned, including coordinates, dimensions and data size and type. We'll cover these aspects in more detail later.
For the moment, less plot the rainfall values for one date:
rfh.sel(time='2023-03-21').plot()
<matplotlib.collections.QuadMesh at 0x1687837f0>
Customising the get_dataset call¶
The get_dataset allows the user to override some of the defaults. It will default to return data for the AnalysisArea geobox and datetime_range, but these can be customized in the call itself.
This is handy, for example, when reading long-term averages from the Humanitarian Data Cube. In this case dates are set to the arbitrary year of 1970. So, to get the average rainfall in the 3rd dekad of March, we can:
rfh_lta = area.get_dataset(["CHIRPS","RFH_DEKAD_LTA"], dates="1970-03-21")
rfh_lta.plot()
GDAL_DATA = /Users/paolo/miniforge3/envs/hip-analysis-dev-env/share/gdal GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5
<matplotlib.collections.QuadMesh at 0x168926620>
Finding datasets¶
hip-analysis comes with a growing set of predefined datasources out of the box. The full list and documentation for each can be found in the package docs page here. This catalogue is work-in-progress and will improve significantly over the next few months.
For example, let's look at NDVI from Landsat. We'll need to define a smaller bounding box and not forget the hi_res flag.
bbox = (44.622946,5.267375,44.658566,5.297913)
time_range = "2023-01-01/2024-01-01"
area = AnalysisArea(
bbox=bbox,
datetime_range=time_range,
hi_res=True,
)
ndvi = area.get_dataset(['LANDSAT', 'NDVI'])
[########################################] | 100% Completed | 105.85 ms [########################################] | 100% Completed | 105.28 ms
ndvi.isel(time=-1).plot()
<matplotlib.collections.QuadMesh at 0x1681dba90>
Notice the white patches? This is one example of the pre-processing of data that hip-analysis takes care of. In this case, cloud-covered pixels have automatically been masked with missing values.
Beyond curated datasets¶
hip-analysis connects to multiple STAC APIs: WFP's Humanitarian Data Cube, Microsoft's Planetary Computer, Digital Earth Africa, and Copernicus Dataspace. For the most commonly used datasets, the connection to the appropriate STAC API is taken care of by hip-analysis. For example, the above ["LANDSAT","NDVI"] key does not reveal where the data is loaded from (psst... it's Planetary Computer). hip-analysis also takes care of some pre-processing operations which, depending on the datasource, include scaling and offsetting data as needed, masking no data, and removing outliers.
However, datasets that are not already mapped to a curated key can still be loaded from the above STAC APIs via the get_dataset method. For example, the code below loads the crop_mask product from Digital Earth Africa without it being explicitly listed among the curated datasets in hip-analysis. Note, however, that the data will be loaded as-is, without any preprocessing or cleaning.
Note: For Copernicus Dataspace products, you'll need to set up S3 credentials as described in the credentials guide.
BBOX = (32.244186,-2.698206,32.470093,-2.526723)
DATES = None
area = AnalysisArea(bbox=BBOX, hi_res=True, datetime_range=DATES)
crop_mask = area.get_dataset(["DEA","crop_mask"])
crop_mask
GDAL_DATA = /Users/paolo/miniforge3/envs/hip-analysis-dev-env/share/gdal GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5 AWS_S3_ENDPOINT = s3.af-south-1.amazonaws.com AWS_NO_SIGN_REQUEST = YES
WARNING:root:nodata attribute is not set in the product '['DEA', 'crop_mask']'. Computing without masking nodata values.
<xarray.Dataset>
Dimensions: (y: 633, x: 839, time: 1)
Coordinates:
* y (y) float64 9.721e+06 9.721e+06 ... 9.702e+06 9.702e+06
* x (x) float64 4.16e+05 4.16e+05 4.16e+05 ... 4.411e+05 4.411e+05
spatial_ref int32 32736
* time (time) datetime64[ns] 2019-01-01
Data variables:
mask (time, y, x) float32 dask.array<chunksize=(1, 633, 839), meta=np.ndarray>
prob (time, y, x) float32 dask.array<chunksize=(1, 633, 839), meta=np.ndarray>
filtered (time, y, x) float32 dask.array<chunksize=(1, 633, 839), meta=np.ndarray>Your turn!¶
Look through the data sources offered by hip-analysis, or the wider list available from the supported STAC APIs. Pick one, load the data, and plot one time-step. For the moment, please use a small datetime_range argument, for example 3 months.
- Is the resolution, number of timesteps and other info about the data what you expected?
- Have missing values been masked?
- Was the loading time what you expected?
Extra credit:
- Try a second datasource!
Further reading:
Visualizing data¶
Let's return to the Horn of Africa and make a map of the 3-month Standardized Precipitation Index
bbox = (37.968750,0.878872,50.976563,12.554564)
time_range = "2018-01-01/2024-01-01"
area = AnalysisArea(
bbox=bbox,
datetime_range=time_range
)
spi3m = area.get_dataset(["CHIRPS","R3S_DEKAD"])
GDAL_DATA = /Users/paolo/miniforge3/envs/hip-analysis-dev-env/share/gdal GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5
(A tangent) lazy evaluation¶
Before we can proceed to exploring the data further, we should touch on a concept that is very important when handling large datasets: lazy evaluation.
Lazy evaluation is a computational design principle whereby data loading and/or computations are only carried out when explicitly requested or necessary. Otherwise, the program only records the necessary instructions to perform, while not actually doing anything to the data yet. You can learn more about lazy evaluation (and parallel computing) in the context of xarray here.
spi3m
<xarray.DataArray 'band' (time: 217, latitude: 235, longitude: 261)>
dask.array<add, shape=(217, 235, 261), dtype=float64, chunksize=(1, 235, 261), chunktype=numpy.ndarray>
Coordinates:
* latitude (latitude) float64 12.57 12.52 12.47 ... 0.975 0.925 0.875
* longitude (longitude) float64 37.98 38.02 38.08 ... 50.88 50.93 50.98
spatial_ref int32 4326
* time (time) datetime64[ns] 2018-01-01 2018-01-11 ... 2024-01-01
Attributes:
nodata: nanFor the moment, the important thing to become aware of is that when we see words like dask and chunks in our view of the DataArray like above, it could be that the data has not actually been loaded yet. Given the entire dataset is not very large (101 MiB), loading the data in memory will make it easier to explore.
spi3m = spi3m.load()
Making a map¶
hip-analysis currently offers some very basic map-making capabilities, though we intended to enrich these in the near future. Currently, an xarray.DataArray can be mapped via the .hip.mapping.map accessor. Once fully developed, this accessor would allow the creation of report-quality maps with a single line of code.
As we are plotting the SPI from HDC, we can make use of the pre-defined colour-map available through hdc-colors:
from hdc.colors.rainfall import rxs
spi3m.isel(time=0).hip.viz.map(title="SPI 3months", cmap=rxs.cmap)
Explore data interactively¶
Many spatiotemporal analyses involve exploring the evolution of a variable across timesteps and inspecting its timeseries at a specific location.
We recently exposed the point_timeseries accessor to be able to carry out such analysis interactively.
spi3m.hip.viz.point_timeseries()
Exporting and sharing data¶
You can export a specific timestep into a GeoTiff format, for example for analysis in QGIS, using:
from odc.geo.xr import write_cog
write_cog(spi3m.sel(time='2022-01-01'), "./my_geotiff.tiff")
Static plots and maps can be saved as png files, for example:
figure = spi3m.sel(time='2020-03-21').hip.viz.map(title="SPI 3months", cmap=rxs.cmap)
figure.savefig("/shared/public/hip-analysis-training-spi-horn-of-africa.png")
RAM-C's JupyterHub offers the /shared/public/ folder easy sharing of outputs. Any file saved there will be available via a browser by navigating to https://jupyter.earthobservation.vam.wfp.org/public/. For example, the SPI map above is available here
Your turn!¶
Look through the data sources offered by hip-analysis, or the wider list available from the supported STAC APIs. Pick one and load a few years of data.
- Use the visualization tools presented to explore the data.
- Make a map and share it in the public folder under
/shared/public/hip-analysis-training/
Further reading:
- Visualization and animation examples from our recent work on seasonality
- Interactive plots using hvplot