Seasonality¶
Welcome to this tutorial on using hip-analysis for seasonality detection. We'll walk you through each step of the analysis process, including preparing masks, categorizing mono-seasonal and bi-seasonal regions, conducting short-term and long-term analyses, and exploring various visualization options.
Before we dive into the analysis, let's make sure that the data you're working with is stored as a raster with three dimensions: time, geo_x_axis, and geo_y_axis. It's important to note that some tasks require a minimum amount of time-related data, which we'll explain in more detail in the corresponding sections.
First let's get prepared by importing necessary packages.
from hip.analysis.aoi import AnalysisArea
import hip.analysis
import pandas as pd
import xarray as xr
import numpy as np
Preparation¶
In this section, we'll lay the groundwork for creating masks that identify regions with year-round or out-of-season activity, as well as discern whether an area displays a single or dual seasonal pattern.
Creating non-computing mask¶
The non-computing mask serves to exclude unnecessary calculations and identify year-round or out-of-season regions. To achieve this, we recommend using long-term average data, such as r1h_dekad_lta. These datasets offer smoothed values between data points, providing generalized seasonal patterns.
It's essential to operate with two additional dataset copies covering a three-year span, denoted as dup_data in the coding section. The use of a one-year dataset could potentially result in the incorrect identification of a starting date. This becomes particularly significant if a season was already in progress during the previous year.
bbox = (-20, -35, 56, 38) # Africa for example
area = AnalysisArea(
bbox=bbox,
datetime_range="1970-01-01/1970-12-31"
)
lta = area.get_dataset(["CHIRPS","r1h_dekad_lta"])
def make_rep_data(da, tot_yr):
da_c = da.copy()
for i in range(tot_yr - 1):
da_c['time'] = pd.to_datetime(da_c.time) + pd.offsets.DateOffset(years=1)
da = xr.concat([da, da_c], dim='time')
return da
dup_data = make_rep_data(lta, 3).chunk({'time':-1, 'latitude':146, 'longitude':152})
GDAL_DATA = /envs/user/hip-workshop/lib/python3.10/site-packages/rasterio/gdal_data GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5 AWS_ACCESS_KEY_ID = xx..xx AWS_SECRET_ACCESS_KEY = xx..xx AWS_SESSION_TOKEN = xx..xx
STARTING_THRESHOLD = 50
ENDING_THRESHOLD = 50
PERIOD = 36
idx_ts = dup_data.hip.seasons.get_seasons_by_threshold(
s_threshold=STARTING_THRESHOLD,
e_threshold=ENDING_THRESHOLD,
period=PERIOD,
output_type='soc_eoc_idxs')
dry_mask = xr.where(dup_data.isel(time=0).notnull() & idx_ts.isel(soc_eoc_idx=0).isnull(), True, np.nan)
wet_mask = xr.where(idx_ts.isel(soc_eoc_idx=0) == -1, True, np.nan)
Determining unimodal and bimodal regions¶
In specific analytical contexts, particularly in long-term analyses, our aim extends beyond assessing seasonal patterns in unimodal regions. We also seek to evaluate these patterns for each individual season within bimodal regions, involving the classification of mono-seasonal and bi-seasonal regimes. To achieve this, we recommend using harmonic analysis directly on the original long-term time-series raster or alternatively on an averaged one-year dataset, such as r1h_dekad_lta.
Assuming your dataset is named avg_data, the outcome will be a mask containing 'True' values in bimodal regions, 'False' values in unimodal areas, and 'null' values for oceanic regions.
avg_data = lta.chunk({'time':-1, 'latitude':146, 'longitude':152})
bimodal_mask = avg_data.hip.seasons.harmonic_analysis(output_type='bimodal_mask')
Climatological Start and End (SoC, EoC)¶
The climatological start and end analysis aims to unveil dominant seasonal patterns over an extended period. This process utilizes two distinct methods: the threshold-based method and the anomaly-based method, both yielding comparable results.
Threshold-based method¶
This method follows a procedure similar to that outlined in the previous section. To enhance accuracy and prevent misleading start dates, two additional copies of the dataset are employed. Notably, pixel-based thresholding is applied, allowing the use of 2D DataArray inputs for s_threshold and e_threshold. The function call is almost the same as calculating idx_ts here, while we suggest adding repeat_end_constraint to rule out potential fake ends.
Anomaly-based method¶
Contrasting the threshold-based method, the anomaly-based approach requires a long-term time series (long_term_ts) as input instead of an averaged quantity over a standard period.
The exploration process hinges on your specific focus. Optionally, you can specify a precomputed bimodal mask using bimodal_mask. In the absence of a provided mask, the function will autonomously calculate it in advance.
area = AnalysisArea(
bbox=bbox,
datetime_range="2015-01-01/2020-12-31"
)
long_term_ts = area.get_dataset(["CHIRPS","r1h_dekad"]).chunk({'time':-1, 'latitude':146, 'longitude':152})
GDAL_DATA = /envs/user/hip-workshop/lib/python3.10/site-packages/rasterio/gdal_data GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5 AWS_ACCESS_KEY_ID = xx..xx AWS_SECRET_ACCESS_KEY = xx..xx AWS_SESSION_TOKEN = xx..xx
soc_eoc = long_term_ts.hip.seasons.get_seasons_by_anomaly(
output_type='soc_eoc_ts'
)
Yearly Start and End¶
In yearly analysis, our primary objectives revolve around saving historical Start of Season (SoS) and End of Season (EoS) indices for future reference, as well as deriving meaningful statistics from this data.
- Saving Historical Indices: To preserve historical indices for future analysis, our primary goal is to extract Start of Season (SoS) and End of Season (EoS) indices within a search window defined by climatological start and end points. Two methods share this common approach. Following this extraction, we employ post-processing techniques, marking SoS and EoS in humid regions as -1 and eliminating short seasons. All these steps are performed on the one-year extended period of the long-term observational raster (
etd_ts).
etd_ts = long_term_ts
# Threshold-based method
SEARCH_WINDOW = 4
SOC_EOC = idx_ts
sos_eos_t = etd_ts.hip.seasons.get_seasons_by_threshold(
s_threshold=STARTING_THRESHOLD,
e_threshold=ENDING_THRESHOLD,
soc_eoc_idx=SOC_EOC,
search_wd=SEARCH_WINDOW,
output_type='yearly_idxs')
# Anomaly-based method
## bimodal_mask and soc_eoc can be skipped
BIMODAL_MASK = bimodal_mask # or None
SOC_EOC = soc_eoc # or None
sos_eos_a = etd_ts.hip.seasons.get_seasons_by_anomaly(
bimodal_mask=BIMODAL_MASK,
soc_eoc=SOC_EOC,
output_type='sos_eos_ts')
- Calculating Descriptive Statistics: Our second objective is to summarize valuable insights from the historical indices using circular statistics. To accomplish this, we calculate the circular mean and variance for the SoS indices. The circular mean provides us with a single value that represents the average timing of the SoS events over the years, while the circular variance helps us gauge the variation in these events. These statistics are essential for summarizing historical patterns and gaining a deeper understanding of the data. To provide a practical illustration, here's an example of applying these principles to the start of the first season (
SoS_1):
from hip.analysis.ops._statistics import cir_mean, cir_var
# Calculate Circular Mean for each SoS and EoS
cir_m = cir_mean(sos_eos_t, input_core_dim='year')
# Calculate Circular Variance for each SoS and EoS
cir_v = cir_var(sos_eos_t, input_core_dim='year')
Visualization¶
In this section, we will explore diverse visualization techniques for presenting seasons. These methods include static maps, animations, interactive plots, and web applications. But keep in mind that there isn't a single "best" way to plot seasonality detection results – it depends on the situation. We will provide insights into the recommended use cases for each approach. We'll start with the simplest techniques that need fewer external tools and move on to more advanced ones.
Static map with cartopy¶
An easy-to-use technique is to call the .plot() method on a single DataArray time layer. This layer can be extracted by specifying a particular datetime or time index from a 3D seasonality output raster. To make the map more informative, it's a good idea to incorporate country borders and coastlines, and that's where the cartopy library comes in.
Here's an example: Let's say we want to show when the season first starts (as discussed in Section 1. To make things clearer, you can adjust the map's transparency alpha so that country borders don't get hidden by dark colors. If you want a color scale with discrete colors, you can set how many sections you want using the levels setting. Since the values we're dealing with are in the range of [0, period - 1], you can set levels to period + 1, and also set the lowest (vmin) and highest (vmax) values accordingly.
idx = cir_m
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, subplot_kw=dict(projection=ccrs.PlateCarree()))
p = idx.isel(idx=0).plot(cmap='nipy_spectral_r',
levels=PERIOD+1,
transform=ccrs.PlateCarree(),
alpha=0.75,
vmin=0,
vmax=PERIOD)
ax.coastlines()
ax.add_feature(cfeature.BORDERS)
ax.set_title('Your title', fontsize=15)
Text(0.5, 1.0, 'Your title')
Animation with matplotlib.animation¶
In certain cases, it's valuable to gain a broader perspective on the transition between seasons or regime shifts over an extended timeframe. While static plots for individual time points are informative, they might become overwhelming when dealing with a large number of plots or when you need to focus on specific areas. To address this, we introduce an alternative approach: FuncAnimation.
This technique involves generating data for the initial frame and subsequently modifying this data for each subsequent frame, which offers a dynamic way to explore temporal changes in the data. It proves particularly advantageous when dealing with masked time series visualization and examining shifts in single-seasonal and bi-seasonal patterns. Here's an example for masked time series visualization. For other video format, please refer to the saving animations chapter.
rfh = long_term_ts.isel(time = slice(0,5))
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, PillowWriter
# Get masked time series
mts = rfh.hip.seasons.get_seasons_by_threshold(output_type = 'mts_idxs')
# Set up figure with matplotlib
fig, ax = plt.subplots(
1, 1, figsize=(8, 6),
subplot_kw=dict(projection=ccrs.PlateCarree())
)
# Create base plot and static features (section 3.1)
p = mts.isel(time=0).plot.imshow(ax=ax,
transform=ccrs.PlateCarree(),
animated=True,
cmap='nipy_spectral_r',
levels=PERIOD+1,
alpha=0.75,
vmin=0,
vmax=PERIOD)
ax.coastlines()
ax.add_feature(cfeature.BORDERS)
# Update animated elements for a specific time
def update(t):
# print(t)
ax.set_title(f'Your title at time {t}', fontsize=15)
p.set_array(mts.isel(time=t))
return p,
# Run the animation and apply `update()` for each time points
anim = FuncAnimation(fig,
update,
frames=range(len(mts.time)),
blit=False,
repeat_delay=10000)
# Show the animation
plt.show()
# Save it to .gif
writergif = PillowWriter(fps=2)
anim.save('My animation.gif',writer=writergif)
Interactive plot with hvplot¶
When regular plots aren't enough and you need a closer look at your data down to the tiniest detail, hvplot steps in. It's a tool that powers your visualizations with cool interactive features. In this guide, we'll look at two simple ways to use hvplot in seasonality detection:
- Comparing Data on Maps: Sometimes, you want to compare data on two maps. hvplot makes it easy. Just use
.hvplot()instead of.plot()after your DataArray. You can put multiple plots side by side using+, or combine them into one using*. But be careful, using hvplot can be memory-intensive, even if you you decide to make a static plot withdatashade=True.
data_layer1 = idx.isel(idx=0)
data_layer2 = idx.isel(idx=2)
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import hvplot.xarray
from holoviews import opts
import holoviews as hv
vmin = 0
vmax = PERIOD
plot1 = data_layer1.hvplot(frame_width=350,
frame_height=350,
geo=True,
cmap='nipy_spectral_r',
alpha=0.75,
features=['borders','coastline'],
title='title 1', clim=(vmin,vmax))
plot2 = data_layer2.hvplot(frame_width=350,
frame_height=350,
geo=True,
cmap='nipy_spectral_r',
alpha=0.75,
features=['borders','coastline'],
title='title 2', clim=(vmin,vmax))
plot1 + plot2
- Tracking Time Series for a Clicked Pixel: Imagine you are tracking changes over time on a map. With hvplot's tap stream feature, you can click on a specific point on the map and access the time series data for that precise location. This is achieved by utilizing
holoviews.streams.Tapto retain the clicked coordinates and subsequently passing them to a second plot throughholoviews.DynamicMap. We recommend combining line and scatter elements in your time series graph. This combination effectively portrays both the overarching trends and individual data points.
left_data_layer1 = data_layer1
right_data1 = lta
bbox = (-20, -35, 56, 38) # Africa for example
area = AnalysisArea(
bbox=bbox,
datetime_range="1970-01-01/1970-12-31"
)
right_data2 = area.get_dataset(["CHIRPS","rfh_dekad_lta"])
GDAL_DATA = /envs/user/hip-workshop/lib/python3.10/site-packages/rasterio/gdal_data GDAL_DISABLE_READDIR_ON_OPEN = EMPTY_DIR GDAL_HTTP_MAX_RETRY = 10 GDAL_HTTP_RETRY_DELAY = 0.5 AWS_ACCESS_KEY_ID = xx..xx AWS_SECRET_ACCESS_KEY = xx..xx AWS_SESSION_TOKEN = xx..xx
# Create the map to be clicked on
map_left = left_data_layer1.hvplot(geo=True,
frame_width=400,
frame_height=400,
alpha=0.75,
cmap='nipy_spectral_r',
features=['borders','coastline'],
title="Map to be clicked on")
# Initialize the x-y position
posxy = hv.streams.Tap(source=map_left.image.I, x=(bbox[0]+bbox[2])/2, y=(bbox[3]+bbox[1])/2)
# Define the right plot with x-y position input
def tap_histogram(x, y):
# Map clicked x-y into nearest lat, lon values
y = right_data1.latitude.values[np.abs(y-right_data1.latitude.values).argmin()]
x = right_data1.longitude.values[np.abs(x-right_data1.longitude.values).argmin()]
right_plot_1 = right_data1.sel(latitude=y, longitude=x).hvplot.line(label='first line') *\
right_data1.sel(latitude=y, longitude=x).hvplot.scatter(label='first line')
right_plot_2 = right_data2.sel(latitude=y, longitude=x).hvplot.line(label='second line') *\
right_data2.sel(latitude=y, longitude=x).hvplot.scatter(label='second line')
return (right_plot_1 * right_plot_2).opts(
legend_position='top_left',
frame_width=700,
frame_height=400,
title='Latitude: %0.3f, Longitude: %0.3f' % (y, x))
# Bind the time series plot with chosen x-y from the left map
tap_dmap = hv.DynamicMap(tap_histogram, streams=[posxy])
# Show the map and time series
map_left + tap_dmap