3.1 Clustering of stations based on seasonality measures
Directional statistics can be used to define similarity measures from
the timing of zero flow conditions. The first step is to convert dates
into the day-in-year, which is the day of a year starting from 1 April,
into an angular value (Burn, 1997):
\(\theta_{i}=\left(\text{Julian\ Date}\right)_{i}\left(\frac{2\pi}{365}\right)\)(1)
where θi is the angular value in radians for the
zero-discharge day I In leap years, the denominator was increased
by one. All zero-discharge days can then be seen as vectors with unit
magnitude and direction given by θi . Then, for a
sample of n dates, the and coordinates of the mean date
can be determined as:
\(\overline{x}=\frac{1}{n}\sum_{i=1}^{n}{\cos\left(\theta_{i}\right)}\)(2)
\(\overline{y}=\frac{1}{n}\sum_{i=1}^{n}{\sin\left(\theta_{i}\right)}\)(3)
The mean direction (the mean date) \(\in[0,\ 2\pi)\) of
zero-flow dates for a given station can be then obtained from:
\(\underline{θ}= \arctan^*(\frac{y}{x})\) (4)
where \(\arctan^{*}()\) is the quadrant-specific inverse of the tangent
function. The measure of the variability of the n occurrences
around the mean date is the mean resultant length:
\(\underline{r}=\sqrt{\underline{x}^2+\underline{y}^2}\) (5)
It should be noted that \(0<\leq 1\) and that near to 1 implies
little variation and high concentration of data, and near to 0 a
large variation and wide dispersion around the mean date.
The clustering is based on the and metrics calculated for
winter and summer, using the Ward distance. The identification of the
optimal number of clusters is achieved with the help of the silhouette
plot and visual inspection of the clusters obtained.