3.1 Clustering of stations based on seasonality measures
Directional statistics can be used to define similarity measures from the timing of zero flow conditions. The first step is to convert dates into the day-in-year, which is the day of a year starting from 1 April, into an angular value (Burn, 1997):
\(\theta_{i}=\left(\text{Julian\ Date}\right)_{i}\left(\frac{2\pi}{365}\right)\)(1)
where θi is the angular value in radians for the zero-discharge day I In leap years, the denominator was increased by one. All zero-discharge days can then be seen as vectors with unit magnitude and direction given by θi . Then, for a sample of n dates, the and coordinates of the mean date can be determined as:
\(\overline{x}=\frac{1}{n}\sum_{i=1}^{n}{\cos\left(\theta_{i}\right)}\)(2)
\(\overline{y}=\frac{1}{n}\sum_{i=1}^{n}{\sin\left(\theta_{i}\right)}\)(3)
The mean direction (the mean date) \(\in[0,\ 2\pi)\) of zero-flow dates for a given station can be then obtained from:
\(\underline{θ}= \arctan^*(\frac{y}{x})\) (4)
where \(\arctan^{*}()\) is the quadrant-specific inverse of the tangent function. The measure of the variability of the n occurrences around the mean date is the mean resultant length:
\(\underline{r}=\sqrt{\underline{x}^2+\underline{y}^2}\) (5)
It should be noted that \(0<\leq 1\) and that near to 1 implies little variation and high concentration of data, and near to 0 a large variation and wide dispersion around the mean date.
The clustering is based on the and metrics calculated for winter and summer, using the Ward distance. The identification of the optimal number of clusters is achieved with the help of the silhouette plot and visual inspection of the clusters obtained.