Identification of buffered data in time series preprocessing: application on surface river temperature

Nelly Moulin; Frederic Gresselin; Bruno Dardaillon; Zahra Thomas

doi:10.22541/au.172114549.97246945/v1

loading page

Identification of buffered data in time series preprocessing: application on surface river temperature

Nelly Moulin,
Frederic Gresselin,
Bruno Dardaillon,
Zahra Thomas

Abstract

With the growing number of sensors technologies, the production of numerous types of data allows finer observations of our environment. Among them, time series represent a valuable heritage by the time spent on their recording and the information they contain. However, the analysis of time series produced by a monitoring network generally requires preprocessing steps to separate data with meaningful information from sensors’ dysfunctions or measurement particular conditions. In this context, outliers are already well studied and several methods are already developed to identify them. In this paper, we propose a complementary method to identify buffered data. Buffered data are characterized by a lower amplitude than the rest of the time series and can be naturally caused (groundwater influence for example) or caused by measurement defects (sensor covered by sediment movements). The necessity to identify buffered signals came with the use of data coming from several databases with different level of qualification. Buffered signals are not necessarily filtered with conventional preprocessing methods and can affect the analysis when not related to the studied phenomena. The identification method proposed in this study relies on a normalized diurnal range index. It was developed on surface river temperature time series recorded in metropolitan France to cover a wide variety of regional climates and measurement environments. The method is able to highlight buffered data inside a time series. Furthermore, it is able to separate (naturally caused or not) occasional or regular buffered signal periods in a time series. The study then uses preprocessed time series to analyze the distribution of regular buffered data according to the season of occurrence and a climate typology.