loading page

Self-supervised learning based clustering workflow for exploring seismological data from dense networks
  • +2
  • Joachim Rimpot,
  • Clément Hibert,
  • Jean-Philippe Malet,
  • Germain Forestier,
  • Jonathan Weber
Joachim Rimpot
University of Strasbourg, Institut Terre et Environnement de Strasbourg (ITES), CNRS UMR 7063

Corresponding Author:[email protected]

Author Profile
Clément Hibert
University of Strasbourg, Institut Terre et Environnement de Strasbourg (ITES), CNRS UMR 7063, Ecole et Observatoire des Sciences de la Terre (EOST), CNRS UAR 830, Université de Strasbourg
Author Profile
Jean-Philippe Malet
University of Strasbourg, Institut Terre et Environnement de Strasbourg (ITES), CNRS UMR 7063, Ecole et Observatoire des Sciences de la Terre (EOST), CNRS UAR 830, Université de Strasbourg
Author Profile
Germain Forestier
Institut de Recherche en Informatique, Mathématiques, Automatique et Signal (IRIMAS), UR 7499, University of Haute-Alsace,
Author Profile
Jonathan Weber
Institut de Recherche en Informatique, Mathématiques, Automatique et Signal (IRIMAS), UR 7499, University of Haute-Alsace,
Author Profile

Abstract

A key challenge in environmental seismology is processing seismic data to study source physics, natural and human-induced forcings and geological structures such as landslides, glaciers, and volcanoes. Seismic arrays with dozens of stations have expanded dataset sizes, and this, combined with signal complexity and high noise levels, makes it difficult to analyze using traditional event detection and labeling methods, especially for low-energy or rare events. Clustering continuous data offers a comprehensive method for exploring the datasets and detecting all relevant events. In this study, we present an iterative clustering workflow based on self-supervised learning (SSL) designed to handle datasets ranging in size from thousands to millions of events. This approach enables automated clustering of continuous data from seismic arrays containing dozens of stations. When applied to the ”Marie-sur-Tinée” landslide dataset, our workflow processed 10 millions 30-second windows and identified four main groups: Potential Endogenous Landslide Seismic Events, Potential Regional Earthquakes, Potential Rainfall-Induced Signals, and Noise. Despite the overall consistency, some noise remained in the event-related clusters, highlighting areas for further improvement in clustering methods. Nevertheless, the proposed iterative SSL-based clustering workflow shows great potential for an efficient exploration of seismic datasets of millions of events and could be a solution for the blind exploration of similarly large datasets.