Joachim Rimpot - 21DOCS Test Area

A key challenge in environmental seismology is processing seismic data to study source physics, natural and human-induced forcings and geological structures such as landslides, glaciers, and volcanoes. Seismic arrays with dozens of stations have expanded dataset sizes, and this, combined with signal complexity and high noise levels, makes it difficult to analyze using traditional event detection and labeling methods, especially for low-energy or rare events. Clustering continuous data offers a comprehensive method for exploring the datasets and detecting all relevant events. In this study, we present an iterative clustering workflow based on self-supervised learning (SSL) designed to handle datasets ranging in size from thousands to millions of events. This approach enables automated clustering of continuous data from seismic arrays containing dozens of stations. When applied to the ”Marie-sur-Tinée” landslide dataset, our workflow processed 10 millions 30-second windows and identified four main groups: Potential Endogenous Landslide Seismic Events, Potential Regional Earthquakes, Potential Rainfall-Induced Signals, and Noise. Despite the overall consistency, some noise remained in the event-related clusters, highlighting areas for further improvement in clustering methods. Nevertheless, the proposed iterative SSL-based clustering workflow shows great potential for an efficient exploration of seismic datasets of millions of events and could be a solution for the blind exploration of similarly large datasets.