Self-supervised learning based clustering workflow for exploring
seismological data from dense networks
Abstract
A key challenge in environmental seismology is processing seismic data
to study source physics, natural and human-induced forcings and
geological structures such as landslides, glaciers, and volcanoes.
Seismic arrays with dozens of stations have expanded dataset sizes, and
this, combined with signal complexity and high noise levels, makes it
difficult to analyze using traditional event detection and labeling
methods, especially for low-energy or rare events. Clustering continuous
data offers a comprehensive method for exploring the datasets and
detecting all relevant events. In this study, we present an iterative
clustering workflow based on self-supervised learning (SSL) designed to
handle datasets ranging in size from thousands to millions of events.
This approach enables automated clustering of continuous data from
seismic arrays containing dozens of stations. When applied to the
”Marie-sur-Tinée” landslide dataset, our workflow processed 10 millions
30-second windows and identified four main groups: Potential Endogenous
Landslide Seismic Events, Potential Regional Earthquakes, Potential
Rainfall-Induced Signals, and Noise. Despite the overall consistency,
some noise remained in the event-related clusters, highlighting areas
for further improvement in clustering methods. Nevertheless, the
proposed iterative SSL-based clustering workflow shows great potential
for an efficient exploration of seismic datasets of millions of events
and could be a solution for the blind exploration of similarly large
datasets.