1. Introduction
Many marine mammal species have been observed in the Ross Sea region
marine protected area (Ainley and David, 2010; Erbe et al., 2017; Giorli
et al., 2019), which is the most pristine region in the world and one of
the least human-impacted marine environments. This region, known for its
biodiversity hotspots, includes the world’s largest marine protected
area established by the Commission for the Conservation of Marine Living
Resources (CCAMLR) (CCAMLR, 2016). Marine mammals, the high trophic
level of the food web, are a group representing the health of the marine
ecosystem (Nelms et al., 2019; Ross, 2000), and their abundance and
diversity could be important factors determining the state of the ocean
as sentinels. Because marine mammals spend most of their time
underwater, the traditional visual detection method is complemented by
underwater acoustic monitoring, which can record long-term data
independent of the time of day and weather conditions. Passive acoustic
monitoring (PAM) is an effective method for investigating the presence,
distribution, and behavior of soniferous marine mammals, especially in
polar environments where the observation conditions are poor.
The leopard seal Hydrurga leptonyx is classified as “least
concern” on the International Union for Conservation of Nature (IUCN)
Red List of threatened species (Hückstädt, 2015). This species, rarely
sighted and usually observed close to pack ice, is a top predator in the
ecosystem of the Ross Sea and is a highly vociferous during the austral
spring and summer (Stirling and Siniff, 1979; Rogers et al., 1996). Ray
first reported one underwater vocalization of leopard seals in 1970.
Stirling and Siniff quantitively described four call types in 1979
within a frequency range of 150 and 5,900 Hz, and vocalizations have
been categorized into two groups: “broadcast calls” and “local
calls” (Rogers et al., 1996). “Broadcast calls” made by mature
species and not intended for a particular receiver may function for
mating and/or territorial indication at a long distance and consist of
high double trill (HDT), medium single trill (MST), low double trill
(LDT), descending trill (DT), hoot with single trill (HST), and hoot
(H). Highly stereotyped broadcast calls are observed only from December
to January when female seals exhibit elevated estradiol levels
associated with sexual receptivity (Rogers et al., 1996). Call types of
medium double trill (MDT) and ascending trill (AT) have also been
reported (Stirling and Siniff, 1979; Klinck, 2008), and call repertoires
and acoustic characteristics show geographic differences (Rogers et al.,
1995; Kreiss et al., 2013; Klinck, 2008). “Local calls”, which are
relatively less studied, are associated with close interactions between
two seals and consist of growl, snort, thump pulse, noseblast, roar, and
blast (Rogers et al., 1996). Ultrasonic sounds up to 164 kHz were also
recorded from captive seals as they chased fish (Thomas et al., 1983).
Adult male seals produce highly stereotyped broadcast calls, and
subadult males have more variant calls (Rogers, 2007). In addition, the
source level assuming spherical spreading was estimated to be from 153
to 177 dB re 1 \(\mu\text{Pa}\) (Rogers, 2014).
Most studies on the acoustic characteristics of leopard seal
vocalizations have predominantly concentrated on broadcast calls rather
than local calls, with a particular focus on HDT and LDT, which
constitute the largest portion of the underwater vocal repertoire
(Kreiss et al., 2013; Rogers and Cato, 2002; Van Opzeeland et al.,
2010). They have primarily utilized spectrograms to estimate acoustic
characteristics such as frequency bandwidth (encompassing maximum and
minimum frequencies), peak frequency, call duration, and pulse
repetition rate (PRR). The inherent variability in both the amplitude
and structure of vocal signals, coupled with the spatiotemporal
variability of background noise levels, may pose challenges in
estimating these characteristics and in the effective detection of call
signals. Therefore, it is necessary to apply adaptive extraction methods
that consider the composition and variation of the vocal signals to
achieve more precise acoustic characteristics. In terms of detection
methods, automatic detection methods for marine mammal vocalizations in
long-term PAM data have been suggested (Erbe and King, 2008; Miller et
al., 2021), but their application to leopard seals has been limited
(Klinck et al., 2008; Klinck, 2008). Recent efforts in developing
algorithms using artificial intelligence have been actively conducted
(Shamir et al., 2014; Shiu et al., 2020), and well-labeled datasets
based on manual detection results are imperative for facilitating the
development of robust models. Therefore, acoustic data observed
underwater, which are more inaccessible than in terrestrial
environments, are valuable resources prior to the development of
automated detection and analysis algorithms, especially for protected
species such as leopard seals.
Our study presents the first results of underwater passive acoustic
monitoring on Seaview Bay off Inexpressible Island, designated the
Antarctic Specially Protected Area (ASPA) No. 178 by the Secretariat of
the Antarctic Treaty (ATS) (MOE, 2020). We focused on leopard seal
vocalizations, which are rarely studied but commonly recorded in this
region during the mating season, and the acoustic characteristics and
temporal variations of each call type were investigated. An unmanned
aerial vehicle (UAV) and a monitoring camera, which have been used
actively for marine environment monitoring recently, were operated
together to overcome the limitations of underwater acoustic monitoring,
which requires acoustic source classification based on previous reports.