1. Introduction
Many marine mammal species have been observed in the Ross Sea region marine protected area (Ainley and David, 2010; Erbe et al., 2017; Giorli et al., 2019), which is the most pristine region in the world and one of the least human-impacted marine environments. This region, known for its biodiversity hotspots, includes the world’s largest marine protected area established by the Commission for the Conservation of Marine Living Resources (CCAMLR) (CCAMLR, 2016). Marine mammals, the high trophic level of the food web, are a group representing the health of the marine ecosystem (Nelms et al., 2019; Ross, 2000), and their abundance and diversity could be important factors determining the state of the ocean as sentinels. Because marine mammals spend most of their time underwater, the traditional visual detection method is complemented by underwater acoustic monitoring, which can record long-term data independent of the time of day and weather conditions. Passive acoustic monitoring (PAM) is an effective method for investigating the presence, distribution, and behavior of soniferous marine mammals, especially in polar environments where the observation conditions are poor.
The leopard seal Hydrurga leptonyx is classified as “least concern” on the International Union for Conservation of Nature (IUCN) Red List of threatened species (Hückstädt, 2015). This species, rarely sighted and usually observed close to pack ice, is a top predator in the ecosystem of the Ross Sea and is a highly vociferous during the austral spring and summer (Stirling and Siniff, 1979; Rogers et al., 1996). Ray first reported one underwater vocalization of leopard seals in 1970. Stirling and Siniff quantitively described four call types in 1979 within a frequency range of 150 and 5,900 Hz, and vocalizations have been categorized into two groups: “broadcast calls” and “local calls” (Rogers et al., 1996). “Broadcast calls” made by mature species and not intended for a particular receiver may function for mating and/or territorial indication at a long distance and consist of high double trill (HDT), medium single trill (MST), low double trill (LDT), descending trill (DT), hoot with single trill (HST), and hoot (H). Highly stereotyped broadcast calls are observed only from December to January when female seals exhibit elevated estradiol levels associated with sexual receptivity (Rogers et al., 1996). Call types of medium double trill (MDT) and ascending trill (AT) have also been reported (Stirling and Siniff, 1979; Klinck, 2008), and call repertoires and acoustic characteristics show geographic differences (Rogers et al., 1995; Kreiss et al., 2013; Klinck, 2008). “Local calls”, which are relatively less studied, are associated with close interactions between two seals and consist of growl, snort, thump pulse, noseblast, roar, and blast (Rogers et al., 1996). Ultrasonic sounds up to 164 kHz were also recorded from captive seals as they chased fish (Thomas et al., 1983). Adult male seals produce highly stereotyped broadcast calls, and subadult males have more variant calls (Rogers, 2007). In addition, the source level assuming spherical spreading was estimated to be from 153 to 177 dB re 1 \(\mu\text{Pa}\) (Rogers, 2014).
Most studies on the acoustic characteristics of leopard seal vocalizations have predominantly concentrated on broadcast calls rather than local calls, with a particular focus on HDT and LDT, which constitute the largest portion of the underwater vocal repertoire (Kreiss et al., 2013; Rogers and Cato, 2002; Van Opzeeland et al., 2010). They have primarily utilized spectrograms to estimate acoustic characteristics such as frequency bandwidth (encompassing maximum and minimum frequencies), peak frequency, call duration, and pulse repetition rate (PRR). The inherent variability in both the amplitude and structure of vocal signals, coupled with the spatiotemporal variability of background noise levels, may pose challenges in estimating these characteristics and in the effective detection of call signals. Therefore, it is necessary to apply adaptive extraction methods that consider the composition and variation of the vocal signals to achieve more precise acoustic characteristics. In terms of detection methods, automatic detection methods for marine mammal vocalizations in long-term PAM data have been suggested (Erbe and King, 2008; Miller et al., 2021), but their application to leopard seals has been limited (Klinck et al., 2008; Klinck, 2008). Recent efforts in developing algorithms using artificial intelligence have been actively conducted (Shamir et al., 2014; Shiu et al., 2020), and well-labeled datasets based on manual detection results are imperative for facilitating the development of robust models. Therefore, acoustic data observed underwater, which are more inaccessible than in terrestrial environments, are valuable resources prior to the development of automated detection and analysis algorithms, especially for protected species such as leopard seals.
Our study presents the first results of underwater passive acoustic monitoring on Seaview Bay off Inexpressible Island, designated the Antarctic Specially Protected Area (ASPA) No. 178 by the Secretariat of the Antarctic Treaty (ATS) (MOE, 2020). We focused on leopard seal vocalizations, which are rarely studied but commonly recorded in this region during the mating season, and the acoustic characteristics and temporal variations of each call type were investigated. An unmanned aerial vehicle (UAV) and a monitoring camera, which have been used actively for marine environment monitoring recently, were operated together to overcome the limitations of underwater acoustic monitoring, which requires acoustic source classification based on previous reports.