4. Discussion
Our research was motivated by the objective of implementing passive acoustic monitoring in a region that is known for its well-preserved marine environment. During the analysis, we identified leopard seal vocalization as the dominant sound source. Unlike many previous studies that have focused on vocalizations of target species from identified individuals from an ecological perspective, our research aimed to interpret the vocalizations produced by unidentified individuals of both sexes and ages after the measurements.
One of the most significant findings in our results was the identification of a new call type, the triple ascending trill. The ascending trill has been mentioned in only four previous studies (Shabangu and Rogers, 2021; Van Opzeeland et al., 2010; Klinck, 2008; Thomas and Golladay, 1995). Specifically, Van Opzeeland et al. (2010) and Klinck (2008) reported it as a single ascending trill with a spectrogram, while Thomas and Golladay (1995) described it as call 4, consisting of two or three components. Most of the ascending trills observed in our data comprised three distinguishable trill parts. In contrast, the single ascending trills recorded in our measurements had a very low signal-to-noise ratio, and some triple ascending trills exhibited faint second and third ascending trill parts (Figure 7A, B), making clear identification challenging. This suggests that AT is fundamentally composed of three parts, but weak single ascending trills might be recorded when the distance between the recorder and the individual is large. Consequently, the single ascending trill was excluded from detection and acoustic characterization due to its significantly higher detection uncertainty. To verify whether the triple ascending trill is a call of the leopard seal, we checked video data containing the airborne vocalizations of the male leopard seal, taken on December 12 after the underwater acoustic measurements for 26.9 minutes using a monitoring camera (48,000 Hz sampling rate). We extracted fifty-five vocal signals, which were divided into 7 call types — HDT, MST, LDT, DT, triple ascending trill (Figure 7D), low single trill (Figure 7E) and, hoot (Figure 7F) — from the 4.4-minute video, excluding indistinct signals due to the wind noise and sections without vocalizations. The call pattern of the triple ascending trill observed in the air was similar to that recorded underwater. The lack of the colorbar on the vocal spectrogram recorded in the air was due to our inability to obtain the receiving voltage sensitivity of the camera microphone from the manufacturer, which rendered quantitative analysis of the airborne acoustic data impossible. The airborne vocalizations exhibit more apparent trill patterns and harmonic components than those recorded underwater, with much less reverberation. Reverberation in the underwater acoustic waveguide lasts relatively longer compared to air due to interaction with ocean boundaries such as the sea surface and seafloor (Katsnelson et al., 2012). This supports the validity of the method for determining the end point of the call by the end point of the amplitude modulation pattern. While most of the HDT, MST, and LDT waveforms exhibited relatively distinct amplitude modulation patterns, those of the trill parts in HST, DT, and AT were not clearly visible. This difference may be caused by their vocal mechanisms, but further verification is required to confirm the reason. Since we focused on the underwater vocalizations of leopard seals and airborne data were used to support the observations of triple ascending trill, detailed information regarding airborne vocalizations can be found in previous studies (Rogers et al., 1995).
In our study, call types were categorized based on previous research; however, there were low single trill and LDT cases without a strong narrow component and indistinct double trill cases, similar to low single trill in the frequency band overlapping with LDT. As these were identified as variant calls of LDT in a previous study (Rogers, 2007), we did not classify them as separate call types. Additionally, HST exhibited large variations in the duration and interval of hoot and trill. Consequently, the uncertainty in the call counts of low-frequency vocalizations, including DT, was high. Furthermore, single hoot was detected in all the acoustic data; however, due to the significant variability in its sound pressure level and the difficulty in distinguishing them from HST, they were not included in the call count, similar to the single ascending trill. From 101 sample signals of single hoot with relatively high signal-to-noise ratios, the estimated peaks, minimum frequencies, maximum frequencies, and call duration were 182 (± 10 SD), 163 (± 8 SD), 201 (± 10 SD) Hz and 2.7 (± 0.4 SD) seconds, respectively. A representative spectrogram of a single hoot is shown in Figure 7C. We also addressed that the process of calculating the upper and lower limit frequencies of the HDT, MST, and triple ascending trill, which are relatively broadband calls, based on their contrast against the ambient noise level, which is also a meaningful point of this study. Despite these efforts, the call rates and acoustic characteristics of each call type were estimated, acknowledging that the manual process of detecting calls and determining the start points of sample calls may be subject to uncertainty. In particular, call detection under low signal-to-noise ratio conditions remains technically challenging. We have established call datasets, which is clustered within a narrow low-frequency bandwidth, and they will be applied to development of automatic detection and classification algorithms as foundational data in future studies.