Species Distribution Models
Albacore habitat suitability was modeled in relation to environmental variables using three distinct binomial generalized additive models (GAMs), each differing in their methods to combine data and treatment of spatial dependence (Table 1): a data pooled habitat-envelope model, a data pooled Gaussian field model, and a joint-likelihood Gaussian field model. All models used a logit link and were fitted with the Integrated Nested Laplace Approximation (INLA) framework (Rue et al. 2009), which offers a computationally efficient alternative to other methods of Bayesian inference (i.e., Markov chain Monte Carlo) while also enabling smoothing approaches akin to frequentist GAMs (e.g., as implemented in the mgcv R package; Lezama-Ochoa et al. 2020). Each model included three environmental covariates: sea surface temperature (SST), mixed layer depth (MLD), and bathymetry, which previously have shown to be important drivers of habitat use for juvenile albacore tuna (Muhling et al. 2019, Farchadi et al. 2024). Here, we outline the design of each SDM with further details on model parameterization and structure provided in the Supporting Information.
The first model employed a spatially implicit approach, often referred to as a “habitat-envelope” model (hereafter HE; Brodie et al. 2020), which relies on species-environmental relationships to explain the variation in species distributions without considering the spatial arrangement of habitats and species occurrences. This approach is consistent with many traditional applications of SDMs and has been widely adopted to model highly migratory marine species including albacore tuna (Muhling et al. 2019). To combine the albacore logbook and tag data in this model, we applied data-pooling methods to combine the two datasets into a single aggregated dataset for modeling. These methods have been shown to enhance model performance in similarly designed HE SDMs by mitigating biases inherent in individual datasets, offering a robust framework for applications in species distribution modeling (Braun et al. 2023b).
Building upon the HE model, the second model introduced Gaussian fields (GF) to account for spatial dependence while retaining the data-pooling approach. Unlike the inherent spatially implicit nature of the HE model, this second model (hereafter termed “GF”) used a geostatistical framework that explicitly incorporates spatial information through random spatial fields, which capture unmeasured spatial processes across the study domain (Stock et al. 2020). In INLA, random spatial fields are approximated as discrete Gaussian Markov random fields (GMRFs) using the stochastic partial differential equation approach with a Matérn covariance structure (Lindgren et al. 2011), which serves as a smoother based on the assumption that nearby locations are more similar than distant ones (Krainski et al. 2018). To account for temporally varying spatial autocorrelation in the training dataset, the GF model was fitted with season-specific GMRFs, formulated as a cyclic first-order autoregressive spatiotemporal structure, where neighboring seasons (e.g., winter and fall) being more closely correlated than those farther apart (e.g., winter and summer).
The third model extended the GF model by employing joint-likelihood methods to explicitly integrate the two disparate data sources within a single iSDM framework. This model used two separate linear predictors, with each data source directly informing albacore habitat suitability through shared parameters in a jointly estimated likelihood. Both sub-models incorporated shared effects for three environmental covariates and the random spatial fields (Table 1). However, unlike the data pooled GF model, the iSDM included two sets of random spatial fields: (A) seasonal random spatial fields describing the spatiotemporal autocorrelation for the tag data and shared with the logbook linear predictor (Barber et al. 2021), and (B) a climatological spatial field, informed only by the logbook data, to account for any residual autocorrelation not explained by either the shared seasonal spatial fields or environmental covariates (Simmonds et al. 2020). The decision to share the archival tag-estimated spatial fields across the linear predictors was informed by prior studies that used iSDMs to investigate predator-prey interactions (Barber et al. 2021), akin to fisheries systems where prey spatial dynamics (i.e. albacore tuna) influence predator distributions (i.e. the fishery). With this structure, we assume that the spatial patterns influencing albacore tuna distributions, as indicated by fishery-independent archival tag data, directly correspond to the patterns explaining albacore catch in the fishery-dependent dataset. The second spatial field was modeled without a temporal component (i.e., climatology) as the distribution of U.S. pole-and-line and troll fishery varies minimally during the fishing season (May – November; Figure S2).
Since the data sources only include positive observations of occurrence, models were constructed following dynamic SDM techniques that use simulated data to represent where individuals were likely absent (i.e. “pseudo-absences”) as described in Farchadi et al. (in revision ). These methods have been shown to capture dynamic habitat use at daily temporal scales (Hazen et al. 2018). Briefly, background sampling of pseudo-absences was generated from the monthly spatial extent of each dataset at a 1:1 ratio for the two datasets (Barbet-Massin et al. 2012) as has been shown effective for other pelagic species (Hazen et al. 2021, Braun et al. 2023b). Model outputs from each modeling approach describe albacore “habitat suitability” as continuous values ranging from 0 (low habitat suitability) to 1 (high habitat suitability).