Species Distribution Models
Albacore habitat suitability was modeled in relation to environmental
variables using three distinct binomial generalized additive models
(GAMs), each differing in their methods to combine data and treatment of
spatial dependence (Table 1): a data pooled habitat-envelope model, a
data pooled Gaussian field model, and a joint-likelihood Gaussian field
model. All models used a logit link and were fitted with the Integrated
Nested Laplace Approximation (INLA) framework (Rue et al. 2009), which
offers a computationally efficient alternative to other methods of
Bayesian inference (i.e., Markov chain Monte Carlo) while also enabling
smoothing approaches akin to frequentist GAMs (e.g., as implemented in
the mgcv R package; Lezama-Ochoa et al. 2020). Each model included three
environmental covariates: sea surface temperature (SST), mixed layer
depth (MLD), and bathymetry, which previously have shown to be important
drivers of habitat use for juvenile albacore tuna (Muhling et al. 2019,
Farchadi et al. 2024). Here, we outline the design of each SDM with
further details on model parameterization and structure provided in the
Supporting Information.
The first model employed a spatially implicit approach, often referred
to as a “habitat-envelope” model (hereafter HE; Brodie et al. 2020),
which relies on species-environmental relationships to explain the
variation in species distributions without considering the spatial
arrangement of habitats and species occurrences. This approach is
consistent with many traditional applications of SDMs and has been
widely adopted to model highly migratory marine species including
albacore tuna (Muhling et al. 2019). To combine the albacore logbook and
tag data in this model, we applied data-pooling methods to combine the
two datasets into a single aggregated dataset for modeling. These
methods have been shown to enhance model performance in similarly
designed HE SDMs by mitigating biases inherent in individual datasets,
offering a robust framework for applications in species distribution
modeling (Braun et al. 2023b).
Building upon the HE model, the second model introduced Gaussian fields
(GF) to account for spatial dependence while retaining the data-pooling
approach. Unlike the inherent spatially implicit nature of the HE model,
this second model (hereafter termed “GF”) used a geostatistical
framework that explicitly incorporates spatial information through
random spatial fields, which capture unmeasured spatial processes across
the study domain (Stock et al. 2020). In INLA, random spatial fields are
approximated as discrete Gaussian Markov random fields (GMRFs) using the
stochastic partial differential equation approach with a Matérn
covariance structure (Lindgren et al. 2011), which serves as a smoother
based on the assumption that nearby locations are more similar than
distant ones (Krainski et al. 2018). To account for temporally varying
spatial autocorrelation in the training dataset, the GF model was fitted
with season-specific GMRFs, formulated as a cyclic first-order
autoregressive spatiotemporal structure, where neighboring seasons
(e.g., winter and fall) being more closely correlated than those farther
apart (e.g., winter and summer).
The third model extended the GF model by employing joint-likelihood
methods to explicitly integrate the two disparate data sources within a
single iSDM framework. This model used two separate linear predictors,
with each data source directly informing albacore habitat suitability
through shared parameters in a jointly estimated likelihood. Both
sub-models incorporated shared effects for three environmental
covariates and the random spatial fields (Table 1). However, unlike the
data pooled GF model, the iSDM included two sets of random spatial
fields: (A) seasonal random spatial fields describing the spatiotemporal
autocorrelation for the tag data and shared with the logbook linear
predictor (Barber et al. 2021), and (B) a climatological spatial field,
informed only by the logbook data, to account for any residual
autocorrelation not explained by either the shared seasonal spatial
fields or environmental covariates (Simmonds et al. 2020). The decision
to share the archival tag-estimated spatial fields across the linear
predictors was informed by prior studies that used iSDMs to investigate
predator-prey interactions (Barber et al. 2021), akin to fisheries
systems where prey spatial dynamics (i.e. albacore tuna) influence
predator distributions (i.e. the fishery). With this structure, we
assume that the spatial patterns influencing albacore tuna
distributions, as indicated by fishery-independent archival tag data,
directly correspond to the patterns explaining albacore catch in the
fishery-dependent dataset. The second spatial field was modeled without
a temporal component (i.e., climatology) as the distribution of U.S.
pole-and-line and troll fishery varies minimally during the fishing
season (May – November; Figure S2).
Since the data sources only include positive observations of occurrence,
models were constructed following dynamic SDM techniques that use
simulated data to represent where individuals were likely absent (i.e.
“pseudo-absences”) as described in Farchadi et al. (in
revision ). These methods have been shown to capture dynamic habitat use
at daily temporal scales (Hazen et al. 2018). Briefly, background
sampling of pseudo-absences was generated from the monthly spatial
extent of each dataset at a 1:1 ratio for the two datasets
(Barbet-Massin et al. 2012) as has been shown effective for other
pelagic species (Hazen et al. 2021, Braun et al. 2023b). Model outputs
from each modeling approach describe albacore “habitat suitability” as
continuous values ranging from 0 (low habitat suitability) to 1 (high
habitat suitability).