The Power of Integrated Models
Leveraging multiple data types, whether through pooling or explicit
integration such as via joint likelihood approaches, has been shown to
generally improve SDM performance by estimating more precise and
accurate environmental relationships (Fletcher et al. 2019, Paradinas et
al. 2023, Braun et al. 2023b). Although recent research has highlighted
the application of combining various data for SDMs (Bedriñana-Romano et
al. 2018, Rufener et al. 2021, Paradinas et al. 2023, Braun et al.
2023b), few studies have demonstrated their capacity to forecast and
project potential distributional shifts under novel environmental
conditions (Chevalier et al. 2021). Our study suggests that while all
model approaches used here perform well during periods of normal
environmental conditions, joint likelihood approaches that explicitly
account for the biases in each data source (i.e., iSDMs) maintain robust
and ecologically realistic forecasts as environmental conditions become
increasingly novel. We demonstrate that iSDMs effectively mitigate
issues that are broadly attributed to a model’s forecast skill. Our
findings confirm that explicit integration of diverse datasets
represents a promising approach to overcome the potential biases
inherent in a single data source, as it enables harnessing the strength
of various data types to facilitate more accurate inferences about a
species’ distribution (Isaac et al. 2020). The models we tested all
exhibited high predictive skill (average AUC > 0.83, MAE
< 0.25) and strong ecological realism. This can be
particularly beneficial for highly migratory pelagic species, such as
albacore, as using a single data source may only capture a portion of
their range, such as that represented by a fishery, which could lead to
mischaracterizing a species’ realized niche (Paradinas et al. 2023,
Braun et al. 2023b). However, our results also suggest that predictive
skill may be higher for fishery-dependent data compared to
fishery-independent sources, as seen in the deviations observed in early
2016 (Figure 3). This aligns with previous findings (Braun et al. 2023b;
Farchadi et al. in revision ), where models were more effective at
predicting the fishery’s interaction with a species rather than broader
habitat suitability. These differences underscore the need to carefully
consider the representativeness of each data source when interpreting
forecasted distributions.
The improved predictive performance of iSDMs under increasing
environmental novelty may stem from differences in the fitted
species-environmental response curves (Thuiller et al. 2004) and their
ability to account for spatiotemporal variation (Muhling et al. 2019,
Simmonds et al. 2020). Previous studies evaluating SDM forecasting
performance, whether in the near-term (Muhling et al. 2020, Barnes et
al. 2022) or long-term (Thuiller et al. 2004, Karp et al. 2023), have
emphasized that biased or limited species-environmental response curves
can lead to erroneous predictions. This limitation is often an inherent
bias in training data, such as in fishery catch data that only captures
a portion of the species’ preferred habitat conditions due to sampling
bias (e.g., clustering, gear selectivity, limited spatial and/or
temporal coverage), resulting in truncated species-environmental
response curves (Chevalier et al. 2021, Barnes et al. 2022, Paradinas et
al. 2023). Our results indicate that leveraging diverse data types can
help capture the full range of environmental conditions a species
occupies, but the species response curves depend on how the model
framework combines data types. For example, more generalized
species-environmental relationships were estimated for both spatially
explicit models (i.e. GF, iSDM) which performed better than the
spatially implicit model. This is likely due, at least in part, to HE
response curves that exhibited greater overfitting and were heavily
biased towards distributions of the more data-rich vessel logbook
records, particularly for MLD (Figure 5). Notably, the GF and iSDM
response curves for MLD closely matched the known diving behavior of
juvenile albacore tuna, which regularly dive to approximately 100 meters
(Frawley et al. 2024) but are often vertically-limited by colder
temperatures below the mixed layer (Graham and Dickson 1981). In
contrast, the HE model suggested albacore suitability declined with
deeper MLDs, particularly > 10 meters, a pattern that
mirrors the environmental conditions targeted by the pole-and-line and
troll fisheries along the U.S. West Coast (Figure S1). This demonstrates
that the inclusion of GMRFs in the spatially explicit models helped
account for unmeasured variation in albacore distribution. By modeling
the spatial structure separately, these models provided more reliable
estimates of environmental relationships, reducing the risk of response
curves being artifacts of sampling biases in the fishery data.
Our results also highlight how approaches to spatial dependence and
combining disparate data sources can influence an SDM’s capacity to
accurately forecast species distributions under novel environmental
conditions. Consistent with previous studies, we found that habitat
envelope models produce narrower response curves than spatially explicit
frameworks, likely due to their inability to capture residual
variability (Thorson 2018, Simmonds et al. 2020). Consequently, tightly
fit response curves may fail to account for non-stationary
species-environment relationships under novel conditions. In contrast,
the broader, more generalized response curves generated by iSDMs better
capture these dynamics over time (Yates et al. 2018, Muhling et al.
2020; Figure 5). Additionally, the strong performance of spatially
explicit models may stem from their ability to incorporate variation
across multiple temporal scales. Consistent with prior findings, our
analysis suggests that including GMRFs—analogous to seasonal or
climatological covariates—enhances forecast skill, particularly in the
near term (Barnes et al. 2022). Furthermore, differences between the two
spatially explicit models, GF and iSDM, highlight the influence of data
integration methods. While the GF model pools data sources, potentially
masking differences in sampling design (Fletcher et al. 2019), iSDMs
estimate data-specific spatial fields, allowing for improved handling of
spatiotemporal variation and biases while also balancing
disproportionate sample sizes). This, in turn, can lead to more accurate
representation of the underlying ecology of the species. Given the
challenges of identifying and addressing bias in different data sources,
ongoing evaluation of integration methods remains essential for
optimizing predictive performance in species distribution modeling.