Model Performance and Environmental Novelty Analysis
We evaluated each model’s forecasting ability during a period of multiple MHWs in the NEP by training each model on the full datasets from 1995 - 2013 and testing their performance at monthly time steps on the held-out data from 2014 - 2019 (Figure 2). Model performance was assessed across two different dimensions: predictive skill andecological realism . Predictive skill denotes the model’s ability to accurately classify locations where species were present from those where a species was absent on independent occurrence data. We quantified predictive skill using two metrics: the Area Under the receiver-operating Curve (AUC) and Mean Absolute Error (MAE). Using AUC and MAE has been recommended for assessing SDMs, as they provide complementary insights on model performance while addressing shortcomings inherent in each metric’s assumptions (Konowalik and Nosol 2021). In addition to predictive skill, we evaluated ecological realism , which considers a model’s capacity to estimate biologically plausible species-environment relationships and predict spatiotemporal patterns consistent with observed ecological processes. This was assessed qualitatively by 1) comparing spatial predictions for a forecasted month to the known distribution of albacore during a MHW and 2) analyzing partial response curves to determine whether they aligned with observed environmental covariate distributions during the training (1995–2013) and forecasting (2014–2019) periods.
Beyond assessing overall model performance, we evaluated how effectively each model type handled environmental novelty using Hellinger Distance (Legendre and Legendre 2012, Johnson and Watson 2021, Karp et al. 2023), which measures the difference between two probability distributions (see Karp et al. 2023 for formulas). We calculated Hellinger Distance for both dynamic environmental covariates (SST and MLD) of each month-year in the test data relative to the climatological conditions of the same month in the training data. Hellinger Distance quantifies the extent of extrapolation required by the fitted SDM when making predictions and ranges from 0 to 1, where a value of 0 indicates that the two distributions share the same information (e.g., complete data overlap), while values of >0.5 indicate greater dissimilarity than similarity between the two distributions (Johnson and Watson 2021). Finally, we assessed the impact of environmental novelty on forecast skill by comparing the relationship between AUC and MAE with Hellinger Distance across the different models.