Species distribution models (SDM)
We modelled the distribution of the three main groups retrieved by the genetic analyses (I-III, IV and V) based on major environmental factors. Random Forest classification models (Breiman, 2001) were calculated in R (R Core Team, 2020) using the package randomForest (Liaw & Wiener, 2002). The models were based on 2000 random trees and 1/3 of all variables randomly sampled at each split (Liaw and Wiener, 2002).
Two types of models were calculated: a.) a multiclass model in which all 3 groups (I-III, IV and V) are present and the model decides on the most probable class, and b.) a separate model for each group in which the model computes the probability of presence and absence for each group separately. For type b), a presence/absence matrix was produced for the groups I-IIIRAD, IVRAD and VRAD. To avoid bias towards the absence class (which is most common), each tree was calculated with the same number of absence (randomly chosen) and presence. Predictor layers including the major environmental forces structuring the area (depth, bottom water salinity and temperature, bottom water oxygen, and particulate organic carbon flux (POC)) were downloaded from the Global Marine Environmental Dataset (GMED) (http://gmed.auckland.ac.nz). Probability of occurrence was predicted using the resulting models on a dataset containing 88,822 geographic locations in the study area, after excluding locations situated on land or those lacking values in one or more predictor variables. Significant deviation from random of the observed model errors was calculated with the function MVSF.test from package RFTools (https://github.com/pmartinezarbizu/RFtools).
Results