Species distribution models (SDM)
We modelled the distribution of the three main groups retrieved by the
genetic analyses (I-III, IV and V) based on major environmental factors.
Random Forest classification models (Breiman, 2001) were calculated in R
(R Core Team, 2020) using the package randomForest (Liaw & Wiener,
2002). The models were based on 2000 random trees and 1/3 of all
variables randomly sampled at each split (Liaw and Wiener, 2002).
Two types of models were calculated: a.) a multiclass model in which all
3 groups (I-III, IV and V) are present and the model decides on the most
probable class, and b.) a separate model for each group in which the
model computes the probability of presence and absence for each group
separately. For type b), a presence/absence matrix was produced for the
groups I-IIIRAD, IVRAD and
VRAD. To avoid bias towards the absence class (which is
most common), each tree was calculated with the same number of absence
(randomly chosen) and presence. Predictor layers including the major
environmental forces structuring the area (depth, bottom water salinity
and temperature, bottom water oxygen, and particulate organic carbon
flux (POC)) were downloaded from the Global Marine Environmental Dataset
(GMED)
(http://gmed.auckland.ac.nz).
Probability of occurrence was predicted using the resulting models on a
dataset containing 88,822 geographic locations in the study area, after
excluding locations situated on land or those lacking values in one or
more predictor variables. Significant deviation from random of the
observed model errors was calculated with the function MVSF.test from
package RFTools (https://github.com/pmartinezarbizu/RFtools).
Results