Should we exploit opportunistic databases with Joint Species
Distribution Models? Artificial and real data suggest it depends on the
completeness percentage.
Abstract
Anticipating the effects of global change on biodiversity have become a
global challenge that requires new methods. Approaches like Species
Distribution Models have important limitations which have fuelled the
development of Joint Species Distribution Models (JSDM). However, JSDMs
rely on community data from structured surveys. Nonetheless, no
assessment on the suitability of JSDMs to work with unstructured data
from opportunistic databases has been performed. Here we test JSDMs
performance when using opportunistic databases. Using artificial data
that mimic the limitations of such databases by subsampling complete
cooccurrence matrices, we analysed how the completeness of opportunistic
databases affects JSDMs in terms of (a) the role of independent
variables on species occurrence, (b) residual species cooccurrence (as a
proxy of biotic interactions), and (c) species distributions. Moreover,
we illustrate how to evaluate completeness at the pixel level of real
data with a study case of forest tree species in Europe, and evaluated
the role of data completeness in model estimation. Our results with
artificial data demonstrate that decreasing the retention percentage
increase false negatives and negative cooccurrence probabilities,
leading to loss of ecological information. However, JSDMs support
different levels of degradation depending on the aspect of the model
being considered. Models with 50 % of missing data are valid for
estimating species niches and distribution, but interaction matrices
would require more complete databases with at least 75% of data
retention. Furthermore, in most cases JSDMs predict the original data
even better than the data from subsampled matrices, both from testing
and training subsets. All those findings were confirmed in the analysis
with the real study case. We conclude that opportunistic databases are a
valuable data source for JSDMs, but their use require a previous
analysis of data completeness for the target taxa in the study area at
the spatial resolution of interest.