New deep learning-based methods for visualizing ecosystem properties
using environmental DNA metabarcoding data
Abstract
1. Metabarcoding of environmental DNA (eDNA) has recently improved our
understanding of biodiversity patterns in marine and terrestrial
ecosystems. However, the complexity of these data prevents current
methods to extract and analyze all the relevant ecological information
they contain. Therefore, ecological modeling could greatly benefit from
new methods providing better dimensionality reduction and clustering. 2.
Here we present two new deep learning-based methods that combine
different types of neural networks to ordinate eDNA samples and
visualize ecosystem properties in a two-dimensional space: the first is
based on variational autoencoders (VAEs) and the second on deep metric
learning (DML). The strength of our new methods lies in the combination
of several inputs: the number of sequences found for each molecular
operational taxonomic unit (MOTU), together with the genetic sequence
information of each detected MOTU within an eDNA sample. 3. Using three
different datasets, we show that our methods represent well three
different ecological indicators in a two-dimensional latent space: MOTU
richness per sample, sequence α-diversity per sample, and sequence
ꞵ-diversity between samples. We show that our nonlinear methods are
better at extracting features from eDNA datasets while avoiding the
major biases associated with eDNA. Our methods outperform traditional
dimension reduction methods such as Principal Component Analysis,
t-distributed Stochastic Neighbour Embedding, and Uniform Manifold
Approximation and Projection for dimension reduction. 4. Our results
suggest that neural networks provide a more efficient way of extracting
structure from eDNA metabarcoding data, thereby improving their
ecological interpretation and thus biodiversity monitoring.