Margaux Perhirin1*, Hannah Gossner2, Jessica Godfrey2, Rodney Johnson2, Leocadio Blanco-Bercial2, Sakina-Dorothée Ayata1
1 Sorbonne Université, UMR 7159 CNRS-IRD-MNHN, LOCEAN-IPSL, Paris, France
2 Bermuda Institute of Ocean Sciences, Arizona State University, St. Georges, Bermuda
* corresponding author:
Abstract (250 words)
Mesozooplankton is a very diverse group of small animals ranging in size from 0.2 to 20 mm not able to swim against ocean currents. It is a key component of pelagic ecosystems through its roles in the trophic networks and the biological carbon pump. Traditionally studied through microscopes, recent methods have been however developed to rapidly acquire large amounts of data (morphological, molecular) at the individual scale, making it possible to study mesozooplankton using a trait-based approach. Here, combining quantitative imaging with metabarcoding time-series data obtained in the Sargasso Sea at the Bermuda Atlantic Time-series Study (BATS) site, we showed that organisms’ transparency might be an important trait to also consider regarding mesozooplankton impact on carbon export, contrary to the common assumption that just size is the master trait directing most mesozooplankton-linked processes. Three distinct communities were defined based on taxonomic composition, and succeeded one another throughout the study period, with changing levels of transparency among the community. A co-occurrences’ network was built from metabarcoding data revealing six groups of taxa. These were related to changes in the functioning of the ecosystem and/or in the community’s morphology. The importance of Diel Vertical Migration at BATS was confirmed by the existence of a group made of taxa known to be strong migrators. Finally, we assessed if metabarcoding can provide a quantitative approach to biomass and/or abundance of certain taxa. Knowing more about mesozooplankton diversity and its impact on ecosystem functioning would allow to better represent them in biogeochemical models.
Key words (4 to 6)
Transparency, carbon export, metabarcoding, quantitative imaging, trait-based approaches, zooplankton
Introduction
Mesozooplankton is a group comprised of small metazoans of 0.2 to 20 mm in size drifting with ocean currents. They constitute a critical link between primary producers and higher trophic levels and play a major role in the biological carbon pump. They contribute directly to Particulate Organic Carbon (POC) export through the production of sinking faecal pellets, moults and carcasses (Lomas et al., 2010; Steinberg & Landry, 2017). Species in this group can also repackage particles into larger and faster-sinking faecal pellets or can fragment sinking POC, particularly by sloppy feeding, into smaller and slower-sinking particles. Additionally, Diel Vertical Migrations (DVM) realised by some mesozooplankton species between the surface and the deep ocean increase the export of dissolved carbon and the remineralisation of nutrients at depth (Kelly et al., 2019; Tarrant et al., 2021). Hence, mesozooplankton is a highly diverse group in terms of species, ecological strategies, and size, among others (Litchman et al., 2013). Its biomass and community composition can impact both the productivity and the biogeochemistry of the environment (Schnetzer & Steinberg, 2002). Better understanding of how variations in mesozooplankton diversity impact ecosystem functionalities will be a fundamental step to improve the general knowledge of ocean ecosystems. This would also contribute to improving the representation of mesozooplankton diversity in biogeochemical models since this key component of the marine ecosystem is usually represented by only one variable (Quéré et al., 2005).
The Sargasso Sea has been identified as one of the highest zooplanktonic diversity locations in the world (Rombouts et al., 2010). The Bermuda Atlantic Time-series Study (BATS, Lomas et al., 2013) site is located in the vicinity of Bermuda Islands (31°40‘N, 64°10‘W) in the North Atlantic subtropical oligotrophic gyre and has been studied since 1988. Long-term sampling and research have shown that environmental variables and zooplanktonic biomass follow a classical seasonal cycle. The phytoplankton bloom, stimulated by optimal nutrients and light conditions, is followed by a peak in zooplankton biomass, usually in March or April. In summer, thermal stratification is strong and leads to nutrient depleted upper waters and thus, a decrease in zooplankton biomass (Deevey, 1971; Madin, 2001). Secondary peaks can occur in late summer or early fall, when the mixed layer deepens again. Zooplankton biomass is minimal during winter (Ivory et al., 2019) when nutrient flux fertilises the surface layer thanks to deep mixing and mode water formation (Steinberg et al., 2001). Distinct taxa assemblages, based exclusively on the community composition, are associated with each of the four seasonal hydrographic periods (Blanco-Bercial, 2020).
Historically, zooplankton have been investigated by collecting samples using plankton nets and identifying specimens at the species (or genus) level under a microscope. However, this technique of visual identification is time consuming and complex, partly due to the presence of cryptic species, species with multiple life stages or with distinct sexual dimorphism (Huo et al., 2020; Rey et al., 2020). Moreover, taxonomic identification can be strongly biased by expert’s specialisation (Harvey et al., 2017) and more generally, biased towards the largest and most known plankton species. Molecular studies have revealed that plankton knowledge is biased towards sampled or cultivable species, representing less than 30% of the total estimated plankton diversity, leading to a ”Mare Incognitum” (Chust et al., 2017).
Next-generation sequencing technologies, such as DNA metabarcoding, could therefore be an alternative to overcome these microscopy-based monitoring limitations (Abad et al., 2016). DNA metabarcoding allows for the identification of taxa by identifying specific genes (Hebert et al., 2003) from a bulk sample containing DNA from dozens of taxa. This inexpensive and rapid method can detect rare and cryptic species (Huo et al., 2020), and be used to explore the diversity and structure of marine eukaryotic communities (Chen et al., 2021) as well as their ecosystem dynamics (Matthews et al., 2021). However, it is not free of several technical biases: barcode and primers selection, DNA amplification’s efficiency, bioinformatics pipelines’ parameters choice, or reference database quality are known sources of biases (Harvey et al., 2017). This method produces compositional data, meaning that values inferred from counts of reads for any taxon will be influenced by the ones from other taxa (MacNeil et al., 2022; Brisbin et al., 2020). Hence, the quantitative property of these data is still controversial and hard to evaluate (Ershova et al., 2021), necessitating verification by imaging methods (Harvey et al., 2017).
Over the past few decades, imaging systems (e.g., Underwater Vision Profiler, Picheral et al., 2010; In Situ Ichthyoplankton Imaging System, Cowen and Guigand, 2008 or Zooscan, Gorsky et al., 2010) have been developed, along with machine learning algorithms, to acquire and analyze large numbers of individual organisms’ images in a quantitative way (Irisson et al., 2022). These systems are less or non (if in situ ) invasive compared to classical sampling methods (Martini et al., 2021), allowing for relatively rapid acquisition of a large quantity of data, and offering new opportunities to study some previously neglected organisms (Biard et al., 2016).
These images give consistent rich morphological information of individual organisms, making it possible to use a trait-based approach and thus a more ecosystem-focused point of view than a species or taxonomic one (Martini et al., 2021). Trait-based approaches refer to functional traits, i.e., any individual’s characteristic (morphological, physiological) impacting its fitness (Violle et al., 2007), and are based on the assumption that the fitness of an individual is based on its success regarding feeding, reproduction and survival which would depend on a few key traits (Brun et al., 2017). Size is easily measured, and due to its influence on many physiological and ecological aspects of an organism’s life, it has been defined, notably in zooplankton, as a “master trait” (Barton et al., 2013; Litchman et al., 2013). For example, larger organisms will produce larger faecal pellets (Uye & Kaname, 1994) and/or have a stronger migration behaviour, hence playing an important role in export flux out of the euphotic zone (Stamieszkin et al., 2015; Steinberg et al., 2001). Organisms’ transparency, although an important zooplankton morphological characteristic (Orenstein et al., 2022), is however understudied (Johnsen & Widder, 1998). Quantifying transparency might give information about the organism’s ecology (reproduction with gonads or eggs, feeding with guts content, Orenstein et al., 2022). Transparency and colour can also be related to physiological functions (Vilgrain et al., 2022), e.g., carotenoids effects against oxidative stress (Brüsin et al., 2016), or possibly the utilization of pigmentation and DVM in order to escape visual predators (Hays et al., 1994).
While both metabarcoding and imaging techniques can provide information on the temporal and spatial distribution of zooplankton communities and their environmental dynamics (Vilgrain et al., 2021), imaging data is also able to quantify taxa’s abundance and biomass (e.g.,Monferrer et al., 2022). Hence, it would be ideal to combine high throughput image acquisition methods with metabarcoding to improve zooplanktonic community studies (Bucklin et al., 2019). Each of these approaches has uncertainties (Monferrer et al., 2022) but when combined, many resolution issues are resolved. For example, metabarcoding fills in some of the limited size spectrum resolution of imaging techniques (MacNeil et al., 2022) and allows a joint study of zooplankton communities taxonomic and functional aspects. Previous studies have tried to assess the quantitative property of metabarcoding by comparing counts of sequence reads to (relative) biomass (e.g., Djurhuus et al., 2018; Durkin et al., 2022 on particles) or to (relative) abundance (e.g., Ibarbalz et al., 2019; Rey et al., 2020) for various zooplankton taxonomic groups but results from each approach seldom agree (Harvey et al., 2017), requiring further studies.
This study, therefore, seeks to use the complimentary methods of metabarcoding and high throughput imaging in order to link variations in mesozooplankton community composition throughout the year to morphological changes in the highly diverse ecosystem of the Sargasso Sea. Additionally, we evaluate how modifications in community composition will impact the ecosystem’s export intensity. Finally, utilising the potential of our “twin” morphological and molecular datasets, we aim to test the correlations between abundance and biomass, and relative read counts, for wide taxonomic categories.
Material & Methods
Environmental data acquisition
Data were collected at BATS site (31°40′N; 64°10′W) on a monthly basis, from March 2016 to May 2017, except in March 2017 due to rough weather. Environmental samples were taken during the day, except in February-2017 when they were taken at night. Environmental data were collected vertically using a SeaBird 911 CTD rosette equipped with temperature, conductivity, fluorescence, oxygen and photosynthetic active radiation (PAR) sensors. Density (σθ) was calculated from temperature and conductivity. Depth of the Deep Chlorophyll Maximum (DCM, in metre) was defined as the maximum of the fluorescence profile. Mixed Layer Depth (MLD, in metre) was calculated as the depth for which σθ is larger or equal to σθsurface + 0.125 kg m-3 (Levitus, 1982). Sediment traps were allowed to drift for 72 hours at 200 m (Steinberg et al., 2001). The total organic carbon (TOC) measured at 200 m (in mg m-2 d-1) was used as a proxy of carbon export to the mesopelagic layer. TOC data were averaged from three replicates and blank corrected, following BATS protocol (Steinberg et al., 2001).
Zooplankton data acquisition
Zooplankton sampling was conducted by employing a 1 m² rectangular net with 202 µm mesh, towed at a speed of about 1 knot, paying out 250 m of wire at 15 m per minute both ways (maximum depth sampled: 168.68 m ± 37.60 m, Supp. Table I). Three replicate day and night tows were made for each cruise, except in October 2016 for which the day sample for molecular analysis was not obtained due to rough weather. Samples from two tows were preserved in formaldehyde at 4% for Zooscan analyses (see following sections). Zooplankton from the last net was fixed in 95% undenatured ethanol then stored at -20°C until returning to BIOS (1 to 3 days later) for molecular analyses. Ethanol was changed right after arrival to BIOS.
Zooscan process and analyses
Zooscan process
A fraction of each half-split sample from two net replicates (obtained with a Motoda splitter; Motoda, 1959) was scanned using the Zooscan system (Gorsky et al., 2010). All individuals larger than 2 cm were selected by eye and scanned separately. Remaining samples were sieved through a 1000 µm sieve and at least 1500 particles from two size classes (202-1000 µm and >1000 µm) were scanned. Morphological variables (as in Vilgrain et al., 2021, Supp. Table II) were calculated for each individual image of zooplankton organism using the ZooProcess software (Gorsky et al., 2010). Individual images, their metadata, and their morphological descriptors were stored in the EcoTaxa web application ( Picheral et al., 2017); project ‘BATS_timeseries [4236]’). Automatic taxonomic classification was done on these images at a relatively high level using a random forest algorithm (Irisson et al., 2022; 42 categories, Supp. Table III). Taxonomic identification of each individual image was then validated by a human expert. In total, the dataset gathered 133,389 images. Only the images from 29 taxonomic categories (in bold in Supp. Table III) and with a major axis length higher than 500 µm were studied, to focus on mesozooplankton. Moreover, Actinopterygii images with a major axis length longer than 2 cm were removed to consider only larvae that cannot swim against the flow, leading to a total of 69,949 mesozooplankton images. “Oikopleura” category was renamed Larvacea to take into account all the Appendicularia’s orders but not be mistaken with the genus name Appendicularia . For each image, a density factor \(d\)was computed to account for the water volume sampled as follows:
\(d\ =\ \frac{\text{Motoda\ fraction\ scanned}}{\text{Total\ volume\ sampled}}\).
Biovolume and dry weight computations from images
Area, major and minor axes from Ecotaxa were converted from pixels to millimetres (1 pixel = 0.0053 mm). The biovolume was computed for all the 69,949 mesozooplankton images using the Equivalent Spherical Diameter (ESD).
\begin{equation} ESD=2\times\sqrt[]{\frac{\text{area}}{\pi}}\nonumber \\ \end{equation}\begin{equation} Biovolume\ =\ \frac{4}{3}\times\pi\times({\frac{\text{ESD}}{2})}^{3}\nonumber \\ \end{equation}
This computation was chosen to prevent overestimation of the individual’s biovolume, due to unfolded copepods’ antennae for example (as done in Monferrer et al., 2022). Then, each individual biovolume was multiplied by its density factor \(d\) to account for the water volume sampled, and a taxonomic-specific factor \(t\) to get the individual dry weight from individual biovolume value. These taxonomic-specific values were obtained and adapted from Maas et al. (2021) who defined linear relationships between biomass and biovolume for various taxa present at BATS (in bold, Supp. Table III).
\begin{equation} Biomass\ =\ Biovolume\ \times\ d\ \times\ t\nonumber \\ \end{equation}
Relative biomasses and abundances
For comparison with relative read counts from metabarcoding (see below), normalised biomasses were transformed into relative biomasses for each taxonomic category \(i\), going from 1 to \(a\ =\ \)29, and per sample\(j\), the sample rank going from 1 to \(b\ =\ \)27, as follows:
\begin{equation} \text{Relative\ Biomass}_{\text{ta}x_{\text{i\ j}}}\ =\ \frac{\sum_{i\ =\ 1}^{i\ =\ a}{\text{Biomas}s_{\text{ta}x_{\text{i\ j}}}}}{\sum_{j\ =\ 1}^{j\ =\ b}{\sum_{i\ =\ 1}^{i\ =\ a}\text{Biomass}_{\text{ta}x_{\text{i\ j}}}}\ }\nonumber \\ \end{equation}
For the same purpose, relative abundances for each taxonomic category were computed per sample, using the same formula with\(Abundance\ =\#ind\ \times d\ \) instead of Biomass. Data are presented in Supp. Table IV for relative biomasses, expressed as a biomass per volume of water, normalised by the volume of water sampled (mg m-3); and in Supp. Table V for relative abundances (ind m-3).
Construction of a multivariate morphological space from Zooscan images
To summarise the morphological characteristics of each organism in the mesozooplankton community and identify the main criteria of morphological variations, a morphological space was defined as in Vilgrain et al. (2021) by performing a weighted PCA (Legendre & Legendre, 2012) on a selection of 18 morphological variables (Supp. Table II) for the 69,949 mesozooplankton images, weighted by their density factor \(d\) to give more weight to the most abundant organisms. The 18 morphological descriptors could be divided into four main categories, describing the shape, the size, the transparency, and the complexity of the organism itself, at the individual scale. The weighted PCA then allows to define a reduced morphological space (i.e. , a morphological space with reduced dimensions) in which each image is located depending on its main morphological characteristics. By doing so, we can identify a few significant axes that summarise the main morphological traits that vary among images. 0.5% of the extreme variables were transformed into NAs to avoid being influenced by outlying points. They were all then normalised by the Yeo-Johnson transformation to satisfy the conditions for PCA application. Axes were considered significant only if their associated eigenvalue was greater than the average of all the eigenvalues (Kaiser-Guttman criterion; Cattell, 1966). The position of each mesozooplankton image in the PCA space is thus a function of its morphology: this PCA space will then be named morphospace. To ease the visualisation of the morphospace, example images were mapped to their position in the morphological space as follows: for each pair of principal components (i.e., PC1 vs PC2 or PC3 vs PC4) a 15-by-15 grid was defined and five images were randomly chosen and superposed at each node of the grid, to represent a morphotype for each node. Individuals, meaning each of the 69,949 images, can also be plotted with dots in the morphospace. Supplementary variables were added to this PCA and represented in the morphological space as follows. Carbon export at 200 metres was added as a quantitative variable and represented by arrows that pointed towards the morphospace area positively correlated to their variations. Pearson’s correlation coefficients and their associated p-values were computed based on the coordinates of each individual in the morphospace and the carbon export value obtained during the matching sample. The three molecular-based clusters (see below), the sample time (Day or Night) and taxonomic categories were also added as qualitative supplementary variables.
Linear relationship between size and transparency traits and carbon export
Pearson’s correlation coefficients were computed between carbon export (ln-transformed to meet the normality assumption), opacity, and size variables. Each individual’s trait value was multiplied by its density factor to account for the water volume sampled. Data were averaged by sample dates (no Day/Night distinction) since there was only one carbon export measurement per cruise. For significant correlations (p < 0.05), linear relationships were represented in the graphs.
Metabarcoding process and analyses
DNA extraction and 18S amplification
Half of each ethanol-preserved sample (obtained with a Motoda splitter; Motoda, 1959) was used for DNA extraction. The other half was kept as archival material at BIOS. Since we focused the study on mesozooplankton, macroplankton (> 2 cm) were removed from the sample (mostly fish larvae). Samples were filtered through a 53 μm mesh, and gently washed with milliQ water to remove as much ethanol as possible. They were then transferred to a 50 mL falcon tube and centrifuged at 3,500 g for 10 minutes to remove as much water and ethanol as possible. The pellet is then weighed and transferred to a new 50 mL falcon tube and 15 or 25 ml SDS buffer (10 mM Tris-HCl, 100 mM EDTA pH 8.0, 200 mM NaCl, 1% SDS) depending on the total mass of the pellet. Samples were then vortexed to ensure rapid mixing of the plankton with the buffer and thus degradation of the DNA. They were then ground using a Fisher Scientific FSH125 homogenizer for 5 minutes, at 1/3 of maximum speed to avoid excessive DNA fragmentation and rapid heating of the sample. After the grinding step, the homogenizer was rinsed with 10 mL SDS buffer (or 15 mL depending on sample mass) to minimise sample loss. Between samples, the homogenizer is washed and sterilised. Proteinase K (Sigma-Aldrich; St. Louis, MO, USA) was added to the buffer (0.2 mg mL-1 final concentration), with 5 mL sterilised stainless steel Shot, 3/32′′ Ellipse (Rio Grande, CA, USA). Samples were then placed in an oven at 60°C for 4 h, vortexing at maximum speed for 30 s every 30 min. Tubes were centrifuged at 3,500 g for 15 min and three 400 μL replicates of supernatant were transferred to 1.5 mL Eppendorf tubes. DNA was extracted using the E.Z.N.A. Mollusc DNA Kit (Omega Bio-Tek, Norcross, GA, USA), with some modifications explained in Blanco-Bercial (2020). Quality control of extracted DNA was performed by fluorimetry using a NanoDropTM One instrument. Extracted DNA was stored at -20°C until amplification.
Amplification of the V1V2 region of the 18S (about 350 bp) was done following Fonseca et al. (2010). Samples were then sent to the Genomics Research Center of Rochester University (NY, USA) for sequencing. Sequencing was performed in an Illumina MiSeq using the MiSeq Reagent Kit v3 (600-cycles; 2 × 300) V3 chemistry. To test for replicability within the procedure, two samples were submitted using different indexes in separate batches.
Bioinformatic pipeline
Demultiplexed samples were analysed in MOTHUR ver. 1.39.5. (Schloss et al., 2009). The entire annotated pipeline is accessible at the following link: . Unique sequences were aligned against a V1V2 region of the SILVA 138 database (Quast et al., 2013). Sequences were trimmed to the length of the V1V2 region, and only complete ones (starting at the first base and ending at the last base of the alignment) were kept in order to avoid bias due to unfinished amplifications. After gap removal, chimaeras were removed using UCHIME (Edgar et al., 2011) as implemented in MOTHUR, and unique sequences, called thereafter Amplicon Sequence Variant (ASV) were obtained after denoising using UNOISE2 (Edgar, 2016) as implemented in MOTHUR (diffs = 1). ASVs taxonomy was retrieved from a database developed from the complete SILVA 138 release database for SSU, modified to define clade levels (kingdom, phylum class, order, family, genus and species) to fixed columns in the taxonomy file (Bucklin et al., 2021). Using this classification and to focus only on mesozooplankton, only metazoans were kept for the downstream analyses (44,441 ASVs removed).
Metabarcoding standardised analyses
ASVs belonging to Vertebrata taxa were eliminated, meaning that fish ASVs were removed to not be affected by cells from fish larvae able to swim against currents and thus not belonging to mesozooplankton (17,479 ASVs removed). Moreover, ASVs with less than 2 reads were removed. All samples were standardised randomly to a minimum number of reads,i.e. , 92,150 reads. Percentages of relative abundance in the entire sampling period were computed per ASV, as follows, with \(j\) the sample rank going from 1 to \(b\ =\ \)27 and \(k\) the ASV rank going from 1 to \(c=\ \)92,150.
\begin{equation} {\%RA}_{\text{ASV}_{\text{TOTk}}}\ =\ \frac{\sum_{j\ =\ 1}^{j\ =\ b}\text{reads}_{\text{ASV}_{\text{jk}}}}{\sum_{j\ =\ 1}^{j\ =\ b}{\sum_{k\ =\ 1}^{k\ =\ c}\text{reads}_{\text{ASV}_{\text{jk}}}}\ }\times 100\nonumber \\ \end{equation}
A minimum percentage of relative abundance threshold of 0.05% was then applied for all the following analyses. Indeed, metabarcoding reads are a viable proxy of taxon relative abundances for the major taxonomic groups, but it should be interpreted cautiously for the less abundant ones (Matthews et al., 2021). Hence, only 225 ASVs are considered in this study out of 22,814 ASVs. Finally, relative abundances (in percentage) estimated per ASV and per sample were computed as follows, with \(j\) the sample rank going from 1 to \(b\ =\ \)27 and \(k\) the ASV rank going from 1 to \(c=\ \)225.
\begin{equation} {\%RA}_{\text{ASV}_{\text{jk}}}\ =\ \frac{\text{reads}_{\text{ASV}_{\text{jk}}}}{\sum_{j\ =\ 1}^{j\ =\ b}{\sum_{k\ =\ 1}^{k\ =\ c}\text{reads}_{\text{ASV}_{\text{jk}}}}\ }\times 100\nonumber \\ \end{equation}
These were then Hellinger-transformed (square-root transformation on relative abundances) to reduce the weight of very abundant ASVs (Legendre & Gallagher, 2001).
ASVs relative counts of read per Zooscan categories and correlation with abundance and biomass
For direct comparison with image analyses, 22,814 ASVs were grouped into the taxonomic levels used in Zooscan/EcoTaxa systems. For each sample, the relative number of reads per coarse taxonomic category (in bold, Supp. Table III) was computed. Pearson’s correlation coefficients were then computed between relative counts of read and relative normalised abundances and biomasses from Zooscan data.
Succession of ASVs assemblies throughout the sampling period
To explore the differences in mesozooplankton community throughout the sampling period, a Principal COordinates Analysis (PCoA, Legendre and Legendre, 2012) based on Bray-Curtis dissimilarity index (Bray & Curtis, 1957) was performed. Sample dates were clustered by hierarchical classification carried out on their coordinates along the first seven axes of the PCoA to explain more than 70% of the variance with at least 4% in each axis (72.59% of variance explained), using a Euclidean distance and a synoptic Ward linkage (Murtagh & Legendre, 2014). A threshold (-0.55, Figure 3) was set to minimise the total within-cluster variations, defining three meaningful groups of stations with homogeneous values for the variables used. The significance of the PCoA clustering pattern was tested using an ANalysis Of SIMilarity (ANOSIM, 1000 permutations, Clarke, 1993) and permutational multivariate analysis of variance (ADONIS, 1000 permutations, Anderson, 2001) Holm corrected (Holm, 1979).
Co-occurrences network and “modules” identification
A network of co-occurring ASVs was built on the relative Hellinger-transformed abundances. Nodes represented ASVs and edges were the positive correlations between those, computed with Spearman’s correlation coefficient (rho, Spearman, 1904). Cut-offs of rho and p-values were respectively 0.6 and 0.01 to consider the most statistically robust correlations, meaning only links between pairs of ASVs that often appeared at the same time in the samples and reducing the risks to consider by-chance co-occurrences (Barberán et al., 2012). Modularity estimates the tendency of a network to subdivide in densely connected modules (or clusters) (Bellisario et al., 2019). Networks with high modularity have dense connections between the nodes within modules but limited connections between nodes in different modules. If modularity is higher than 0.40, then we consider the network can be divided into modules, i.e. , groups of ASVs there are more densely connected between each other than with the rest of the network (Newman, 2006). The Leading Eigenvector community detection algorithm was used (Newman, 2006) to identify the modules. In a few words, the network was splitted into two parts that would be kept if the separation leads to a significant increase in the modularity, and each part would be divided again into two parts at the next step of the algorithm. If the separation did not lead to a significant increase in the modularity, the network was splitted into two other parts, and the process started again. The algorithm stopped when no further separation led to a significant increase in the modularity.
Identification of taxa composing the modules
A more precise and trustable taxonomy of ASVs belonging to modules was assigned via BLASTN from the NCBI website (, Altschul et al., 1990) followed by Batch Entrez analyses to retrieve species identities from Accession Hit numbers (). Then, obtained files were formatted in R. Taxonomy was obtained from LifeWatch Belgium services (). Unique best hits, meaning that an ASV matches with a unique sequence from the NCBI database at 100% of similitude, were firstly chosen. When multiple species had equal identity (and when one was not a clear misidentification), the lowest common taxonomic rank was kept,e.g . if an ASV matches at 95% of similitude with two sequences in NCBI database belonging to Candacia truncata andCandacia elongata , then it was characterised as Candaciasp. The complete taxonomy of the 225 most abundant ASVs analysed in this study can be found in Supplementary Table VII.
For each module \(m\) (between 1 and \(e\ =\ \)6), their relative abundances were computed per sample as follows, with \(j\) the sample rank going from 1 to \(b\ =\ \)27.
\begin{equation} {\%RA}_{\text{Mod}_{\text{jm}}}\ =\ \frac{\sum_{m\ =\ 1}^{m\ =\ e}\text{reads}_{\text{ASV}_{\text{jm}}}}{\sum_{j\ =\ 1}^{j\ =\ b}{\ \sum_{m\ =\ 1}^{m\ =\ e}\text{reads}_{\text{ASV}_{\text{jm}}}}}\times 100\nonumber \\ \end{equation}
To test the Day/Night effect on the modules’ composition, a Wilcoxon-Mann-Whitney test (Mann and Whitney, 1947) was conducted on the percentages of relative abundance per sample, for all the modules. Finally, a Kruskal-Wallis analysis (Kruskal and Wallis, 1952) helped to test the impact of the three clusters obtained from the PCoA on the modules’ percentages of relative abundance, followed by Dunn pairwise test (Dunn, 1961) with Holm p-values correction for multiple comparisons to identify which cluster affected the most the modules’ percentages of relatives abundance. In both cases, non-parametric statistical tests were realised because Shapiro (Shapiro and Wilk, 1965) and Levene’s (Brown and Forsythe, 1974) tests showed that not all modules’ percentages of relative abundance were normally distributed with similar variances among groups. Moreover, Pearson’s correlation coefficients were computed between percentages of relative abundance per module and per sample, and their coordinates along each morphological dimension per sample to have information about the overall morphology of the module; and carbon export per cruise to see if some modules were related to higher or lower carbon export.
Numerical tools
All statistical analyses were conducted in the programming environment R 4.1.2 (R Core Team 2021). The package tidyverse (Wickham et al., 2019) was used for data manipulations, lubridate to deal with temporal variables (David James and Kurt Hornik, 2020), vegan (Oksanen et al., 2009) and FactoMineR (Lê et al., 2008) for multivariate analysis, especially the function dimdesc was used to compute correlations between carbon export and morphospace axes. pairwiseAdonis () was used for permutational multivariate analyses of variance, and igraph (Gabor Csardi & Tamas Nepusz, 2006) for network and modularity analyses. rstatix (Kassambara, 2021), NbClust (Charrad et al., 2014) and Hmisc (Harrell Jr & Dupon, 2022) were used for statistical analyses. The packages RColorBrewer (Neuwirth, 2022), ggplot2 (Wickham et al., 2021), ggpubr (Kassambara, 2020), ggrepel (Slowikowski et al., 2021) and cowplot (Wilke, 2020) were used to produce general graphics, morphr (https://github.com/jiho/morphr ) was used to represent individual images in the morphological space.
Results
Seasonal dynamics of the BATS ecosystem
Temporal changes in environmental variables followed a classical seasonal cycle. As expected, temperatures reflected a deep winter mixing, from December 2016 to April 2017. Summer stratification was strong and deep (about 40 m) from July 2016 to September 2016 (Supp. Figure 1). MLD increased in autumn and allowed the re-injection of nutrients in the surface layer, creating ideal conditions for an autumnal bloom. Fluorescence was low in surface waters except during the deep mixing in January 2017 and April 2017 (Supp. Figure 1). DCM was around 100 metres. Carbon export was the highest in April 2017 and in May 2016 and nearly absent in September 2016 (0.75 mgC m-2 day-1). Otherwise, it fluctuated between 6 mgC m-2 day-1 and 25 mgC m-2 day-1.
Carbon export is more transparency-dependant than size-dependant
The four first axes of the PCA based on the morphological variables were significant. They explained about 87.68% of the morphological variance observed in mesozooplankton samples (Figure 1). PC1 (42.87%) was mostly influenced by size and shape related variables (e.g., Perimeter, Circularity, FerretDiameter, or Symmetry) with smaller and more circular organisms for PC1 < 0 and larger and more elongated ones for PC1 > 0. PC2 (23.89%) characterised the grey level of mesozooplankton and their size, with PC2 < 0 corresponding to larger, less transparent organisms and PC2 > 0 corresponding to smaller, more transparent ones. PC3 (14.53%) referred to the elongation of the organisms and how their appendages were positioned when organisms were scanned. Negative PC3 < 0 corresponded to organisms with visible appendages while PC3 > 0 represented elongated organisms with a well-defined body (folded or no appendages). Finally, PC4 (6.59%) described the heterogeneity of grey colour within each organism, with organisms with more heterogeneous grey colour for PC4 < 0 and organisms with homogeneous grey colour for PC4 > 0. Morphology of mesozooplankton was quite similar between Day and Night samples, except along PC4 (homogeneity in grey colour) with organisms presenting less homogeneous grey levels during the Day (Figure 1).
Correlations between carbon export and principal components were significant for PC1, PC2, and PC3 and reached -0.04, -0.30 and +0.08, respectively (p < 0.01, Figure 1). Thus, according to the visible signal in morphological space (Figure 1), confirmed by the correlations just mentioned, carbon export seems to be more strongly linked to PC2, the transparency of organisms, than to the other axes. To validate this signal, linear relationships were computed between carbon export values and size-related morphological descriptors (to test the assumption that size is a master-trait), and also transparency-related morphological descriptors (correlated to PC2). The results are visible in Figure 2, no significant linear relationships (p-values > 0.05) were found between carbon export and size-related morphological descriptors. However, significant linear relationships (p-values < 0.05) were found between morphological descriptors related to transparency (correlated to PC2) and carbon export. From these linear relationships, it is possible to infer the relationships between imaged organism’s transparency and carbon export rates. First, lower organisms’ transparency, signified by low values of the three grey intensity variables (Figure 2. F, G, H) was associated with higher carbon export. Second, carbon export was higher when StdDevGrey values were high (Figure 2. I), i.e. , when mesozooplankton individuals presented larger ranges of grey level values. Third, carbon export was higher when SkewnessGrey values were higher (Figure 2. J). This descriptor indicates how skewed the normal distribution of the grey level values is per image. Therefore, a more normal distribution of grey values is associated with higher carbon export.
Three communities with different levels of transparency succeed one another at BATS
A PCoA was conducted to analyse the temporal evolution of the taxonomic composition from metabarcoding data (Figure 3). The first PCoA axis explained about 22.99% of the variance in the taxonomic composition, the 2nd one explained 14.84%. Hierarchical clustering from the first seven PCoA axes (72% of variance explained) coordinates revealed that three communities could be distinguished from their ASV composition (Figure 3). Molecular-based cluster 1 corresponds to all the samplings done between June 2016 and October 2016. The second contains day and night samplings from May 2016 and April 2017. Molecular-based cluster 3 is composed of all the other samplings. Based on their timeline, these groups represent communities present during the summer stratification period (cluster #1), the zooplankton bloom period (cluster #2) and the deep winter mixing season, except samplings done in April 2016 and May 2017 (cluster #3). These patterns were confirmed by the identification of ASVs specific to each of the molecular-based clusters whose identity and functional traits match with environmental conditions (Supp. Analysis I). Clustering pattern is significant (p = 0.001) and communities are all significantly different from each other (Holm p-adjusted = 0.003 between all pairs of clusters).
Fitting the three molecular-based communities in the morphospace, it appears that they were quite similar along PC1 (size and circularity) (Figure 4). However, individuals belonging to the bloom community tended to be less transparent (lowest coordinate along PC2 axis) than winter mixing cluster ones. Individuals from the molecular-based summer cluster were the most transparent ones (lowest coordinate along PC2 axis). This signal of transparency changes, along PC2, matches for carbon export intensity, as explained in the previous paragraph. A similar trend is observed along PC3 (Supp. Figure II) with taxa from the bloom community tending to be more elongated with fewer or no visible appendages. On the opposite end, taxa from the summer stratification community were less elongated with more visible appendages. The winter mixing community was between both. This succession of communities along PC2 and PC3 also matches for carbon export intensity.
Metabarcoding read counts and abundance and biomass values from Zooscan are positively correlated, for some taxonomic groups
Correlations between relative number of reads and relative abundance or relative biomass derived from Zooscan images were computed for each taxonomic category (Table 1). Cyclopoida, Doliolida, Harpacticoida and Larvacea relative abundance and biomass were positively correlated to relative read count. Only relative biomass was positively correlated to relative read count for Cephalopoda (Table 1), while relative abundance was positively correlated to relative read count for Chaetognatha and Cladocera. It was negatively correlated for Ostracoda.
Migrating and seasonal of groups’ effects on carbon export
To better identify the degree to which specific taxa drove morphological variations and carbon export changes described above, a co-occurrence network was built on the 225 ASVs. It contained 413 edges (average of 3.67 co-occurrences per ASV). It had a modularity coefficient of 0.72, meaning that the network can be divided into modules. A total of 15 modules were found. Six modules were composed of at least 10 ASVs, named hereafter A to F. They contained between 11 (module F) and 36 (module D) ASVs.
Taxa were quite different from one module to another. Module A contained many Euphausiacea, Ostracoda from three genus, and some gelatinous taxa (Supp. Table VI). Module B was exclusively composed of Arthropoda (Supp. Table VI). Module D included some Appendicularia and Thaliacea, among other organisms (Supp. Table VI). Modules’ entire taxonomic composition can be found in Supp. Table VII.
Relative abundances trends were highly variable among modules (Figure 5). They were very similar between day and night samples, except for Module A (Wilcoxon-Mann-Whitney test; p < 0.001, Figure 6 & Table 2), especially during the winter period (from November 2016 to January 2017). All modules except A there were significant changes in their relative abundances between molecular-based cluster samplings (Kruskal-Wallis test, p < 0.05 except for module A: p = 0.182 ; Figure 6). Compared to the overall study period, module B was relatively more abundant during the bloom period (molecular-based cluster 2) while module D was relatively less abundant during the same period (Dunn pairwise comparison test; Figure 6 & Table 2). Taxa included in module C were relatively more abundant in BATS mesozooplankton community during the summer stratification period (molecular-based cluster 1) (p < 0.001, Figure 6 & Table 2). Percentages of relative abundance from module F were steady and low throughout the study period. Morphology differed also between modules (Table 2). When ASVs from module E were more present, they tended to increase the size and reduce the circularity (PC1 > 0) of the overall mesozooplankton community. Sampling positions along the second axis were significantly and positively correlated to percentages of relative abundances of modules C and F (respectively +0.74 and +0.55). Hence, when ASVs from these modules are more abundant, the overall mesozooplankton community tended to be more transparent (PC2 > 0) than usual. Finally, Pearson’s correlation coefficient was significantly positive between relative abundances of module D and sampling positions along the third axis of the morphospace (+0.67). It was negative between relative abundances of module F, and sampling positions along the third axis of the morphospace (-0.54). Thus, the overall mesozooplankton community appeared with folded appendices, or was more composed of taxa without appendices (PC3 > 0) when ASVs from module D are relatively more abundant, and inversely when module F is relatively more present. Relative abundances of module E and sampling positions along the first axis of the morphospace were positively correlated (+0.56, Table 2). Finally, carbon export was positively correlated to relative abundance of module D (+0.87) and negatively correlated to modules C (-0.51) and F (-0.60).