2.3 Bioinformatics
Sequence reads were processed as stated in our previous study in detail
(Leese, Sander et al., 2021). Briefly, JAMP v0.67
(https://github.com/VascoElbrecht/JAMP; Elbrecht et al.,
2018)
was used on default settings to merge paired-end reads and, where
needed, to build the reverse complements of the sequences. Primer
sequences were removed. To retain only reads of the expected fragment
length, sequences with a deviation of >15 bp were excluded
from further analyses. Reads with an expected maximum error of
>0.5 and singletons were removed before clustering the
sequences with a similarity ≥97 % to OTUs. To maximize the number of
reads retained, the dereplicated sequences, including singletons, were
mapped with a similarity of ≥97 % to the generated OTU dataset. Only
OTUs with a minimal read abundance of 0.01 % in at least one sample
were retained for further analyses. OTU centroid sequences were compared
to the BOLD database for taxonomic annotation using BOLDigger 1.1.4
(Buchner & Leese,
2020).
For further analyses, we only considered OTUs with a similarity of
≥90 % to a reference sequence in BOLD. OTUs with a similarity ≥98%
were assigned to species, ≥95 % to genus and ≥90% to family level.
Replicates were merged with reads summed up and divided by two for each
OTU. OTUs for which conflicting taxonomic results were found were
checked manually, taking into account if reference specimens were
identified by taxonomic experts. Further, the obtained taxa list was
compared to the RMO database, which contains detailed information on
morphologically identified taxa occurring in this area, and the taxa
list was additionally checked by three taxonomic experts to exclude
terrestrial taxa and taxa that are impossible or unlikely to occur in
the study area.