Discussion
In this study, we investigated the local genetic structure of C.
nozawae from eDNA analysis for the D-loop region and compared the
results with those obtained from tissue samples, to reveal the
applicability of eDNA to landscape genetics.
The haplotype distributions obtained from eDNA and tissue samples showed
similar patterns, although there were some differences in each haplotype
frequency. Major spatial patterns, including the presence of genetically
isolated sites and differences corresponding to spatial structure, were
also detected from both approaches. Note that the population structure
inferred from the D-loop region is only based on a single locus; thus,
the strength of the population structure was not as apparent as that
inferred from genome-wide SNP data, which is commonly used in
present-day tissue studies (Figure 2). Nevertheless, showing the
haplotype distribution obtained from eDNA should be useful for
understanding a rough spatial pattern within an entire watershed.
All statistics of genetic diversity and differentiation calculated from
eDNA were significantly correlated with those obtained from tissue
samples. Regarding genetic diversity, the correlation with tissue-based
approaches was lower in hr than h S (Figure
S2). This is likely because statistics based on the number of haplotypes
tend to be more affected by potentially erroneous sequences generated in
next-generation sequencing than those based on gene frequencies. As for
genetic differentiation, the main subject of this study, both statistics
used were based on gene frequencies and were highly correlated between
eDNA- and tissue-based calculations using the same marker. The
correlation coefficients were similar to those of a previous study that
calculated a statistic identical to D PS from eDNA
and found a correlation of r = 0.76 with tissue samples (Andres et al.
2023a). What is novel and particularly important in this study is the
fact that these high correlations are observed not only for the entire
dataset but also for datasets featuring nearby sites (upstream dataset
and 15-km dataset) (Table 2). This indicates that genetic
differentiation can be calculated with nearly the same accuracy as for
tissue samples even at spatial scales where gene flow is the dominant
factor in shaping the strength of genetic differentiation. On the other
hand, some distant sites did not share any haplotypes (i.e.,D PS = 1; Table S3), and comparisons of the
strength of genetic differentiation between such pairs of sites are
deemed difficult in the used marker.
Another important aspect in local scale inference is that eDNA-based
differentiation was not necessarily inferior to tissue-based
differentiation using the same region when inferences based on SNP data
are regarded as more reliable (see D PS in
upstream dataset and 15-km dataset in Table 2). In population genetics,
a higher number of individuals × loci will lead to better accuracy in
inference, and the inference obtained from small samples introduces
biases associated with the individuals sampled. In eDNA, although there
are usually fewer available loci, it is possible that the samples
contain information from many individuals (Tsuji et al. 2020a), which
may be more reflective of the “gene pool” in each site. How many
individuals are generally reflected in eDNA samples is still under study
(Couton et al. 2023), but because the studied area displays high
population densities in C. nozawae (about 50 individuals per 100
m2; Suzuki et al. 2021), information on a large number
of individuals might be obtained from eDNA. Nevertheless, a clear
difference in the accuracy between the eDNA- and tissue-based approaches
could not be identified. The take-home message at this time is that eDNA
analysis is not necessarily inferior to the tissue-based analysis from
approximately 16 individuals per population when the same markers are
used.
When exploring regional haplotypes or their new distributions from the
perspective of phylogeography, it is paramount to minimize false
positives and negatives (Turon et al. 2020; Tsuji et al. 2023). However,
genetic differentiation at the local scale, such as that addressed in
this study, is fundamentally dependent on gene frequencies within each
population, and their overall spatial patterns or relationship to the
environment are the subject of analysis. In addition, although the
effect of haplotypes detected at low frequencies on statistics is
relatively low, the sharing of the same haplotype at low frequencies at
multiple sites sometimes provides important information for the
comparison of the strength of gene flow (Slatkin and Barton 1989). On
the other hand, since denoising primarily removes erroneous sequences
that occur in sync with the correct sequences (Callahan et al. 2016),
the remaining false positives are likely to be randomly distributed in
small amounts across all samples and may not have a large impact on
inferences. For these reasons, bulk removal of low-frequency sequences
may not be a beneficial option in landscape genetics. Conversely, it may
be worth considering the results that a semi-quantitative approach
(treatment 3) did not decrease the correlation with tissue samples but
rather increased it in some datasets (Table 2). The fact that the
accuracy did not change much when detailed values were converted to
rough values suggests that the calculated statistics should be used to
capture overall trends rather than to find meaning in slight differences
in values.