Discussion
In this study, we investigated the local genetic structure of C. nozawae from eDNA analysis for the D-loop region and compared the results with those obtained from tissue samples, to reveal the applicability of eDNA to landscape genetics.
The haplotype distributions obtained from eDNA and tissue samples showed similar patterns, although there were some differences in each haplotype frequency. Major spatial patterns, including the presence of genetically isolated sites and differences corresponding to spatial structure, were also detected from both approaches. Note that the population structure inferred from the D-loop region is only based on a single locus; thus, the strength of the population structure was not as apparent as that inferred from genome-wide SNP data, which is commonly used in present-day tissue studies (Figure 2). Nevertheless, showing the haplotype distribution obtained from eDNA should be useful for understanding a rough spatial pattern within an entire watershed.
All statistics of genetic diversity and differentiation calculated from eDNA were significantly correlated with those obtained from tissue samples. Regarding genetic diversity, the correlation with tissue-based approaches was lower in hr than h S (Figure S2). This is likely because statistics based on the number of haplotypes tend to be more affected by potentially erroneous sequences generated in next-generation sequencing than those based on gene frequencies. As for genetic differentiation, the main subject of this study, both statistics used were based on gene frequencies and were highly correlated between eDNA- and tissue-based calculations using the same marker. The correlation coefficients were similar to those of a previous study that calculated a statistic identical to D PS from eDNA and found a correlation of r = 0.76 with tissue samples (Andres et al. 2023a). What is novel and particularly important in this study is the fact that these high correlations are observed not only for the entire dataset but also for datasets featuring nearby sites (upstream dataset and 15-km dataset) (Table 2). This indicates that genetic differentiation can be calculated with nearly the same accuracy as for tissue samples even at spatial scales where gene flow is the dominant factor in shaping the strength of genetic differentiation. On the other hand, some distant sites did not share any haplotypes (i.e.,D PS = 1; Table S3), and comparisons of the strength of genetic differentiation between such pairs of sites are deemed difficult in the used marker.
Another important aspect in local scale inference is that eDNA-based differentiation was not necessarily inferior to tissue-based differentiation using the same region when inferences based on SNP data are regarded as more reliable (see D PS in upstream dataset and 15-km dataset in Table 2). In population genetics, a higher number of individuals × loci will lead to better accuracy in inference, and the inference obtained from small samples introduces biases associated with the individuals sampled. In eDNA, although there are usually fewer available loci, it is possible that the samples contain information from many individuals (Tsuji et al. 2020a), which may be more reflective of the “gene pool” in each site. How many individuals are generally reflected in eDNA samples is still under study (Couton et al. 2023), but because the studied area displays high population densities in C. nozawae (about 50 individuals per 100 m2; Suzuki et al. 2021), information on a large number of individuals might be obtained from eDNA. Nevertheless, a clear difference in the accuracy between the eDNA- and tissue-based approaches could not be identified. The take-home message at this time is that eDNA analysis is not necessarily inferior to the tissue-based analysis from approximately 16 individuals per population when the same markers are used.
When exploring regional haplotypes or their new distributions from the perspective of phylogeography, it is paramount to minimize false positives and negatives (Turon et al. 2020; Tsuji et al. 2023). However, genetic differentiation at the local scale, such as that addressed in this study, is fundamentally dependent on gene frequencies within each population, and their overall spatial patterns or relationship to the environment are the subject of analysis. In addition, although the effect of haplotypes detected at low frequencies on statistics is relatively low, the sharing of the same haplotype at low frequencies at multiple sites sometimes provides important information for the comparison of the strength of gene flow (Slatkin and Barton 1989). On the other hand, since denoising primarily removes erroneous sequences that occur in sync with the correct sequences (Callahan et al. 2016), the remaining false positives are likely to be randomly distributed in small amounts across all samples and may not have a large impact on inferences. For these reasons, bulk removal of low-frequency sequences may not be a beneficial option in landscape genetics. Conversely, it may be worth considering the results that a semi-quantitative approach (treatment 3) did not decrease the correlation with tissue samples but rather increased it in some datasets (Table 2). The fact that the accuracy did not change much when detailed values were converted to rough values suggests that the calculated statistics should be used to capture overall trends rather than to find meaning in slight differences in values.