Phylogenetic analysis of the chemoreceptor genes in arthropods
As commented above, having a very continuous assembly opens the door of annotating as “complete genes” most copies of a medium to large-sized multigene families, almost outside of the scope in most of the available, highly fragmented non-model chelicerates genomes. These improved annotations (with the inclusion of more and longer family copies in the multiple sequence alignments) in turn, yield to much more accurate phylogenetic analyses, increasing the evolutionary signal, and improving the tree node support. In many cases, furthermore, these new complete copies could add very valuable information about, for instance, recent bursts of duplication and gene retention.
Current phylogenetic analysis of the Gr and Ir families, which are based on the high-quality annotations from the new assembly ofD. silvatica , are clear examples of these benefits. Our analysis undoubtedly reflects the high gene turnover rates of these families in chelicerates (and, in general, in panarthropods; Vizueta, Escuer, Frıas-Lopez, et al., 2020; Vizueta et al., 2018). However, after including the complete chemoreceptor set, a remarkable evolutionary hallmark emerges in the D. silvatica lineage (Figures 6, 7, S3 and S4). Only a small group of Ir genes, probably involved in some essential animal chemoreception functions, such as co-receptors (Ir25a/8a related sequences), and the receptors involved in thermosensation and hygrosensation, and in amino acid taste inDrosophila (i.e., Ir93a and Ir76b related sequences) (Ni, 2021), seem to be fairly conserved between insects and spiders. Although this extreme feature is well known in arthropods, where most family copies cluster in species-specific clades in the phylogenetic trees, current analysis is the first that incorporate nearly complete information of most copies of these two families in a chelicerate. The quality of the data allowed us to explore the origin and diversification trends of D. silvatica chemoreceptors with unprecedented precision and robustness. We found, for instance, that the distribution of gene ages in the Gr family is similar in D. silvatica and D. melanogaster , with most family members being old (likely during the early diversification of these subphyla). In the Gr family ofD. silvatica , however, we uncovered very recent duplication events that created (at least) one new genomic cluster in a very short period in the scaffold U29 (with at least 10 genes in the cluster).
The contrasting pattern between D. silvatica and D. melanogaster is much more pronounced in the Ir family. Particularly noteworthy is the presence of two very recent bursts of gene duplication that originated 116 new Ir genes (83 and 33 copies, respectively; Figures 7 and S4A). Interestingly, most of these novel nearly identical receptors map in multiple clusters in many of the smallest scaffolds; this feature suggest that they could be indeed part of much large genomic clusters, which were not well assembled due to the high number of repetitive Ir copies arranged in tandem in the same genomic region (Clifton et al., 2020). Such duplication burst generating many new chemoreceptor genes could reflect some relevant evolutionary events related to the chemosensory biology of these organisms and, consequently, they deserve to be investigated more in depth, especially in relation to the role of selective and non-selective forces in their retention and divergence. In fact, since current available chelicerate genomes do not allow detecting such copies accurately or they are just annotated as different partial sequences (Vizueta, Escuer, Frıas-Lopez, et al., 2020; Vizueta et al., 2018), they have been scarcely added to phylogenetic analyses, thus preventing the precise understanding of the relevant events that shaped the repertoire size of large gene families at a very short time scale. Our results demonstrate that new high-quality data are especially useful to conduct comprehensive studies of the evolution of large multigene families.
We have also used our new assembly to test whether the LBD domain (PF00060) has enough phylogenetic signal to classify the different subfamilies within the Ir/GluR superfamily (a strategy that we previously used in fragmented assemblies, e.g. Vizueta et al., 2018). Here, we take advantage that the high continuity of the assembly permitted the complete annotation of many iGluR genes, with both the ANF-receptor (PF01094) and the LBD domains in the same gene model. The latter combination, that is characteristic of the iGluRsubfamily, is never found in Ir genes, which lack the ANF-receptor domain (Croset et al., 2010). This genomic structure makes it possible to unequivocally distinguish iGluR from Irgenes. Our phylogenetic trees based on the complete sequences of this superfamily was fully consistent with those built using only the LBD domains identified in D. silvatica (Figure S4C). This feature demonstrates that this domain, by itself holds enough subfamily-specific information to place correctly the proteins having the ANF domain in the phylogenetic tree (i.e., close to the D. melanogasteriGluRs and separated from the Ir sequences of both species). In fact, the information of the LBD domain allowed us to classify correctly as iGluR some copies of this superfamily for which we were not able to identify an ANF domain in the genomic sequences (and that, in principle, would have been annotated asIr ).