Phylogenetic analysis of the chemoreceptor genes in arthropods
As commented above, having a very continuous assembly opens the door of
annotating as “complete genes” most copies of a medium to large-sized
multigene families, almost outside of the scope in most of the
available, highly fragmented non-model chelicerates genomes. These
improved annotations (with the inclusion of more and longer family
copies in the multiple sequence alignments) in turn, yield to much more
accurate phylogenetic analyses, increasing the evolutionary signal, and
improving the tree node support. In many cases, furthermore, these new
complete copies could add very valuable information about, for instance,
recent bursts of duplication and gene retention.
Current phylogenetic analysis of the Gr and Ir families,
which are based on the high-quality annotations from the new assembly ofD. silvatica , are clear examples of these benefits. Our analysis
undoubtedly reflects the high gene turnover rates of these families in
chelicerates (and, in general, in panarthropods; Vizueta, Escuer,
Frıas-Lopez, et al., 2020; Vizueta et al., 2018). However, after
including the complete chemoreceptor set, a remarkable evolutionary
hallmark emerges in the D. silvatica lineage (Figures 6, 7, S3
and S4). Only a small group of Ir genes, probably involved in
some essential animal chemoreception functions, such as co-receptors
(Ir25a/8a related sequences), and the receptors involved in
thermosensation and hygrosensation, and in amino acid taste inDrosophila (i.e., Ir93a and Ir76b related
sequences) (Ni, 2021), seem to be fairly conserved between insects and
spiders. Although this extreme feature is well known in arthropods,
where most family copies cluster in species-specific clades in the
phylogenetic trees, current analysis is the first that incorporate
nearly complete information of most copies of these two families in a
chelicerate. The quality of the data allowed us to explore the origin
and diversification trends of D. silvatica chemoreceptors with
unprecedented precision and robustness. We found, for instance, that the
distribution of gene ages in the Gr family is similar in D.
silvatica and D. melanogaster , with most family members being
old (likely during the early diversification of these
subphyla).
In the Gr family ofD. silvatica , however, we uncovered very recent duplication
events that created (at least) one new genomic cluster in a very short
period in the scaffold U29 (with at least 10 genes in the cluster).
The contrasting pattern between D. silvatica and D.
melanogaster is much more pronounced in the Ir family.
Particularly noteworthy is the presence of two very recent bursts of
gene duplication that originated 116 new Ir genes (83 and 33
copies, respectively; Figures 7 and S4A). Interestingly, most of these
novel nearly identical receptors map in multiple clusters in many of the
smallest scaffolds; this feature suggest that they could be indeed part
of much large genomic clusters, which were not well assembled due to the
high number of repetitive Ir copies arranged in tandem in the
same genomic region (Clifton et al., 2020). Such duplication burst
generating many new chemoreceptor genes could reflect some relevant
evolutionary events related to the chemosensory biology of these
organisms and, consequently, they deserve to be investigated more in
depth, especially in relation to the role of selective and non-selective
forces in their retention and divergence. In fact, since current
available chelicerate genomes do not allow detecting such copies
accurately or they are just annotated as different partial sequences
(Vizueta, Escuer, Frıas-Lopez, et al., 2020; Vizueta et al., 2018), they
have been scarcely added to phylogenetic analyses, thus preventing the
precise understanding of the relevant events that shaped the repertoire
size of large gene families at a very short time scale. Our results
demonstrate that new high-quality data are especially useful to conduct
comprehensive studies of the evolution of large multigene families.
We have also used our new assembly to test whether the LBD domain
(PF00060) has enough phylogenetic signal to classify the different
subfamilies within the Ir/GluR superfamily (a strategy that we
previously used in fragmented assemblies, e.g. Vizueta et al., 2018).
Here, we take advantage that the high continuity of the assembly
permitted the complete annotation of many iGluR genes, with both
the ANF-receptor (PF01094) and the LBD domains in the same gene model.
The latter combination, that is characteristic of the iGluRsubfamily, is never found in Ir genes, which lack the
ANF-receptor domain (Croset et al., 2010). This genomic structure makes
it possible to unequivocally distinguish iGluR from Irgenes. Our phylogenetic trees based on the complete sequences of this
superfamily was fully consistent with those built using only the LBD
domains identified in D. silvatica (Figure S4C). This feature
demonstrates that this domain, by itself holds enough subfamily-specific
information to place correctly the proteins having the ANF domain in the
phylogenetic tree (i.e., close to the D. melanogasteriGluRs and separated from the Ir sequences of both
species). In fact, the information of the LBD domain allowed us to
classify correctly as iGluR some copies of this superfamily for
which we were not able to identify an ANF domain in the genomic
sequences (and that, in principle, would have been annotated asIr ).