Chemoreceptor genes are unevenly distributed across the genome of D. silvatica
Despite that not all chemoreceptor genes could be mapped on the scaffolds corresponding to the main cytological described chromosomes, the new high-quality assembly allowed us to study the genomic organization and evolution of a great number of paralogous copies of each family. According to our criterion (see methods), we identified 83 genomic clusters, 17 and 66 of them including Gr and Irgenes, respectively (Figures 5A, 5C, S2A and S2C). These clusters, which harbor up to 10 copies of the same family, were found in all major scaffolds of D. silvatica.
To gain insights into the evolutionary meaning of such gene clustering structure we investigated the relationship between pairwise evolutionary divergences, measured as dij (the number of amino acid substitutions per site between two sequences), and physical distances (in kb). We found that C ST values are high in all pseudochromosomes, ranging from 0.418 to 0.982 and from 0.428 to 0.894 for the Gr and Ir gene families, respectively, considering all identified sequences (Table 3); these values are similar when using only the complete data set (Table S4). These highC ST values translate into statistically lower evolutionary distances among family copies included in clusters than those dispersed along the genome, both at the chromosome (Mann–Whitney U-test, p -values < 0.05 for nearly all pseudochromosomes) but also at the whole genome levels (p -values < 0.001 in all cases) (Tables 3 and S4; Figures 5 and S2). This result, jointly with the large number of genomic clusters found across the D. silvatica genome, point to the recent origin of many of the chemoreceptors in this species, and to the unequal crossing-over as a major mechanism accounting for this origin. After gene duplication, the paralogs that are retained long enough (i.e., those that are not lost by genetic drift or purifying selection), continuously diverged at the sequence, and likely at the functional level (at least in terms of ligand specificity or signaling characteristics). We expect, therefore, that over time, evolutionary distances of these retained copies increase with physical distance, just as we have found (Figures 5B, 5D, S2B and S2D). This genomic architecture could have relevant functional and evolutionary implications. For instance, the presence of distantly related family members within the same genomic cluster, could be the hallmark of the interaction between functional and gene regulation constraints preventing cluster breaking. A more comprehensive analysis of these specific cases deserves to be further evaluated.