Physical versus evolutionary distances
We used the best-fit amino acid substitution model found by IQ-TREE software v. 2.1.2 (Minh et al., 2020) to estimate the evolutionary distances (measured as the number of amino acid replacements per amino acid site) across all pairwise comparisons. The analysis was performed with MEGA-CC 10.2.4 software (command-line version) (Kumar, Stecher, Peterson, & Tamura, 2012), using the JTT substitution model (Jones, Taylor, & Thornton, 1992), with gamma-distributed heterogeneous rate variation among sites (5 and 7 discrete classes for the Gr and Ir families, respectively).
We investigated the relationship between physical and evolutionary distances by means of the CST statistic, which measures the proportion of the evolutionary distance that is attributable to unclustered genes. We computed CST independently in the two chemoreceptor families, and separately for each scaffold (or for the whole genome). CST is estimated as:
\begin{equation} C_{\text{ST}}=\ \frac{D_{T}-D_{C}}{D_{T}}\nonumber \\ \end{equation}
where DT , the average of the pairwise evolutionary (amino acid replacements per site) distances between gene family copies, is estimated as:
\begin{equation} D_{T}=\frac{2}{n(n-1)\ }\sum_{i<j}d_{\text{ij}}\nonumber \\ \end{equation}
where n is the number of gene family members in the surveyed scaffold (or in the entire genome), and dij is the evolutionary distance between sequences i and j.
And \(D_{C}\), the average of the pairwise evolutionary distance between copies from within a cluster, averaged across all clusters of the same scaffold (or across the genome), is estimated as:
\begin{equation} D_{C}=\frac{1}{\text{m\ }}\sum_{k=1}^{m}D_{\text{Ck}}\nonumber \\ \end{equation}\begin{equation} D_{\text{Ck}}=\frac{2}{n(n-1)\ }\sum_{i<j}d_{\text{ij}}\nonumber \\ \end{equation}
where n is the number of copies in cluster k , anddij is the amino acid-based distance between sequences iand j .
We used the Mann–Whitney U-test to determine whether the evolutionary distances between copies of the same family in genomic clusters (in a particular scaffold) are significantly different from those estimated across unclustered genes.