Physical versus evolutionary distances
We used the best-fit amino acid substitution model found by IQ-TREE
software v. 2.1.2 (Minh et al., 2020) to estimate the evolutionary
distances (measured as the number of amino acid replacements per amino
acid site) across all pairwise comparisons. The analysis was performed
with MEGA-CC 10.2.4 software (command-line version) (Kumar, Stecher,
Peterson, & Tamura, 2012), using the JTT substitution model (Jones,
Taylor, & Thornton, 1992), with gamma-distributed heterogeneous rate
variation among sites (5 and 7 discrete classes for the Gr and Ir
families, respectively).
We investigated the relationship between physical and evolutionary
distances by means of the CST statistic, which
measures the proportion of the evolutionary distance that is
attributable to unclustered genes. We computed CST independently in the two chemoreceptor families, and separately for
each scaffold (or for the whole genome). CST is
estimated as:
\begin{equation}
C_{\text{ST}}=\ \frac{D_{T}-D_{C}}{D_{T}}\nonumber \\
\end{equation}where DT , the average of the pairwise
evolutionary (amino acid replacements per site) distances between gene
family copies, is estimated as:
\begin{equation}
D_{T}=\frac{2}{n(n-1)\ }\sum_{i<j}d_{\text{ij}}\nonumber \\
\end{equation}where n is the number of gene family members in the surveyed
scaffold (or in the entire genome), and dij is
the evolutionary distance between sequences i and j.
And \(D_{C}\), the average of the pairwise evolutionary distance between
copies from within a cluster, averaged across all clusters of the same
scaffold (or across the genome), is estimated as:
\begin{equation}
D_{C}=\frac{1}{\text{m\ }}\sum_{k=1}^{m}D_{\text{Ck}}\nonumber \\
\end{equation}\begin{equation}
D_{\text{Ck}}=\frac{2}{n(n-1)\ }\sum_{i<j}d_{\text{ij}}\nonumber \\
\end{equation}where n is the number of copies in cluster k , anddij is the amino acid-based distance between sequences iand j .
We used the Mann–Whitney U-test to determine whether the evolutionary
distances between copies of the same family in genomic clusters (in a
particular scaffold) are significantly different from those estimated
across unclustered genes.