Cluster definition and analysis
We determined whether the members of a given chemoreceptor gene family are physically closer (forming a cluster) in the pseudochromosomes than expected by chance by analyzing the distribution of pairwise physical distances between the members of a particular gene family and scaffold. We classified the paralogous copies as “clustered” and “non clustered”. Operationally we consider that n closely linked genes from a gene family are clustered if they are arranged within a genomic region that spans less than certain cut-offCL value following Vieira, Sánchez-Gracia, & Rozas (2007):
\begin{equation} C_{L}=\ g\left(n-1\right)\nonumber \\ \end{equation}
where CL is the maximum length of a cluster that contains two or more copies of the same family, and g is the maximum distance between two copies of a given family to consider that they are clustered. Here we set the value of g to 100 kb. The gene density of the Ir family in the D. silvatica genome is about one copy every 3.32 Mb (and even lower in the Grfamily). Assuming a uniform distribution of gene family members across the genome, the probability of finding by chance two (or more) Irgenes in a 100 kb stretch is p = 0.0004 (Poisson distribution, λ = 0.0301); this p -value is even lower for the Gr family. Thus, the selected g guarantees conservativeCL lengths for the two chemoreceptor families. Pairwise physical distances between gene family copies were processed with the R package ComplexHeatmap (Gu, Eils, & Schlesner, 2016), and plotted as heatmaps to facilitate the visualization of gene clustering across scaffolds.