Cluster definition and analysis
We determined whether the members of a given chemoreceptor gene family
are physically closer (forming a cluster) in the pseudochromosomes than
expected by chance by analyzing the distribution of pairwise physical
distances between the members of a particular gene family and scaffold.
We classified the paralogous copies as “clustered” and “non
clustered”. Operationally we consider that n closely linked
genes from a gene family are clustered if they are arranged within a
genomic region that spans less than certain cut-offCL value following Vieira, Sánchez-Gracia, &
Rozas (2007):
\begin{equation}
C_{L}=\ g\left(n-1\right)\nonumber \\
\end{equation}where CL is the maximum length of a cluster that
contains two or more copies of the same family, and g is the
maximum distance between two copies of a given family to consider that
they are clustered. Here we set the value of g to 100 kb. The
gene density of the Ir family in the D. silvatica genome
is about one copy every 3.32 Mb (and even lower in the Grfamily). Assuming a uniform distribution of gene family members across
the genome, the probability of finding by chance two (or more) Irgenes in a 100 kb stretch is p = 0.0004 (Poisson distribution, λ
= 0.0301); this p -value is even lower for the Gr family.
Thus, the selected g guarantees conservativeCL lengths for the two chemoreceptor families.
Pairwise physical distances between gene family copies were processed
with the R package ComplexHeatmap (Gu, Eils, & Schlesner, 2016), and
plotted as heatmaps to facilitate the visualization of gene clustering
across scaffolds.