A new, high-quality genome assembly and annotation
The new genome of D. silvatica , which has an assembly size of 1.37 Gb (Table 1), shows a high completeness, with the detection of 86.3% and 92.9% of the BUSCO genes across the arachnida and eukaryota data sets, respectively, in their genome sequence (Table 1; Table S2). Despite having 15,360 scaffolds, the N50 and L50 values, 174.2 Mb and 4 scaffolds, respectively, also demonstrate the high continuity of the assembly. The seven largest scaffolds (or pseudochromosomes), including the larger scaffold that likely corresponds to the X chromosome (317.9 Mb long, nearly twice the size of the second largest scaffold), represent ~87% of total assembly size, matching perfectly with the haploid component of this species (6 autosomes and the X chromosome; Bellvert & Arnedo, unpublished data; Figure 2).
The structural annotation shows that the genome of D. silvaticaencodes 33,275 coding-protein genes (35,370 transcripts) and 37,198 putative tRNAs; this annotation includes the 90% and 95% of the BUSCO arthropoda and eukaryote data sets, respectively, evidencing the completeness of the annotation (Table 2). Sequence similarity-based searches uncovered 28,904 sequences with positive hits against the surveyed protein databases (16,241, 25,917 and 22,604 against Swiss-Prot, ArthropodDB and InterPro databases, respectively). Furthermore 22,093 of these functionally annotated sequences also have at least one associated GO term.
We identified ~3.2 millions of repetitive sequences, which encompass 53.0% of the total assembly size (Table 2; Tables S3). The great majority of these sequences (51.6%) correspond to transposable elements, many of them (22.1%) without detectable homologs in known databases; class II elements are the most abundant type (16.5%), followed by retrotransposons (class I), including LINEs (10.6%) and SINEs (1.8%).
The great majority of structurally annotated genes in the new genome assembly are shared across Arthropoda (63.5%), being 36.3% of them also present in Ecdysozoa (Figure S1; Table S1). Besides, 25.0% of these genes are spider-specific (order Araneae), and 17.4% were identified as lineage-specific in D. silvatica . When considering only functionally annotated genes (n = 28,904), this analysis yields equivalent results (Figure 3a), although a slightly higher fraction of genes shared within Arthropoda (72.8%) and less D. silvatica lineage-specific genes were detected (2,200 genes: 7.6%). Remarkably, the number of lineage-specific genes is nearly the half of those initially reported by Sánchez-Herrero et al. (2019); this feature could be partly explained by the fact that here we have used a broader Araneae dataset for the searches, although the higher quality of the new assembly would have allowed to identify much more proteins accurately annotated. Homology-search results based on OrthoDB (Figure 3b) were more similar than those obtained in Sánchez-Herrero et al. (2019), reflecting that the vast improvement in assembly continuity does not affect orthology inference quality.
Globally, the new chromosome-scale assembly of D. silvaticarepresents a huge improvement compared with our previous draft assembly. In terms of continuity, it implies an improvement of more than 4,500 times (the N50 value) yielding a scaffold N50 of 174.2 Mb from the 38 kb in the previous assembly. This improvement is also reflected in the high number of annotated genes (Table 2, Table S2), despite the number the gene models in current version drops from 48,619 (75% of them with functional annotation) to 33,275 (87% with functional annotation). On the other hand, the new reference sequence encompass sequence data exclusively from D. silvatica , while that reported in version 1 was generated using information from various individuals one of them now identified as D. enghoffi Arnedo, Oromí & Ribera, 1997, a phylogenetically close relative to D. silvatica also endemic from La Gomera (Arnedo et al., 2007; see also Adrián-Serrano, Lozano-Fernandez, Pons, Rozas, & Arnedo (2021) for the new mtDNA data). The new chromosome-level assembly of D. silvatica is the first highly continuous genome of a representative of the spider clade Synspermiata, which currently includes 17 families, and the third within the Araneae order (the other two are members of the superfamily Araneoidea), an extremely poor and biased genomic representation of the taxonomic and evolutionary diversity of the spider tree of life (Figure 1). Our assembly, therefore, represents a valuable resource to further conduct molecular evolutionary and functional studies in spiders and their relatives.