A new, high-quality genome assembly and annotation
The new genome of D. silvatica , which has an assembly size of
1.37 Gb (Table 1), shows a high completeness, with the detection of
86.3% and 92.9% of the BUSCO genes across the arachnida and eukaryota
data sets, respectively, in their genome sequence (Table 1; Table S2).
Despite having 15,360 scaffolds, the N50 and L50 values, 174.2 Mb and 4
scaffolds, respectively, also demonstrate the high continuity of the
assembly. The seven largest scaffolds (or pseudochromosomes), including
the larger scaffold that likely corresponds to the X chromosome (317.9
Mb long, nearly twice the size of the second largest scaffold),
represent ~87% of total assembly size, matching
perfectly with the haploid component of this species (6 autosomes and
the X chromosome; Bellvert & Arnedo, unpublished data; Figure 2).
The structural annotation shows that the genome of D. silvaticaencodes 33,275 coding-protein genes (35,370 transcripts) and 37,198
putative tRNAs; this annotation includes the 90% and 95% of the BUSCO
arthropoda and eukaryote data sets, respectively, evidencing the
completeness of the annotation (Table 2).
Sequence similarity-based searches
uncovered 28,904 sequences with positive hits against the surveyed
protein databases (16,241, 25,917 and 22,604 against Swiss-Prot,
ArthropodDB and InterPro databases, respectively).
Furthermore 22,093 of these
functionally annotated sequences also have at least one associated GO
term.
We identified ~3.2 millions of repetitive sequences,
which encompass 53.0% of the total assembly size (Table 2; Tables S3).
The great majority of these sequences (51.6%) correspond to
transposable elements, many of them (22.1%) without detectable homologs
in known databases; class II elements are the most abundant type
(16.5%), followed by retrotransposons (class I), including LINEs
(10.6%) and SINEs (1.8%).
The great majority of structurally annotated genes in the new genome
assembly are shared across Arthropoda (63.5%), being 36.3% of them
also present in Ecdysozoa (Figure S1; Table S1). Besides, 25.0% of
these genes are spider-specific (order Araneae), and 17.4% were
identified as lineage-specific in D. silvatica . When considering
only functionally annotated genes (n = 28,904), this analysis
yields equivalent results (Figure 3a), although a slightly higher
fraction of genes shared within Arthropoda (72.8%) and less D.
silvatica lineage-specific genes were detected (2,200 genes: 7.6%).
Remarkably, the number of lineage-specific genes is nearly the half of
those initially reported by Sánchez-Herrero et al. (2019); this feature
could be partly explained by the fact that here we have used a broader
Araneae dataset for the searches, although the higher quality of the new
assembly would have allowed to identify much more proteins accurately
annotated. Homology-search results based on OrthoDB (Figure 3b) were
more similar than those obtained in Sánchez-Herrero et al. (2019),
reflecting that the vast improvement in assembly continuity does not
affect orthology inference quality.
Globally, the new chromosome-scale assembly of D. silvaticarepresents a huge improvement compared with our previous draft assembly.
In terms of continuity, it implies an improvement of more than 4,500
times (the N50 value) yielding a scaffold N50 of 174.2 Mb from the 38 kb
in the previous assembly. This improvement is also reflected in the high
number of annotated genes (Table 2, Table S2), despite the number the
gene models in current version drops from 48,619 (75% of them with
functional annotation) to 33,275 (87% with functional annotation). On
the other hand, the new reference sequence encompass sequence data
exclusively from D. silvatica , while that reported in version 1
was generated using information from various individuals one of them now
identified as D. enghoffi Arnedo, Oromí & Ribera, 1997, a
phylogenetically close relative to D. silvatica also endemic from
La Gomera (Arnedo et al., 2007; see also Adrián-Serrano,
Lozano-Fernandez, Pons, Rozas, & Arnedo (2021) for the new mtDNA data).
The new chromosome-level assembly of D. silvatica is the first
highly continuous genome of a representative of the spider clade
Synspermiata, which currently includes 17 families, and the third within
the Araneae order (the other two are members of the superfamily
Araneoidea), an extremely poor and biased genomic representation of the
taxonomic and evolutionary diversity of the spider tree of life (Figure
1). Our assembly, therefore, represents a valuable resource to further
conduct molecular evolutionary and functional studies in spiders and
their relatives.