A high-quality genome assembly for the most endangered seabird in
Europe
The assembly length and the GC content of the Balearic shearwater hybrid
assembly presented here are similar to those reported in the seven
Procellariiformes genomes released by the B10K (Feng et al., 2020).
Albeit repetitive content is remarkably higher (+33.4%) in the Balearic
shearwater in comparison to the other genomes of the order, but within
the range of avian genomes (G. Zhang et al., 2014). This difference of
up to a third can be due to the fact that we included a Procellariiform
(C. borealis ) repeat library prior to running RepeatMasker,
achieving a more precise library that encloses clade related repeats
that are present in the genome but not found by the de novo
RepeatModeler library. The genome assembly completeness (BUSCO 95.9%)
is slightly higher than the obtained for other recently published bird
genomes (Feng et al., 2020; Prost et al., 2019), and even higher than
genome assemblies including optical mapping (Peñalba et al., 2020).
Despite not being a chromosome-scale assembly, contiguity is also quite
high (N50 2.1 Mbp), and higher than recent avian MaSuRCA hybrid
assemblies (Gan et al., 2019; Leroy et al., 2019).
The retrieved proteome (21,959 protein-coding genes) is similar to
previous genomes (Liu et al., 2021; Recuerda et al., 2021), but higher
than the B10K 2020 genomes used in the comparative studies in this work
(mean of 16K). This is probably due to the B10K annotation pipeline
being fully based on homology, whilst we also used de novoprediction. The functional annotation quality in terms of genes having
at least a GO term (85.9%) is comparable to recent chromosome-scale
genomes (Recuerda et al., 2021).
The mitogenome of P. mauretanicus spans 19,885 bp, exhibiting the
same order and the nad6 gene duplication observed in P.
lherminieri (Torres et al. 2018). We did not find any cobduplication as it occurs in the Diomedeidae family (Abbott, Double,
Trueman, Robinson, & Cockburn, 2005). Our result supports Torres et al.
(2018) conclusion that nad6 duplication could be widespread in
Procellariiformes, and, like cob , could have undergone various
events of deletion or addition during the diversification of the order.
Nevertheless, since some of the reported duplications could be
artificial (Formenti et al., 2021; Urantówka, Kroczak, & Mackiewicz,
2020), to fully identify the true number of gene duplications/deletions
will require additional and specific experimental analyses.