A high-quality genome assembly for the most endangered seabird in Europe
The assembly length and the GC content of the Balearic shearwater hybrid assembly presented here are similar to those reported in the seven Procellariiformes genomes released by the B10K (Feng et al., 2020). Albeit repetitive content is remarkably higher (+33.4%) in the Balearic shearwater in comparison to the other genomes of the order, but within the range of avian genomes (G. Zhang et al., 2014). This difference of up to a third can be due to the fact that we included a Procellariiform (C. borealis ) repeat library prior to running RepeatMasker, achieving a more precise library that encloses clade related repeats that are present in the genome but not found by the de novo RepeatModeler library. The genome assembly completeness (BUSCO 95.9%) is slightly higher than the obtained for other recently published bird genomes (Feng et al., 2020; Prost et al., 2019), and even higher than genome assemblies including optical mapping (Peñalba et al., 2020). Despite not being a chromosome-scale assembly, contiguity is also quite high (N50 2.1 Mbp), and higher than recent avian MaSuRCA hybrid assemblies (Gan et al., 2019; Leroy et al., 2019).
The retrieved proteome (21,959 protein-coding genes) is similar to previous genomes (Liu et al., 2021; Recuerda et al., 2021), but higher than the B10K 2020 genomes used in the comparative studies in this work (mean of 16K). This is probably due to the B10K annotation pipeline being fully based on homology, whilst we also used de novoprediction. The functional annotation quality in terms of genes having at least a GO term (85.9%) is comparable to recent chromosome-scale genomes (Recuerda et al., 2021).
The mitogenome of P. mauretanicus spans 19,885 bp, exhibiting the same order and the nad6 gene duplication observed in P. lherminieri (Torres et al. 2018). We did not find any cobduplication as it occurs in the Diomedeidae family (Abbott, Double, Trueman, Robinson, & Cockburn, 2005). Our result supports Torres et al. (2018) conclusion that nad6 duplication could be widespread in Procellariiformes, and, like cob , could have undergone various events of deletion or addition during the diversification of the order. Nevertheless, since some of the reported duplications could be artificial (Formenti et al., 2021; Urantówka, Kroczak, & Mackiewicz, 2020), to fully identify the true number of gene duplications/deletions will require additional and specific experimental analyses.