RESULTS

We recovered an average of 1,227,032 (SD = 815,798) PE-ddRAD reads per sample (Table S1) that were assembled to an average of 24,621 loci per sample, with a mean coverage per sample of 39x (SD = 19). Locus length ranged from 140 to 239 bp with a median of 198 bp (SD = 25.5).

Bayesian divergence time estimation with SNP data

The SNAPP phylogeny revealed largely the same topology as a previous phylogenetic study based on the same data (Ferrer-Obiol et al. under review), except for the poorly supported relationship betweenArdenna and Puffinus and the relationship between A. grisea and A. tenuirostris (Table S2; Figure S1). Both incongruences were already identified in the previous study using different methods and datasets, and were caused by high levels of ILS.
Using a constraint for the age of the root, we estimated the time-calibrated tree shown in Figure 1. The time to the most recent common ancestor (TMRCA) of Puffinus was the oldest among the three genera, estimated at 9.98 Mya (95% HPD: 12.26-7.65 Mya). The TMRCA of Ardenna was inferred to be 4.52 Mya (95% HPD: 5.64-3.53 Mya) and the TMRCA of Calonectris 3.54 Mya (95% HPD: 4.57-2.55 Mya). If the divergence times are accurate, then shearwater speciation increased during the Pliocene reaching a peak by the late Pliocene (~2.58 Mya; Figure 1), when most of the modern biogeographical groups of shearwaters were already present.
Using the same three fossil calibrations (see Material and Methods), shearwater divergence times inferred using the MSC were 28-94% younger than those estimated by Ferrer-Obiol et al. (under review) using concatenation (Table S2). MSC analyses using these fossil calibrations resulted in slightly older estimates (8.9% older on average) compared to the same analyses using a single age constraint on the root (Figure S2, Table S2). Conversely, fixing the phylogeny had very little effect on age estimates (0.4% older on average).
The mean population size across all shearwater species estimated by SNAPP was N=63,555 individuals (95% HPD: 50,390–77,155) when assuming the lowest generation time estimated for a shearwater species (13 years; Genovart et al., 2016), and N=43,485 individuals (95% HPD: 34,477–52,790) when assuming the highest estimated value (19 years; Birdlife International 2020). However, SNAPP analysis without age constraints showed a notable variation in θ estimates even between sister species (Figure S3) suggesting frequent changes in population size in the evolutionary history of shearwaters.

Biogeographic Analysis

Under all tested models, ancestral range estimation analyses, including a dispersal matrix restricting dispersal between areas connected by main historical and present surface ocean currents, had lower AICc than models with an unrestricted dispersal matrix (Table 1). DIVALIKE and DEC models had lower AICc than BAYAREALIKE models, especially when the founder event parameter (j) was not included, suggesting that vicariance is an important mode of speciation in shearwaters. The slightly lower AICc for DIVALIKE models compared to DEC models further supported the importance of widespread vicariance. However, in models including founder event speciation, the j parameter ranged from 0.0874 to 0.1733 and the rate of range expansion (d) was an order of magnitude smaller, showing that founder events have a higher probability of explaining most of the data than range expansion. Indeed, the likelihood ratio test (LRT) between the best DIVALIKE and DIVALIKE + j models showed that DIVALIKE + j was strongly favoured (P = 1.9 x 10-5).
Under the best DIVALIKE + j model, the South Australia - New Zealand area showed the highest support as the ancestral region of shearwaters (marginal ML probability = 0.44), followed by the North and Tropical Pacific (0.33) (Figure 1 and Figure S4). The origin of Ardennawas also traced to the South Australia - New Zealand area (0.54). On the other hand, Calonectris had an unequivocal origin in the Northern Hemisphere (North Atlantic and North and Tropical Pacific = 0.45, North and Tropical Pacific = 0.45), whereas the ancestral area of the MRCA ofPuffinus was estimated as either the North and Tropical Pacific (0.37), the South Australia - New Zealand area (0.27) or both (0.16).

Phylogenetic generalized least squares of body size

The PGLS analyses recovered several significant correlations (FDR < 0.05) between body size measures and the predictors (Figure 2a; Table S3). Mean body mass showed significant correlations with all predictors, suggesting that this trait is strongly influenced by ecological factors. Overall, migratory strategy and latitudinal range were the best predictors, suggesting that body size in shearwaters is associated with movement capacity. Indeed, migratory strategy explained 75% of the variance in mean body mass (Figure S5a; long-distance migrants were the heaviest and sedentary/dispersive species the lightest) and latitudinal range explained 67% of the variance in body mass range (Figure S5b shows the positive correlation between body mass range and latitudinal range occupied by a species year-round). Breeding latitude was also a good predictor of mean body mass, with the strongest correlation recovered for maximum breeding latitude (Figure S5c; adjusted R2=0.212). As shown in the phenogram of ancestral state reconstructions for body mass in Figure 2b, striking differences in body mass between sister clades are common in shearwaters, showing that body mass changes may be important during speciation. The ancestral state reconstruction of migratory behaviour showed that the MRCAs of Calonectris and Ardenna were most likely long-distance migrants (Figure S6; marginal ML probability = 0.94 and 0.86, respectively). Conversely, the MRCA of Puffinus was most likely either a short-distance migrant or a sedentary species (marginal ML probability = 0.47 and 0.37, respectively).

Effects of life-history traits on the substitution rate

Results from a previous study found that rates of mitochondrial DNA (mtDNA) evolution were slower for larger taxa in the Procellariiformes (Nunn & Stanley, 1998), yet we did not find any significant correlations between the substitution rate of our PE-ddRAD-Seq dataset and the LHT (Table S4). This may have been influenced by the high variance in the substitution rates of our dataset. However, consistent with GC-biased gene conversion (gBGC): a recombination-associated mechanism that leads to the preferential fixation of G and C in AT/GC heterozygotes, we found that GC* had a positive significant correlation with the number of breeding pairs (correlation coefficient = 0.684, posterior probability = 0.94). These results are in agreement with the hypothesis that the impact of gBGC is strongest in species with high population sizes (Weber, Boussau, Romiguier, Jarvis, & Ellegren, 2014).

Genomic divergence and taxonomy

The fineRADStructure analysis identified three major clusters corresponding to the three shearwater genera (Figure 3). Further subdivisions within each group largely supported the most recent shearwater phylogeny (Ferrer-Obiol et al. under review), and all the species and subspecies included in the study were recovered as unique clusters by the fineStructure clustering algorithm (Lawson, Hellenthal, Myers, & Falush, 2012), except for P. bailloni nicolae and P. bailloni dichrous , P. mauretanicus and P. yelkouan , andA. creatopus and A. carneipes , that were in each case, clustered into a single population.
Overall, the distributions of genetic distances were consistent with the current taxonomy. However, the distributions of distances within and among species showed some overlap (Figure 4). The genetic distances between A. creatopus and A. carneipes , and betweenP. mauretanicus and P. yelkouan , were within the distribution of genetic distances within the same subspecies (first interval delimited in Figure 4b to ease qualitative comparison). In addition, the genetic distances between P. boydi and P. baroli , and between the different Atlantic Calonectris species were within the interval of genetic distances among different subspecies (second interval in Figure 4b).