RESULTS
We recovered an average of 1,227,032 (SD = 815,798) PE-ddRAD reads per
sample (Table S1) that were assembled to an average of 24,621 loci per
sample, with a mean coverage per sample of 39x (SD = 19). Locus length
ranged from 140 to 239 bp with a median of 198 bp (SD = 25.5).
Bayesian divergence time estimation
with SNP
data
The SNAPP phylogeny revealed largely the same topology as a previous
phylogenetic study based on the same data (Ferrer-Obiol et al. under
review), except for the poorly supported relationship betweenArdenna and Puffinus and the relationship between A.
grisea and A. tenuirostris (Table S2; Figure S1). Both
incongruences were already identified in the previous study using
different methods and datasets, and were caused by high levels of ILS.
Using a constraint for the age of the root, we estimated the
time-calibrated tree shown in Figure 1. The time to the most recent
common ancestor (TMRCA) of Puffinus was the oldest among the
three genera, estimated at 9.98 Mya (95% HPD: 12.26-7.65 Mya). The
TMRCA of Ardenna was inferred to be 4.52 Mya (95% HPD: 5.64-3.53
Mya) and the TMRCA of Calonectris 3.54 Mya (95% HPD: 4.57-2.55
Mya). If the divergence times are accurate, then shearwater speciation
increased during the Pliocene reaching a peak by the late Pliocene
(~2.58 Mya; Figure 1), when most of the modern
biogeographical groups of shearwaters were already present.
Using the same three fossil calibrations (see Material and Methods),
shearwater divergence times inferred using the MSC were 28-94% younger
than those estimated by Ferrer-Obiol et al. (under review) using
concatenation (Table S2). MSC analyses using these fossil calibrations
resulted in slightly older estimates (8.9% older on average) compared
to the same analyses using a single age constraint on the root (Figure
S2, Table S2). Conversely, fixing the phylogeny had very little effect
on age estimates (0.4% older on average).
The mean population size across all shearwater species estimated by
SNAPP was N=63,555 individuals (95% HPD: 50,390–77,155) when assuming
the lowest generation time estimated for a shearwater species (13 years;
Genovart et al., 2016), and
N=43,485 individuals (95% HPD: 34,477–52,790) when assuming the
highest estimated value (19 years; Birdlife International 2020).
However, SNAPP analysis without age constraints showed a notable
variation in θ estimates even between sister species (Figure S3)
suggesting frequent changes in population size in the evolutionary
history of shearwaters.
Biogeographic
Analysis
Under all tested models, ancestral range estimation analyses, including
a dispersal matrix restricting dispersal between areas connected by main
historical and present surface ocean currents, had lower AICc than
models with an unrestricted dispersal matrix (Table 1). DIVALIKE and DEC
models had lower AICc than BAYAREALIKE models, especially when the
founder event parameter (j) was not included, suggesting that vicariance
is an important mode of speciation in shearwaters. The slightly lower
AICc for DIVALIKE models compared to DEC models further supported the
importance of widespread vicariance. However, in models including
founder event speciation, the j parameter ranged from 0.0874 to 0.1733
and the rate of range expansion (d) was an order of magnitude smaller,
showing that founder events have a higher probability of explaining most
of the data than range expansion. Indeed, the likelihood ratio test
(LRT) between the best DIVALIKE and DIVALIKE + j models showed that
DIVALIKE + j was strongly favoured (P = 1.9 x 10-5).
Under the best DIVALIKE + j model, the South Australia - New Zealand
area showed the highest support as the ancestral region of shearwaters
(marginal ML probability = 0.44), followed by the North and Tropical
Pacific (0.33) (Figure 1 and Figure S4). The origin of Ardennawas also traced to the South Australia - New Zealand area (0.54). On the
other hand, Calonectris had an unequivocal origin in the Northern
Hemisphere (North Atlantic and North and Tropical Pacific = 0.45, North
and Tropical Pacific = 0.45), whereas the ancestral area of the MRCA ofPuffinus was estimated as either the North and Tropical Pacific
(0.37), the South Australia - New Zealand area (0.27) or both (0.16).
Phylogenetic generalized least
squares of body
size
The PGLS analyses recovered several significant correlations (FDR
< 0.05) between body size measures and the predictors (Figure
2a; Table S3). Mean body mass showed significant correlations with all
predictors, suggesting that this trait is strongly influenced by
ecological factors. Overall, migratory strategy and latitudinal range
were the best predictors, suggesting that body size in shearwaters is
associated with movement capacity. Indeed, migratory strategy explained
75% of the variance in mean body mass (Figure S5a; long-distance
migrants were the heaviest and sedentary/dispersive species the
lightest) and latitudinal range explained 67% of the variance in body
mass range (Figure S5b shows the positive correlation between body mass
range and latitudinal range occupied by a species year-round). Breeding
latitude was also a good predictor of mean body mass, with the strongest
correlation recovered for maximum breeding latitude (Figure S5c;
adjusted R2=0.212). As shown in the phenogram of
ancestral state reconstructions for body mass in Figure 2b, striking
differences in body mass between sister clades are common in
shearwaters, showing that body mass changes may be important during
speciation. The ancestral state reconstruction of migratory behaviour
showed that the MRCAs of Calonectris and Ardenna were most
likely long-distance migrants (Figure S6; marginal ML probability = 0.94
and 0.86, respectively). Conversely, the MRCA of Puffinus was
most likely either a short-distance migrant or a sedentary species
(marginal ML probability = 0.47 and 0.37, respectively).
Effects of life-history traits on
the substitution
rate
Results from a previous study found that rates of mitochondrial DNA
(mtDNA) evolution were slower for larger taxa in the Procellariiformes
(Nunn & Stanley, 1998), yet
we did not find any significant correlations between the substitution
rate of our PE-ddRAD-Seq dataset and the LHT (Table S4). This may have
been influenced by the high variance in the substitution rates of our
dataset. However, consistent with GC-biased gene conversion (gBGC): a
recombination-associated mechanism that leads to the preferential
fixation of G and C in AT/GC heterozygotes, we found that GC* had a
positive significant correlation with the number of breeding pairs
(correlation coefficient = 0.684, posterior probability = 0.94). These
results are in agreement with the hypothesis that the impact of gBGC is
strongest in species with high population sizes
(Weber, Boussau, Romiguier,
Jarvis, & Ellegren, 2014).
Genomic divergence and
taxonomy
The fineRADStructure analysis identified three major clusters
corresponding to the three shearwater genera (Figure 3). Further
subdivisions within each group largely supported the most recent
shearwater phylogeny (Ferrer-Obiol et al. under review), and all the
species and subspecies included in the study were recovered as unique
clusters by the fineStructure clustering algorithm
(Lawson, Hellenthal, Myers,
& Falush, 2012), except for P. bailloni nicolae and P.
bailloni dichrous , P. mauretanicus and P. yelkouan , andA. creatopus and A. carneipes , that were in each case,
clustered into a single population.
Overall, the distributions of genetic distances were consistent with the
current taxonomy. However, the distributions of distances within and
among species showed some overlap (Figure 4). The genetic distances
between A. creatopus and A. carneipes , and betweenP. mauretanicus and P. yelkouan , were within the
distribution of genetic distances within the same subspecies (first
interval delimited in Figure 4b to ease qualitative comparison). In
addition, the genetic distances between P. boydi and P.
baroli , and between the different Atlantic Calonectris species
were within the interval of genetic distances among different subspecies
(second interval in Figure 4b).