2.5 Phylogeographic scenarios
Reciprocal monophyly was inferred on all markers as partition by gene trees, which were used to evaluate the degree of divergence among lineages using MrBayes v. 3.2.6 (Ronquist et al. 2012; Supplementary Material 4). To reconstruct the scenario of divergence among the different populations, we used species trees, inferred using two different methods. We first ran an analysis with *BEAST v 2.2.0 (Bouckaert et al. 2014) using the three mitochondrial markers. We choose to link time-trees for the three mitochondrial markers, since they are on the same plasmid, where no recombination is expected at the time scale considered in this study. However, the three markers have different composition and different evolutionary rates, so we did not link the molecular clock and evolution models. We tested the hypothesis of molecular clock with the Clock Test using ML implemented in MEGA v7.0.20 (Kumar et al. 2016). This hypothesis was rejected (p-value < 0.0005). We therefore used, for each marker, an uncorrelated lognormal relaxed clock model. The clock rate was fixed to 0.00553 ± 0.00115 substitution per site per million year for cox1 and 0.00631 ± 0.0035 for cytb (rates inferred for Procellariiformes by (Nabholz et al. 2016)). The rate for the control region was estimated by the model as no rate is published for Procellariiformes. Existing rates for other birds could not be used either because they belong to groups that are too distant (e.g. Moas (Baker et al. 2005) or Peafowls (Kimball et al. 1997)), and the control region is highly variable among groups (Ruokonen & Kvist 2002). Published rates for the CR were however set in *BEAST as priors and did not affect the results (data not shown). We used for each marker a model consistent with the result of jModelTest2, and a Yule process species tree prior with a continuous population size model. As for the MrBayes analysis, we ran each MCMC chain with 50*106 generations, sampled every 1,000 generations, the first 25% of generations were discarded as burn-in and we inspected the stationarity of the chains using Tracer (Rambaut et al. 2018). To test whether the colonization of the Atlantic could result from birds of the Pacific passing through the Panama Isthmus, individuals from the central Pacific (Marquesas archipelago, taxondichrous ) were added to the *BEAST inference. This taxon can be considered as the best representative taxon for the central Pacific, as it is the most widespread and numerous, since polynesiae is considered synonym to dichrous (Austin et al. 2004), whilebannermani is best recognised as a species on its own (Kawakami et al. 2018). The taxon gunax (from Vanuatu) has never been sequenced, and actually the location of its breeding colonies is unknown. Three dichrous cytb sequences retrieved from Genbank (AY219949-AY219951) and three individuals from our own collection were used. We ran a second *BEAST analysis by adding all the nuclear markers independently to the three mitochondrial markers, using the same MCMC parameters. Clock rates priors were set to 0.019 substitutions per site per million year for βfib and 0.013 substitution per site per million year for rag1 , since these rates were estimated for birds (Groth & Barrowclough 1999; Prychitko & Moore 1997). The clock rate for nuclear markers was estimated by the model as no rate has been produced for petrels. In this latter case, we kept only the individuals for which we had sequences for all markers. To investigate the demographic history of lineages, we estimated population size through time by estimating Extended Bayesian Skyline plot as implemented in *BEAST, for each genetic unit previously delimited, considering a generation time of 15 years (Precheur et al. 2016).
We also used a coalescent-based ABC approach to explore the best demographic scenario describing the dataset of the combined mitochondrial and nuclear markers using the program DIYABC v. 2.1.0 (Cornuet et al. 2014). ABC methods consist in the simulation of datasets similar to the real dataset in terms of population and marker sizes. First, in the Indian Ocean, we tested if the three populations emerged simultaneously in a radiation event or in two disjoint events by comparing the posterior probability of these three scenarios, and in the case of the latter, which population was ancestral to the two others and which one of the two remaining populations was ancestral to the other. This hierarchical strategy was applied to each lineage independently, then to the three Atlantic lineages, and finally considering the five lineages together (see Supplementary Material 5 for a description of all tested scenarii). For each possible scenario, 106pseudo-observed datasets were simulated, with the same ploidy and number of loci per population as observed in the real dataset. We fixed uniform priors for population sizes, times of size variation and divergence and mutation and admixture rates priors (see Supplementary Material 4 for details), from which we simulated the datasets. Summary statistics were calculated from the simulated datasets and compared to the same statistics obtained from the real dataset. The Euclidean distance was calculated between the statistics obtained for each normalized simulated dataset and those for the observed dataset (Beaumont et al. 2002). Posterior probability of each scenario was then calculated using a logistic regression on summary statistics produced by the 1% of the simulated datasets closest to the real dataset. To reduce the dimensionality of the data, a linear discriminant analysis was preliminarily applied to the summary statistics (Estoup et al. 2012). The scenario with the highest posterior probability value with a non-overlapping 95% confidence interval (95% CI) was selected.
Results