4.2 Sequencing artifacts due to mtDNA duplication and uncertainties about molecular clock rates
The mitochondrial genome of several Procellariiformes presents tandem repeats (Abbott et al. 2005; Lounsberry et al. 2015), including P. lherminieri (Torres et al. 2018). In this taxon, a duplicated region comprising two copies of the CR was found, yielding the presence of double peaks on chromatograms. By treating gDNA with an exonuclease (which effectively removed all linear DNA), we did not observe triple- or quadruple-peaks on chromatograms, and thus hypothesized that only two copies of the CR sequences were amplified by PCR. Therefore we considered two haplotypic phases for all CR sequences, assuming that removing all CR sequences would have led to a loss of information and of statistical power in the analyses. To check the robustness of this approach however, we replicated all our analyses using two other data subsets: one in which all ambiguous CR sequences were removed, and another one from which all CR sequences were removed. Removing all CR sequences led to a strong loss of information, an increase of the estimated differentiation and a decrease of the estimated divergence times (Supplementary Material 6). Removing only the ambiguous sequences led to estimations close to the estimations of the complete dataset. This suggests that the noise caused by the multiple copies of CR was swamped by the signal contained in that marker and so our analyses using all the individuals are not significantly biased by these sequences.
Another major issue concerns the choice of a molecular clock rate for dating the splitting events. We found a clear hierarchical structuration of populations in the Atlantic and Indo-Pacific Oceans, which diverged around 1.76 to 2.71 My ago either using all markers or only mtDNA markers, respectively. In contrast, using the same taxa (but fewer specimens and only cytb ), Austin et al. (2004) dated this split at 3.2-3.8 My, and rather suggested that the closure of the Isthmus of Panama erected a barrier to gene flow between Indo-Pacific and Atlantic populations. This difference in divergence times might be due to taxon sampling, gene sampling, or the calibration of the molecular clock (0.631%/My here, vs. 0.9%/My in Austin et al). This latter parameter is probably a major contributor to the difference observed between our studies, since using the same value as Austin et al, and only sequences from cytb , we found a divergence time of 2.2-3.8 My, thus very close to Austin et al. Many substitution rates for both Mt DNA markers were proposed for petrels, ranging from 0.19%/My (Pacheco et al. 2011) to 1.544%/My (Pereira & Baker 2006) for cox1 , and from 0.18%/My (Pacheco et al. 2011) to 0.88-0.92%/My (Nunn & Stanley 1998; the latter rate was used by Austin et al), 1.022%/My (Pereira & Baker 2006) and 1.89%/My (Weir & Schluter 2008) for cytb . Using lowest rates (Pacheco et al. 2011) resulted in a divergence time between Atlantic and Indo-Pacific lineages around 12.8 My ago (Supplementary Material 10), an unlikely value given the presence of an ocean between the Americas before the Isthmus of Panama erected, despite the fact thatPuffinus is dating back to the Oligocene (Henderson & Gill 2010). Here we used the substitution rates that were estimated in Nabholz et al. 2016, as the most recent review. There are two further arguments against Austin et al. (2004) scenario: first, the closure of the Isthmus of Panama has been actually dated at 2.8 My ago (O’Dea et al. 2016), thus later than Austin et al’s scenario. Second, we found that the pacific taxon, dichrous , was not ancestral to the Atlantic taxon, but was embedded within Indian ocean taxa. The use of precise fossil calibration would bring a more robust estimation of divergence times and so a supplemental evidence of the origin of the colonization.