Abstract
Evolutionary processes such as speciation happens gradually over time
making such processes time-dependant. Many studies conducted over the
past two decades have aimed at providing accurate, fossil-calibrated,
estimates of the divergence times of both extant and extinct species in
most lineages of the tree of life, including fish, amphibians, reptiles,
birds, and mammals. Data from more than 4 000 of these studies are now
publicly available from a central time tree resource and provide
opportunities of retrieving divergence times, evolutionary timelines,
and time trees in various formats to enhance scientific investigations
of evolution. There is, however, still limited functionality when
studying large lists of species that would require the batch retrieval
of data. To overcome this, a PYTHON package called Python Automated
Retrieval of Time Tree data, abbreviated as PAReTT, was created to
facilitate the interaction with the time tree resource when working with
species lists. This package was recently used in a meta-analysis of
candidate genes to study migration genetics and was able to successfully
retrieve data for forty or more species to illustrate the relationship
between divergence times and genetic data. The PAReTT package is freely
available for download from GitHub to implement in PYTHON or as a
pre-compiled Windows executable, with extensive documentation on the
package available on the PAReTT GitHub wiki pages on dependencies,
installation, and implementation of the various functions.
Keywords: PAReTT, PYTHON, Time tree, Divergence time,
Timelines, Diversification rate.
INTRODUCTION
Evolutionary processes are linked to time, be it diversification within
a lineage which may lead to the emergence of a new species, or via
subtle molecular changes over several generations steadily driving
phenotypic variation (Wagner, 2018; Francisco Henao Diaz et al. ,
2019). For example, some primary divisions between entire taxonomic
orders of birds happened approximately 75 million years ago while more
recent divisions between sub-species occurred as recently as 1 million
years ago (Prum et al. , 2015). Many evolutionary processes are
studied in the context of ecological and geographic processes that shape
the landscape within which selection, adaptation, and extinction would
have taken place and the paleogeography (Scotese, 2016; Müller et
al. , 2018), which incorporates continental drift and known major
periods of glaciation, can only be factored into the evolutionary
history of a species if the time periods for speciation are known
(Figure 1 ). Furthermore, accurate estimates of diversification
rates within species are dependent on comparing the temporal range
within which lineages, species, and subspecies are formed (Jetz et
al. , 2012). It is therefore crucial that studies on evolutionary
processes are contextualized within the relevant time frames they
happened.
Over the past few decades, a plethora of molecular studies have been
published using variable methods from fossil calibrated Bayesian
inference (Rannala and Yang, 2003; Kumar and Hedges, 2016) to comparable
relative time approaches (Yang and Yoder, 2003; Tamura et al. ,
2012) to establish the timeline for the emergence and diversification of
most species including many mammals (Nyakatura and Bininda-Emonds, 2012;
Springer, Murphy and Roca, 2018), reptiles (Tucker et al. , 2017),
and birds (Barker et al. , 2015; Prum et al. , 2015)–both
living and extinct. These studies have greatly advanced our
understanding of evolutionary processes within the context of ecological
changes and the time constraints that they occur in (Scholl and Wiens,
2016), and has helped clarify many of the questions we have with regards
to the taxonomy and phylogeny of species; which have frequently been at
odds with each other (Sangster, 2014; Springer, Murphy and Roca, 2018).
The result is a compendium of 4 075 (or more) studies that has
culminated in a central “Time Tree” resource (Hedges, Dudley and
Kumar, 2006), that collects and compiles divergence time estimates and
time trees from published and peer-reviewed studies. From this resource,
estimates of divergence times and related timelines and time trees are
available online (Kumar et al. , 2017), including an incorporated
version in MEGA (Mello, 2018) and a mobile phone app (Kumar and Hedges,
2011). Collectively this provides access to accurate diverge time
estimates for use in calibrating phylogenetic trees according to time,
comparing clades in phylogenetic trees to know clades of shared common
ancestry, as well as comparing genetic distance between species to their
temporal or evolutionary distance. There is, however, still limited
functionality in retrieving divergence times from the resource when
dealing with species lists rather than individual pairs of species. As
evolutionary studies frequently focus on multiple species, and even
multiple lineages, at a time this presents a significant roadblock
towards the streamlining of the integration of divergence time data into
larger studies.
Previous attempts at automation to facilitate batch retrieval provided
limited utilities and were poorly maintained, resulting in the removal
of the package from the CRAN repository. The Time tree resource has
continued to develop and expand and the need for such capabilities is
eminent in the ever-expanding field of evolutionary biology. To this
effect, we have endeavoured to create an easily accessible and freely
available resource to retrieve relevant data on evolutionary histories
from the Time tree site for the seamless integration of divergence time
data in molecular studies. PAReTT, short for Python-Automated Retrieval
of Time Tree data, is a menu driven and user-friendly PYTHON package to
automate interaction with the Time Tree resource for retrieving batch
data with lists of species, freely available on GitHub or as a
stand-alone Windows executable.
METHODS