Within-individual
repeatability in telomere length: a meta-analysis in non-mammalian
vertebrates
Running title: Within-individual repeatability in TL
Tiia Kärkkäinena*$, Michael
Brigaa*, Toni Laaksonena, Antoine
Stiera$
aDepartment of Biology, University of Turku, Turku,
Finland
* Equal contribution
$ Corresponding authors:tmakark@gmail.com /antoine.stier@gmail.com
0. Abstract
Telomere length is increasingly used as a biomarker of long-term life
history costs, ageing and future survival prospects. Yet, to have the
potential to predict long-term outcomes, telomere length should exhibit
a relatively high within-individual repeatability over time, which has
been largely overlooked in past studies. To fill this gap, we conducted
a meta-analysis on 74 studies reporting longitudinal telomere length
assessment in non-mammalian vertebrates, with the aim to establish the
current pattern of within-individual repeatability in telomere length
and to identify the methodological (e.g. qPCR/TRF, study length)
and biological factors (e.g. taxon, wild/captive, age class,
species lifespan, phylogeny) that may affect it. While the median
within-individual repeatability of telomere length was moderate to high
(R = 0.55; 95% CI: 0.05-0.95; N = 82), marked heterogeneity
between studies was evident. Measurement method affected strongly
repeatability estimate, with TRF-based studies exhibiting high
repeatability (R = 0.80; 95% CI: 0.34-0.96; N = 25), while
repeatability of qPCR-based studies was only half of that and more
variable (R = 0.46; 95% CI: 0.04-0.82; N = 57). While phylogeny
explained some variance in repeatability, phylogenetic signal was not
significant (λ = 0.32; 95% CI: 0.00-0.83). None of the
biological factors investigated here had a statistically significant
association with the repeatability of telomere length, being potentially
obscured by methodological noise. Our meta-analysis highlights the need
to carefully evaluate and consider within-individual repeatability in
telomere studies to ensure the robustness of using telomere length as a
biomarker of long-term survival and fitness prospects.
Keywords : ageing biomarker, qPCR, TRF, lifespan, phylogeny
1. Introduction
Telomeres are highly conserved repetitive sequences of non-coding DNA
that cap the ends of linear chromosomes of eukaryotic species and
contribute to genomic integrity maintenance (Blackburn, 1991). Telomeres
shorten with every cell division due to the end replication problem
(inability of DNA polymerase to copy terminal DNA) (Levy, Allsopp,
Futcher, Greider, & Harley, 1992). Additionally, telomere shortening
can be accentuated by cellular stressors, such as oxidative stress
(Reichert & Stier, 2017; von Zglinicki, 2002) or substantially
increased energy demands (Casagrande & Hau, 2019; Ludlow et al., 2008).
When telomeres reach critically short length, they induce cell
senescence, apoptosis, or genomic instability, which in turn contribute
to ageing phenotypes (Campisi, 2005). Short telomeres have been
associated with increased risks of developing degenerative diseases in
humans (e.g. cardiovascular and Alzheimer diseases), while long
telomeres could increase the risk of neoplastic diseases (Aviv & Shay,
2018). Yet, short telomeres have been associated with increased
mortality risk in both humans and non-model vertebrates (Arbeev et al.,
2020; Boonekamp, Simons, Hemerik, & Verhulst, 2013; Wilbourn et al.,
2018). While a causal role of telomeres in organismal ageing has been
questioned (Simons, 2015; Young, 2018), some recent evidence suggests
that experimentally increasing telomere length could extend lifespan in
laboratory mice (Muñoz-Lorente, Cano-Martin, & Blasco, 2019).
Irrespective of causality controversies, telomere shortening is
considered to be a hallmark of ageing (López-Otín, Blasco, Partridge,
Serrano, & Kroemer, 2013) and telomere length has been suggested to act
as a biomarker of past stress exposure (Chatelain, Drobniak, & Szulkin,
2020; Pepper, Bateson, & Nettle, 2018), phenotypic quality (Angelier,
Weimerskirch, Barbraud, & Chastel, 2019), future disease risk
(Fasching, 2018), survival probability (Wilbourn et al., 2018) and even
fitness prospects (Eastwood et al., 2019).
Making inferences about past stress exposure or predictions about future
long-term consequences based on a given telomere length requires that
past and future telomere length are reasonably correlated with the
current length. This correlation can be quantified as the
within-individual repeatability R (Nakagawa & Schielzeth, 2010).
The repeatability expresses the reproducible proportion of the total
variance among repeated measurements, while the non-repeatable
proportion consists of individual flexibility and measurement error
(Nakagawa & Schielzeth, 2010). Because telomere length is dynamic,i.e. it changes over time, repeated telomere measurements are not
expected to be perfectly repeatable even in the absence of any
measurement error. R is an important measure because it
quantifies the association between repeated telomere length measurements
and can vary from high (R ~ [0.5 - 1.0]; Fig.
1a) to moderate (R ~ [0.25 - 0.5] Fig. 1b) or
low (R ~ [0.0 - 0.25] Fig. 1c), even when the
overall population telomere shortening is set at a fixed level (Fig
1a-1c). Benetos et al. (2019) evaluated within-individual repeatability
of telomere length in humans to be high (R = [0.85-0.91], but
see Martens et al. (2021) for somewhat lower correlation estimates), but
the few studies that reported R of telomere length in other
species have provided more variable estimates ranging from 0.03 to 0.97
(Bichet et al., 2020; Boonekamp, Bauch, Mulder, & Verhulst, 2017;
Fairlie et al., 2016; Nettle et al., 2016; Pérez-Rodríguez et al., 2019;
Spurgin et al., 2018; van Lieshout et al., 2019).
Nonetheless, longitudinal studies
very rarely report within-individual repeatability, while such
information appears critical to evaluate the potential for telomere
length at a given time to make inferences about the past or the future
(Fig. 1).
While initial studies of telomeres were mostly cross-sectional and
measured telomere length only once per individual, the last decade has
been characterized by a marked increase in longitudinal studies
measuring telomere length at least twice from each individual. In such
longitudinal studies, telomeres are generally expected to shorten with
time/age, at least in most endotherm vertebrate species (i.e.mammals and birds; e.g. Stier, Reichert, Criscuolo, & Bize, 2015
for a review in non-mammalian vertebrates). Indeed, while the enzyme
telomerase enabling telomere elongation is mainly suppressed in somatic
tissues of adult birds and mammals, this is not the case in many
ectotherm vertebrate species (i.e. fish, amphibian and reptiles;
Gomes, Shay, & Wright, 2010), which could explain the diversity of
telomere dynamics observed in such taxa (Olsson, Wapstra, & Friesen,
2018; Simide, Angelier, Gaillard, & Stier, 2016). Yet, some
longitudinal studies in endotherms have also reported telomere
lengthening, which is suggested not to be explained by measurement error
alone (e.g. Spurgin et al., 2018; van Lieshout et al., 2019). The
increasing availability of longitudinal studies now enables to get a
general picture of the within-individual repeatability in telomere
length in a variety of species and gives the opportunity to identify the
factors that could explain variation in such an important parameter.
Here, we aim at providing, to the best of our knowledge, the first
meta-analysis of within-individual repeatability of telomere length and
factors affecting it by focusing on non-mammalian vertebrates. We
address several methodological and biological factors that could create
variation in the within-individual repeatability of telomere length. 1)
As commonly used quantitative PCR method (qPCR) in telomere length
measurement is more prone to measurement error than telomere restriction
fragment assay (TRF) (Aviv et al., 2011), qPCR studies can be expected
to have lower within-individual repeatability estimates (Nettle, Seeker,
Nussey, Froy, & Bateson, 2019; Nettle, Gadalla, Susser, Bateson, &
Aviv 2020 preprint). 2) Studies measuring telomere length with a long
time interval between subsequent sampling occasions are expected to have
lower within-individual repeatability than studies using samples taken
only a few days from each other, due to both inter-individual
differences in telomere shortening rate, and potentially due to slight
differences in sample handling protocols, such as storage method
(Reichert et al., 2017). 3) As most ectotherms can maintain telomerase
activity in adulthood, allowing the possibility for telomere restoration
(Gomes et al., 2010), they are expected to have lower within-individual
repeatability than endotherms that mainly suppress their telomerase
activity. 4) Fast rate of telomere shortening could potentially lower
the estimates of within-individual repeatability. Consequently,
juveniles are expected to have lower R than adults, as most of
the telomere shortening occurs in early life during growth (Spurgin et
al., 2018; Stier, Metcalfe, & Monaghan, 2020). 5) Similarly, as there
is evidence that telomeres shorten slower in long-lived wild species
than in short-lived species (Dantzer & Fletcher, 2015; Tricola et al.,
2018), species with short lifespan are expected to have lower
within-individual repeatability than species with long lifespans. 6)
Finally, the higher the between-individual variation in telomere
shortening rate is, due to, for example, environmental heterogeneity or
in sensitivities to the same stressor, the lower the within-individual
repeatability will be. Thus, studies on species living in the wild would
be expected to have lower R than species that have been living in
stable captive conditions, sometimes for generations. By testing the
importance of these factors, we aim to increase knowledge about
within-individual repeatability of telomere length and factors
potentially affecting it. This should help to assess the validity of
telomere length as a biomarker for past stress exposure and future
long-term costs in particular study systems. Additionally, it may help
researchers that wish to estimate any past experiences or long-term
costs based on given telomere length to design their research to aim at
high within-individual repeatability.
2. Material and Methods
2.1 Literature search and data collection
We performed literature searches (last search on September 30, 2019)
using Web of Science search engine and following search terms: “telome*
AND bird*”, “telome* AND reptile*”, “telome* AND ectotherm*”, and
“telomere dynamics”. We identified a total of 1292 records in these
searches (Fig. S1). In addition, we screened all the studies citing
Heidinger et al. (2012), and the reference list of Olsson et al. (2018)
to identify additional studies not found in the original searches (N =
6). We also included 4 unpublished datasets, of which one was provided
by M. Haussmann (Bucknell University) while others are authors’ own
unpublished data. After duplicate removal, 1005 records remained, and
their titles and abstracts were screened for eligibility. 124 full text
articles were assessed using our inclusion criteria. We included studies
that (1) used a non-mammalian (bird or ectotherm) vertebrate study
species, (2) measured telomere length at least twice, (3) had at least
one day between the telomere measurements, and (4) provided the raw data
online/upon request. We obtained the raw datasets as many published
articles do not report the within-individual repeatability in telomere
length, which enabled us to calculate the within-individual
repeatabilities in a standardized way. Thus, if the raw data were not
available online, we contacted the corresponding authors with a request
to provide us with the raw data or to run standardized analyses using an
R script that we provided. We chose not to include mammals for two main
reasons: 1) human studies are mostly outside of our eco-evolutionary
scope, and 2) longitudinal telomere measurements in mammals are almost
exclusively measured from white blood cells, and the natural changes in
white-blood cell composition (e.g. with season or age) have been
previously highlighted to seriously bias the estimation of telomere
length (Beaulieu, Benoit, Abaga, Kappeler, & Charpentier, 2017). In
non-mammalian vertebrates, longitudinal telomere measurements are almost
exclusively measured from nucleated red blood cells, which represent a
more homogenous population of blood cells (Stier et al. 2015). We found
71 studies that met our inclusion criteria. Additionally, we were able
to include three studies using data that we extracted from scatter plots
using the METADIGITIZE package in R (Pick, Nakagawa, & Noble, 2019),
adding up to 74 eligible studies (Fig. S1). We hence excluded 50 full
text articles for the following reasons: (i) In 27 studies, the data
were non-longitudinal; (ii) in 14 cases all or part of the data was used
in more than one publication, in which case we used the first
encountered article, or the most complete dataset; (iii) in seven cases
we were not able to obtain the raw data; and (iv) in two cases the data
were not comparable due to major methodological differences (Fig. S1).
From each eligible publication we recorded the following biological and
methodological factors: Taxon,
Species, Sample size, Number of telomere samples, Study length, Telomere
measurement method, Age class of individuals, and Environment (Fig. 2).
Additionally, we obtained maximum lifespan estimates for each species
from the AnAge database of animal ageing and longevity (Tacutu et
al., 2018). For three species there was no lifespan estimate available,
and we used the mean estimate for the genus. We obtained all other
predictor data directly from the studies. If present, individuals with
only one measurement were excluded from the datasets. If the number of
samples was unequal between individuals, we used an average number of
samples per study. Study length was determined as time between
successive telomere measurements. If there was variation in time between
different sampling points within one study, we used the average time
between the samples per study. Some datasets included data for different
levels of a categorical variable, for example, some datasets included
telomere measurements for both juvenile and adult individuals, or for
individuals that were sampled only as juveniles and others that were
sampled as both juveniles and adults. Because we have reason to believe
that within-individual telomere length repeatability might differ
between juveniles and adults due to distinct growth patterns
(i.e. fast growth vs essentially no growth at all), in these
cases we split the data into two or three according to the age class. In
two datasets we did a similar split when the same samples were measured
with both qPCR and TRF methods. One dataset included data for two
different species and we thus split the data into their respective
species. Therefore, we obtained 82 effect size estimates from 74 studies
(Table 1).
2.2 Statistical analyses
We carried out all analyses in R (v. 4.0.1) (R Core Team, 2020). We
performed our analyses following (Holtmann, Lagisz, & Nakagawa, 2017).
First, we checked the distribution of telomere length variable in all 82
datasets, and where needed, we transformed these data using log, square
root, or box-cox transformation to fulfill the assumptions of normality.
For each of these datasets, we estimated the within-individual
repeatability using a linear mixed model LMM approach (Nakagawa &
Schielzeth, 2010), i.e. repeatability is an intra-class
correlation coefficient that captures the between-individual variance
(by individual identity as a random intercept while controlling for the
time between successive measurements) relative to the total variance,
with the function ‘rpt’ of the package ‘rptR’ (Stoffel, Nakagawa, &
Schielzeth, 2017). For one study with repeatability <0.005, we
estimated the within-individual repeatability using ANOVA approach as
the LMM approach biases very low repeatability values upwards (Holtmann
et al., 2017; Nakagawa & Schielzeth, 2010). Confidence intervals (95%
CI) around the repeatability were estimated based on 1,000 bootstraps.
These 95% CI are similarly constrained between 0 and 1 thereby
underestimating the lower 95% CI of studies with low repeatabilities
(upper 95% CI were never close enough to 1 to be biased). To avoid this
bias, for studies with a lower 95% CI < 0.005, we took the
symmetry of the (bootstrapped) upper 95% CI using the standard error
and t-value of the t-distribution matching the study’s sample size (1.96
when number of individuals >100). Performing this
estimation for the whole data range confirmed that the bias emerges when
lower 95% CI < 0.005 (Fig. S2).
We performed mixed-model meta-analyses using general and generalized
linear models in R (R Core Team, 2020). In these models the
within-individual repeatability of telomere length is the dependent
variable, and we performed the analyses using two distributions. First,
we followed Holtmann et al. (2017) and standardized all the
repeatability estimates and their variance using Fisher’s
Z-transformation. This transformation renders repeatability estimates
close to a normal distribution and after normalizing the heavy tail
using the Lambert W x F transformation (Goerg, 2011), we could perform
all analyses assuming a normal error distribution. In this approach, we
weighed each study according to their Fisher-z-transformed variance.
Second, we performed all analyses on untransformed repeatability
estimates, acknowledging that within-individual repeatability is a
continuous distribution of a proportion with 2 categories (withinvs. between individual variance). As such, the repeatability
follows a beta distribution (Douma & Weedon, 2019; Ferrari &
Cribari-Neto, 2004). Hence, we also performed all analyses using a beta
distribution and logit link function and weighing each study according
to the inverse of their sample size. For both approaches, model
residuals fulfilled all assumptions, followed the quantiles of the used
distribution with variance homogeneity and without influential
datapoints, as checked with the functions ’influence’ and
‘testResiduals’ of the packages ‘influence.ME’ and ‘DHARMa’ (Hartig,
2019; Nieuwenhuis, te Grotenhuis, & Pelzer, 2012). Both approaches gave
consistent results and below we present the results of the first
approach using Fisher’s Z-transformation and a normal error
distribution. For the ease of interpretation, we back-transformed Z
values to effect size (intra-class correlation coefficient ICC) values
and their 95% CI following equation 6 in (Holtmann et al., 2017).
The models contained as fixed effects: (i) measurement method (qPCR or
TRF), (ii) study length (continuous variable), (iii) taxon (i.e.
ectotherm or endotherm), (iv) environment (captive, semi-wild or wild),
(v) age class (juveniles, adults or juveniles to adult) and (vi) species
maximum lifespan (continuous variable). We standardized continuous fixed
effects with a mean of 0 and a variance of 1. We analyzed whether there
were biases in predictor variables between measurement methods using
permutation tests with the function ‘independence_test’ of the package
‘coin’ (Hothorn, Van De Wiel, Hornik, & Zeileis, 2008) based on 10,000
permutations. We used χ2 tests to identify deviations
from 50/50. qPCR was used more than twice as often as TRF, accounting
for 70% (N=57) of the estimates and 76% (N=3757) of the individuals
(Fig. 2; χ2=5.4; p=0.02). There were no TRF
measurements in fish, but otherwise there was no taxon-specific or
system-specific bias between both methods (Table S1). Indeed, studies
from the wild were equally overrepresented in both methods, respectively
at 63% (N=36) and 70% (N=19) of the qPCR and TRF estimates (Z = 1.02;
p = 0.40). Species measured by qPCR lived somewhat shorter than those
measured by TRF, respectively with a median lifespan of 15.0 years (95%
CI: 7.4 - 36.8) and 20.3 years (95% CI : 6.0 - 34.4; Table S1), but
this difference was not statistically significant (Z = -1.29; p = 0.20).
Study length was somewhat shorter in qPCR than TRF studies with
respectively 3.3 months (95% CI: 0.6 - 128) and 10.0 months (95% CI:
0.8 -85; Table S1), but this difference was not statistically
significant (Z = -0.38; p = 0.70). When adjusting study length relative
to species lifespan the difference between qPCR and TRF became even
smaller at respectively 1.0% (95% CI: 0.13 - 19) and 3.0% (95% CI:
0.1 -15; Z = 0.05; p = 0.96). There was also no statistically
significant difference between both methods in monitored age classes
(juvenile versus adult or both; Table S1; Z = 1.59; p = 0.20). Hence, we
did not detect any species-specific bias in the data distribution
between TRF and qPCR measurement methods.
To account for the fact that several studies came from the same
laboratory (N = 22), we included laboratory identity as random
intercept. To account for multiple measurements of the same species (N =
42), we included species identity as a random intercept. Species are
however related by phylogeny (Fig. 3). We therefore also included
phylogeny as a random term in this analysis following Hadfield &
Nakagawa (2010) and de Villemereuil & Nakagawa (2014), in which the
phylogeny is captured in the variance-covariance matrix between species
in the mixed model. This model contains both species and phylogeny as
random intercepts because these terms capture different variances,
respectively the within-species variance, while the phylogeny accounts
for the relatedness between species (de Villemereuil & Nakagawa, 2014).
To identify whether there was a role of phylogeny in the
within-individual repeatability of telomere length, we estimated the
phylogenetic signal lambda (λ ), which is the ratio of the
variance explained by phylogeny relative to the total variance explained
by the model and hence its value ranges from 0 (no signal) to 1
(Freckleton, Harvey, & Pagel, 2002; Hadfield & Nakagawa, 2010). We
constructed a phylogeny of the 42 species in this study using the Open
Tree of Life (Hinchliff et al., 2015) with the package ‘rotl’
(Michonneau, Brown, & Winter, 2016). We set the branch lengths
following Grafen (1989) with the function ‘compute.brlen’ from the
package ‘ape’ (Paradis & Schliep, 2019).
We performed the mixed-model meta-analyses without and with phylogeny,
which gave consistent results (Table S2). Here we present the analyses
with phylogeny performed using a Bayesian approach with the function
‘brm’ from the package ‘brms’ (Bürkner, 2017), but note that the
conclusions were consistent with those based on a frequentist approach
with the functions ‘lmer’ and ‘gls’ of the packages ‘lme4’ and ‘nlme’
respectively (results not shown). For the Bayesian models, we used
weakly informative priors and ran 4 chains each with 1,500,000
iterations, a burn-in of 100,000 and a thinning of 250, resulting in a
posterior effective sample size of >2000 and an Rhat of 1,
which together with pareto-k-diagnostics (k<0.7), visual
inspection of the trace plots and potential scale reduction factor
showed that simulations had ran properly (Bürkner, 2017). We evaluated
the relative fits of the Bayesian model on the data using the
leave-one-out cross-validation (LOO) approach (Vehtari, Gelman, &
Gabry, 2017) and compared the models’ relative weight with the functions
‘loo’ and ‘model_weights ‘ of the package ‘brms’. In brief, a model’s
weight is an estimate of the probability that the model will make the
best predictions on new data, conditional on the alternative models
considered with the weights of all models adding up to 1. We determined
the statistical ‘significance’ of the fixed effects and random effects
based on their model fit (loo weights for Bayesian models) and the
overlap with 0 of the 95% CI of coefficients or variance estimates.
We assessed publication bias based on visual inspection of funnel plots
of ‘meta-analytic’ residuals of the model in Table 2 (Fig. S5), the
Egger’s test on the residuals of this model (Egger, Smith, Schneider, &
Minder, 1997; Nakagawa & Santos, 2012) and using the trim and fill
method (Duval & Tweedie, 2000) with the function ’trimfill’ in the
package ‘metafor’ (Viechtbauer, 2010). None of these approaches
indicated there was publication bias.
3. Results
We obtained 82 repeatability estimates from 74 studies on 42 species
measured in 22 laboratories based on a total of 4918 individuals.
Individuals were measured on average 2.3 times (95% CI: 2.0 – 3.0) and
were monitored for a median of 4.2 months (95% CI: 0.6 – 121) or 1.4
% of their lifespan (95% CI: 0.1 – 19.0). Birds were overrepresented
relative to reptiles or fish, accounting for 87% (N = 71, Fig. 2) of
the estimates and 90% (N = 4439) of the individuals. Studies on wild
systems were twice as abundant as studies on semi-wild or captive
systems, accounting for 67% (N = 55) of the estimates and 70% (N =
3422) of the individuals (Fig. 2).
The within-individual
repeatability of telomere length was overall moderate to high, with a
median value of R = 0.55. Yet, there was marked variation between
studies, as exemplified by the large 95% CI around median Rvalue from 0.05 until 0.95. There was no clear phylogenetic signal for
the repeatability of telomere length (λ R = 0.38; 95%CI:
0.00-0.85; Fig. S4A; λ model table 2 = 0.32; 95% CI: 0.00-0.83,
Fig. S4B), but phylogeny captured some variance (SD: 0.28; 95%CI:
0.02-0.70; Table 2) and a model with phylogeny (Fig. 3) was favored over
one without phylogeny, albeit moderately (respective loo-weights 0.57
vs. 0.43). As a control, we checked in the same dataset, the
phylogenetic signal of lifespan, which was moderate but statistically
significant (λ =0.38; 95%CI: 0.03-0.77; Fig. S3; Fig. S4C) and a
model with phylogeny was strongly favored over a model without phylogeny
(loo-weights of 0.83 vs. 0.17).
There was a statistically significant effect of telomere length
measurement method (Table 2): the median within-individual repeatability
of TRF-based studies was high at R = 0.80 (N = 25; 95% CI: 0.34
- 0.96; Fig. 4; Fig. 5B), while that of qPCR-based studies was only
almost half of that and more variable at R = 0.46 (N = 57; 95%
CI: 0.04 - 0.82; Fig. 4; Fig. 5B). Within-individual repeatability of
telomere length also decreased with the length of the study (β = -0.08;
-0.15 < 95% CI < -0.01; Table 2; Fig. 5E). Once
these methodological variables were accounted for, none of the
biological variables we tested (i.e. taxon, species lifespan,
environment, age class) had a statistically significant effect on the
repeatability of telomere length (Table 2; Fig. 5 A-G). There was a weak
negative association between the within-individual repeatability of
telomere length and species lifespan, but the 95% CI of this
coefficient overlapped with zero (β=-0.06; -0.16 < 95% CI
< 0.05; Table 2; Fig. 5F), indicating little statistical
support for an association between species lifespan and the
repeatability of telomere length.
4. Discussion
The within-individual repeatability (R) of measurements over time
is a key requirement to identify the dynamics of variables and the
factors driving these dynamics. Telomere length changes over time and
this change is known to be affected by various factors, e.g. pre-
and postnatal environmental conditions (Kärkkäinen, Teerikorpi, Schuett,
Stier, & Laaksonen, 2021; Stier et al., 2020; Stier, Metcalfe, &
Monaghan, 2020), stress exposure (Chatelain et al., 2020), and
reproductive effort (Reichert et al., 2014). Yet, to be informative of
future performances (e.g. remaining lifespan), telomere length
also needs to be repeatable to some extent. Here, we performed a
meta-analysis of the within-individual repeatability of telomere length
and investigated some biological and methodological variables that might
affect this repeatability. Overall, we found telomere length to be
relatively repeatable at R = 0.55 but the repeatability was
highly variable across studies varying from almost 0 to almost 1. The
repeatability was mainly driven by measurement method, with studies
using qPCR method showing a repeatability that was almost half and more
variable than those using TRF. The within-individual repeatability of
telomere length declined with the length of study and tended to decline
with species lifespan, although the latter was statistically
non-significant. Phylogeny explained a minor, but a statistically
significant part of the variance in within-individual repeatability. Any
other tested biological variable did not have a statistically
significant measurable effect on the repeatability estimates. Here we
discuss three major implications of our study.
First, while repeatabilities of physiological traits have been
investigated before, to the best of our knowledge this is the first
meta-analysis on the within-individual repeatability of telomere length.
The within-individual repeatability of glucocorticoid levels have been
reported to be about half of that of telomere length at R ≈ 0.3
(Schoenemann & Bonier, 2018; Taff, Schoenle, & Vitousek, 2018). This
is not surprising as repeatability is not expected to be really high for
labile traits (Bonier & Martin, 2016), and most arguably
within-individual hormone levels are more variable in time than
within-individual telomere length. The repeatabilities of metabolic
rates (basal, standard and resting) were more similar to our estimates
of the repeatability of telomere length: R ≈ 0.60-0.80 in Nespolo
& Franco (2007), R ≈ 0.45-0.55 in Holtmann et al. (2017),R ≈ 0.42-0.65 in Auer et al. (2016), R ≈ 0.40-0.50, but
mass-adjusted repeatability of R ≈ 0.30-0.40 in Briga & Verhulst
(2017). Phylogeny accounted for a large fraction of the variation in the
repeatability of metabolic rate between studies (Holtmann et al., 2017).
In contrast with metabolic rate, the within-individual repeatability of
telomere length was little affected by phylogeny or the biological
factors studied here. Contrary to our prediction, we found a negative
association between the within-individual repeatability of telomere
length and species lifespan, albeit not statistically significant. There
is some evidence that long-lived bird species maintain telomerase
activity throughout life (Haussmann, Winkler, Huntington, Nisbet, &
Vleck, 2007), which could explain this negative association. However,
the lack of statistically significant biological factors and the minor
role of phylogeny in explaining the variation in the within-individual
repeatability of telomere length indicate that current measurements of
telomere length are not able to detect much role of biology in the
repeatability of telomere length.
Second, methodological factors explained most of the variation in the
within-individual repeatability of telomere length. Consistent with our
expectation, the repeatability of telomere length declined with study
length. A similar association was also found for the repeatability of
metabolic rate (Auer et al., 2016; Briga & Verhulst, 2017). However,
most of the variation in the repeatability was captured by the
measurement method. Consistent with our expectation, studies using TRF
method to measure telomere length yielded higher repeatability estimates
than studies using qPCR. The difference most probably arises from how
the methods quantify telomere length. While TRF measures the terminal
telomere length by the use of gel electrophoresis, qPCR is based on PCR
and amplification of the target sequences, and measures the relative
telomere length by calculating the amount of telomeric sequence (T) in
the sample in relation to the amount of non-variable copy-gene sequence
(S); this magnifies measurement errors in telomere and/or single copy
gene reactions in the resulting T/S ratio (Nettle et al., 2019).
Consequently, it has been shown that measurement error alone can
decrease within-individual repeatability in telomere length in
longitudinal qPCR studies (Nettle et al., 2019), even if cross-sectional
accuracy is confirmed with TRF (Nettle et al. 2020 preprint). An
extensive discussion on measurement reliability between these methods
has been had previously elsewhere (Martin-Ruiz et al., 2015; Martin-Ruiz
et al., 2015; Verhulst et al., 2016, 2015). It is also important to note
that both qPCR and TRF might over-estimate the within-individual
repeatability of telomere length. Indeed, qPCR also includes the
interstitial telomeric sequences (ITS) in the measure of relative
telomere length. Amount of ITS can vary between individuals of the same
species but is not considered to change in time (Foote, Vleck, & Vleck,
2013), although this remains to be shown. This would artificially
increase the within-individual repeatability of telomere length if ITS
amount is indeed constant within an individual over time. Relatively
similarly, in addition to measuring the terminal telomeric sequences,
TRF also includes some amount of subtelomeric regions (Baird, 2005),
which, like ITS, can vary between individuals but are assumed to remain
stable within an individual over time. Thus, the occurrence of ITS and
of subtelomeric regions might inflate the repeatability of telomere
length, which might explain both the importance of measurement method
(effect size, Fig. 5G), and the large variation observed between qPCR
studies (i.e. ITS amount is very variable between species; (Foote
et al., 2013).
Third, given the importance of measurement method, there are a number of
methodological practices that are known to affect the quality and
repeatability of telomere length measurements, especially in qPCR
telomere measurements: differences in sample storage (Eastwood, Mulder,
Verhulst, & Peters, 2018; Reichert et al., 2017), DNA extraction method
(Dagnall et al., 2017; Seeker et al., 2016) and even the type of qPCR
master mix being used (Morinha, Magalhães, & Blanco, 2020a).
Furthermore, while DNA integrity is widely known to be important in TRF,
it is traditionally thought to be less crucial in qPCR (Aviv et al.,
2011). However, recent evidence suggests that DNA degradation can either
increase or decrease telomere length measured with qPCR (Ropio et al.,
2020; Tolios, Teupser, & Holdt, 2015). Currently, DNA integrity is
rarely assessed before performing qPCR analyses, and standard agarose
gel electrophoresis might be insufficient to assess DNA integrity for
qPCR telomere length measurement (AS, pers. obs. ). Practices
during the analytical phase can also potentially affect the
repeatability of telomere measurements. In longitudinal studies,
especially if using long-term data, data are often analyzed in batches
and/or clusters. Failure to consider the sample structure among the
batches and clusters can create variation from which it is impossible to
separate the biological variation from the confounding between
batch/cluster variation (van Lieshout et al., 2020). When analyzing
longitudinal samples, the samples from the same individual are often
analyzed on the same plate/gel to increase statistical power to detect
within-individual effects. However, doing so, but not controlling for
the plate/gel effect, can inadvertently increase the within-individual
repeatability, while analyzing the samples from the same individual on
different plates/gels will often decrease the within-individual
repeatability. Thus, there are a number of methodological practices that
need to be taken into account to improve the repeatability of telomere
length in both qPCR and TRF methods. Quantifying the relative importance
of these practices is beyond the scope of this meta-analysis but
warrants further investigation.
To conclude, telomere length is increasingly used as a biomarker for
past stress exposure (Chatelain et al., 2020) and future performance
(Eastwood et al., 2019; Heidinger et al., 2012). However, for telomere
length to be truly informative about the past or the future, telomere
length requires a reasonable within-individual repeatability over the
life course. For instance, inferring any long-lasting effects of
early-life environmental conditions or predicting future survival
probability when the within-individual repeatability is virtually close
to 0 (in ca. 23% of qPCR studies) is likely to lead to spurious
conclusions. Similarly, repeatability often sets an upper limit to
heritability (Falconer and Mackay, 1996; Lynch and Walsh, 1997, but see
Dohm, 2002) and low repeatability will conceal the heritable component
of telomere length. Accordingly, heritability estimates are usually
higher for TRF than qPCR studies (Bauch, Boonekamp, Korsten, Mulder, &
Verhulst, 2020), which is in line with the strong method effect found
here for within-individual repeatability. Our study indeed indicates
that the within-individual repeatability of telomere length is mainly
driven by telomere measurement method, with the repeatability being
significantly lower with qPCR than with TRF. Unfortunately, the majority
of the longitudinal telomere studies to date have used qPCR, which may
partly mask the role of biological variation in telomere length and
dynamics. It might be worth reassessing whether biological factors drive
within-individual repeatability in telomere length when there will be
more TRF studies and/or when the repeatability of qPCR has improved.
Meanwhile, we encourage scientists to design their research and
laboratory practices to aim at high repeatability by reducing factors
that potentially create variation in telomere length measurements. It is
noteworthy, that while TRF studies regularly showed higher repeatability
estimates than qPCR studies, it is also possible to achieve high
repeatability using qPCR. In qPCR studies it is important to pay
particular attention to both sample handling and storage and optimizing
the qPCR protocol (see Morinha, Magalhães, & Blanco, 2020b for specific
guidelines). We particularly encourage scientists to estimate and report
the within-individual repeatability R in their study systems.
Repeatabilities are driven by measurement error and/or biological
variables and to decompose the contribution of both components, studies
should report the within-individual repeatability of all longitudinal
samples along with the technical repeatability of their methodology
(e.g. based on the repeated measurement of the same samples).
This should capture at least part of the loss of repeatability due to
measurement error. It would thus allow the estimation of a
biology-driven repeatability and hence testing some of the predictions
we proposed in this study. Longitudinal studies still rarely report
within-individual repeatability of telomere length, while it can be a
key statistic for the interpretation of both the reliability of the
methodology and the biology driving the dynamics of telomere length and
more generally individual traits.