Within-individual repeatability in telomere length: a meta-analysis in non-mammalian vertebrates
Running title: Within-individual repeatability in TL
Tiia Kärkkäinena*$, Michael Brigaa*, Toni Laaksonena, Antoine Stiera$
aDepartment of Biology, University of Turku, Turku, Finland
* Equal contribution
$ Corresponding authors:tmakark@gmail.com /antoine.stier@gmail.com
0. Abstract
Telomere length is increasingly used as a biomarker of long-term life history costs, ageing and future survival prospects. Yet, to have the potential to predict long-term outcomes, telomere length should exhibit a relatively high within-individual repeatability over time, which has been largely overlooked in past studies. To fill this gap, we conducted a meta-analysis on 74 studies reporting longitudinal telomere length assessment in non-mammalian vertebrates, with the aim to establish the current pattern of within-individual repeatability in telomere length and to identify the methodological (e.g. qPCR/TRF, study length) and biological factors (e.g. taxon, wild/captive, age class, species lifespan, phylogeny) that may affect it. While the median within-individual repeatability of telomere length was moderate to high (R = 0.55; 95% CI: 0.05-0.95; N = 82), marked heterogeneity between studies was evident. Measurement method affected strongly repeatability estimate, with TRF-based studies exhibiting high repeatability (R = 0.80; 95% CI: 0.34-0.96; N = 25), while repeatability of qPCR-based studies was only half of that and more variable (R = 0.46; 95% CI: 0.04-0.82; N = 57). While phylogeny explained some variance in repeatability, phylogenetic signal was not significant (λ = 0.32; 95% CI: 0.00-0.83). None of the biological factors investigated here had a statistically significant association with the repeatability of telomere length, being potentially obscured by methodological noise. Our meta-analysis highlights the need to carefully evaluate and consider within-individual repeatability in telomere studies to ensure the robustness of using telomere length as a biomarker of long-term survival and fitness prospects.
Keywords : ageing biomarker, qPCR, TRF, lifespan, phylogeny
1. Introduction
Telomeres are highly conserved repetitive sequences of non-coding DNA that cap the ends of linear chromosomes of eukaryotic species and contribute to genomic integrity maintenance (Blackburn, 1991). Telomeres shorten with every cell division due to the end replication problem (inability of DNA polymerase to copy terminal DNA) (Levy, Allsopp, Futcher, Greider, & Harley, 1992). Additionally, telomere shortening can be accentuated by cellular stressors, such as oxidative stress (Reichert & Stier, 2017; von Zglinicki, 2002) or substantially increased energy demands (Casagrande & Hau, 2019; Ludlow et al., 2008). When telomeres reach critically short length, they induce cell senescence, apoptosis, or genomic instability, which in turn contribute to ageing phenotypes (Campisi, 2005). Short telomeres have been associated with increased risks of developing degenerative diseases in humans (e.g. cardiovascular and Alzheimer diseases), while long telomeres could increase the risk of neoplastic diseases (Aviv & Shay, 2018). Yet, short telomeres have been associated with increased mortality risk in both humans and non-model vertebrates (Arbeev et al., 2020; Boonekamp, Simons, Hemerik, & Verhulst, 2013; Wilbourn et al., 2018). While a causal role of telomeres in organismal ageing has been questioned (Simons, 2015; Young, 2018), some recent evidence suggests that experimentally increasing telomere length could extend lifespan in laboratory mice (Muñoz-Lorente, Cano-Martin, & Blasco, 2019). Irrespective of causality controversies, telomere shortening is considered to be a hallmark of ageing (López-Otín, Blasco, Partridge, Serrano, & Kroemer, 2013) and telomere length has been suggested to act as a biomarker of past stress exposure (Chatelain, Drobniak, & Szulkin, 2020; Pepper, Bateson, & Nettle, 2018), phenotypic quality (Angelier, Weimerskirch, Barbraud, & Chastel, 2019), future disease risk (Fasching, 2018), survival probability (Wilbourn et al., 2018) and even fitness prospects (Eastwood et al., 2019).
Making inferences about past stress exposure or predictions about future long-term consequences based on a given telomere length requires that past and future telomere length are reasonably correlated with the current length. This correlation can be quantified as the within-individual repeatability R (Nakagawa & Schielzeth, 2010). The repeatability expresses the reproducible proportion of the total variance among repeated measurements, while the non-repeatable proportion consists of individual flexibility and measurement error (Nakagawa & Schielzeth, 2010). Because telomere length is dynamic,i.e. it changes over time, repeated telomere measurements are not expected to be perfectly repeatable even in the absence of any measurement error. R is an important measure because it quantifies the association between repeated telomere length measurements and can vary from high (R ~ [0.5 - 1.0]; Fig. 1a) to moderate (R ~ [0.25 - 0.5] Fig. 1b) or low (R ~ [0.0 - 0.25] Fig. 1c), even when the overall population telomere shortening is set at a fixed level (Fig 1a-1c). Benetos et al. (2019) evaluated within-individual repeatability of telomere length in humans to be high (R = [0.85-0.91], but see Martens et al. (2021) for somewhat lower correlation estimates), but the few studies that reported R of telomere length in other species have provided more variable estimates ranging from 0.03 to 0.97 (Bichet et al., 2020; Boonekamp, Bauch, Mulder, & Verhulst, 2017; Fairlie et al., 2016; Nettle et al., 2016; Pérez-Rodríguez et al., 2019; Spurgin et al., 2018; van Lieshout et al., 2019). Nonetheless, longitudinal studies very rarely report within-individual repeatability, while such information appears critical to evaluate the potential for telomere length at a given time to make inferences about the past or the future (Fig. 1).
While initial studies of telomeres were mostly cross-sectional and measured telomere length only once per individual, the last decade has been characterized by a marked increase in longitudinal studies measuring telomere length at least twice from each individual. In such longitudinal studies, telomeres are generally expected to shorten with time/age, at least in most endotherm vertebrate species (i.e.mammals and birds; e.g. Stier, Reichert, Criscuolo, & Bize, 2015 for a review in non-mammalian vertebrates). Indeed, while the enzyme telomerase enabling telomere elongation is mainly suppressed in somatic tissues of adult birds and mammals, this is not the case in many ectotherm vertebrate species (i.e. fish, amphibian and reptiles; Gomes, Shay, & Wright, 2010), which could explain the diversity of telomere dynamics observed in such taxa (Olsson, Wapstra, & Friesen, 2018; Simide, Angelier, Gaillard, & Stier, 2016). Yet, some longitudinal studies in endotherms have also reported telomere lengthening, which is suggested not to be explained by measurement error alone (e.g. Spurgin et al., 2018; van Lieshout et al., 2019). The increasing availability of longitudinal studies now enables to get a general picture of the within-individual repeatability in telomere length in a variety of species and gives the opportunity to identify the factors that could explain variation in such an important parameter.
Here, we aim at providing, to the best of our knowledge, the first meta-analysis of within-individual repeatability of telomere length and factors affecting it by focusing on non-mammalian vertebrates. We address several methodological and biological factors that could create variation in the within-individual repeatability of telomere length. 1) As commonly used quantitative PCR method (qPCR) in telomere length measurement is more prone to measurement error than telomere restriction fragment assay (TRF) (Aviv et al., 2011), qPCR studies can be expected to have lower within-individual repeatability estimates (Nettle, Seeker, Nussey, Froy, & Bateson, 2019; Nettle, Gadalla, Susser, Bateson, & Aviv 2020 preprint). 2) Studies measuring telomere length with a long time interval between subsequent sampling occasions are expected to have lower within-individual repeatability than studies using samples taken only a few days from each other, due to both inter-individual differences in telomere shortening rate, and potentially due to slight differences in sample handling protocols, such as storage method (Reichert et al., 2017). 3) As most ectotherms can maintain telomerase activity in adulthood, allowing the possibility for telomere restoration (Gomes et al., 2010), they are expected to have lower within-individual repeatability than endotherms that mainly suppress their telomerase activity. 4) Fast rate of telomere shortening could potentially lower the estimates of within-individual repeatability. Consequently, juveniles are expected to have lower R than adults, as most of the telomere shortening occurs in early life during growth (Spurgin et al., 2018; Stier, Metcalfe, & Monaghan, 2020). 5) Similarly, as there is evidence that telomeres shorten slower in long-lived wild species than in short-lived species (Dantzer & Fletcher, 2015; Tricola et al., 2018), species with short lifespan are expected to have lower within-individual repeatability than species with long lifespans. 6) Finally, the higher the between-individual variation in telomere shortening rate is, due to, for example, environmental heterogeneity or in sensitivities to the same stressor, the lower the within-individual repeatability will be. Thus, studies on species living in the wild would be expected to have lower R than species that have been living in stable captive conditions, sometimes for generations. By testing the importance of these factors, we aim to increase knowledge about within-individual repeatability of telomere length and factors potentially affecting it. This should help to assess the validity of telomere length as a biomarker for past stress exposure and future long-term costs in particular study systems. Additionally, it may help researchers that wish to estimate any past experiences or long-term costs based on given telomere length to design their research to aim at high within-individual repeatability.
2. Material and Methods
2.1 Literature search and data collection
We performed literature searches (last search on September 30, 2019) using Web of Science search engine and following search terms: “telome* AND bird*”, “telome* AND reptile*”, “telome* AND ectotherm*”, and “telomere dynamics”. We identified a total of 1292 records in these searches (Fig. S1). In addition, we screened all the studies citing Heidinger et al. (2012), and the reference list of Olsson et al. (2018) to identify additional studies not found in the original searches (N = 6). We also included 4 unpublished datasets, of which one was provided by M. Haussmann (Bucknell University) while others are authors’ own unpublished data. After duplicate removal, 1005 records remained, and their titles and abstracts were screened for eligibility. 124 full text articles were assessed using our inclusion criteria. We included studies that (1) used a non-mammalian (bird or ectotherm) vertebrate study species, (2) measured telomere length at least twice, (3) had at least one day between the telomere measurements, and (4) provided the raw data online/upon request. We obtained the raw datasets as many published articles do not report the within-individual repeatability in telomere length, which enabled us to calculate the within-individual repeatabilities in a standardized way. Thus, if the raw data were not available online, we contacted the corresponding authors with a request to provide us with the raw data or to run standardized analyses using an R script that we provided. We chose not to include mammals for two main reasons: 1) human studies are mostly outside of our eco-evolutionary scope, and 2) longitudinal telomere measurements in mammals are almost exclusively measured from white blood cells, and the natural changes in white-blood cell composition (e.g. with season or age) have been previously highlighted to seriously bias the estimation of telomere length (Beaulieu, Benoit, Abaga, Kappeler, & Charpentier, 2017). In non-mammalian vertebrates, longitudinal telomere measurements are almost exclusively measured from nucleated red blood cells, which represent a more homogenous population of blood cells (Stier et al. 2015). We found 71 studies that met our inclusion criteria. Additionally, we were able to include three studies using data that we extracted from scatter plots using the METADIGITIZE package in R (Pick, Nakagawa, & Noble, 2019), adding up to 74 eligible studies (Fig. S1). We hence excluded 50 full text articles for the following reasons: (i) In 27 studies, the data were non-longitudinal; (ii) in 14 cases all or part of the data was used in more than one publication, in which case we used the first encountered article, or the most complete dataset; (iii) in seven cases we were not able to obtain the raw data; and (iv) in two cases the data were not comparable due to major methodological differences (Fig. S1).
From each eligible publication we recorded the following biological and methodological factors: Taxon, Species, Sample size, Number of telomere samples, Study length, Telomere measurement method, Age class of individuals, and Environment (Fig. 2). Additionally, we obtained maximum lifespan estimates for each species from the AnAge database of animal ageing and longevity (Tacutu et al., 2018). For three species there was no lifespan estimate available, and we used the mean estimate for the genus. We obtained all other predictor data directly from the studies. If present, individuals with only one measurement were excluded from the datasets. If the number of samples was unequal between individuals, we used an average number of samples per study. Study length was determined as time between successive telomere measurements. If there was variation in time between different sampling points within one study, we used the average time between the samples per study. Some datasets included data for different levels of a categorical variable, for example, some datasets included telomere measurements for both juvenile and adult individuals, or for individuals that were sampled only as juveniles and others that were sampled as both juveniles and adults. Because we have reason to believe that within-individual telomere length repeatability might differ between juveniles and adults due to distinct growth patterns (i.e. fast growth vs essentially no growth at all), in these cases we split the data into two or three according to the age class. In two datasets we did a similar split when the same samples were measured with both qPCR and TRF methods. One dataset included data for two different species and we thus split the data into their respective species. Therefore, we obtained 82 effect size estimates from 74 studies (Table 1).
2.2 Statistical analyses
We carried out all analyses in R (v. 4.0.1) (R Core Team, 2020). We performed our analyses following (Holtmann, Lagisz, & Nakagawa, 2017). First, we checked the distribution of telomere length variable in all 82 datasets, and where needed, we transformed these data using log, square root, or box-cox transformation to fulfill the assumptions of normality. For each of these datasets, we estimated the within-individual repeatability using a linear mixed model LMM approach (Nakagawa & Schielzeth, 2010), i.e. repeatability is an intra-class correlation coefficient that captures the between-individual variance (by individual identity as a random intercept while controlling for the time between successive measurements) relative to the total variance, with the function ‘rpt’ of the package ‘rptR’ (Stoffel, Nakagawa, & Schielzeth, 2017). For one study with repeatability <0.005, we estimated the within-individual repeatability using ANOVA approach as the LMM approach biases very low repeatability values upwards (Holtmann et al., 2017; Nakagawa & Schielzeth, 2010). Confidence intervals (95% CI) around the repeatability were estimated based on 1,000 bootstraps. These 95% CI are similarly constrained between 0 and 1 thereby underestimating the lower 95% CI of studies with low repeatabilities (upper 95% CI were never close enough to 1 to be biased). To avoid this bias, for studies with a lower 95% CI < 0.005, we took the symmetry of the (bootstrapped) upper 95% CI using the standard error and t-value of the t-distribution matching the study’s sample size (1.96 when number of individuals >100). Performing this estimation for the whole data range confirmed that the bias emerges when lower 95% CI < 0.005 (Fig. S2).
We performed mixed-model meta-analyses using general and generalized linear models in R (R Core Team, 2020). In these models the within-individual repeatability of telomere length is the dependent variable, and we performed the analyses using two distributions. First, we followed Holtmann et al. (2017) and standardized all the repeatability estimates and their variance using Fisher’s Z-transformation. This transformation renders repeatability estimates close to a normal distribution and after normalizing the heavy tail using the Lambert W x F transformation (Goerg, 2011), we could perform all analyses assuming a normal error distribution. In this approach, we weighed each study according to their Fisher-z-transformed variance. Second, we performed all analyses on untransformed repeatability estimates, acknowledging that within-individual repeatability is a continuous distribution of a proportion with 2 categories (withinvs. between individual variance). As such, the repeatability follows a beta distribution (Douma & Weedon, 2019; Ferrari & Cribari-Neto, 2004). Hence, we also performed all analyses using a beta distribution and logit link function and weighing each study according to the inverse of their sample size. For both approaches, model residuals fulfilled all assumptions, followed the quantiles of the used distribution with variance homogeneity and without influential datapoints, as checked with the functions ’influence’ and ‘testResiduals’ of the packages ‘influence.ME’ and ‘DHARMa’ (Hartig, 2019; Nieuwenhuis, te Grotenhuis, & Pelzer, 2012). Both approaches gave consistent results and below we present the results of the first approach using Fisher’s Z-transformation and a normal error distribution. For the ease of interpretation, we back-transformed Z values to effect size (intra-class correlation coefficient ICC) values and their 95% CI following equation 6 in (Holtmann et al., 2017).
The models contained as fixed effects: (i) measurement method (qPCR or TRF), (ii) study length (continuous variable), (iii) taxon (i.e. ectotherm or endotherm), (iv) environment (captive, semi-wild or wild), (v) age class (juveniles, adults or juveniles to adult) and (vi) species maximum lifespan (continuous variable). We standardized continuous fixed effects with a mean of 0 and a variance of 1. We analyzed whether there were biases in predictor variables between measurement methods using permutation tests with the function ‘independence_test’ of the package ‘coin’ (Hothorn, Van De Wiel, Hornik, & Zeileis, 2008) based on 10,000 permutations. We used χ2 tests to identify deviations from 50/50. qPCR was used more than twice as often as TRF, accounting for 70% (N=57) of the estimates and 76% (N=3757) of the individuals (Fig. 2; χ2=5.4; p=0.02). There were no TRF measurements in fish, but otherwise there was no taxon-specific or system-specific bias between both methods (Table S1). Indeed, studies from the wild were equally overrepresented in both methods, respectively at 63% (N=36) and 70% (N=19) of the qPCR and TRF estimates (Z = 1.02; p = 0.40). Species measured by qPCR lived somewhat shorter than those measured by TRF, respectively with a median lifespan of 15.0 years (95% CI: 7.4 - 36.8) and 20.3 years (95% CI : 6.0 - 34.4; Table S1), but this difference was not statistically significant (Z = -1.29; p = 0.20). Study length was somewhat shorter in qPCR than TRF studies with respectively 3.3 months (95% CI: 0.6 - 128) and 10.0 months (95% CI: 0.8 -85; Table S1), but this difference was not statistically significant (Z = -0.38; p = 0.70). When adjusting study length relative to species lifespan the difference between qPCR and TRF became even smaller at respectively 1.0% (95% CI: 0.13 - 19) and 3.0% (95% CI: 0.1 -15; Z = 0.05; p = 0.96). There was also no statistically significant difference between both methods in monitored age classes (juvenile versus adult or both; Table S1; Z = 1.59; p = 0.20). Hence, we did not detect any species-specific bias in the data distribution between TRF and qPCR measurement methods.
To account for the fact that several studies came from the same laboratory (N = 22), we included laboratory identity as random intercept. To account for multiple measurements of the same species (N = 42), we included species identity as a random intercept. Species are however related by phylogeny (Fig. 3). We therefore also included phylogeny as a random term in this analysis following Hadfield & Nakagawa (2010) and de Villemereuil & Nakagawa (2014), in which the phylogeny is captured in the variance-covariance matrix between species in the mixed model. This model contains both species and phylogeny as random intercepts because these terms capture different variances, respectively the within-species variance, while the phylogeny accounts for the relatedness between species (de Villemereuil & Nakagawa, 2014). To identify whether there was a role of phylogeny in the within-individual repeatability of telomere length, we estimated the phylogenetic signal lambda (λ ), which is the ratio of the variance explained by phylogeny relative to the total variance explained by the model and hence its value ranges from 0 (no signal) to 1 (Freckleton, Harvey, & Pagel, 2002; Hadfield & Nakagawa, 2010). We constructed a phylogeny of the 42 species in this study using the Open Tree of Life (Hinchliff et al., 2015) with the package ‘rotl’ (Michonneau, Brown, & Winter, 2016). We set the branch lengths following Grafen (1989) with the function ‘compute.brlen’ from the package ‘ape’ (Paradis & Schliep, 2019).
We performed the mixed-model meta-analyses without and with phylogeny, which gave consistent results (Table S2). Here we present the analyses with phylogeny performed using a Bayesian approach with the function ‘brm’ from the package ‘brms’ (Bürkner, 2017), but note that the conclusions were consistent with those based on a frequentist approach with the functions ‘lmer’ and ‘gls’ of the packages ‘lme4’ and ‘nlme’ respectively (results not shown). For the Bayesian models, we used weakly informative priors and ran 4 chains each with 1,500,000 iterations, a burn-in of 100,000 and a thinning of 250, resulting in a posterior effective sample size of >2000 and an Rhat of 1, which together with pareto-k-diagnostics (k<0.7), visual inspection of the trace plots and potential scale reduction factor showed that simulations had ran properly (Bürkner, 2017). We evaluated the relative fits of the Bayesian model on the data using the leave-one-out cross-validation (LOO) approach (Vehtari, Gelman, & Gabry, 2017) and compared the models’ relative weight with the functions ‘loo’ and ‘model_weights ‘ of the package ‘brms’. In brief, a model’s weight is an estimate of the probability that the model will make the best predictions on new data, conditional on the alternative models considered with the weights of all models adding up to 1. We determined the statistical ‘significance’ of the fixed effects and random effects based on their model fit (loo weights for Bayesian models) and the overlap with 0 of the 95% CI of coefficients or variance estimates.
We assessed publication bias based on visual inspection of funnel plots of ‘meta-analytic’ residuals of the model in Table 2 (Fig. S5), the Egger’s test on the residuals of this model (Egger, Smith, Schneider, & Minder, 1997; Nakagawa & Santos, 2012) and using the trim and fill method (Duval & Tweedie, 2000) with the function ’trimfill’ in the package ‘metafor’ (Viechtbauer, 2010). None of these approaches indicated there was publication bias.
3. Results
We obtained 82 repeatability estimates from 74 studies on 42 species measured in 22 laboratories based on a total of 4918 individuals. Individuals were measured on average 2.3 times (95% CI: 2.0 – 3.0) and were monitored for a median of 4.2 months (95% CI: 0.6 – 121) or 1.4 % of their lifespan (95% CI: 0.1 – 19.0). Birds were overrepresented relative to reptiles or fish, accounting for 87% (N = 71, Fig. 2) of the estimates and 90% (N = 4439) of the individuals. Studies on wild systems were twice as abundant as studies on semi-wild or captive systems, accounting for 67% (N = 55) of the estimates and 70% (N = 3422) of the individuals (Fig. 2).
The within-individual repeatability of telomere length was overall moderate to high, with a median value of R = 0.55. Yet, there was marked variation between studies, as exemplified by the large 95% CI around median Rvalue from 0.05 until 0.95. There was no clear phylogenetic signal for the repeatability of telomere length (λ R = 0.38; 95%CI: 0.00-0.85; Fig. S4A; λ model table 2 = 0.32; 95% CI: 0.00-0.83, Fig. S4B), but phylogeny captured some variance (SD: 0.28; 95%CI: 0.02-0.70; Table 2) and a model with phylogeny (Fig. 3) was favored over one without phylogeny, albeit moderately (respective loo-weights 0.57 vs. 0.43). As a control, we checked in the same dataset, the phylogenetic signal of lifespan, which was moderate but statistically significant (λ =0.38; 95%CI: 0.03-0.77; Fig. S3; Fig. S4C) and a model with phylogeny was strongly favored over a model without phylogeny (loo-weights of 0.83 vs. 0.17).
There was a statistically significant effect of telomere length measurement method (Table 2): the median within-individual repeatability of TRF-based studies was high at R = 0.80 (N = 25; 95% CI: 0.34 - 0.96; Fig. 4; Fig. 5B), while that of qPCR-based studies was only almost half of that and more variable at R = 0.46 (N = 57; 95% CI: 0.04 - 0.82; Fig. 4; Fig. 5B). Within-individual repeatability of telomere length also decreased with the length of the study (β = -0.08; -0.15 < 95% CI < -0.01; Table 2; Fig. 5E). Once these methodological variables were accounted for, none of the biological variables we tested (i.e. taxon, species lifespan, environment, age class) had a statistically significant effect on the repeatability of telomere length (Table 2; Fig. 5 A-G). There was a weak negative association between the within-individual repeatability of telomere length and species lifespan, but the 95% CI of this coefficient overlapped with zero (β=-0.06; -0.16 < 95% CI < 0.05; Table 2; Fig. 5F), indicating little statistical support for an association between species lifespan and the repeatability of telomere length.
4. Discussion
The within-individual repeatability (R) of measurements over time is a key requirement to identify the dynamics of variables and the factors driving these dynamics. Telomere length changes over time and this change is known to be affected by various factors, e.g. pre- and postnatal environmental conditions (Kärkkäinen, Teerikorpi, Schuett, Stier, & Laaksonen, 2021; Stier et al., 2020; Stier, Metcalfe, & Monaghan, 2020), stress exposure (Chatelain et al., 2020), and reproductive effort (Reichert et al., 2014). Yet, to be informative of future performances (e.g. remaining lifespan), telomere length also needs to be repeatable to some extent. Here, we performed a meta-analysis of the within-individual repeatability of telomere length and investigated some biological and methodological variables that might affect this repeatability. Overall, we found telomere length to be relatively repeatable at R = 0.55 but the repeatability was highly variable across studies varying from almost 0 to almost 1. The repeatability was mainly driven by measurement method, with studies using qPCR method showing a repeatability that was almost half and more variable than those using TRF. The within-individual repeatability of telomere length declined with the length of study and tended to decline with species lifespan, although the latter was statistically non-significant. Phylogeny explained a minor, but a statistically significant part of the variance in within-individual repeatability. Any other tested biological variable did not have a statistically significant measurable effect on the repeatability estimates. Here we discuss three major implications of our study.
First, while repeatabilities of physiological traits have been investigated before, to the best of our knowledge this is the first meta-analysis on the within-individual repeatability of telomere length. The within-individual repeatability of glucocorticoid levels have been reported to be about half of that of telomere length at R ≈ 0.3 (Schoenemann & Bonier, 2018; Taff, Schoenle, & Vitousek, 2018). This is not surprising as repeatability is not expected to be really high for labile traits (Bonier & Martin, 2016), and most arguably within-individual hormone levels are more variable in time than within-individual telomere length. The repeatabilities of metabolic rates (basal, standard and resting) were more similar to our estimates of the repeatability of telomere length: R ≈ 0.60-0.80 in Nespolo & Franco (2007), R ≈ 0.45-0.55 in Holtmann et al. (2017),R ≈ 0.42-0.65 in Auer et al. (2016), R ≈ 0.40-0.50, but mass-adjusted repeatability of R ≈ 0.30-0.40 in Briga & Verhulst (2017). Phylogeny accounted for a large fraction of the variation in the repeatability of metabolic rate between studies (Holtmann et al., 2017). In contrast with metabolic rate, the within-individual repeatability of telomere length was little affected by phylogeny or the biological factors studied here. Contrary to our prediction, we found a negative association between the within-individual repeatability of telomere length and species lifespan, albeit not statistically significant. There is some evidence that long-lived bird species maintain telomerase activity throughout life (Haussmann, Winkler, Huntington, Nisbet, & Vleck, 2007), which could explain this negative association. However, the lack of statistically significant biological factors and the minor role of phylogeny in explaining the variation in the within-individual repeatability of telomere length indicate that current measurements of telomere length are not able to detect much role of biology in the repeatability of telomere length.
Second, methodological factors explained most of the variation in the within-individual repeatability of telomere length. Consistent with our expectation, the repeatability of telomere length declined with study length. A similar association was also found for the repeatability of metabolic rate (Auer et al., 2016; Briga & Verhulst, 2017). However, most of the variation in the repeatability was captured by the measurement method. Consistent with our expectation, studies using TRF method to measure telomere length yielded higher repeatability estimates than studies using qPCR. The difference most probably arises from how the methods quantify telomere length. While TRF measures the terminal telomere length by the use of gel electrophoresis, qPCR is based on PCR and amplification of the target sequences, and measures the relative telomere length by calculating the amount of telomeric sequence (T) in the sample in relation to the amount of non-variable copy-gene sequence (S); this magnifies measurement errors in telomere and/or single copy gene reactions in the resulting T/S ratio (Nettle et al., 2019). Consequently, it has been shown that measurement error alone can decrease within-individual repeatability in telomere length in longitudinal qPCR studies (Nettle et al., 2019), even if cross-sectional accuracy is confirmed with TRF (Nettle et al. 2020 preprint). An extensive discussion on measurement reliability between these methods has been had previously elsewhere (Martin-Ruiz et al., 2015; Martin-Ruiz et al., 2015; Verhulst et al., 2016, 2015). It is also important to note that both qPCR and TRF might over-estimate the within-individual repeatability of telomere length. Indeed, qPCR also includes the interstitial telomeric sequences (ITS) in the measure of relative telomere length. Amount of ITS can vary between individuals of the same species but is not considered to change in time (Foote, Vleck, & Vleck, 2013), although this remains to be shown. This would artificially increase the within-individual repeatability of telomere length if ITS amount is indeed constant within an individual over time. Relatively similarly, in addition to measuring the terminal telomeric sequences, TRF also includes some amount of subtelomeric regions (Baird, 2005), which, like ITS, can vary between individuals but are assumed to remain stable within an individual over time. Thus, the occurrence of ITS and of subtelomeric regions might inflate the repeatability of telomere length, which might explain both the importance of measurement method (effect size, Fig. 5G), and the large variation observed between qPCR studies (i.e. ITS amount is very variable between species; (Foote et al., 2013).
Third, given the importance of measurement method, there are a number of methodological practices that are known to affect the quality and repeatability of telomere length measurements, especially in qPCR telomere measurements: differences in sample storage (Eastwood, Mulder, Verhulst, & Peters, 2018; Reichert et al., 2017), DNA extraction method (Dagnall et al., 2017; Seeker et al., 2016) and even the type of qPCR master mix being used (Morinha, Magalhães, & Blanco, 2020a). Furthermore, while DNA integrity is widely known to be important in TRF, it is traditionally thought to be less crucial in qPCR (Aviv et al., 2011). However, recent evidence suggests that DNA degradation can either increase or decrease telomere length measured with qPCR (Ropio et al., 2020; Tolios, Teupser, & Holdt, 2015). Currently, DNA integrity is rarely assessed before performing qPCR analyses, and standard agarose gel electrophoresis might be insufficient to assess DNA integrity for qPCR telomere length measurement (AS, pers. obs. ). Practices during the analytical phase can also potentially affect the repeatability of telomere measurements. In longitudinal studies, especially if using long-term data, data are often analyzed in batches and/or clusters. Failure to consider the sample structure among the batches and clusters can create variation from which it is impossible to separate the biological variation from the confounding between batch/cluster variation (van Lieshout et al., 2020). When analyzing longitudinal samples, the samples from the same individual are often analyzed on the same plate/gel to increase statistical power to detect within-individual effects. However, doing so, but not controlling for the plate/gel effect, can inadvertently increase the within-individual repeatability, while analyzing the samples from the same individual on different plates/gels will often decrease the within-individual repeatability. Thus, there are a number of methodological practices that need to be taken into account to improve the repeatability of telomere length in both qPCR and TRF methods. Quantifying the relative importance of these practices is beyond the scope of this meta-analysis but warrants further investigation.
To conclude, telomere length is increasingly used as a biomarker for past stress exposure (Chatelain et al., 2020) and future performance (Eastwood et al., 2019; Heidinger et al., 2012). However, for telomere length to be truly informative about the past or the future, telomere length requires a reasonable within-individual repeatability over the life course. For instance, inferring any long-lasting effects of early-life environmental conditions or predicting future survival probability when the within-individual repeatability is virtually close to 0 (in ca. 23% of qPCR studies) is likely to lead to spurious conclusions. Similarly, repeatability often sets an upper limit to heritability (Falconer and Mackay, 1996; Lynch and Walsh, 1997, but see Dohm, 2002) and low repeatability will conceal the heritable component of telomere length. Accordingly, heritability estimates are usually higher for TRF than qPCR studies (Bauch, Boonekamp, Korsten, Mulder, & Verhulst, 2020), which is in line with the strong method effect found here for within-individual repeatability. Our study indeed indicates that the within-individual repeatability of telomere length is mainly driven by telomere measurement method, with the repeatability being significantly lower with qPCR than with TRF. Unfortunately, the majority of the longitudinal telomere studies to date have used qPCR, which may partly mask the role of biological variation in telomere length and dynamics. It might be worth reassessing whether biological factors drive within-individual repeatability in telomere length when there will be more TRF studies and/or when the repeatability of qPCR has improved. Meanwhile, we encourage scientists to design their research and laboratory practices to aim at high repeatability by reducing factors that potentially create variation in telomere length measurements. It is noteworthy, that while TRF studies regularly showed higher repeatability estimates than qPCR studies, it is also possible to achieve high repeatability using qPCR. In qPCR studies it is important to pay particular attention to both sample handling and storage and optimizing the qPCR protocol (see Morinha, Magalhães, & Blanco, 2020b for specific guidelines). We particularly encourage scientists to estimate and report the within-individual repeatability R in their study systems. Repeatabilities are driven by measurement error and/or biological variables and to decompose the contribution of both components, studies should report the within-individual repeatability of all longitudinal samples along with the technical repeatability of their methodology (e.g. based on the repeated measurement of the same samples). This should capture at least part of the loss of repeatability due to measurement error. It would thus allow the estimation of a biology-driven repeatability and hence testing some of the predictions we proposed in this study. Longitudinal studies still rarely report within-individual repeatability of telomere length, while it can be a key statistic for the interpretation of both the reliability of the methodology and the biology driving the dynamics of telomere length and more generally individual traits.