The Need for New Proxies of Ancient Sequence Preservation
The number of specimens reported to preserve molecular sequences decreases substantially beyond geologic ages of ~0.13-0.24Ma and ~0.8-1.0Ma for DNA and proteins respectively (Buckley, Larkin, & Collins, 2011; Froese et al., 2017; Lindqvist et al., 2010; Meyer et al., 2017; Mitchell & Rawlence, 2021; Wadsworth & Buckley, 2014; Welker et al., 2019). This is excluding specimens of permafrost settings and some cave deposits as these settings often confer exceptional preservation potential for molecular sequences (Dabney et al., 2013; Meyer et al., 2016; Meyer et al., 2014; Ngatia et al., 2019; Orlando et al., 2013; van der Valk et al., 2021; Welker et al., 2019). The decrease in reported sequences from specimens exceeding these timepoints suggests substantial diagenetic alteration occurs to fossil/sub-fossil biomolecules over these timeframes. The extent of this diagenetic alteration is such that in many cases molecular sequences are degraded beyond the limit of detection of commonly used sequencing protocols. Still, molecular sequences, particularly protein sequences, have been reported from a few non-cave/permafrost specimens with geologic ages exceeding these thresholds (Asara, Schweitzer, Freimark, Phillips, & Cantley, 2007; Buckley et al., 2019; Cappellini et al., 2019; Cleland et al., 2015; Demarchi et al., 2016; Rybczynski et al., 2013; Schroeter et al., 2017; Schweitzer et al., 2009). A study on a Pliocene camel tibia from Ellesmere Island, Yukon, Canada, for example, managed to recover type-1 collagen peptides (Buckley et al., 2019; Rybczynski et al., 2013); in addition to two Mesozoic dinosaur specimens (Asara et al., 2007; Cleland et al., 2015; Schroeter et al., 2017; Schweitzer et al., 2009), these are the only pre-Pleistocene bones currently reported to harbor sequence-able proteins. Two other camels from Miocene and Pliocene formations of Nebraska were analyzed in the Ellesmere Island tibia study yet failed to yield detectable peptide sequences (Buckley et al., 2019; Rybczynski et al., 2013). This begs the question of why some specimens like the exceptional Ellesmere Island tibia preserve protein and/or DNA sequence information while many other pre/early and even mid-Pleistocene specimens do not.
The prevailing view in the paleogenomic and paleoproteomic literature would be that the greater thermal exposure of the temperate Nebraska specimens facilitated protein degradation relative to the Ellesmere Island tibia (Demarchi et al., 2016; Hofreiter et al., 2015; Kistler et al., 2017; Wadsworth et al., 2017; Welker et al., 2019). A warmer thermal setting accelerates the rate of diagenetic reactions affecting biomolecular histology, including molecular sequences (Demarchi et al., 2016; Kistler et al., 2017; Ramsøe et al., 2021). Advanced geologic age expands the temporal period over which these reactions have to progress and accumulate (Kistler et al., 2017; Lindahl, 1993; Ramsøe et al., 2021). Hence, a lower geologic age along with a cooler thermal setting is hypothesized to inhibit the extent of such diagenetic reactions and limit molecular sequence degradation. This and the degree materials such as bone, dentine, enamel, eggshell, and others resist degradation (Demarchi et al., 2016; Wadsworth & Buckley, 2014; Welker et al., 2019) are often cited as key variables explaining examples of exceptional sequence preservation.
Indeed, a fossil or sub-fossil’s thermal setting/history and geological age are generally used as proxies for predicting sequence preservation potential (Demarchi et al., 2016; Hofreiter et al., 2015; Kistler et al., 2017; Wadsworth et al., 2017; Welker et al., 2019). However, even ancient specimens from similar timepoints and depositional environments are known to display great variation in sequence preservation (Fortes et al., 2016; C. L. Hill & Schweitzer, 1999; Kistler et al., 2017; Letts & Shapiro, 2012; Mackie et al., 2017; Presslee et al., 2019; Wadsworth & Buckley, 2014; Wadsworth et al., 2017). In a study of 118 Xenarthrans from temperate to tropical locales, 6 specimens from the Santa Clara formation (~8.5-128Ka) of Camet Norte, Buenos Aires, Argentina, were analyzed. Of these, 2 specimens out of 6 demonstrated substantial evidence of protein preservation (Presslee et al., 2019). In this case, geologic age and thermal setting would be rendered relatively inaccurate as proxies since all specimens came from the same formation and would be expected to share a similar thermal history, yet not all preserve sequence information to a similar degree. Furthermore, a 2017 study by Mackie et al. examined the dental calculus of 21 Roman-era Homo sapiens specimens from 3 European burial sites using LC-MS/MS sequencing. Reported sequence preservation varied widely between specimens and was unattributable to any specific variables (Mackie et al., 2017). These differences in preservation likely result from a combination of other variables including differences in composition (Briggs, 2003; Collins et al., 2002; Collins et al., 1995; Gupta, 2014; Kendall et al., 2018; Lindahl, 1993; Schweitzer et al., 2019; Schweitzer et al., 2014), moisture content (Briggs, 2003; Collins et al., 2002; Gupta, 2014; Kendall et al., 2018; Lennartz et al., 2020; Lindahl, 1993; Nielsen-Marsh et al., 2000; Schweitzer et al., 2019; Trueman et al., 2004), and oxygen content (Briggs, 2003; Collins et al., 2002; Gupta, 2014; Kendall et al., 2018; Lennartz et al., 2020; Lindahl, 1993; Schweitzer et al., 2019; Wiemann et al., 2020; Wiemann et al., 2018) of burial sediments, among others . The complex range of variables potentially affecting sequence preservation supports that factors beyond geologic age and thermal history are responsible for specimens demonstrating exceptional sequence preservation. This limits the usefulness of any single diagenetic variable, such as geologic age or thermal history, as a proxy for DNA and protein sequence preservation.
A proposed solution to this limitation is to directly use fossil/sub-fossil biomolecular histology as a proxy for molecular sequence preservation. Biomolecular histology is the underlying basis for why diagenetic variables such as thermal history and geologic age can be used as proxies, in any capacity, for predicting sequence preservation. The cumulative effects of diagenetic variables are reflected in the preservational condition of a fossil or sub-fossil’s biomolecular histology (Briggs, 2003; Briggs et al., 2000; Gupta, 2014). Directly studying biomolecular histology and correlating it with degree of sequence preservation bypasses the need to study any one of these variables individually. Thus biomolecular histology is hypothesized to be usable as an accurate proxy for molecular sequence preservation. Yet little empirical research exists to this point that has observed how biomolecular histology of fossil and sub-fossil specimens varies with degree of sequence preservation.
Biomolecular Histology as a Novel Proxy for Ancient Sequence Preservation
A biomolecular histological approach directly examines the interface between tissue morphology and constituent biomolecules. One approach to studying biomolecular histology is to analyze biomolecule morphology. An example is electron microscope imaging of the ~67nm banded fibrils that are the direct manifestation of type-1 collagen peptide sequences (Boatman et al., 2019; Gottardi et al., 2016; Lin, Douglas, & Erlandsen, 1993; Rabotyagova, Cebe, & Kaplan, 2008; Tzaphlidou, 2005). Another example is the microscopic imaging of cellular membranes, which are primarily the manifestation of phospholipid bilayers with associated proteins and sterols (Lamparter & Galic, 2020). In both cases, the direct morphological manifestation of biomolecules is being examined. In contrast, prior studies correlating fossil/sub-fossil tissue histology with molecular sequence preservation have generally done so using petrographic thin sections of mineralized tissue, such as whole bone (Collins et al., 2002; Hollund et al., 2017; Kontopoulos et al., 2019; van der Sluis et al., 2014). The presence of both abundant biogenic and/or exogenous minerals hinders the observation of biomolecule morphology (Armitage & Anderson, 2013; Collins et al., 2002; Kontopoulos et al., 2019; van der Sluis et al., 2014), obscuring features of collagenous matrix, cellular membranes, vascular tissue, and other such structures. With light microscopy, such structures often may not even be observable in petrographic thin sections. A biomolecular histological approach would instead isolate biomolecular tissue portions from surrounding biogenic minerals. For example, in the case of bone, collagenous matrix, blood vessels, osteocyte cells, etc. would be individually isolated from surrounding bioapatite mineral, potentially via incubation in dilute acid (Armitage & Anderson, 2013; Lindgren et al., 2018; Schweitzer, Wittmeyer, et al., 2007; Schweitzer et al., 2013; Surmik et al., 2016). These biomolecular tissue portions would then be observable unhindered by biogenic minerals. Observations of properties of these biomolecular structures such as flexibility, robustness, and color, among others, can then be readily made using light microscopy, and surface structure is less obscured during electron microscope imaging (Lindgren et al., 2011; Schweitzer, Wittmeyer, et al., 2007; Schweitzer et al., 2005; Surmik et al., 2016). The absence of ~67nm banding in type-1 collagen fibrils, for example, of demineralized collagenous matrix directly evidences a substantial degree of collagen sequence degradation (Carrilho et al., 2009; Hashimoto et al., 2003; Rabotyagova et al., 2008). Such changes to the fibrils indicate the type-1 collagen peptide sequences have shifted in structure at the molecular level, which corresponds to degradation (Rabotyagova et al., 2008). The degree of change is hypothesized to correspond to degree of chemical sequence degradation, although, as this manuscript discusses, this has not yet been tested for ancient specimens. A petrographic thin section demonstrating somewhat degraded bone histology does not likewise necessarily indicate poorly preserved collagenous matrix/peptides (although it certainly may be suggestive). Such degradation could be more so due to alteration of the biogenic apatite or perhaps the incorporation of exogenous minerals or other contaminants. Biomolecular histology is proposed as a more precise/higher resolution proxy of molecular preservation relative to the historical methods that analyze whole mineralized hard tissue.
Regarding the study of biomolecular histology, an alternative to examining biomolecule morphology, as described above, is to instead chemically localize/map biomolecular signal to histological structure. For example, time-of-flight secondary ionization mass spectrometry (ToF-SIMS) can map biomolecular signals across histological structure surface. Ionized fatty acids and phospholipids, along with a variety of ionic fragments (including those of proteins), can be localized/mapped across cellular membranes (Sodhi, 2004; Thiel & Sjövall, 2011; Touboul & Brunelle, 2016). Historically, studies correlating biomolecular signal with molecular sequence preservation have done so using homogenized samples (demineralized or powdered) (Campos et al., 2012; Hollund et al., 2017; Kontopoulos et al., 2019; Kontopoulos et al., 2020; Kontopoulos, Presslee, Penkman, & Collins, 2018; Leskovar, Pajnič, Geršak, Jerman, & Črešnar, 2020; van der Sluis et al., 2014). In Kontopoulos et al. (2019, 2020), organics are directly examined using FT-IR to obtain amide/phosphate signal ratios for specimens. However, these ratios are obtained from homogenously ground bone which precludes localizing the biomolecular signal to its histological source (Kontopoulos et al., 2019; Kontopoulos et al., 2020). A biomolecular histological approach would instead use imaging FT-IR to localize chemical signal to specific histological structures (Pan & Hu, 2019), such as collagen matrix fibrils or cellular membranes. This approach improves confidence that the chemical signal is indeed endogenous, and that it originates from the target structure; for example, if FT-IR data for collagen protein specifically is desired, imaging FT-IR allows for structure consistent with collagen fibrils to be directly analyzed, as opposed to analysis of homogenized samples. This reduces the likelihood of potential contamination and allows for a more direct and replicable comparison of biomolecular structure, such as collagenous matrix or cellular membranes, across fossil/sub-fossil specimens. Again, biomolecular histology is here proposed as a novel proxy for sequence preservation that is more precise and higher resolution relative to prior methodologies such as sampling of homogenized tissue.
In general, the use of biomolecular histology as a proxy is less assuming than historical methods. Stable isotope, spectroscopic, and other analyses applied to homogenized samples generally assume (to an extent) that chemical signals arise from endogenous biomolecules (Campos et al., 2012; Chadefaux, Hô, Bellot-Gurlet, & Reiche, 2009; Hollund et al., 2017; Kontopoulos et al., 2020; Kontopoulos et al., 2018; Lebon, Reiche, Gallet, Bellot-Gurlet, & Zazzo, 2016; van der Sluis et al., 2014); in the case of homogenized bone, this signal is generally attributed to the highly abundant collagenous matrix (Hollund et al., 2017; Kontopoulos et al., 2019; Kontopoulos et al., 2020; Leskovar et al., 2020). For petrographic thin sections and other methods utilizing mineralized tissue histology, histological preservation of mineralized tissue is generally assumed to correlate, to an extent, with degree of biomolecular preservation (Collins et al., 2002; Kontopoulos et al., 2019; Nielsen-Marsh et al., 2007). For younger sub-fossil specimens that are relatively unaltered, these certainly may be reasonable assumptions. When dealing with increasingly ancient and/or diagenetically altered remains, however, the potential for unknown variables affecting specimen diagenesis increases (Alleon et al., 2012; Buckley, Warwood, van Dongen, Kitchener, & Manning, 2017; Hollund et al., 2017; Schweitzer, Hill, Asara, Lane, & Pincus, 2002). Furthermore, claims of endogenous organics are especially scrutinized for increasingly ancient specimens, or those likely subjected to extensive diagenetic alteration, such as from tropical and subtropical thermal settings. In such cases, assuming the source of biomolecular signals is especially risky, and such claims are likely to be challenged (Alleon et al., 2012; Buckley et al., 2017; Kaye, Gaugler, & Sawlowicz, 2008; Saitta et al., 2019). Rather than use analyses of complex samples, such as, for example, whole bone sections or homogenously ground bone, specific changes to the collagen protein fibrils of bone could instead be observed. As mentioned earlier, such observations include microscopic imaging of the ~67nm banded fibril structure formed by type-1 collagen peptide sequences (Boatman et al., 2019; Gottardi et al., 2016; Lin et al., 1993; Rabotyagova et al., 2008; Tzaphlidou, 2005), or the mapping and localization of biochemical signal to the collagen fibrils. This direct observation of the type-1 collagen fibrils themselves limits the potential for unknown variables to impact the use of such biomolecular histological data as a proxy. The result is a potentially higher resolution, more precise proxy for ancient DNA and protein sequence preservation.
A study by Cappellini et al. (2012) (Cappellini et al., 2012) used LC-MS/MS to recover peptide sequences from bones of a permafrost Mammuthus primigenius specimen (~43Ka, Yakutia, Russia) and two temperate Mammuthus columbi specimens (~11Ka, Colorado, United States; ~18Ka, Nebraska, United States). For the M. primigenius specimen, 1,139 unique peptides were recovered matching 269 different proteins with at least one unique peptide sequence; 126 proteins were reported to match at least 2 unique peptide sequences for this specimen. In contrast, for the two temperate M. columbi specimens, 342 and 243 unique peptides were recovered matching 35 and 19 different proteins from the Colorado and Nebraska M. columbi specimens, respectively. This corresponds to a >85% decrease in the number of unique proteins identified between the permafrost M. primigenius and the two temperate M. columbi. As a preliminary investigation, to demonstrate the concept of correlating biomolecular histology to degree of sequence preservation, bone samples relatively comparable (similar geologic age and thermal history) to those sequenced by Cappellini et al. (2012) were herein obtained, along with an extant control. Scanning electron microscope images of demineralized bone matrix were obtained (Figure 1) from a Bos taurus long bone (extant, fresh and frozen only once, purchased from Whole Foods Market), a Beringian permafrost Mammuthus primigenius innominate fragment (Pleistocene, Little Blanche Creek, Yukon territory, Canada, YG 610.2397), and a temperate Mammuthus columbi femur specimen (~14-15Ka in calibrated years, ~12.5Ka in radiocarbon years, Lindsay/Deer Creek, Montana, United States, MOR 91.72).