Algorithmic detection of elemental biosignatures

Murray Jesse

doi:10.1002/essoar.10506132.1

loading page

Algorithmic detection of elemental biosignatures

Jesse Murray

Abstract

Machine learning (ML) models that classify a sample as non-indicative or indicative of life can play an important role in planning life-detection missions. They are based on clearly defined and consistent algorithms, regardless of sample type or origin, and make their predictions from weighted combinations of multiple features rather than from any singular feature. These weighted combinations can reveal the most informative measurements within the operational constraints of a life-detection mission. The Ladder of Life Detection (Neveu 2018) identifies the need for an understanding of how combinations of multiple biosignatures affect overall confidence. The present work provides a starting point to answer this need, and future work will expand the data types to obtain even more predictive combinations of features. Elemental composition and isotope fractionation were chosen as the data types, as they are available for both biogenic and abiogenic systems and not unique to Earth biochemistry. Measurements of these data types across a wide range of unambiguously non-indicative or indicative samples were gathered from published literature. The varied sample measurements were then integrated into twenty-one representative samples. The ML models only made binary classifications of non-indicative or indicative of life. Nonetheless, the indicative samples broadly fell into three categories: mixed, non-alive, and alive. Four classification algorithms were trained and tested with Monte Carlo simulations using a 70:30 train to validation ratio. Between the models, around 75% of the test samples were correctly classified, with variations in sensitivity and specificity of the models. For elemental abundances predictive of a non-indicative of life sample: all models found Ti and Si as strong and Fe, Al, Mn, and Mg as medium. For predicting an indicative of life sample, all models found C, N, and Carbon-13 as strong and K, H, P, and Ca as medium. A weighted combination of multiple biosignatures is shown to be a more effective approach to classifying sample-data than relying on any individual biosignature or on an unweighted group of biosignatures. Different models also made different chronic misclassifications, suggesting that combining the outputs of multiple models may be more effective than relying on the output of a singular model. Which type of model to use may depend on the application, e.g., higher sensitivity models might be preferred in first-pass situations where false-negatives are more costly than false-positives. Lastly, the weighted combination of measurements in a model suggests how to combine biosignatures to affect the overall confidence of the classification. These results provide evidence of elemental biosignatures beyond the CHNOPS of Earth-based life and serve as a proof of concept for algorithmic biosignature classification.