Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

MrIML: Multi-response interpretable machine learning to map genomic landscapes
  • +10
  • Nichola Fountain-Jones,
  • Christopher Kozakiewicz,
  • Brenna Forester,
  • Erin Landguth,
  • Scott Carver,
  • Michael Charleston,
  • Roderick Gagne,
  • Brandon Greenwell,
  • Simona Kraberger,
  • Daryl Trumbo,
  • Michael Mayer,
  • Nicholas Clark,
  • Gustavo Machado
Nichola Fountain-Jones
University of Tasmania

Corresponding Author:[email protected]

Author Profile
Christopher Kozakiewicz
Washington State University
Author Profile
Brenna Forester
Colorado State University
Author Profile
Erin Landguth
University of Montana
Author Profile
Scott Carver
University of Tasmania
Author Profile
Michael Charleston
University of Tasmania
Author Profile
Roderick Gagne
Author Profile
Brandon Greenwell
University of Cincinnati
Author Profile
Simona Kraberger
Colorado State University
Author Profile
Daryl Trumbo
Colorado State University
Author Profile
Michael Mayer
Actuarial department
Author Profile
Nicholas Clark
University of Queensland
Author Profile
Gustavo Machado
North Carolina State University
Author Profile

Abstract

We introduce a new R package ‘MrIML’ (Multi-response Interpretable Machine Learning). MrIML provides a powerful and interpretable framework that enables users to harness recent advances in machine learning to map multi-locus genomic relationships, to identify loci of interest for future landscape genetics studies and to gain new insights into adaptation across environmental gradients. Relationships between genetic change and environment are often non-linear, interactive and autocorrelated. Our package helps capture this complexity and offers functions that construct, fit and conduct inference on a wide range of highly flexible models that are routinely used for single-locus landscape genetics studies but are rarely extended to estimate response functions for multiple loci. To demonstrate the package’s broad functionality, we test its ability to recover landscape relationships from simulated genomic data. We also apply the package to two empirical case studies. In the first we estimate variation in the population-level genetic composition of North American balsam poplar (Populus balsamifera, Salicaceae) and in the second we recover individual-level landscapes while estimating host drivers of feline immunodeficiency virus genetic spread in bobcats (Lynx rufus). The ability to model thousands of loci collectively and compare models from linear regression to extreme gradient boosting, within the same analytical framework, has the potential to be transformative. The MrIML framework is also extendable and not limited to mapping genetic change, for example, it can be used to quantify the environmental driver sof microbiomes and coinfection dynamics.