Abstract
We introduce a new R package ‘MrIML’ (Multi-response Interpretable
Machine Learning). MrIML provides a powerful and interpretable framework
that enables users to harness recent advances in machine learning to map
multi-locus genomic relationships, to identify loci of interest for
future landscape genetics studies and to gain new insights into
adaptation across environmental gradients. Relationships between genetic
change and environment are often non-linear, interactive and
autocorrelated. Our package helps capture this complexity and offers
functions that construct, fit and conduct inference on a wide range of
highly flexible models that are routinely used for single-locus
landscape genetics studies but are rarely extended to estimate response
functions for multiple loci. To demonstrate the package’s broad
functionality, we test its ability to recover landscape relationships
from simulated genomic data. We also apply the package to two empirical
case studies. In the first we estimate variation in the population-level
genetic composition of North American balsam poplar (Populus
balsamifera, Salicaceae) and in the second we recover individual-level
landscapes while estimating host drivers of feline immunodeficiency
virus genetic spread in bobcats (Lynx rufus). The ability to model
thousands of loci collectively and compare models from linear regression
to extreme gradient boosting, within the same analytical framework, has
the potential to be transformative. The MrIML framework is also
extendable and not limited to mapping genetic change, for example, it
can be used to quantify the environmental driver sof microbiomes and
coinfection dynamics.