6.6 ML-enhanced conformational sampling
Advances in ML-derived force fields are promising to revolutionize
classical simulations by directly defining energy landscapes from more
accurate quantum mechanical simulations [159, 160]. Besides, in
particle-based simulations of MSM, the efficient sampling of
high-dimensional conformational spaces constitutes a significant
challenge in the computational molecular sciences limiting the longtime
molecular dynamics (MD) simulations of molecular systems in biophysical
chemistry and materials science. Combining MD simulations with ML can
provide a powerful approach to address the challenges mentioned above
[161]. The last decade has seen significant advances in the use of
electronic structure calculations to train ML potentials for atomistic
simulations capable of reaching large systems sizes and longtime scales
with accurate and reliable energies and forces. More recently, ML
approaches have proved useful in learning high-dimensional free energy
surfaces [162, 163], and in providing a low dimensional set of
collective variables or CVs [164]. Some examples are discussed
below.
6.6.1 Boltzmann generators: The primary difficulty in sampling
physical realizations or microstates of the system from the Boltzmann
distribution lies in the nature of the potential energy. In large,
complex systems, the conformational space holds the positions of
hundreds of thousands to millions of atoms. The potential energy should
be viewed as a vast, rugged landscape in this high-dimensional space
characterized by an exponentially large number of low-energy regions or
minima, all separated by ridges. It is now possible to train a deep
neural network to learn a transformation from the conformational space
to another variable-space such that in this new space, the variables are
distributed according to simple distributions such as the Gaussian
distribution. One can back-map to the original space through inverse
transformation onto a high-probability region of the original
conformational space [161].
6.6.2 ML-enabled conformational enhanced sampling : The choice of
appropriate collective variables (CVs, aka order parameters in the
earlier section) for enhanced sampling methods such as metadynamics is
still a challenge. Recent advances have enabled ML tools of supervised
learning to define appropriate CVs. The pipeline flow for such
supervised machine learning methods utilizes ML-identified features from
(A) as CVs for enhanced sampling simulations [165]. Another choice
for ML-identified CV is through the use of variational autoencoders
[166], which are deep NNs that perform dimensionality reduction
similar to principal component analysis. The neural encoder takes a high
dimensional input vector and outputs a lower-dimensional output vector.
The neural decoder then takes the latent variable as input and attempts
to reconstruct the original high dimensional input using standard
optimization techniques of a loss function [167].
6.6.3 ML-enhanced adaptive path sampling: While enhanced sampling
methods are efficient in exploring the landscape based on pre-defined
CVs, one often discovers new variables during sampling. In such
scenarios, new variables that become relevant are often orthogonal to
the original CVs, and it is impossible to incorporate such variables
adaptively into the free energy landscape. One approach is to combine
enhanced sampling such as metadynamics with path sampling and utilize
the newly identified CVs in the path sampling through a path action
[168]. In this approach, path sampling is pursued by adaptively
modifying the path action, and the free energy landscape based on the
original CVs can be refined iteratively. The path action approach is
also easily customized to include other ML strategies such as
reinforcement learning to guide the system through non-Boltzmann paths.