6.6 ML-enhanced conformational sampling
Advances in ML-derived force fields are promising to revolutionize classical simulations by directly defining energy landscapes from more accurate quantum mechanical simulations [159, 160]. Besides, in particle-based simulations of MSM, the efficient sampling of high-dimensional conformational spaces constitutes a significant challenge in the computational molecular sciences limiting the longtime molecular dynamics (MD) simulations of molecular systems in biophysical chemistry and materials science. Combining MD simulations with ML can provide a powerful approach to address the challenges mentioned above [161]. The last decade has seen significant advances in the use of electronic structure calculations to train ML potentials for atomistic simulations capable of reaching large systems sizes and longtime scales with accurate and reliable energies and forces. More recently, ML approaches have proved useful in learning high-dimensional free energy surfaces [162, 163], and in providing a low dimensional set of collective variables or CVs [164]. Some examples are discussed below.
6.6.1 Boltzmann generators: The primary difficulty in sampling physical realizations or microstates of the system from the Boltzmann distribution lies in the nature of the potential energy. In large, complex systems, the conformational space holds the positions of hundreds of thousands to millions of atoms. The potential energy should be viewed as a vast, rugged landscape in this high-dimensional space characterized by an exponentially large number of low-energy regions or minima, all separated by ridges. It is now possible to train a deep neural network to learn a transformation from the conformational space to another variable-space such that in this new space, the variables are distributed according to simple distributions such as the Gaussian distribution. One can back-map to the original space through inverse transformation onto a high-probability region of the original conformational space [161].
6.6.2 ML-enabled conformational enhanced sampling : The choice of appropriate collective variables (CVs, aka order parameters in the earlier section) for enhanced sampling methods such as metadynamics is still a challenge. Recent advances have enabled ML tools of supervised learning to define appropriate CVs. The pipeline flow for such supervised machine learning methods utilizes ML-identified features from (A) as CVs for enhanced sampling simulations [165]. Another choice for ML-identified CV is through the use of variational autoencoders [166], which are deep NNs that perform dimensionality reduction similar to principal component analysis. The neural encoder takes a high dimensional input vector and outputs a lower-dimensional output vector. The neural decoder then takes the latent variable as input and attempts to reconstruct the original high dimensional input using standard optimization techniques of a loss function [167].
6.6.3 ML-enhanced adaptive path sampling: While enhanced sampling methods are efficient in exploring the landscape based on pre-defined CVs, one often discovers new variables during sampling. In such scenarios, new variables that become relevant are often orthogonal to the original CVs, and it is impossible to incorporate such variables adaptively into the free energy landscape. One approach is to combine enhanced sampling such as metadynamics with path sampling and utilize the newly identified CVs in the path sampling through a path action [168]. In this approach, path sampling is pursued by adaptively modifying the path action, and the free energy landscape based on the original CVs can be refined iteratively. The path action approach is also easily customized to include other ML strategies such as reinforcement learning to guide the system through non-Boltzmann paths.