5.1 Enhanced sampling methods
The second law of thermodynamics states that natural systems seek a state of minimum free energy at equilibrium. Thus, the computation of a system’s free energy is essential in comparing the results of simulation and experiment. Several different methods have been implemented for calculation of the free energy of various chemical and biomolecular systems, and here we will discuss three of the more commonly employed techniques, namely the free energy perturbation (FEP) method [86] and umbrella sampling [6], and metadynamics.
5.1.1 Free energy perturbation (FEP): In molecular systems, the free energy problem is typically presented in terms of computing a free energy difference, ΔF, between two defined thermodynamic states, for example, a ligand-bound versus unbound molecule. The free energy difference between the two states is expressed as [87]:
\(F=-\frac{1}{\beta}\ln\left\langle\exp[-\beta v\left(x\right)]\right\rangle_{0}\);\(\beta=\frac{1}{k_{B}T}\), (Eq. 14)
where, the subscript zero indicates configurational averaging over the ensemble of configurations representative of the initial state of the system, kB is the Boltzmann constant, T is the temperature, and v (x ) is the potential energy function that depends on the Cartesian coordinates of the system, [x ]. ΔF can also be computed by the reverse integration:
\(F=-\frac{1}{\beta}\ln\left\langle\exp[-\beta v\left(x\right)]\right\rangle_{1}\), (Eq. 15)
where the subscript one indicates averaging over the ensemble of configurations representative of the final state of the system. However, for systems where the free energy difference is significantly larger, a series of intermediate states must be defined and must differ by no more than 2kBT. The total ΔF can then be computed by summing the ΔFs between the intermediate states:
\(F=-\frac{1}{\beta}\sum_{i=1}^{M+1}{\ln\left\langle\exp[-\beta\left[v\left(x;\lambda_{i+1}\right)-v\left(x;\lambda_{i}\right)\right]]\right\rangle_{\lambda_{i}}}\), (Eq. 16)
where M indicates the number of intermediate states, and λ is the coupling parameter, a continuous parameter that marks the extent of the transition from the initial to the final state. As λ is varied from 0 (initial state) to 1 (final state), the potential energy functionv (x; λ ) passes from v 0 tov 1.
5.1.2 Umbrella sampling: This procedure enables the calculation of the potential of mean force (free energy density) along an a priori chosen set of reaction coordinates or order parameters, from which free energy changes can be calculated by numerical integration (see for example, [13]). For the free energy calculation, the probability distribution P(S) is calculated by dividing the range of order parameter S into several windows. The histograms for each window are collected by harvesting and binning trajectories in that window, from which the potential of mean force Λ(S) is calculated; the potential of mean force Λ(S) is given by [88, 89],
Λ(S) = −kBT ln(P(S)) + Constant; then, exp(-βΔF)= ∫exp(-βΛi(S)) dS (Eq. 17)
The functions Λ(S) in different windows are pieced together by matching the constants such that the Λ function is continuous at the boundaries of the windows. Thus, the arbitrary constant associated with each window is adjusted to make the Λ function continuous. Note that Λ(S) here is the same function as F(S) in (Eq. 6) The standard deviation in each window of the potential of mean force calculations is estimated by dividing the set of trajectories into two blocks and collecting separate histograms. The calculation of the multi-dimensional potential of mean force (multiple reaction coordinates) using the weighted histogram analysis method (WHAM) reviewed by Roux [90], which enables an easy and accurate recipe for unbiasing and combining the results of umbrella sampling calculations, which simplifies considerably, the task of recombining the various windows of sampling in complex systems and computing ΔF.
5.1.3 Metadynamics : In metadynamics, the equations of motion are augmented by a history-dependent potential V(S,t)=kBΔT[1+N(S,t)/τ], where N(S,t) represents the histograms of previously visited configurations up to time t. With this choice of the biasing potential, the evolution equation of V is derived and is solved together with the equation of motion. One can show that the unbiased free energy can be constructed from the biased dynamics using the equation F(S)=[(T+ΔT)/ΔT]V(S). Metadynamics accelerates rare events along chosen collective variables (CV) S. Well-tempered metadynamics (WTMD) [65, 91] is widely used to sample the large scale configurational space between the configurations in large biomolecular systems.
5.1.4 Methods for determining reaction paths: Path-based methodologies seek to describe transition pathways connecting two well defined states [92-94]; practical applications of this ideology are available through methods such as stochastic path approach [95], nudged elastic band [96-98], finite temperature string [99], and transition path sampling [66, 100, 101], which each exploit the separation in timescales in activated processes, namely, the existence of a shorter time scale of relaxation at the kinetic bottle neck or the transition state (τrelax), in comparison to a much longer timescale of activation at the transition state itself (τTS). Below, we review the path-based method of transition path sampling.
Transition path sampling (TPS) [66, 100] aims to capture rare events (excursions or jumps between metastable basins in the free energy landscape) in molecular processes by essentially performing Monte Carlo sampling of dynamics trajectories; the acceptance or rejection criteria are determined by selected statistical objectives that characterize the ensemble of trajectories. In transition path sampling, time-reversible MD trajectories in each transition state region are harvested using the shooting algorithm [101] to connect two metastable states via a Monte Carlo protocol in trajectory space. Essentially, for a given dynamics trajectory, the state of the system (i.e., basin A or B) is characterized by defining a set of order parameters χ=[χ12,…]. Each trajectory is expressed as a time series of length τ. To formally identify a basin, the population operator hA=1 if and only if a particular molecular configuration associated with a time t of a trajectory belongs to basin A; otherwise hA=0. The trajectory operator HB=1 if and only if the trajectory visits basin B in duration τ, i.e., there is at least one time-slice for which hB=1; otherwise HB=0. The idea in TPS is to generate many trajectories that connect A to B from one such existing pathway. This is accomplished by a Metropolis algorithm that generates an ensemble of trajectories [χτ] according to a path action S[χτ] given by: S[χτ]=ρ(0)hA0)HBτ], where ρ(0) is the probability of observing the configuration at t=0 (ρ(0)∝exp(-E(0)/kBT), in the canonical ensemble). Trajectories are harvested using the shooting algorithm [101]: a new trajectory χ*τ is generated from an existing one χτ by perturbing the momenta of atoms at a randomly chosen time t in a symmetric manner [101], i.e., by conserving detailed balance. The perturbation scheme is symmetric, i.e., the probability of generating a new set of momenta from the old set is the same as the reverse probability. Moreover, the scheme conserves the equilibrium distribution of momenta and the total linear momentum (and, if desired, the total angular momentum). The acceptance probability implied by the above procedure is given by Pacc=min(1, S[χ*τ]/S[χτ]). With sufficient sampling in trajectory space, the protocol converges to yield physically-meaningful trajectories passing through the true transition state (saddle) region.