5.1 Enhanced sampling methods
The second law of thermodynamics states that natural systems seek a
state of minimum free energy at equilibrium. Thus, the computation of a
system’s free energy is essential in comparing the results of simulation
and experiment. Several different methods have been implemented for
calculation of the free energy of various chemical and biomolecular
systems, and here we will discuss three of the more commonly employed
techniques, namely the free energy perturbation (FEP) method [86]
and umbrella sampling [6], and metadynamics.
5.1.1 Free energy perturbation (FEP): In molecular systems, the
free energy problem is typically presented in terms of computing a free
energy difference, ΔF, between two defined thermodynamic states, for
example, a ligand-bound versus unbound molecule. The free energy
difference between the two states is expressed as [87]:
\(F=-\frac{1}{\beta}\ln\left\langle\exp[-\beta v\left(x\right)]\right\rangle_{0}\);\(\beta=\frac{1}{k_{B}T}\), (Eq. 14)
where, the subscript zero indicates configurational averaging over the
ensemble of configurations representative of the initial state of the
system, kB is the Boltzmann constant, T is the
temperature, and v (x ) is the potential energy function
that depends on the Cartesian coordinates of the system, [x ].
ΔF can also be computed by the reverse integration:
\(F=-\frac{1}{\beta}\ln\left\langle\exp[-\beta v\left(x\right)]\right\rangle_{1}\),
(Eq. 15)
where the subscript one indicates averaging over the ensemble of
configurations representative of the final state of the system. However,
for systems where the free energy difference is significantly larger, a
series of intermediate states must be defined and must differ by no more
than 2kBT. The total ΔF can then be computed by summing
the ΔFs between the intermediate states:
\(F=-\frac{1}{\beta}\sum_{i=1}^{M+1}{\ln\left\langle\exp[-\beta\left[v\left(x;\lambda_{i+1}\right)-v\left(x;\lambda_{i}\right)\right]]\right\rangle_{\lambda_{i}}}\),
(Eq. 16)
where M indicates the number of intermediate states, and λ is the
coupling parameter, a continuous parameter that marks the extent of the
transition from the initial to the final state. As λ is varied from 0
(initial state) to 1 (final state), the potential energy functionv (x; λ ) passes from v 0 tov 1.
5.1.2 Umbrella sampling: This procedure enables the calculation
of the potential of mean force (free energy density) along an a
priori chosen set of reaction coordinates or order parameters, from
which free energy changes can be calculated by numerical integration
(see for example, [13]). For the free energy calculation, the
probability distribution P(S) is calculated by dividing the range of
order parameter S into several windows. The histograms for each window
are collected by harvesting and binning trajectories in that window,
from which the potential of mean force Λ(S) is calculated; the potential
of mean force Λ(S) is given by [88, 89],
Λ(S) = −kBT ln(P(S)) + Constant; then, exp(-βΔF)=
∫exp(-βΛi(S)) dS (Eq. 17)
The functions Λ(S) in different windows are pieced together by matching
the constants such that the Λ function is continuous at the boundaries
of the windows. Thus, the arbitrary constant associated with each window
is adjusted to make the Λ function continuous. Note that Λ(S) here is
the same function as F(S) in (Eq. 6) The standard deviation in each
window of the potential of mean force calculations is estimated by
dividing the set of trajectories into two blocks and collecting separate
histograms. The calculation of the multi-dimensional potential of mean
force (multiple reaction coordinates) using the weighted histogram
analysis method (WHAM) reviewed by Roux [90], which enables an easy
and accurate recipe for unbiasing and combining the results of umbrella
sampling calculations, which simplifies considerably, the task of
recombining the various windows of sampling in complex systems and
computing ΔF.
5.1.3 Metadynamics : In metadynamics, the equations of motion are
augmented by a history-dependent potential
V(S,t)=kBΔT[1+N(S,t)/τ], where N(S,t) represents the
histograms of previously visited configurations up to time t. With this
choice of the biasing potential, the evolution equation of V is derived
and is solved together with the equation of motion. One can show that
the unbiased free energy can be constructed from the biased dynamics
using the equation F(S)=[(T+ΔT)/ΔT]V(S). Metadynamics accelerates
rare events along chosen collective variables (CV) S. Well-tempered
metadynamics (WTMD) [65, 91] is widely used to sample the large
scale configurational space between the configurations in large
biomolecular systems.
5.1.4 Methods for determining reaction paths: Path-based
methodologies seek to describe transition pathways connecting two well
defined states [92-94]; practical applications of this ideology are
available through methods such as stochastic path approach [95],
nudged elastic band [96-98], finite temperature string [99], and
transition path sampling [66, 100, 101], which each exploit the
separation in timescales in activated processes, namely, the existence
of a shorter time scale of relaxation at the kinetic bottle neck or the
transition state (τrelax), in comparison to a much
longer timescale of activation at the transition state itself
(τTS). Below, we review the path-based method of
transition path sampling.
Transition path sampling (TPS) [66, 100] aims to capture rare events
(excursions or jumps between metastable basins in the free energy
landscape) in molecular processes by essentially performing Monte Carlo
sampling of dynamics trajectories; the acceptance or rejection criteria
are determined by selected statistical objectives that characterize the
ensemble of trajectories. In transition path sampling, time-reversible
MD trajectories in each transition state region are harvested using the
shooting algorithm [101] to connect two metastable states via a
Monte Carlo protocol in trajectory space. Essentially, for a given
dynamics trajectory, the state of the system (i.e., basin A or B) is
characterized by defining a set of order parameters
χ=[χ1,χ2,…]. Each trajectory
is expressed as a time series of length τ. To formally identify a basin,
the population operator hA=1 if and only if a particular
molecular configuration associated with a time t of a trajectory belongs
to basin A; otherwise hA=0. The trajectory operator
HB=1 if and only if the trajectory visits basin B in
duration τ, i.e., there is at least one time-slice for which
hB=1; otherwise HB=0. The idea in TPS is
to generate many trajectories that connect A to B from one such existing
pathway. This is accomplished by a Metropolis algorithm that generates
an ensemble of trajectories [χτ] according to a
path action S[χτ] given by:
S[χτ]=ρ(0)hA(χ0)HB[χτ],
where ρ(0) is the probability of observing the configuration at t=0
(ρ(0)∝exp(-E(0)/kBT), in the canonical ensemble).
Trajectories are harvested using the shooting algorithm [101]: a new
trajectory χ*τ is generated from an existing one
χτ by perturbing the momenta of atoms at a randomly
chosen time t in a symmetric manner [101], i.e., by conserving
detailed balance. The perturbation scheme is symmetric, i.e., the
probability of generating a new set of momenta from the old set is the
same as the reverse probability. Moreover, the scheme conserves the
equilibrium distribution of momenta and the total linear momentum (and,
if desired, the total angular momentum). The acceptance probability
implied by the above procedure is given by Pacc=min(1,
S[χ*τ]/S[χτ]). With
sufficient sampling in trajectory space, the protocol converges to yield
physically-meaningful trajectories passing through the true transition
state (saddle) region.