Fragment-based Linear Scaling Computation Methods
QM methods have been widely applied in chemistry because they can
evaluate electron interactions relative to molecular mechanical (MM)
methods. However, due to the steep computational scaling associated with
system size, it is difficult, or even impossible, to perform quantum
calculations for large molecular systems, such as biomolecules that
contain hundreds or thousands of atoms. The desire to study systems
larger than what was computationally feasible led to the development of
novel methods, including QM/MM methods, semi-empirical approaches, and
reduced-scaling methods.11 Another method is linear
scaling,24 a thriving field of research into efficient
calculation of large molecular systems using only modest requirements
for memory and CPU time. Some linear scaling methods rely on screening
or approximating electron-repulsive integrals,25-30such as the fast multipole method25 and linear scaling
exchange29; however, they do not split a whole
molecule into fragments, and can only achieve linear scaling for
one-dimensional systems, such as alkane chains. In contrast, FLSMs as
another important class of scaling method can achieve real linear
scaling for 3D systems, such as proteins.10 These
represent examples of the significant progress made in developing and
applying new fragmentation methods.31-40
Li et al.32 grouped FLSMs into density- and
energy-based methods. In the present study, we divided all FLSMs into
‘overlapping’ and ‘disjoint’ methods according to fragment formation.
FLSMs, such as MFCC, FMO, generalized energy-based fragmentation (GEBF),
molecular-tailoring approach (MTA), kernel energy method, X-Pol
(previously referred to as MODEL), use strictly linear scaling and have
received increased attention.10,14,15,32,41-48 For
example, FMO was used by Heifetz et al.49,50 to
investigate agonist–orexin-2 receptor interactions and optimize
interleukin-2-inducible T cell kinase inhibitors. Additionally, MFCC was
used by Liu et al.51 for geometry optimization and
vibrational spectrum calculation of proteins, and Singh et al.52 used the MTA method to estimate binding energies
for large water clusters.
Performing an FLSM calculation usually requires three steps. The first
step is dividing the large molecular system into subsystems, which might
or might not contain buffer atoms, according to different methods.
Ideally, the correct fragmentation operation ensures the local
interaction of every fragment. Second, input file(s) of appropriate
quantum chemistry software should be prepared and used for subsystem
calculation. The final step is assembling the calculation results of
substructures to obtain the original system’s properties, such as
charge, energy, or energy gradient. Correct and efficient performance of
FLSM calculations is not straightforward, especially for the molecule
fragmentation step, which is cumbersome. Combining the three steps into
a single, automated solution in one platform or software package would
significantly lower the barrier of using FLSMs.
Among FLSMs, two methods typically belong to different subclasses, with
MFCC and FMO chosen for implementation in this study. MFCC was proposed
by Zhang et al. in 200314 and represents an
inclusion-exclusion principle-based method that belongs to the
‘overlapping’ FLSM subclass. It is ideally suited for calculations
involving large biomolecules, such as ligand–protein-binding energies.
There are other methods, including generalized (G)MFCC/MM and
electrostatically embedded (EE)-GMFCC,14,53,54developed based on the MFCC method; however, a platform or software
suite for simplified use of the MFCC method for research is currently
unavailable. In this study, we constructed an automated process to make
MFCC a useful tool, especially for users unfamiliar with such methods.
Kitaura and co-workers originally proposed the FMO method, which belongs
to the ‘disjoint’ FLSM subclass, in 1999 to calculate the energies of
large molecular systems and with properties obtained from many-body
expansion or FMO calculations.15,55-59 FMO is a
well-established tool for calculating energies and other properties,
optimizing structure and study time evolution with molecular dynamics
(MD), and investigating interactions in large molecular
systems.10,57,60 Several improved methods have been
developed based on FMO, including effective fragment-potential
FMO61, FMO/polarizable continuum model
(PCM)62, and FMO-long-range correction density
functional tight binding.63 Additionally, FMO has been
implemented in several programs, including General Atomic and Molecular
Electronic Structure System [GAMESS (US)], ABINIT-MP, OpenFMO, and
parallelized ab initio calculation system
(PAICS).64-67 GAMESS (US) incorporates a majority of
FMO-related methods and is a widely used package for FMO
research.57 OpenFMO is an open-architecture program
targeting effective FMO calculations on massive parallel computers,
especially GPU-accelerated computers.67 The
availability of graphical user interfaces makes FMO application
relatively easy for preparing calculations and visualizing
results.34,40,68 For example, FragIt is used to
prepare input files for FMO calculation in GAMESS, but it cannot use
HPCs to accelerate the computing process, and it includes limited
results analysis ability.35 Additionally, Facio is
used for FMO input-file preparation for PC-GAMESS; however, its
implementation as a Windows application restricts its
use.69 BioStation Viewer and PAICSView are user
interfaces for ABINIT-MP and PAICS, respectively, and both have limited
abilities to interact with HPCs. In the present study, we implemented
the full FMO method of GAMESS into GridMol to allow users to prepare FMO
input files easily and use HPCs to accelerate the computation process.