Below you will find a short summary of several of my graduate school research
projects. In each case, the relevant literature citation is provided for
further information. Please feel free to
contact me
with questions.
Under construction!!!
The Dual-Basis Idea
Self-consistent field (SCF) calculations form the basis of electronic structure theory. Though relatively primitive, the Hartree-Fock (HF) method is the backbone of nearly all more accurate correlated wavefunctions. Kohn-Sham density functional theory (DFT) is also based on this formalism.
In an SCF calculation, molecular orbitals are typically expanded in a basis of atom-centered Gaussian basis functions (AOs), such as the commonly used Pople-style (6-31G*) and Dunning-style (cc-pVDZ) basis sets. The computational bottleneck for most systems becomes the formation of the two-electron repulsion integrals, formally an
process, where
is the size of the basis. Due to the decay of the one-electron density matrix (with which these integrals are contracted), this quartic scaling naturally reduces to quadratic scaling for many systems. Many methods have further reduced this scaling to linear for large systems. Therefore, scaling with respect to the number of atoms in the system practically scales somewhere between quadratic and linear.
Unfortunately, the prefactor in front of this scaling involves the size of the AO basis. And this prefactor does scale quartically with the size of the AO basis. SCF and, in particular, correlated calculations converge slowly with respect to the size of this basis set, so accurate calculations can still be very expensive, despite the presence of "linear scaling" methods.
In fact, many improvements in the subsequent correlation calculation--such as RI-MP2--have left the underlying SCF the computational bottleneck. As a practical example, consider an MP2 calculation of an alanine tetrapeptide using the cc-pVQZ basis. The combination of so-called "local" methods with the RI approximation (RI-TRIM-MP2) reduces the cost of the correlation calculation to a mere 2 hours. However, the underlying SCF to obtain the MOs took 6 days!
The dual-basis SCF (DB-SCF) method is a simple, cost-effective alternative designed to tackle this basis set size scaling. Essentially perturbation theory for the basis set problem, DB-SCF combines a full, iterative calculation in a small basis set with a subsequent non-iterative correction in the target basis.
[fig here]
Dual-Basis MP2:
Second-order perturbation theory (MP2) is the simplest means to add electron correlation to a HF calculation:
As mentioned above, many recent developments in peturbative correlation calculations--in particular, the "resolution-of-the-identity" (RI) approximation--have made the correlation part of these methods
faster than the underlying SCF calculation for many systems of interest. Thus, the DB-HF approximation is well-suited for these correlation calculations. In a DB-RI-MP2 calculation, the reference HF calculation is replaced by its DB counterpart:
The DB approximation leads to speedups of a factor of 8-10 in the SCF, while the RI approximation drastically reduces the cost of the correlation calculation. Overall savings for accurate, large-basis calculations were demonstrated to be on the order of 95%.
Most importantly, this speed comes with a marginal tradeoff in accuracy. On the G3 set of 225 molecules, for example, atomization energy errors were on the order of only a few
hundredths of a kcal/mol--for quantities that range from tens to thousands of kcal/mol across the set.
[table here]
Furthermore, energies in relative conformations of alanine tetrapeptides were shown to be only 0.014 kcal/mol
and qualitatively consistent with full, target-basis MP2 calculations.
Reference:
"Dual-basis second-order Moller-Plesset perturbation theory: A reduced-cost reference for correlation calculations"
R. P. Steele, R. A. DiStasio, Jr., Y. Shao, J. Kong, and M. Head-Gordon. J. Chem. Phys. 125 074108 (2006).
Dual-Basis HF/DFT First Derivatives:
Optimizing molecular structures and performing molecular dynamics simulations both require first derivatives of the potential energy surface with respect to the position of nuclei. Qualitatively, analytic derivatives require two things: Derivatives of the two-electron integrals and derivatives of the MOs. For standard SCF calculations, the optimized orbitals dictate that the latter is not required (within linear response). Thus, only a single set of integral derivatives is required.
Dual-basis SCF methods, however, deal with unconverged orbitals. Thus, a response term is required in the gradient. Though seemingly a fatal complication, this response term can be solved entirely in the small basis. The savings in the integral derivative term (stemming from the structure of the density matrix), combined with the savings in the underlying energy calculation, still lead to cost speedups in the gradient by a factor of 3-5.
[Fig here]
As might have been guessed from the promising energy results above, errors in molecular structures due to the DB approximation are quite tolerable. Relative to target-basis structures, errors are on the order of a few thousandths of an Angstrom. Furthermore, structures relative to experimental values are essentially indistinguishable. (The 6-31G/6-31G** pairing systematically--though fortuitously--outperforms its single-basis counterpart.) Therefore, for negligible tradeoff in accuracy, molecular geometries may be obtained roughly 35-75% faster.
[Tables here]
For some perspective, we can contrast DB-SCF derivatives with run-of-the-mill SCF and a standard dynamics method: Car-Parrinello Molecular Dynamics (CPMD). In the latter, only single SCF steps are taken--akin to our method--but a "fictitious mass" is carried along with the electrons to (roughly) keep them converged. In standard SCF derivatives, we require an iterative energy calculation, followed by a "one-shot" gradient. In CPMD, the energy is a "one-shot", whereas the gradient should be iterative. The fictitious mass approximately makes this true, leading to efficient dynamics (energy conservation be damned). In DB-SCF, we have the seemingly counter-productive situation involving an iterative calculation in both the energy and gradient. The saving grace is that both iterative calculations may be performed in the smaller basis set, whereas only a (cheap) single step may be taken in the target basis set.
Reference:
"Dual-basis Analytic Gradients: 1. Self-Consistent Field Theory"
R. P. Steele, Y. Shao, R. A. DiStasio, Jr., and M. Head-Gordon. J. Phys. Chem. A 110 13915 (2006).
Dual-Basis Pairings for 6-31G* Calculations:
While the above results for analytic gradients showed promise for large target basis sets, the timing performance for small basis sets (of the 6-31G* ilk) was not stellar. The cost of the underlying small basis set (for both SCF and response calculations) was simply too high. Given that the ratio of basis set size was close to unity--as opposed to more successful pairings, where the ratio was 2-3--some work remained to be done in this area. With only a single set of polarization (d) functions to eliminate, 6-31G* was not an ideal candidate for dual-basis implementation, yet its widespread use compelled us to try.
The solution for this regime was, in essence, to make 6-31G smaller! Its "31" split-valence structure was re-contracted into a 6-4G minimal basis. Importantly, the exponents and the necessary relative weightings were retained so that it remained a proper subset of 6-31G by primitives. As such, the large-basis steps remain identitical in cost to a 6-31G/6-31G* pairing, but the small-basis steps are reduced to a minimal-basis calculation.
The basis set was, therefore, re-optimized to fit the energies of a small set of atomic and (small) molecular systems, similar to the original optimization of 6-31G(*). The resulting basis was tested for reaction energies, molecular structures, and even harmonic frequencies. While DB results were not exact reproductions of the target basis, they effectively captured the lion's share of the difference between single-basis calculations in either basis set alone.
[Fig here]
Timings were also much-improved. For the glycine hexadecapeptide, a 6-4G/6-31G** derivative calculation is reduced to less than half the cost of the SCF alone in the target basis set.
[Fig here]
Reference:
"Dual-basis slef-consistent field methods: 6-31G* calculations with a minimal 6-4G primary basis"
R. P. Steele and M. Head-Gordon. Mol. Phys. 105 2455 (2007) [Peter Pulay Special Issue].
Photochemical Dynamics of Co(CO)3NO:
Summary here.
Dual-Basis Pairings for Augmented Basis Sets & Application to PDI Dimer:
While we typically view chemistry as the making and breaking of covalent bonds, much of chemistry (and biology and materials...) is dictated by non-covalent interactions. The properties of liquid water, for example, are strongly dictated by the interaction of the dipole moments of water molecules. These same interactions are responsible for the solvation of biomolecules.
Describing such interactions from a quantum chemical perspective, however, is extremely challenging. With several subtle, competing components, non-covalent interactions typically necessitate accurate electron correlation and, therefore, large basis sets. Dispersion interactions, for example, are inherently an electron correlation effect that cannot be described by mean-field calculations.
The fact that calculations for these systems typically reside in the large basis set regime makes them particularly attractive targets for dual-basis methods. Since large-basis MP2 is a standard routine for non-covalent complexes, the dual-basis analogue provides a cost-effective tool for systems that are otherwise intractable.
The long-range nature of these interactions makes diffuse ("augmented") basis sets appropriate and often more quickly convergent than higher angular momentum functions. Accordingly, we constructed and exhaustively tested pairings for the Dunning-style aug-cc-pV{D,T,Q}Z series of basis sets. The balanced structure of these basis sets, along with their associated dual-basis truncations--is shown below:
[Fig here]
Reference:
"Non-Covalent Interactions with Dual-Basis Methods: Pairings for Augmented Basis Sets"
R. P. Steele, R. A. DiStasio, Jr., and M. Head-Gordon. J. Chem. Theor. Comput. 5 1560 (2009).
Dual-Basis RI-MP2
Analytical Gradient (Rob DiStasio)
Summary here.