SCF convergence

vibe-qc’s SCF surface is uniform across the four molecular drivers (run_rhf / run_uhf / run_rks / run_uks). The periodic drivers (run_rhf_periodic_gamma, multi-k variants) expose the same option fields, but accelerator dispatch is currently partial outside the Γ-only direct path; see SCF accelerators below for the exact coverage matrix. The user-facing selection knobs split into four groups:

  • Fock-build mode — how the two-electron matrices are constructed each iteration (opts.scf_mode). Default is SCFMode.AUTO, which picks between in-core CONVENTIONAL and on-the-fly screened DIRECT based on basis-set size. See SCF Fock-build modes.

  • Initial guess — where the SCF starts from (opts.initial_guess). Default is InitialGuess.AUTO, which routes through the unified GuessEngine: closed-shell light-atom molecular → SAP, open-shell / transition-metal / any periodic → SAD, with HCORE / PATOM / HUECKEL / MINAO / READ available as explicit selections. The richer story (theory, math, when to use which, references) lives in Initial guesses; the brief summary here is enough to pick a knob.

  • Convergence aids — composable knobs that stack freely (damping, dynamic_damping, fock_mixing, level_shift, smearing_temperature for periodic).

  • SCF accelerators — mutually exclusive choices for how the Fock matrix is extrapolated each iteration (scf_accelerator).

  • Second-order finalizers — mutually exclusive opt-in choices for how the asymptotic regime is closed out (newton_threshold, soscf_threshold, trah_threshold, quadratic_fallback_iter).

Quick example exercising every group:

import vibeqc as vq
from vibeqc import RHFOptions

opts = RHFOptions()

# Stopping criteria
opts.max_iter = 100
opts.conv_tol_energy = 1e-8
opts.conv_tol_grad = 1e-6

# Initial guess (see docs/user_guide/initial_guess.md for details)
opts.initial_guess = vq.InitialGuess.AUTO     # AUTO / SAP / SAD / HCORE

# Convergence aids — compose freely
opts.damping = 0.5                            # static density mixing
opts.dynamic_damping = False                  # adaptive α (Z-H 1979)
opts.fock_mixing = 0.0                        # CRYSTAL FMIXING
opts.level_shift = 0.0                        # Saunders-Hillier

# SCF accelerator — pick one (molecular default: EDIIS_DIIS)
opts.scf_accelerator = vq.SCFAccelerator.EDIIS_DIIS  # default; or DIIS / KDIIS / EDIIS / ADIIS

# Second-order finalizer — opt in for the asymptotic regime
opts.newton_threshold = 0.0                   # D2c full-Hessian Newton
opts.soscf_threshold = 0.0                    # D2d Neese SOSCF
opts.trah_threshold = 0.0                     # D2e TRAH (trust-region AH)

All of these options exist on UHFOptions, RKSOptions, UKSOptions, PeriodicRHFOptions, PeriodicSCFOptions, PeriodicKSOptions. The periodic options additionally expose level_shift_warmup_cycles and smearing_temperature for metals / small-gap insulators.


Initial guess

The unified GuessEngine (v0.9.x) dispatches every initial-guess request to a concrete builder. Default for every options class is InitialGuess.AUTO, which inspects the system and picks the right guess automatically.

opts.initial_guess

Algorithm

Status

When to use

InitialGuess.AUTO

GuessEngine::resolve_auto picks per the table below

✅ shipped

Default — let the engine decide.

InitialGuess.SAD

Superposition of atomic densities — fractional-occupation atomic SCF per unique element, summed at AO indices

✅ shipped

Open-shell, transition / f-block atoms, any periodic system. AUTO routes here.

InitialGuess.SAP

Superposition of atomic potentials (Lehtola/Visscher/Engel 2020) — sum tabulated atomic effective potentials, diagonalise T + V_SAP

✅ shipped (v0.9.x)

Closed-shell molecules with light atoms. AUTO routes here. Cheaper than SAD (no per-element atomic SCF).

InitialGuess.HCORE

Diagonalise H_core; build D from occupied MOs

✅ shipped

Diagnostic / back-compat. Slow on most systems; known to lock onto false minima for some open-shell cases (OH·/6-31G*).

InitialGuess.PATOM

Atomic-block diagonal Fock guess (ORCA PModel)

🟡 scaffolded

Roadmap.

InitialGuess.HUECKEL

Extended Hückel

🟡 scaffolded

Roadmap.

InitialGuess.MINAO

Minimal-AO projection (PySCF default)

🟡 scaffolded

Roadmap.

InitialGuess.READ

Read prior orbitals (chkfile / restart)

🟡 scaffolded

Roadmap.

AUTO dispatch table (GuessEngine::resolve_auto):

Hint

AUTO resolves to

periodic, any

SAD

molecular, open-shell

SAD

molecular, transition / f-block atom

SAD

molecular, closed-shell, light atoms

SAP

opts.initial_guess = vq.InitialGuess.AUTO    # default; engine picks
opts.initial_guess = vq.InitialGuess.SAD     # explicit SAD
opts.initial_guess = vq.InitialGuess.SAP     # explicit SAP (closed-shell light atoms)
opts.initial_guess = vq.InitialGuess.HCORE   # diagnostic / back-compat

The v0.6.2 periodic-HCORE calibration freeze was lifted in v0.9.x: periodic drivers now default to AUTO SAD for any periodic system (fixes the NaCl/MgO bombing class of failure). Converged energies are bit-identical across guess choices (the initial guess can’t change the converged minimum); only iteration counts and trace shape differ.


Convergence aids — compose freely

These are independent knobs. Set any subset; they all stack on top of whichever SCF accelerator and finalizer you’ve selected.

Static damping

Linear density mixing before the next Fock build:

D_used = α · D_prev + (1 − α) · D_new
opts.damping = 0.5      # default; 0.0 disables

Skipped automatically once DIIS / KDIIS / EDIIS / EDIIS+DIIS takes over (the accelerator does the same job better). Skipped during the second-order phases (Newton / SOSCF / quadratic_fallback).

Dynamic damping (Zerner-Hehenberger 1979)

Adapts α iteration-by-iteration based on the energy decrease: increased toward dynamic_damping_max when the energy oscillates upward, decreased toward dynamic_damping_min when the energy is monotonically decreasing.

opts.damping = 0.5               # initial α
opts.dynamic_damping = True      # adapt over iterations
opts.dynamic_damping_min = 0.0   # default; lower bound
opts.dynamic_damping_max = 0.95  # default; upper bound

When false (default), damping is held constant — bit-for-bit back-compat with the pre-dynamic-damping path.

CRYSTAL FMIXING — static Fock-matrix mixing

F_diag = (1 − α_F) · F_current + α_F · F_previous

Independent of density damping. Stabilises the residual that density damping leaves oscillating on tight ionic crystals.

opts.fock_mixing = 0.0    # default; 0.30 mirrors CRYSTAL FMIXING 30

Saunders-Hillier level shift

Adds b · (S ½ S D S) to F before diagonalisation, raising virtual-orbital eigenvalues by b Hartree while leaving occupied orbitals untouched. Inert at the converged density (the SCF fixed point doesn’t change), only the iteration dynamics do. Useful when DIIS oscillates between near-degenerate occupied/virtual swaps on small-HOMO–LUMO-gap insulators.

opts.level_shift = 0.0    # default; typical values: 0.1 – 0.5 Hartree

For periodic drivers, an additional level_shift_warmup_cycles controls a CRYSTAL-style warm-up: -1 for the automatic five-cycle startup with at least one unshifted tail cycle, 0 for a persistent shift, or a positive integer for an explicit restart point.

Smearing (periodic only)

Fermi-Dirac fractional occupations for metals / small-gap insulators. Adds an electronic-entropy contribution to the free energy.

opts.smearing_temperature = 0.0    # default; 0.001 – 0.02 Hartree typical

SCF accelerators — pick one

opts.scf_accelerator

Algorithm

When to use

SCFAccelerator.DIIS

Pulay 1980/1982. Error vector is the AO commutator e = F D S S D F.

Standard organic-chemistry workhorse; use when you want bit-for-bit reproducibility against older vibe-qc / PySCF runs.

SCFAccelerator.KDIIS

Kollmar 2005 (ORCA’s strict-convergence default). Same Pulay machinery, error vector is the orbital-rotation gradient g_{ai} = F^MO_{ai} (occ-vir block in MO basis) instead of the AO commutator.

Often robust where DIIS oscillates on transition-metal / open-shell cases where the AO-commutator is dominated by the wrong eigenmodes.

SCFAccelerator.EDIIS

Energy-DIIS (Kudin/Scuseria/Cancès 2002). Minimises a quadratic energy functional on the convex hull of stored (F, D) pairs.

Diagnostic — pure EDIIS plateaus near convergence; prefer the hybrid below.

SCFAccelerator.ADIIS

Augmented-Roothaan-Hall DIIS (Hu & Yang 2010). The energy-functional sibling of EDIIS — minimises the ARH energy model expanded about the most recent iterate, same positive-simplex QP.

Diagnostic / comparison — Garza/Scuseria 2012 found it near-identical to EDIIS at the HF level; like pure EDIIS it plateaus near convergence.

SCFAccelerator.EDIIS_DIIS

Default (molecular, v0.8.0). Garza/Scuseria 2012 hybrid. Use EDIIS while ‖e‖_F > ediis_diis_switch_threshold (default 0.1), then switch to plain DIIS for the asymptotic regime.

The recommended setting for transition-metal complexes, broken-symmetry singlets, and any case where plain DIIS plateaus or oscillates. Matches PySCF / ORCA defaults.

opts.scf_accelerator = vq.SCFAccelerator.EDIIS_DIIS    # production hybrid
opts.ediis_diis_switch_threshold = 1e-1                # default, PySCF-style
opts.diis_subspace_size = 8                            # rolling-history cap

Per-backend accelerator support

The same scf_accelerator field is on every molecular and periodic options class, but the set of values each backend actually implements is narrower. Pinning a value that the chosen backend does not support raises a clear NotImplementedError with a pointer to the working route; nothing is silently downgraded.

Backend

DIIS

KDIIS

EDIIS

EDIIS_DIIS

ADIIS

dynamic_damping

Default

Molecular RHF / UHF / RKS / UKS (C++)

EDIIS_DIIS (v0.8.0)

C++ direct-truncated periodic (Γ)

DIIS

C++ direct-truncated periodic (multi-k)

❌¹

DIIS

Python Γ-Ewald (RHF/RKS/UHF/UKS)

DIIS

Python multi-k Ewald (RHF/RKS/UHF)

✅³

✅⁴

✅⁴

✅⁴

DIIS

Python BIPOLE (RHF/RKS/UHF/UKS)

✅³

✅⁴

✅⁴

✅⁴

DIIS

Python GDF (Γ + multi-k)

❌²

❌²

❌²

❌²

❌²

DIIS

¹ KDIIS multi-k on the C++ direct-truncated path is still deferred (per-k MO-basis projection not yet wired into the C++ kernel — cpp/src/periodic_scf.cpp:406-411); selecting it there raises a clear error.

² Non-DIIS accelerators and dynamic_damping = True raise NotImplementedError (via python/vibeqc/periodic_rhf_ewald.py::_reject_unsupported_python_accelerator) with a pointer to the C++ direct-truncated path. Only the multi-k Python GDF backend still relies on this gate today (BIPOLE and multi-k Ewald migrated in M2c–M3); GDF rollout is the M4 milestone.

³ KDIIS on the Python multi-k Ewald path uses the per-k orbital- rotation-gradient design landed in M2c (MultiKPeriodicSCFAccelerator in python/vibeqc/periodic_scf_accelerators.py): g(k)_{ai} = (C(k)^† F(k) C(k))_{ai} with a Pulay B-matrix B_{ij} = Σ_k w_k · Re Tr(g_i(k)^† g_j(k)). This is a strict extension of the molecular and Γ-point KDIIS error metric to a k-weighted sum, and is what’s used on every Python multi-k Ewald driver today.

⁴ EDIIS / ADIIS / EDIIS_DIIS on the Python multi-k Ewald path use the stacked-real-block bridge landed in M2e (per_k_to_stacked_real_blocks / stacked_real_blocks_to_per_k in python/vibeqc/periodic_scf_accelerators.py). Each per-k Hermitian matrix M(k) = M_R(k) + i M_I(k) is mapped to two real blocks √w_k · M_R(k) and √w_k · M_I(k), stacked across k. The C++ block-vector kernel’s sum-of-Frobenius bilinear form on a paired stack then evaluates to Σ_k w_k Re Tr[F(k) D(k)] — exactly the periodic per-k energy bilinear form used by E_elec = ½ Σ_k w_k Re Tr[D(k)(Hcore(k) + F(k))] in the multi-k Ewald driver — so the EDIIS / ADIIS QP linear term (E in Hartree) and quadratic cross-term are in matching units and the QP minimum coincides with the per-k energy minimum on the simplex.

The bridge supersedes the M2b Bloch (per-k ↔ per-cell) bridge, which was found to misweight the QP cross-term: the inv_bloch composition multiplies the per-cell inner product by N_g · w_k² rather than w_k, so even on Bloch-Floquet matched cells the QP coefficient set drifted from the per-k EDIIS minimum, and convergence failed end-to-end on a kmesh=[2,1,1] H₂ chain (commit 5c333372). The stacked-real bridge avoids the per-cell representation entirely. EDIIS / ADIIS are asymptotically slower than DIIS for tight conv_tol_grad (no quadratic-information correction past the convex-hull guess); allow extra max_iter if pinning either of them as the sole accelerator, or prefer EDIIS_DIIS (the hybrid inherits DIIS’s fast asymptote once the commutator-error norm drops below ediis_diis_switch_threshold).

The C++ direct-truncated multi-k EDIIS / EDIIS_DIIS / ADIIS branches maintain a full per-cell history: each iterate stores its complete LatticeMatrixSet of Fock and density blocks, and the simplex-QP cross-term ⟨F_i | D_j⟩ = Σ_g (F_i(g) D_j(g)).sum() is summed over the real-space cell list. That sum is exactly the periodic energy bilinear form E_elec = ½ Σ_g (P(g) [H(g) + F(g)]).sum(), so the EDIIS / ADIIS energy functional minimised by the QP coincides with the SCF energy iterate-by-iterate — no Γ-fold approximation, no uniform-per-cell distribution of a Γ-only extrapolation. (Plain DIIS still extrapolates the Γ-folded Fock against a Γ-folded commutator error; a per-cell commutator-error metric is a separate roadmap item.)


Second-order finalizers — pick zero or one

These are opt-in (default off, threshold 0). When activated — by the SCF gradient norm dropping below the configured threshold — the SCF swaps the diagonalize-F update for a step on the orbital-rotation manifold. DIIS / KDIIS / EDIIS+DIIS / damping / level-shift / FMIXING are all skipped during the second-order phase (the second-order step is its own update mechanism).

The four finalizers are mutually exclusive. If more than one is configured, the precedence is:

quadratic_fallback (C1c)  >  Newton (D2c)  >  TRAH (D2e)  >  SOSCF (D2d)

Field

Algorithm

Hessian

Per-step cost

Convergence

When to use

quadratic_fallback_iter > 0

Single damped Newton step in MO space

Diagonal preconditioner only

One eigsolve, no Fock build

Linear

Cheap escape from DIIS oscillation; the C1c periodic-SCF fallback.

newton_threshold > 0

Newton (D2c). Full orbital Hessian via preconditioned CG.

Full Hessian, exact application via CG matvecs through JKBuilder (+ XCKernelBuilder for KS)

Multiple Fock builds per step (one per CG iter, ~5-15 typical); KS adds one W^XC matvec per CG iter

Quadratic in ~3-5 outer iters

The recommended finalizer for the asymptotic regime. RHF / UHF / RKS / UKS, across LDA + GGA + hybrid-GGA (UKS-GGA via the Phase 17e polarised GGA f_xc kernel). Meta-GGA second-order SCF still raises — τ-dependent polarised f_xc is unplumbed.

trah_threshold > 0

TRAH (D2e Helmich-Paris 2022). Same Hessian + CG as Newton, plus Powell-ρ adaptive trust radius.

Full Hessian, exact application

Same as Newton plus a ρ-driven trust-radius update each iter

Quadratic in the trust region; more robust than Newton when the initial trust is too large

Hard-convergence cases where Newton’s fixed trust radius overshoots. Same coverage matrix as Newton.

soscf_threshold > 0

SOSCF (D2d Neese 2000). Augmented-Hessian eigsolve with diagonal-dominant Hessian.

Diagonal Hessian + AH λ-shift; carries V_xc only through F^MO

Single small eigsolve, no Fock build

Linear (plateaus for pure DFT — see note)

Cheaper than Newton when Newton’s matvec is too expensive. All four flavours; works at tight 1e-9 Ha for HF / hybrid functionals, plateaus at ~1e-4 Ha for pure DFT (V_xc[ρ+δρ]−V_xc[ρ] is sizable in pure DFT but absent from SOSCF’s diagonal-only Hessian — that’s literally the motivation for Newton/TRAH).

# Recommended production stack: EDIIS+DIIS warm-up + Newton finalizer
opts.scf_accelerator = vq.SCFAccelerator.EDIIS_DIIS     # default
opts.newton_threshold = 1.0              # Fischer-Almlöf 1992 convention
opts.newton_opts.cg_tol = 1e-4
opts.newton_opts.trust_radius = 0.3

# Same recipe works for RKSOptions / UKSOptions — Newton/TRAH carry the
# XC contribution to the orbital Hessian automatically (the driver
# builds an XCKernelBuilder pinned at the current density each iter).
# Example (H₂O / cc-pVDZ / PBE on planetx): DIIS = 18 iter, Newton = 11 iter,
# ΔE = 4e-14 Ha. Same iter-count win on B3LYP and LDA.

# Cheaper alternative: TRAH instead of Newton (same matvec; adaptive radius)
opts.trah_threshold = 1.0
opts.trah_opts.initial_trust_radius = 0.3
opts.trah_opts.max_trust_radius = 0.7

# Cheap-per-step alternative: SOSCF (no Fock build inside the step)
opts.soscf_threshold = 1.0               # use 0.01 for pure-DFT UKS / RKS
opts.soscf_opts.trust_radius = 0.3

KS extension (v0.8.0). RKSOptions / UKSOptions expose the same newton_threshold / trah_threshold / soscf_threshold / *_opts fields as RHFOptions / UHFOptions. When activated, the RKS / UKS driver builds an XC kernel pinned at the current SCF density and passes it through to the Hessian matvec:

  • RKSmake_unpolarised_xc_kernel_builder (closed-shell LDA / GGA / hybrid-GGA).

  • UKSmake_polarised_xc_kernel_builder, which dispatches by functional kind: LDA → polarised-LDA builder, GGA / hybrid-GGA → polarised-GGA builder (the Phase 17e spin-polarised f_xc kernel, with the αβ coupling carried through v2rho2_αβ, v2rhosigma_*, v2sigma2_* and the σ_αβ cross term).

Meta-GGA second-order SCF (RKS or UKS) is still gated — the τ-dependent f_xc is unplumbed and the kernel factory raises with a clear roadmap pointer for that functional class.

The trace’s SCFIteration.newton_cg_iter field records CG iters per Newton step (0 on iterations that didn’t enter Newton).


Application order in the SCF inner loop

Every iteration runs the same pipeline; each enabled knob slots into a fixed position. The standard CRYSTAL/ORCA order:

for iter in 1..max_iter:
    1. D_used  = damping(D_prev, D)              # static or dynamic α
    2. F       = Hcore + G[D_used]                # JKBuilder
    3. F       = fock_mixing(F, F_prev)           # CRYSTAL FMIXING
    4. F       = level_shift(F, S, D_used)        # Saunders-Hillier
    5. F_extra = accelerator.extrapolate(F, ...)  # DIIS | KDIIS | EDIIS | EDIIS+DIIS
    6. C, eps  = step(F_extra, ...)               # diagonalize OR Newton OR SOSCF OR quadratic
    7. D       = build_density(C)
    8. update dynamic damping α from energy decrease (if enabled)
    9. check convergence

The accelerator (step 5) is fed every iteration so its history builds up; the extrapolated F replaces the raw F only once iter >= diis_start_iter. The second-order finalizers (step 6, Newton / SOSCF) replace the diagonalize-F step entirely once their gradient threshold is crossed; in that case steps 5 (extrapolation) and step 1 (damping) are skipped — the second-order step is its own update.



Diagnosing non-convergence

Every result struct carries a full scf_trace:

for it in result.scf_trace:
    print(f"iter {it.iter:3d}  E = {it.energy:+.10f}  "
          f"dE = {it.delta_e:+.2e}  |grad| = {it.grad_norm:.2e}  "
          f"diis dim = {it.diis_subspace}  "
          f"newton CG = {it.newton_cg_iter}")

Pretty-format utilities:

from vibeqc import format_scf_trace, log_scf_trace
print(format_scf_trace(result))
log_scf_trace(result)         # emits via Python logging

format_scf_trace and log_scf_trace are post-mortem — they render the trace stored on the result object after the SCF returns. For live per-iteration output during a long run (the typical nohup > log 2>&1 & ; tail -f log workflow on a remote box), pass progress=True to any periodic SCF entry point or to run_job. See Output files → Progress logging for the full description and the entry-point coverage table.

If the energy oscillates: try dynamic_damping = True, increase damping to 0.7, switch to accelerator = KDIIS or EDIIS_DIIS, or add level_shift = 0.2.

If the energy diverges: check your initial guess (try initial_guess = SAD) and geometry — a bad structure will never converge. The periodic SCF additionally raises a “POSSIBLY CONDUCTING STATE” warning when convergence stalls in a way consistent with metallic character (try smearing).


Parallelism

vibe-qc’s compute-heavy kernels — molecular and periodic Fock builds, analytic gradients, lattice-summed one-electron integrals, the Ewald lattice sums, and the AO evaluation used by DFT — are parallelised with OpenMP via a one-engine-per-thread pool.

Three ways to set the thread count, in increasing precedence:

  1. OMP_NUM_THREADS environment variable, before launching Python:

    OMP_NUM_THREADS=8 python my-calc.py
    
  2. vibeqc.set_num_threads(n) from Python, pinning the count for the rest of the process:

    import vibeqc
    vibeqc.set_num_threads(8)
    print(vibeqc.get_num_threads())   # 8
    

    n <= 0 restores the default (reads OMP_NUM_THREADS or falls back to the hardware logical-core count).

  3. num_threads= keyword argument on vibeqc.run_job:

    run_job(mol, basis="6-31g*", method="rhf", output="h2o",
            num_threads=4)
    

    The actual thread count is recorded in the output file as Threads: 4  (OpenMP shared-memory parallelism).

Every .out file includes a timing block at the end:

Timings (wall clock, seconds)
----------------------------------------------------
SCF total                              0.326
SCF avg. per iteration                 0.036  (9 iters)
Job total                              0.328
Used 4 OpenMP threads.

For systematic benchmarking, scripts/bench.py in the repository runs a small fixed suite across a sweep of thread counts and prints a speedup table.

Good scaling requires enough work per thread — tiny test systems (diatomics in minimal bases) won’t show much because the OpenMP start-up overhead dominates. Bigger molecules and periodic calculations with many lattice cells benefit much more (up to near-linear in the Fock build).