SCF convergence¶
vibe-qc’s SCF surface is uniform across the four molecular drivers
(run_rhf / run_uhf / run_rks / run_uks). The periodic drivers
(run_rhf_periodic_gamma, multi-k variants) expose the same option
fields, but accelerator dispatch is currently partial outside the
Γ-only direct path; see SCF accelerators
below for the exact coverage matrix. The user-facing selection knobs
split into four groups:
Fock-build mode — how the two-electron matrices are constructed each iteration (
opts.scf_mode). Default isSCFMode.AUTO, which picks between in-core CONVENTIONAL and on-the-fly screened DIRECT based on basis-set size. See SCF Fock-build modes.Initial guess — where the SCF starts from (
opts.initial_guess). Default isInitialGuess.AUTO, which routes through the unifiedGuessEngine: closed-shell light-atom molecular → SAP, open-shell / transition-metal / any periodic → SAD, with HCORE / PATOM / HUECKEL / MINAO / READ available as explicit selections. The richer story (theory, math, when to use which, references) lives in Initial guesses; the brief summary here is enough to pick a knob.Convergence aids — composable knobs that stack freely (
damping,dynamic_damping,fock_mixing,level_shift,smearing_temperaturefor periodic).SCF accelerators — mutually exclusive choices for how the Fock matrix is extrapolated each iteration (
scf_accelerator).Second-order finalizers — mutually exclusive opt-in choices for how the asymptotic regime is closed out (
newton_threshold,soscf_threshold,trah_threshold,quadratic_fallback_iter).
Quick example exercising every group:
import vibeqc as vq
from vibeqc import RHFOptions
opts = RHFOptions()
# Stopping criteria
opts.max_iter = 100
opts.conv_tol_energy = 1e-8
opts.conv_tol_grad = 1e-6
# Initial guess (see docs/user_guide/initial_guess.md for details)
opts.initial_guess = vq.InitialGuess.AUTO # AUTO / SAP / SAD / HCORE
# Convergence aids — compose freely
opts.damping = 0.5 # static density mixing
opts.dynamic_damping = False # adaptive α (Z-H 1979)
opts.fock_mixing = 0.0 # CRYSTAL FMIXING
opts.level_shift = 0.0 # Saunders-Hillier
# SCF accelerator — pick one (molecular default: EDIIS_DIIS)
opts.scf_accelerator = vq.SCFAccelerator.EDIIS_DIIS # default; or DIIS / KDIIS / EDIIS / ADIIS
# Second-order finalizer — opt in for the asymptotic regime
opts.newton_threshold = 0.0 # D2c full-Hessian Newton
opts.soscf_threshold = 0.0 # D2d Neese SOSCF
opts.trah_threshold = 0.0 # D2e TRAH (trust-region AH)
All of these options exist on UHFOptions, RKSOptions, UKSOptions,
PeriodicRHFOptions, PeriodicSCFOptions, PeriodicKSOptions. The
periodic options additionally expose level_shift_warmup_cycles and
smearing_temperature for metals / small-gap insulators.
Initial guess¶
The unified GuessEngine (v0.9.x) dispatches every initial-guess
request to a concrete builder. Default for every options class is
InitialGuess.AUTO, which inspects the system and picks the right
guess automatically.
|
Algorithm |
Status |
When to use |
|---|---|---|---|
|
|
✅ shipped |
Default — let the engine decide. |
|
Superposition of atomic densities — fractional-occupation atomic SCF per unique element, summed at AO indices |
✅ shipped |
Open-shell, transition / f-block atoms, any periodic system. AUTO routes here. |
|
Superposition of atomic potentials (Lehtola/Visscher/Engel 2020) — sum tabulated atomic effective potentials, diagonalise |
✅ shipped (v0.9.x) |
Closed-shell molecules with light atoms. AUTO routes here. Cheaper than SAD (no per-element atomic SCF). |
|
Diagonalise H_core; build D from occupied MOs |
✅ shipped |
Diagnostic / back-compat. Slow on most systems; known to lock onto false minima for some open-shell cases (OH·/6-31G*). |
|
Atomic-block diagonal Fock guess (ORCA |
🟡 scaffolded |
Roadmap. |
|
Extended Hückel |
🟡 scaffolded |
Roadmap. |
|
Minimal-AO projection (PySCF default) |
🟡 scaffolded |
Roadmap. |
|
Read prior orbitals (chkfile / restart) |
🟡 scaffolded |
Roadmap. |
AUTO dispatch table (GuessEngine::resolve_auto):
Hint |
AUTO resolves to |
|---|---|
periodic, any |
SAD |
molecular, open-shell |
SAD |
molecular, transition / f-block atom |
SAD |
molecular, closed-shell, light atoms |
SAP |
opts.initial_guess = vq.InitialGuess.AUTO # default; engine picks
opts.initial_guess = vq.InitialGuess.SAD # explicit SAD
opts.initial_guess = vq.InitialGuess.SAP # explicit SAP (closed-shell light atoms)
opts.initial_guess = vq.InitialGuess.HCORE # diagnostic / back-compat
The v0.6.2 periodic-HCORE calibration freeze was lifted in v0.9.x:
periodic drivers now default to AUTO → SAD for any periodic system
(fixes the NaCl/MgO bombing class of failure). Converged energies are
bit-identical across guess choices (the initial guess can’t change
the converged minimum); only iteration counts and trace shape differ.
Convergence aids — compose freely¶
These are independent knobs. Set any subset; they all stack on top of whichever SCF accelerator and finalizer you’ve selected.
Static damping¶
Linear density mixing before the next Fock build:
D_used = α · D_prev + (1 − α) · D_new
opts.damping = 0.5 # default; 0.0 disables
Skipped automatically once DIIS / KDIIS / EDIIS / EDIIS+DIIS takes over (the accelerator does the same job better). Skipped during the second-order phases (Newton / SOSCF / quadratic_fallback).
Dynamic damping (Zerner-Hehenberger 1979)¶
Adapts α iteration-by-iteration based on the energy decrease:
increased toward dynamic_damping_max when the energy oscillates
upward, decreased toward dynamic_damping_min when the energy is
monotonically decreasing.
opts.damping = 0.5 # initial α
opts.dynamic_damping = True # adapt over iterations
opts.dynamic_damping_min = 0.0 # default; lower bound
opts.dynamic_damping_max = 0.95 # default; upper bound
When false (default), damping is held constant — bit-for-bit
back-compat with the pre-dynamic-damping path.
CRYSTAL FMIXING — static Fock-matrix mixing¶
F_diag = (1 − α_F) · F_current + α_F · F_previous
Independent of density damping. Stabilises the residual that density damping leaves oscillating on tight ionic crystals.
opts.fock_mixing = 0.0 # default; 0.30 mirrors CRYSTAL FMIXING 30
Saunders-Hillier level shift¶
Adds b · (S − ½ S D S) to F before diagonalisation, raising
virtual-orbital eigenvalues by b Hartree while leaving occupied
orbitals untouched. Inert at the converged density (the SCF fixed
point doesn’t change), only the iteration dynamics do. Useful when
DIIS oscillates between near-degenerate occupied/virtual swaps on
small-HOMO–LUMO-gap insulators.
opts.level_shift = 0.0 # default; typical values: 0.1 – 0.5 Hartree
For periodic drivers, an additional level_shift_warmup_cycles
controls a CRYSTAL-style warm-up: -1 for the automatic five-cycle
startup with at least one unshifted tail cycle, 0 for a persistent
shift, or a positive integer for an explicit restart point.
Smearing (periodic only)¶
Fermi-Dirac fractional occupations for metals / small-gap insulators. Adds an electronic-entropy contribution to the free energy.
opts.smearing_temperature = 0.0 # default; 0.001 – 0.02 Hartree typical
SCF accelerators — pick one¶
|
Algorithm |
When to use |
|---|---|---|
|
Pulay 1980/1982. Error vector is the AO commutator |
Standard organic-chemistry workhorse; use when you want bit-for-bit reproducibility against older vibe-qc / PySCF runs. |
|
Kollmar 2005 (ORCA’s strict-convergence default). Same Pulay machinery, error vector is the orbital-rotation gradient |
Often robust where DIIS oscillates on transition-metal / open-shell cases where the AO-commutator is dominated by the wrong eigenmodes. |
|
Energy-DIIS (Kudin/Scuseria/Cancès 2002). Minimises a quadratic energy functional on the convex hull of stored (F, D) pairs. |
Diagnostic — pure EDIIS plateaus near convergence; prefer the hybrid below. |
|
Augmented-Roothaan-Hall DIIS (Hu & Yang 2010). The energy-functional sibling of EDIIS — minimises the ARH energy model expanded about the most recent iterate, same positive-simplex QP. |
Diagnostic / comparison — Garza/Scuseria 2012 found it near-identical to EDIIS at the HF level; like pure EDIIS it plateaus near convergence. |
|
Default (molecular, v0.8.0). Garza/Scuseria 2012 hybrid. Use EDIIS while |
The recommended setting for transition-metal complexes, broken-symmetry singlets, and any case where plain DIIS plateaus or oscillates. Matches PySCF / ORCA defaults. |
opts.scf_accelerator = vq.SCFAccelerator.EDIIS_DIIS # production hybrid
opts.ediis_diis_switch_threshold = 1e-1 # default, PySCF-style
opts.diis_subspace_size = 8 # rolling-history cap
Per-backend accelerator support¶
The same scf_accelerator field is on every molecular and periodic
options class, but the set of values each backend actually
implements is narrower. Pinning a value that the chosen backend
does not support raises a clear NotImplementedError with a
pointer to the working route; nothing is silently downgraded.
Backend |
DIIS |
KDIIS |
EDIIS |
EDIIS_DIIS |
ADIIS |
dynamic_damping |
Default |
|---|---|---|---|---|---|---|---|
Molecular RHF / UHF / RKS / UKS (C++) |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
C++ direct-truncated periodic (Γ) |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
C++ direct-truncated periodic (multi-k) |
✅ |
❌¹ |
✅ |
✅ |
✅ |
✅ |
|
Python Γ-Ewald (RHF/RKS/UHF/UKS) |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
Python multi-k Ewald (RHF/RKS/UHF) |
✅ |
✅³ |
✅⁴ |
✅⁴ |
✅⁴ |
✅ |
|
Python BIPOLE (RHF/RKS/UHF/UKS) |
✅ |
✅³ |
✅⁴ |
✅⁴ |
✅⁴ |
✅ |
|
Python GDF (Γ + multi-k) |
✅ |
❌² |
❌² |
❌² |
❌² |
❌² |
|
¹ KDIIS multi-k on the C++ direct-truncated path is still deferred
(per-k MO-basis projection not yet wired into the C++ kernel —
cpp/src/periodic_scf.cpp:406-411); selecting it there raises a
clear error.
² Non-DIIS accelerators and dynamic_damping = True raise
NotImplementedError (via
python/vibeqc/periodic_rhf_ewald.py::_reject_unsupported_python_accelerator)
with a pointer to the C++ direct-truncated path. Only the multi-k
Python GDF backend still relies on this gate today (BIPOLE and
multi-k Ewald migrated in M2c–M3); GDF rollout is the M4 milestone.
³ KDIIS on the Python multi-k Ewald path uses the per-k orbital-
rotation-gradient design landed in M2c (MultiKPeriodicSCFAccelerator
in python/vibeqc/periodic_scf_accelerators.py):
g(k)_{ai} = (C(k)^† F(k) C(k))_{ai} with a Pulay B-matrix
B_{ij} = Σ_k w_k · Re Tr(g_i(k)^† g_j(k)). This is a strict
extension of the molecular and Γ-point KDIIS error metric to a
k-weighted sum, and is what’s used on every Python multi-k Ewald
driver today.
⁴ EDIIS / ADIIS / EDIIS_DIIS on the Python multi-k Ewald path use
the stacked-real-block bridge landed in M2e
(per_k_to_stacked_real_blocks / stacked_real_blocks_to_per_k
in python/vibeqc/periodic_scf_accelerators.py). Each per-k
Hermitian matrix M(k) = M_R(k) + i M_I(k) is mapped to two
real blocks √w_k · M_R(k) and √w_k · M_I(k), stacked across
k. The C++ block-vector kernel’s sum-of-Frobenius bilinear form
on a paired stack then evaluates to
Σ_k w_k Re Tr[F(k) D(k)] — exactly the periodic per-k energy
bilinear form used by E_elec = ½ Σ_k w_k Re Tr[D(k)(Hcore(k) + F(k))] in the multi-k Ewald driver — so the EDIIS / ADIIS QP
linear term (E in Hartree) and quadratic cross-term are in
matching units and the QP minimum coincides with the per-k
energy minimum on the simplex.
The bridge supersedes the M2b Bloch (per-k ↔ per-cell)
bridge, which was found to misweight the QP cross-term: the
inv_bloch composition multiplies the per-cell inner product by
N_g · w_k² rather than w_k, so even on Bloch-Floquet matched
cells the QP coefficient set drifted from the per-k EDIIS
minimum, and convergence failed end-to-end on a kmesh=[2,1,1]
H₂ chain (commit 5c333372). The stacked-real bridge avoids
the per-cell representation entirely. EDIIS / ADIIS are
asymptotically slower than DIIS for tight conv_tol_grad
(no quadratic-information correction past the convex-hull
guess); allow extra max_iter if pinning either of them as
the sole accelerator, or prefer EDIIS_DIIS (the hybrid
inherits DIIS’s fast asymptote once the commutator-error norm
drops below ediis_diis_switch_threshold).
The C++ direct-truncated multi-k EDIIS / EDIIS_DIIS / ADIIS branches
maintain a full per-cell history: each iterate stores its complete
LatticeMatrixSet of Fock and density blocks, and the simplex-QP
cross-term ⟨F_i | D_j⟩ = Σ_g (F_i(g) ⊙ D_j(g)).sum() is summed over
the real-space cell list. That sum is exactly the periodic energy
bilinear form E_elec = ½ Σ_g (P(g) ⊙ [H(g) + F(g)]).sum(), so the
EDIIS / ADIIS energy functional minimised by the QP coincides with
the SCF energy iterate-by-iterate — no Γ-fold approximation, no
uniform-per-cell distribution of a Γ-only extrapolation. (Plain DIIS
still extrapolates the Γ-folded Fock against a Γ-folded commutator
error; a per-cell commutator-error metric is a separate roadmap
item.)
Second-order finalizers — pick zero or one¶
These are opt-in (default off, threshold 0). When activated — by the SCF gradient norm dropping below the configured threshold — the SCF swaps the diagonalize-F update for a step on the orbital-rotation manifold. DIIS / KDIIS / EDIIS+DIIS / damping / level-shift / FMIXING are all skipped during the second-order phase (the second-order step is its own update mechanism).
The four finalizers are mutually exclusive. If more than one is configured, the precedence is:
quadratic_fallback (C1c) > Newton (D2c) > TRAH (D2e) > SOSCF (D2d)
Field |
Algorithm |
Hessian |
Per-step cost |
Convergence |
When to use |
|---|---|---|---|---|---|
|
Single damped Newton step in MO space |
Diagonal preconditioner only |
One eigsolve, no Fock build |
Linear |
Cheap escape from DIIS oscillation; the C1c periodic-SCF fallback. |
|
Newton (D2c). Full orbital Hessian via preconditioned CG. |
Full Hessian, exact application via CG matvecs through |
Multiple Fock builds per step (one per CG iter, ~5-15 typical); KS adds one W^XC matvec per CG iter |
Quadratic in ~3-5 outer iters |
The recommended finalizer for the asymptotic regime. RHF / UHF / RKS / UKS, across LDA + GGA + hybrid-GGA (UKS-GGA via the Phase 17e polarised GGA |
|
TRAH (D2e Helmich-Paris 2022). Same Hessian + CG as Newton, plus Powell-ρ adaptive trust radius. |
Full Hessian, exact application |
Same as Newton plus a ρ-driven trust-radius update each iter |
Quadratic in the trust region; more robust than Newton when the initial trust is too large |
Hard-convergence cases where Newton’s fixed trust radius overshoots. Same coverage matrix as Newton. |
|
SOSCF (D2d Neese 2000). Augmented-Hessian eigsolve with diagonal-dominant Hessian. |
Diagonal Hessian + AH λ-shift; carries V_xc only through F^MO |
Single small eigsolve, no Fock build |
Linear (plateaus for pure DFT — see note) |
Cheaper than Newton when Newton’s matvec is too expensive. All four flavours; works at tight 1e-9 Ha for HF / hybrid functionals, plateaus at ~1e-4 Ha for pure DFT (V_xc[ρ+δρ]−V_xc[ρ] is sizable in pure DFT but absent from SOSCF’s diagonal-only Hessian — that’s literally the motivation for Newton/TRAH). |
# Recommended production stack: EDIIS+DIIS warm-up + Newton finalizer
opts.scf_accelerator = vq.SCFAccelerator.EDIIS_DIIS # default
opts.newton_threshold = 1.0 # Fischer-Almlöf 1992 convention
opts.newton_opts.cg_tol = 1e-4
opts.newton_opts.trust_radius = 0.3
# Same recipe works for RKSOptions / UKSOptions — Newton/TRAH carry the
# XC contribution to the orbital Hessian automatically (the driver
# builds an XCKernelBuilder pinned at the current density each iter).
# Example (H₂O / cc-pVDZ / PBE on planetx): DIIS = 18 iter, Newton = 11 iter,
# ΔE = 4e-14 Ha. Same iter-count win on B3LYP and LDA.
# Cheaper alternative: TRAH instead of Newton (same matvec; adaptive radius)
opts.trah_threshold = 1.0
opts.trah_opts.initial_trust_radius = 0.3
opts.trah_opts.max_trust_radius = 0.7
# Cheap-per-step alternative: SOSCF (no Fock build inside the step)
opts.soscf_threshold = 1.0 # use 0.01 for pure-DFT UKS / RKS
opts.soscf_opts.trust_radius = 0.3
KS extension (v0.8.0). RKSOptions / UKSOptions expose the same
newton_threshold / trah_threshold / soscf_threshold / *_opts
fields as RHFOptions / UHFOptions. When activated, the RKS / UKS
driver builds an XC kernel pinned at the current SCF density and
passes it through to the Hessian matvec:
RKS —
make_unpolarised_xc_kernel_builder(closed-shell LDA / GGA / hybrid-GGA).UKS —
make_polarised_xc_kernel_builder, which dispatches by functional kind: LDA → polarised-LDA builder, GGA / hybrid-GGA → polarised-GGA builder (the Phase 17e spin-polarisedf_xckernel, with the αβ coupling carried throughv2rho2_αβ,v2rhosigma_*,v2sigma2_*and theσ_αβcross term).
Meta-GGA second-order SCF (RKS or UKS) is still gated — the
τ-dependent f_xc is unplumbed and the kernel factory raises with a
clear roadmap pointer for that functional class.
The trace’s SCFIteration.newton_cg_iter field records CG iters per
Newton step (0 on iterations that didn’t enter Newton).
Application order in the SCF inner loop¶
Every iteration runs the same pipeline; each enabled knob slots into a fixed position. The standard CRYSTAL/ORCA order:
for iter in 1..max_iter:
1. D_used = damping(D_prev, D) # static or dynamic α
2. F = Hcore + G[D_used] # JKBuilder
3. F = fock_mixing(F, F_prev) # CRYSTAL FMIXING
4. F = level_shift(F, S, D_used) # Saunders-Hillier
5. F_extra = accelerator.extrapolate(F, ...) # DIIS | KDIIS | EDIIS | EDIIS+DIIS
6. C, eps = step(F_extra, ...) # diagonalize OR Newton OR SOSCF OR quadratic
7. D = build_density(C)
8. update dynamic damping α from energy decrease (if enabled)
9. check convergence
The accelerator (step 5) is fed every iteration so its history builds
up; the extrapolated F replaces the raw F only once
iter >= diis_start_iter. The second-order finalizers (step 6,
Newton / SOSCF) replace the diagonalize-F step entirely once
their gradient threshold is crossed; in that case steps 5
(extrapolation) and step 1 (damping) are skipped — the second-order
step is its own update.
Recommended workflows¶
System |
Recommended stack |
|---|---|
Routine closed-shell molecule (organic, main-group) |
|
Hard molecule (TM complex, broken-symmetry, small-gap) |
|
Open-shell (UHF/UKS doublet, triplet) |
Same as hard molecule + |
Periodic insulator (small-medium cell) |
|
Periodic ionic crystal (NaCl, MgO, …) |
Add |
Periodic with charge sloshing (metallic, small gap) |
Add |
Anywhere DIIS oscillates |
Try |
When Newton/TRAH pay off (v0.8.0 benchmark)¶
A representative benchmark on planetx (Linux x86-64, OpenBLAS+LAPACKE,
4 CPUs) — full table in scripts/bench_scf_second_order.py:
System |
Functional |
DIIS iter / s |
Newton iter / s |
Iter win |
Wall win |
|---|---|---|---|---|---|
H₂O / cc-pVDZ |
LDA |
19 / 2.5s |
11 / 4.9s |
1.7× |
0.5× |
H₂O / cc-pVDZ |
PBE |
18 / 2.7s |
11 / 6.5s |
1.6× |
0.4× |
H₂O / cc-pVDZ |
B3LYP |
18 / 3.6s |
10 / 8.6s |
1.8× |
0.4× |
Benzene / 6-31G* |
PBE |
26 / 41.7s |
18 / 84.2s |
1.4× |
0.5× |
Benzene / 6-31G* |
B3LYP |
18 / 30.0s |
16 / 80.2s |
1.1× |
0.4× |
OH• / cc-pVDZ |
UKS-LDA |
136 / 8.8s |
12 / 4.7s |
11× |
1.9× |
Two patterns:
Iter count always wins — Newton’s quadratic convergence kicks in near the SCF fixed point, so even on easy systems it saves 5-10 iters. Energies match DIIS to 1e-12 Ha or tighter.
Wall time depends on Fock-build cost vs CG iter count. Each Newton step does ~5-15 Fock builds (one per preconditioned-CG iter). DIIS does one Fock build per iter. So Newton wins wall time only when the iter saving outweighs the per-iter Fock-build overhead. The benzene rows show the unfavourable case (small basis, no DF — Fock-build is cheap, DIIS already converges quickly). The OH• row shows the dramatic favourable case (DIIS struggles with open-shell radicals + small HOMO-LUMO gaps; Newton bypasses the oscillation entirely).
Practical recipe: for routine closed-shell organics keep the
default (EDIIS_DIIS, no second-order finalizer). Enable
newton_threshold = 1.0 whenever DIIS takes more than ~30 iters
or the system has known SCF-difficulty markers (open-shell, near-
degeneracies, TM complexes, small gap, broken symmetry).
Diagnosing non-convergence¶
Every result struct carries a full scf_trace:
for it in result.scf_trace:
print(f"iter {it.iter:3d} E = {it.energy:+.10f} "
f"dE = {it.delta_e:+.2e} |grad| = {it.grad_norm:.2e} "
f"diis dim = {it.diis_subspace} "
f"newton CG = {it.newton_cg_iter}")
Pretty-format utilities:
from vibeqc import format_scf_trace, log_scf_trace
print(format_scf_trace(result))
log_scf_trace(result) # emits via Python logging
format_scf_trace and log_scf_trace are post-mortem — they render
the trace stored on the result object after the SCF returns. For live
per-iteration output during a long run (the typical
nohup … > log 2>&1 & ; tail -f log workflow on a remote box), pass
progress=True to any periodic SCF entry point or to run_job. See
Output files → Progress logging for
the full description and the entry-point coverage table.
If the energy oscillates: try dynamic_damping = True, increase
damping to 0.7, switch to accelerator = KDIIS or EDIIS_DIIS,
or add level_shift = 0.2.
If the energy diverges: check your initial guess (try
initial_guess = SAD) and geometry — a bad structure will never
converge. The periodic SCF additionally raises a “POSSIBLY CONDUCTING
STATE” warning when convergence stalls in a way consistent with
metallic character (try smearing).
Parallelism¶
vibe-qc’s compute-heavy kernels — molecular and periodic Fock builds, analytic gradients, lattice-summed one-electron integrals, the Ewald lattice sums, and the AO evaluation used by DFT — are parallelised with OpenMP via a one-engine-per-thread pool.
Three ways to set the thread count, in increasing precedence:
OMP_NUM_THREADSenvironment variable, before launching Python:OMP_NUM_THREADS=8 python my-calc.py
vibeqc.set_num_threads(n)from Python, pinning the count for the rest of the process:import vibeqc vibeqc.set_num_threads(8) print(vibeqc.get_num_threads()) # 8
n <= 0restores the default (readsOMP_NUM_THREADSor falls back to the hardware logical-core count).num_threads=keyword argument onvibeqc.run_job:run_job(mol, basis="6-31g*", method="rhf", output="h2o", num_threads=4)
The actual thread count is recorded in the output file as
Threads: 4 (OpenMP shared-memory parallelism).
Every .out file includes a timing block at the end:
Timings (wall clock, seconds)
----------------------------------------------------
SCF total 0.326
SCF avg. per iteration 0.036 (9 iters)
Job total 0.328
Used 4 OpenMP threads.
For systematic benchmarking, scripts/bench.py in the repository
runs a small fixed suite across a sweep of thread counts and prints a
speedup table.
Good scaling requires enough work per thread — tiny test systems (diatomics in minimal bases) won’t show much because the OpenMP start-up overhead dominates. Bigger molecules and periodic calculations with many lattice cells benefit much more (up to near-linear in the Fock build).