Tutorial 26: Cross-validating against ORCA, Psi4, PySCF

You’ll learn: how to run the same calculation through vibe-qc and one or more external QC codes, build a comparison table, and assert numerical agreement to a target tolerance.

Why: every QC code has bugs; cross-checking against independent implementations is the gold-standard way to know your number is right. Cross-validation is also how new methods get defensible — agreement with ORCA-CCSD(T) or PySCF-MP2 is what turns “vibe-qc says X” into “vibe-qc + ORCA + PySCF agree on X.”

Prerequisites:

  • vibe-qc installed (Phase A v0.5+ for the dipole / Hessian / polarizability cross-checks; energy + forces work in any v0.4+).

  • PySCF (pip install pyscf into the vibe-qc venv — Apache-2.0, cheap, recommended).

  • Optionally ORCA, Psi4, NWChem — see the external codes user guide for install paths.

This tutorial works at three levels of degradation:

  1. vibe-qc + PySCF — the minimum, ships with the test dependency. PySCF gives most of what we need for energy + force validation.

  2. + ORCA — adds dispersion (D3-BJ), grid-density-independent DFT cross-checks, and a different integral library so a numerical bug in our integrals shows up as a real disagreement.

  3. + Psi4 — adds a third independent implementation with different conventions; useful when you need to pinpoint where a disagreement comes from.

Setup: detect what’s available

from vibeqc.benchmark import print_calculator_availability

print_calculator_availability()
# vibeqc     ✓ available  (Python-native)
# pyscf      ✓ available  (Python-native)
# orca       ✓ available  (/opt/orca_6_1_1/orca)
# psi4       ✗ missing    (Install with `pip install psi4`...)
# ...

The framework auto-detects ORCA via $ASE_ORCA_COMMAND / $ORCA_COMMAND / $ORCA_PATH / which orca. Psi4 needs import psi4 to succeed in the vibe-qc venv (see the external codes guide for the conda Python-version trap). NWChem / Gaussian / Q-Chem are similarly auto-detected if their binaries are on $PATH.

Level 1: H₂O / RHF agreement

The simplest validation: same molecule, same basis, three codes, identical SCF skeleton. Energies and forces should agree to ~µHa.

from ase.build import molecule
from vibeqc.ase import VibeQC
from vibeqc.benchmark import (
    PySCFCalculator, compare_calculators, make_orca_calculator,
)

atoms = molecule("H2O")          # ASE's stock H2O geometry

calculators = [
    ("vibe-qc/RHF/6-31G*", VibeQC(basis="6-31g*")),
    ("PySCF/RHF/6-31G*",   PySCFCalculator(basis="6-31g*",
                                            method="RHF")),
]
orca = make_orca_calculator(orcasimpleinput="HF 6-31G* EnGrad")
if orca is not None:
    calculators.append(("ORCA/RHF/6-31G*", orca))

results = compare_calculators(
    atoms,
    calculators,
    properties=("energy", "forces"),
)
results.print_table()
results.assert_agreement(reference="vibe-qc/RHF/6-31G*",
                         tol={"energy": 1e-5, "forces": 1e-4})

Expected output (verified end-to-end on the v0.5 development machine):

label              │ status │ wall_time_s │ energy          │ |F|max
────────────────────────────────────────────────────────────────────────
vibe-qc/RHF/6-31G* │ ok     │  0.04s      │ -2068.294643 eV │ 1.4934e+00
PySCF/RHF/6-31G*   │ ok     │  0.18s      │ -2068.294643 eV │ 1.4934e+00
ORCA/RHF/6-31G*    │ ok     │  0.72s      │ -2068.294643 eV │ 1.4934e+00

✓ all calculators agree to within 1e-5 eV / 1e-4 eV·Å⁻¹

Six-decimal agreement on energy AND |F|max. That’s the contract for “different codes, same RHF/basis combination”: SCF identifies the ground state to whatever convergence threshold the code is set to (typically 1e-7 Hartree), and the energy + forces are then deterministic.

Level 2: H₂O / DFT — different grids, different residuals

DFT introduces a grid-discretization error that’s calculator-specific. Each code chooses its own DFT integration grid by default; energies typically agree to a few mHa, not 1 µHa like HF.

from vibeqc.ase import VibeQC
from vibeqc.benchmark import (
    PySCFCalculator, compare_calculators, make_orca_calculator,
)
from ase.build import molecule

atoms = molecule("H2O")

calculators = [
    ("vibe-qc/PBE/6-31G*", VibeQC(basis="6-31g*",
                                  functional="PBE")),
    ("PySCF/PBE/6-31G*",   PySCFCalculator(basis="6-31g*",
                                            method="RKS",
                                            functional="PBE")),
]
orca = make_orca_calculator(
    orcasimpleinput="PBE 6-31G* EnGrad",
)
if orca is not None:
    calculators.append(("ORCA/PBE/6-31G*", orca))

results = compare_calculators(
    atoms, calculators,
    properties=("energy", "forces", "dipole"),
)
results.print_table()

# Looser tolerance for DFT — different grid choices contribute
# O(0.1-1 mHa) to the energy. Tighter agreement requires matching
# grid sizes across codes (out of scope here).
results.assert_agreement(
    reference="vibe-qc/PBE/6-31G*",
    tol={"energy": 5e-3, "forces": 1e-2, "dipole": 1e-2},
)

Expected output:

label              │ status  │ wall_time │ energy          │ |F|max     │ dipole
────────────────────────────────────────────────────────────────────────────────
vibe-qc/PBE/6-31G* │ ok      │ 1.47s     │ -2076.781785 eV │ 5.39e-01   │ 0.4256 eÅ
PySCF/PBE/6-31G*   │ partial │ 0.52s     │ -2076.781779 eV │ 5.39e-01   │ —
ORCA/PBE/6-31G*    │ ok      │ 1.19s     │ -2076.784666 eV │ 5.36e-01   │ 0.4256 eÅ

vibe-qc and ORCA agree on dipole to 4 decimals (0.4256 eÅ). PySCF row shows partial status — its energy is fine, but our shim doesn’t expose dipole through ASE (it’s not an interesting property to plumb through; the energy/force comparison is what matters here).

The 3 mHa energy gap between vibe-qc and ORCA is the per-code DFT-grid difference. Visible, but well within “chemical accuracy” (40 mHa = 1 kcal/mol). Tightening this requires aligning the grid between codes, which is method-and-grid-spec-dependent and beyond the scope of a smoke comparison.

Level 3: dispersion (D3-BJ) on the water dimer

Adds Grimme’s D3 with Becke-Johnson damping on top of HF/6-31G*. Vibe-qc and ORCA both implement D3-BJ from the same Grimme parameter set, so the dispersion contribution to the interaction energy should agree to ~µHa.

from vibeqc.benchmark import (
    PySCFCalculator, compare_calculators, make_orca_calculator,
)
from vibeqc.ase import VibeQC
from ase import Atoms

monomer = Atoms(symbols=["O", "H", "H"], positions=[
    (0.000, 0.000,  0.000), (0.762, 0.000, -0.566),
    (-0.762, 0.000, -0.566)])
dimer = Atoms(symbols=["O", "H", "H", "O", "H", "H"], positions=[
    (0.000, 0.000,  0.000), (0.762, 0.000, -0.566),
    (-0.762, 0.000, -0.566),
    (0.000, 0.000,  2.920), (-0.000, 0.787, 3.469),
    (-0.000, -0.787, 3.469)])

# With D3-BJ
panel_d3 = [
    ("vibe-qc", VibeQC(basis="6-31g*", dispersion="hf")),
]
orca = make_orca_calculator(
    orcasimpleinput="HF 6-31G* D3BJ EnGrad",
)
if orca is not None:
    panel_d3.append(("ORCA", orca))

# Compare interaction energies E(dimer) - 2·E(monomer)
e_mon = compare_calculators(monomer, panel_d3,
                            properties=("energy",))
e_dim = compare_calculators(dimer, panel_d3,
                            properties=("energy",))

for label in (l for l, _ in panel_d3):
    e_int = (e_dim.rows[0].properties["energy"]
             - 2.0 * e_mon.rows[0].properties["energy"])
    print(f"{label}:  E_int = {e_int*1000:+.2f} meV")

The dispersion-only contribution (E_int with D3 − E_int without D3) should agree across vibe-qc and ORCA to <0.01 kcal/mol — see examples/ase_compare/compare-water-dimer-dispersion.py for the full script with monomer-and-dimer panels and an explicit assertion.

Level 4: basis-convergence sweep across three codes

Runs the same molecule through five basis sets — STO-3G → 6-31G* → cc-pVDZ → cc-pVTZ → def2-TZVP — through every available calculator. Every code at every basis should produce the same RHF energy.

from ase.build import molecule
from vibeqc.ase import VibeQC
from vibeqc.benchmark import (
    PySCFCalculator, compare_calculators, make_orca_calculator,
)

BASES = [
    ("sto-3g",     "STO-3G"),
    ("6-31g*",     "6-31G*"),
    ("cc-pvdz",    "cc-pVDZ"),
    ("cc-pvtz",    "cc-pVTZ"),
    ("def2-tzvp",  "def2-TZVP"),
]

atoms = molecule("H2O")
table = {}
for vqname, orcaname in BASES:
    calcs = [
        ("vibe-qc", VibeQC(basis=vqname)),
        ("PySCF",   PySCFCalculator(basis=vqname, method="RHF")),
    ]
    orca = make_orca_calculator(orcasimpleinput=f"HF {orcaname}")
    if orca:
        calcs.append(("ORCA", orca))

    results = compare_calculators(atoms, calcs,
                                  properties=("energy",))
    table[vqname] = {row.label: row.properties.get("energy")
                     for row in results if row.status == "ok"}

# Cross-code consistency check
for vqname, _ in BASES:
    ref = table[vqname]["vibe-qc"]
    for code, val in table[vqname].items():
        if code == "vibe-qc":
            continue
        gap_meV = (val - ref) * 1000
        assert abs(gap_meV) < 1.0, (
            f"{vqname}/{code} disagrees by {gap_meV} meV"
        )
print("✓ all 3 codes × 5 bases agree to <1 meV — "
      "integrals + SCF + basis parsing consistent across the stack.")

Verified result on the v0.5 development machine (full output under examples/ase_compare/compare-basis-convergence.py):

basis        ORCA          PySCF        vibe-qc
sto-3g       -2039.885358  -2039.885358  -2039.885358
6-31g*       -2068.294643  -2068.294643  -2068.294643
cc-pvdz      -2068.773588  -2068.773588  -2068.773588
cc-pvtz      -2069.592889  -2069.592889  -2069.592889
def2-tzvp    -2069.645665  -2069.645665  -2069.645665

per-basis Δ from vibe-qc:  0.0001 meV (worst case)

Five basis sets × three codes, all agree to ~0.04 µHa. That’s the chemistry-grade convergence verification the comparison framework was built for.

Level 5: NEB transition-state cross-check

Both vibe-qc and ORCA can drive ASE’s climbing-image NEB. The script examples/ase_compare/compare-nh3-neb-barrier.py runs the full NH₃ umbrella inversion through both, asserts agreement on the barrier height to 1 kcal/mol, and plots both MEPs side-by-side.

from vibeqc.ase import VibeQC
from vibeqc.benchmark import make_orca_calculator
from ase import Atoms
from ase.mep import NEB
from ase.optimize import BFGSLineSearch, FIRE

# (Endpoint relaxation + NEB-CI setup — see the script for the
# full ~80 lines.)

barrier_vqc = run_neb(VibeQC(basis="sto-3g"))
barrier_orca = run_neb(lambda: make_orca_calculator(
    orcasimpleinput="HF STO-3G EnGrad"))

assert abs(barrier_vqc - barrier_orca) < 1.0  # kcal/mol

The NEB protocol is the same in both runs (5 intermediates, FIRE, fmax = 0.10 eV/Å), so the only source of difference is the underlying SCF — which we’ve already cross-validated to µHa above. Barrier agreement is therefore a system-integration check rather than a methods check.

Tolerances by use case

A reference for how tight to set tol={...} based on what you’re validating:

Use case

Tolerance

Rationale

Same RHF/UHF, same basis, different codes

1e-5 eV / 1e-4 eV·Å⁻¹

Identical SCF skeleton; SCF threshold dominates.

Same DFT functional, same basis, different codes

5e-3 eV / 1e-2 eV·Å⁻¹

Per-code DFT-grid differences (~mHa).

MP2 / CCSD same basis

1e-3 eV

Iteration thresholds + integral cutoffs.

Different basis → “same chemistry”

(no tolerance — depends on the system)

Use as a basis-convergence check, not agreement.

D3-BJ contribution alone

1e-5 eV (~µHa)

Same Grimme parameters, deterministic sum.

NEB barriers

1 kcal/mol

NEB-protocol drift dominates over per-code SCF.

Where this leaves us

After running the five comparison scripts in examples/ase_compare/ against vibe-qc + PySCF (and ORCA where available), you have:

  • Energy agreement to µHa across HF in 5 basis sets

  • DFT agreement to mHa (grid-difference floor)

  • Force agreement to 1e-4 eV/Å on every checked system

  • D3-BJ agreement to <0.01 kcal/mol

  • NEB barrier agreement to <1 kcal/mol

That’s chemistry-grade cross-validation — the kind of comparison table you’d put in a Methods section to defend a vibe-qc result. For a forwarding strategy (when vibe-qc gains new methods, what should the cross-check look like?), see the ASE-tutorial-parity matrix in the roadmap.

See also