Experimental reference data (vqfetch reference)

vqfetch reference (Phase 2 of the v0.8.0 external-data integration) pulls experimental and computed reference values from the NIST Computational Chemistry Comparison and Benchmark DataBase (CCCBDB). Use it to validate vibe-qc calculations against the standard QC reference set without hand-curating the numbers.

The data is public domain (US Government work; cite as NIST Standard Reference Database 101, DOI 10.18434/T47C7Z). One URL per molecule, one parser, one HTTP request, full provenance preserved.

What it pulls

For each molecule (looked up by CAS number), fetch_cccbdb() returns an ExperimentalReference record carrying:

Property

Field

Unit

Atomization energy (D₀, ZPE-included)

atomization_energy_kcal_per_mol

kcal/mol

Enthalpy of formation at 0 K

enthalpy_of_formation_0_kj_per_mol

kJ/mol

Enthalpy of formation at 298 K

enthalpy_of_formation_298_kj_per_mol

kJ/mol

Ionization energy

ionization_energy_ev

eV

Vibrational fundamentals

vibrational_fundamentals_cm_inv

cm⁻¹ (tuple)

Vibrational harmonics (where available)

vibrational_harmonics_cm_inv

cm⁻¹ (tuple)

Dipole moment

dipole_moment_debye

Debye

Polarizability

polarizability_au

atomic units

Cartesian geometry

experimental_geometry

Å (per-atom)

Each scalar field has a paired _uncertainty_* field where NIST publishes one. Missing values come back as None.

⚠️ The D₀ vs Dₑ footgun

atomization_energy_kcal_per_mol is D₀, not Dₑ.

That is: it’s the thermodynamic atomization energy (includes zero-point vibrational energy), not the electronic atomization energy (the bare energy difference between molecule minimum and dissociated atoms — what most QC textbooks and benchmark papers call “the atomization energy” without qualification).

This matters because what your SCF calculation produces is Dₑ, not D₀. Comparing CCCBDB’s atomization_energy_kcal_per_mol against your raw SCF atomization will be wrong by ZPE — typically ~3–10 kcal/mol for small molecules, more for large ones.

To get D₀ from your calculation (matches CCCBDB):

D_0 = D_e − ZPE
ZPE = 0.5 × Σᵢ νᵢ (fundamentals, cm⁻¹) × hc / kcal·mol⁻¹

To get Dₑ from CCCBDB (matches your raw SCF):

from vibeqc.fetch import fetch_cccbdb

ref = fetch_cccbdb(cas="7732-18-5")  # H₂O
fundamentals = ref.vibrational_fundamentals_cm_inv  # tuple of cm⁻¹
zpe_kcal_per_mol = 0.5 * sum(fundamentals) * 0.0028591459  # cm⁻¹ → kcal/mol
de_kcal_per_mol = ref.atomization_energy_kcal_per_mol + zpe_kcal_per_mol

The footgun is documented inside the Provenance.notes field of every CCCBDB record, so callers reading the record cold should see the warning. It’s also documented in the ExperimentalReference dataclass docstring.

Quick start: water

vqfetch reference --cas 7732-18-5
# Output:
# examples/regression/references/cccbdb_7732-18-5.json

Then in Python:

from vibeqc.fetch import fetch_cccbdb

ref = fetch_cccbdb(cas="7732-18-5")
print(ref.atomization_energy_kcal_per_mol)   # 219.349 (D_0!)
print(ref.dipole_moment_debye)               # 1.8546
print(ref.vibrational_fundamentals_cm_inv)   # (1594.59, 3656.65, 3755.79)

The same record is also written to examples/regression/references/cccbdb_7732-18-5.json so the regression suite can reattach it to the H₂O test cases (see Regression-suite integration below).

Eight-molecule canonical set (round-trip-verified)

The v1 acceptance harness covers eight canonical molecules representative of the small-molecule QC benchmark literature:

Molecule

CAS

Formula

Why it’s in the canonical set

Water

7732-18-5

H₂O

Universal QC test molecule

Methane

74-82-8

CH₄

Classic single-bond / Td reference

Ammonia

7664-41-7

NH₃

C₃ᵥ, lone pair, polar H bond

Hydrogen fluoride

7664-39-3

HF

Polar diatomic; high IE benchmark

Oxygen

7782-44-7

O₂

Open-shell triplet ground state

Ozone

10028-15-6

O₃

Multireference; classic DFT failure case

Carbon dioxide

124-38-9

CO₂

Linear, π system

Formaldehyde

50-00-0

H₂CO

Smallest carbonyl; pyramidalisation tests

Atomization energies match CCCBDB’s published values within the precision of the parser (HF 135.4, CH₄ 392.4, CO₂ 381.9, H₂O 219.3 kcal/mol etc.).

Round-trip recipe: experimental geometry → vibe-qc Molecule

CCCBDB carries experimental Cartesian geometries. The geometry-bridge helper builds a vibe-qc Molecule directly:

from vibeqc.fetch.references import (
    fetch_cccbdb,
    experimental_geometry_to_molecule_spec,
)
from vibeqc import run_job

ref = fetch_cccbdb(cas="7732-18-5")          # NIST H₂O
mol_spec = experimental_geometry_to_molecule_spec(ref)
                                              # → MoleculeSpec with NIST geometry
                                              #   (r(O–H) = 0.958 Å, ∠HOH = 104.4776°)
mol = mol_spec.to_molecule()
run_job(mol, basis="sto-3g", method="rhf", output="h2o-nist-geom")
                                              # → E = -74.96302314 Ha

The fetched geometry differs slightly from textbook hand-curated geometries (e.g. Szabo-Ostlund’s H₂O is at r(O–H) = 0.957 Å, ∠HOH = 104.5° — ~1 mHa difference at HF/STO-3G).

Regression-suite integration

Attach a CCCBDB reference to every molecular case in the regression suite:

python -m examples.regression.run_suite \
    --include-experimental-reference cccbdb \
    --output-md

The generated summary.md gains an “Experimental references” section, with the NIST DOI footer and per-record permalinks back to CCCBDB. Cases without a CAS number in the SPEC pass silently (no attachment).

Cache + offline mode

Same XDG-compliant cache as vqfetch for structures (~/.cache/vqfetch/cccbdb/). NIST records are immutable in practice; default TTL is 30 days but in practice the cache hits for years.

  • --no-cache — bypass cache reads (still writes through).

  • --cache-only — refuse live HTTP entirely.

Useful for offline reproducibility on cluster compute nodes without external network — pre-warm the cache locally, then ssh / rsync ~/.cache/vqfetch/ to the cluster.

Provenance

Every CCCBDB record carries:

Provenance(
    source_db="nist_cccbdb",
    source_id="7732-18-5",                                # CAS
    source_url="https://cccbdb.nist.gov/exp2x.asp?casno=7732185",
    original_doi="10.18434/T47C7Z",                       # NIST SRD 101
    license="public domain (US Government work)",
    fetched_at="2026-05-09T14:00:00+00:00",
    notes="atomization_energy_kcal_per_mol is D_0 ...",
)

For published work, cite NIST Standard Reference Database 101:

NIST Computational Chemistry Comparison and Benchmark Database, NIST Standard Reference Database Number 101, Release 22, May 2022, Editor: Russell D. Johnson III. DOI: 10.18434/T47C7Z.

The CCCBDB record also surfaces NIST’s standard disclaimer about data uncertainty in its About page; none of the data in CCCBDB carries a warranty of any kind.

Why this is part of vibe-qc

vibe-qc’s molecular validation strategy uses two reference sources:

  1. Software-vs-software parity — vibe-qc vs PySCF / ORCA to machine precision. Catches regressions, validates numerical stability.

  2. Software-vs-experiment — vibe-qc vs NIST CCCBDB on a well-defined small-molecule set. Catches physical wrongness that software-vs-software parity can’t (e.g. a functional that gives the wrong atomization trend even though both vibe-qc and PySCF agree).

vqfetch reference makes (2) accessible without manually maintaining a curated reference table.

See also