Experimental reference data (vqfetch reference)¶
vqfetch reference (Phase 2 of the v0.8.0 external-data
integration) pulls experimental and computed reference values
from the
NIST Computational Chemistry Comparison and Benchmark DataBase
(CCCBDB). Use it to validate vibe-qc
calculations against the standard QC reference set without
hand-curating the numbers.
The data is public domain (US Government work; cite as NIST Standard Reference Database 101, DOI 10.18434/T47C7Z). One URL per molecule, one parser, one HTTP request, full provenance preserved.
What it pulls¶
For each molecule (looked up by CAS number), fetch_cccbdb()
returns an ExperimentalReference record carrying:
Property |
Field |
Unit |
|---|---|---|
Atomization energy (D₀, ZPE-included) |
|
kcal/mol |
Enthalpy of formation at 0 K |
|
kJ/mol |
Enthalpy of formation at 298 K |
|
kJ/mol |
Ionization energy |
|
eV |
Vibrational fundamentals |
|
cm⁻¹ (tuple) |
Vibrational harmonics (where available) |
|
cm⁻¹ (tuple) |
Dipole moment |
|
Debye |
Polarizability |
|
atomic units |
Cartesian geometry |
|
Å (per-atom) |
Each scalar field has a paired _uncertainty_* field where
NIST publishes one. Missing values come back as None.
⚠️ The D₀ vs Dₑ footgun¶
atomization_energy_kcal_per_mol is D₀, not Dₑ.
That is: it’s the thermodynamic atomization energy (includes zero-point vibrational energy), not the electronic atomization energy (the bare energy difference between molecule minimum and dissociated atoms — what most QC textbooks and benchmark papers call “the atomization energy” without qualification).
This matters because what your SCF calculation produces is
Dₑ, not D₀. Comparing CCCBDB’s atomization_energy_kcal_per_mol
against your raw SCF atomization will be wrong by ZPE — typically
~3–10 kcal/mol for small molecules, more for large ones.
To get D₀ from your calculation (matches CCCBDB):
D_0 = D_e − ZPE
ZPE = 0.5 × Σᵢ νᵢ (fundamentals, cm⁻¹) × hc / kcal·mol⁻¹
To get Dₑ from CCCBDB (matches your raw SCF):
from vibeqc.fetch import fetch_cccbdb
ref = fetch_cccbdb(cas="7732-18-5") # H₂O
fundamentals = ref.vibrational_fundamentals_cm_inv # tuple of cm⁻¹
zpe_kcal_per_mol = 0.5 * sum(fundamentals) * 0.0028591459 # cm⁻¹ → kcal/mol
de_kcal_per_mol = ref.atomization_energy_kcal_per_mol + zpe_kcal_per_mol
The footgun is documented inside the Provenance.notes field
of every CCCBDB record, so callers reading the record cold
should see the warning. It’s also documented in the
ExperimentalReference dataclass docstring.
Quick start: water¶
vqfetch reference --cas 7732-18-5
# Output:
# examples/regression/references/cccbdb_7732-18-5.json
Then in Python:
from vibeqc.fetch import fetch_cccbdb
ref = fetch_cccbdb(cas="7732-18-5")
print(ref.atomization_energy_kcal_per_mol) # 219.349 (D_0!)
print(ref.dipole_moment_debye) # 1.8546
print(ref.vibrational_fundamentals_cm_inv) # (1594.59, 3656.65, 3755.79)
The same record is also written to
examples/regression/references/cccbdb_7732-18-5.json so the
regression suite can reattach it to the H₂O test cases (see
Regression-suite integration below).
Eight-molecule canonical set (round-trip-verified)¶
The v1 acceptance harness covers eight canonical molecules representative of the small-molecule QC benchmark literature:
Molecule |
CAS |
Formula |
Why it’s in the canonical set |
|---|---|---|---|
Water |
7732-18-5 |
H₂O |
Universal QC test molecule |
Methane |
74-82-8 |
CH₄ |
Classic single-bond / Td reference |
Ammonia |
7664-41-7 |
NH₃ |
C₃ᵥ, lone pair, polar H bond |
Hydrogen fluoride |
7664-39-3 |
HF |
Polar diatomic; high IE benchmark |
Oxygen |
7782-44-7 |
O₂ |
Open-shell triplet ground state |
Ozone |
10028-15-6 |
O₃ |
Multireference; classic DFT failure case |
Carbon dioxide |
124-38-9 |
CO₂ |
Linear, π system |
Formaldehyde |
50-00-0 |
H₂CO |
Smallest carbonyl; pyramidalisation tests |
Atomization energies match CCCBDB’s published values within the precision of the parser (HF 135.4, CH₄ 392.4, CO₂ 381.9, H₂O 219.3 kcal/mol etc.).
Round-trip recipe: experimental geometry → vibe-qc Molecule¶
CCCBDB carries experimental Cartesian geometries. The
geometry-bridge helper builds a vibe-qc Molecule directly:
from vibeqc.fetch.references import (
fetch_cccbdb,
experimental_geometry_to_molecule_spec,
)
from vibeqc import run_job
ref = fetch_cccbdb(cas="7732-18-5") # NIST H₂O
mol_spec = experimental_geometry_to_molecule_spec(ref)
# → MoleculeSpec with NIST geometry
# (r(O–H) = 0.958 Å, ∠HOH = 104.4776°)
mol = mol_spec.to_molecule()
run_job(mol, basis="sto-3g", method="rhf", output="h2o-nist-geom")
# → E = -74.96302314 Ha
The fetched geometry differs slightly from textbook hand-curated geometries (e.g. Szabo-Ostlund’s H₂O is at r(O–H) = 0.957 Å, ∠HOH = 104.5° — ~1 mHa difference at HF/STO-3G).
Regression-suite integration¶
Attach a CCCBDB reference to every molecular case in the regression suite:
python -m examples.regression.run_suite \
--include-experimental-reference cccbdb \
--output-md
The generated summary.md gains an “Experimental references”
section, with the NIST DOI footer and per-record permalinks
back to CCCBDB. Cases without a CAS number in the SPEC pass
silently (no attachment).
Cache + offline mode¶
Same XDG-compliant cache as
vqfetch for structures
(~/.cache/vqfetch/cccbdb/). NIST records are immutable in
practice; default TTL is 30 days but in practice the cache hits
for years.
--no-cache— bypass cache reads (still writes through).--cache-only— refuse live HTTP entirely.
Useful for offline reproducibility on cluster compute nodes
without external network — pre-warm the cache locally, then
ssh / rsync ~/.cache/vqfetch/ to the cluster.
Provenance¶
Every CCCBDB record carries:
Provenance(
source_db="nist_cccbdb",
source_id="7732-18-5", # CAS
source_url="https://cccbdb.nist.gov/exp2x.asp?casno=7732185",
original_doi="10.18434/T47C7Z", # NIST SRD 101
license="public domain (US Government work)",
fetched_at="2026-05-09T14:00:00+00:00",
notes="atomization_energy_kcal_per_mol is D_0 ...",
)
For published work, cite NIST Standard Reference Database 101:
NIST Computational Chemistry Comparison and Benchmark Database, NIST Standard Reference Database Number 101, Release 22, May 2022, Editor: Russell D. Johnson III. DOI: 10.18434/T47C7Z.
The CCCBDB record also surfaces NIST’s standard disclaimer about data uncertainty in its About page; none of the data in CCCBDB carries a warranty of any kind.
Why this is part of vibe-qc¶
vibe-qc’s molecular validation strategy uses two reference sources:
Software-vs-software parity — vibe-qc vs PySCF / ORCA to machine precision. Catches regressions, validates numerical stability.
Software-vs-experiment — vibe-qc vs NIST CCCBDB on a well-defined small-molecule set. Catches physical wrongness that software-vs-software parity can’t (e.g. a functional that gives the wrong atomization trend even though both vibe-qc and PySCF agree).
vqfetch reference makes (2) accessible without manually
maintaining a curated reference table.
See also¶
external_structures.md— vqfetch’s structure subcommands (OPTIMADE / MP / COD / NOMAD).docs/license.md— full per-source licensing inventory.CCCBDB homepage — browse records online.