Every numerical parameter shipped with vibe-qc — basis sets, ECPs,
dispersion damping constants, EEQ atomic-charge parameters, gCP
geometric-counterpoise fits, integration grids, custom XC-functional
alias definitions — lives under
python/vibeqc/data_library/.
That single tree is the canonical source of truth: it carries the
publication citation, DOI, redistribution license, and per-element
coverage matrix for every parameter set vibe-qc consumes.
This page is the user-facing summary. The full normative standard is
in the directory’s README.md;
the per-kind schema is in
_schema.toml.
vibe-qc made three architectural decisions about parameter data, in
order of priority:
Ship everything that’s published + redistributable.
Optional-dependency packages (dftd3, dftd4) remain as backends
for users who already have them, but vibe-qc itself bundles the
reference numerical data. No paper should fail to reproduce
because the user forgot a pipinstall.
License + citation discipline is mechanical, not vibes. Every
.toml file carries metadata.citation, metadata.doi, and
metadata.license as structurally required fields. CI rejects
any new parameter file missing those. The .system job manifest
the .out SCF log surface them at run time so the citation
chain reaches downstream papers.
Extending coverage is a one-file diff. Adding the gCP
parameters for a new element at an existing basis is dropping a
[elements.<symbol>] table into the right .toml. Adding a new
basis is dropping a whole .toml. No C++ recompile for cold-path
data (gCP, alias definitions, custom hybrid mixes).
Every parameter file under data_library/ follows the same top-level
structure:
[metadata]name="def2-svp"# canonical lowercase id (matches file stem)kind="gcp_parameters"# schema discriminatorcitation="Kruse & Grimme, J. Chem. Phys. 136, 154101 (2012)"doi="10.1063/1.3700154"license="Open data: numerical results from a publication"source="Tables 2 + 3 of the cited paper"status="partial"# complete | partial | pendingcoverage="H, C, N, O, F"note="..."[parameters]sigma=0.4048alpha=0.5263beta=1.4009[elements.H]e_mis=0.000539n_virt=4.0zeta=1.0000[elements.C]e_mis=0.027326# ...
Element keys are atomic symbols (H, C, Fe, …). The loader is
case-insensitive on the symbol. Per-kind body fields are documented
in _schema.toml — for gCP, they’re (sigma,alpha,beta) at the
top level plus (e_mis,n_virt,zeta) per element.
complete — every element declared in coverage is populated.
Safe for production use.
partial — some elements present, others raise the kind’s
domain-specific data-missing exception (e.g. vibeqc.GCPDataMissing)
with the missing-element list + contribute-via-PR pointer.
pending — metadata + fit constants are registered so the
dispatcher knows about the basis, but [elements.*] blocks have
not yet been bundled. The kind’s loader raises a clear
“ parameters not yet bundled; see ” error so
users hit an actionable wall, not a wrong-answer trap.
The v0.9.0 first cut ships partial gCP coverage for def2-SVP and
def2-TZVP (H/C/N/O/F — the published Kruse-Grimme 2012 reference
slice) and pending stubs for the composite-3c-specific bases
(MINIX, def2-mSVP, def2-mTZVP, vDZP) until their per-element rows
are bundled in v0.9.x patches.
The data_library/ standard is read-only at runtime for the
files vibe-qc consumes today. Files that need to live in compiled
C++ for inner-loop performance (D3-BJ coefficient lookup, EEQ
partial charges, Lebedev grids) will migrate to the same standard
in v0.9.x patches via a small generator pipeline:
data_library/dispersion/<functional>.toml holds the
authoritative parameter set.
scripts/generate_cpp_data.py<kind><name> regenerates
cpp/src/<kind>_data.cpp from the TOML at maintenance time
(NOT at every build — the generated .cpp is committed so the
provenance shows up in gitblame).
The C++ inner loop continues to read the compiled table, so
there’s zero runtime cost.
This mirrors the existing
setup_basis_library.sh
regeneration pattern for basis sets: TOML source-of-truth, .cpp
generated cache, both committed.