External structures (vqfetch)¶
vqfetch is the v0.8.0 console-script that pulls crystal
structures from open databases and emits two artefacts:
A regression
PeriodicSpecmodule (underexamples/regression/systems/periodic/) so the structure becomes part of the regression matrix.An executable vibe-qc input script (under
examples/) so you can run an SCF on the fetched cell with one command.
The pull preserves full per-record provenance: source DB,
ID, permalink URL, original DOI (where available), license
string, and fetched-at timestamp. This means a vqfetch-pulled
structure is publication-ready out of the box — no “where did
this come from” auditability gap.
Sources¶
Source |
Default per-record license |
Use it for |
CLI subcommand |
|---|---|---|---|
OPTIMADE federation |
per-provider (varies) |
formula-based federated query across providers |
|
Materials Project |
CC-BY 4.0 |
computed structures; routed through MP’s OPTIMADE endpoint |
|
COD (Crystallography Open Database) |
CC0 / public domain |
experimentally determined CIFs |
|
NOMAD |
CC-BY 4.0 (data); CC0 (metadata) |
computed materials data |
use |
Canonical set |
(whichever the slug’s primary provider applies) |
the five round-trip-verified structures used as smoke tests |
|
Candidate discovery |
(whichever the provider specifies) |
interactive search: list all matching structures with dedup + ranking |
|
The list-candidates subcommand (v0.13.x, VFETCH-X1) queries the
OPTIMADE federation and displays a tabular report of distinct
candidate structures ranked by space-group plurality, so you can
pick the right polymorph before pulling:
$ vqfetch list-candidates --formula MgO
provider id sg a (Å) atoms merged
-----------------------------------------------------------------------
mp mp-1265 Fm-3m 4.1940 8 0
mp mp-1009127 Pm-3m 2.7521 2 0
mp mp-1191789 P6_3mc 3.2322 4 0
3 candidate(s) shown. Pick one:
vqfetch optimade --formula MgO
# OR id-lookup: vqfetch optimade --optimade-id mp/mp-1265
Each row is a structurally distinct polymorph. When two providers
return the same structure, they’re merged into one row with the
merged count tracking duplicates. The table is ranked by
space-group plurality — the most common space group across all
returned structures floats to the top.
Flags:
Flag |
Default |
Purpose |
|---|---|---|
|
(required) |
|
|
|
Narrow to one provider: |
|
|
Maximum candidates to display. |
|
off |
Show structurally-identical entries separately. |
|
off |
Bypass cache reads (still writes through). |
|
off |
Refuse live HTTP; fail on cache miss. |
The full license inventory is in
docs/license.md.
Install¶
vqfetch is part of the optional [fetch] extra:
pip install -e '.[fetch]' # development install
# OR
pip install 'vibe-qc[fetch]' # once published
This pulls in optimade>=1.0,<2, ase>=3.22,
beautifulsoup4>=4.12,<5, and lxml>=4.9. Without the extra,
vqfetch will not be on $PATH.
Quick start: round-trip MgO from the canonical set¶
# 1. Fetch the canonical MgO entry → emit SPEC + input script.
vqfetch canonical mgo_rocksalt --quick
# Output (one path per line; both are written to disk):
# examples/regression/systems/periodic/mgo_rocksalt.py
# examples/input-mgo_rocksalt-sto-3g.py
# 2. Run the SCF on planetx via vq (LDA + sto-3g converges in ~13 iters):
vq submit examples/input-mgo_rocksalt-sto-3g.py --cpus 16 --wall-time-seconds 14400
# returns a 12-char jobid; poll with `vq status <jobid>`.
The SCF run produces:
output-mgo_rocksalt-sto-3g.out— banner, SCF trace, energy breakdown, orbital table, plus wall-clock timings.output-mgo_rocksalt-sto-3g.system— runtime manifest (TOML). Records vibe-qc version, host OS, OMP threads, library versions, and SCF wall time.output-mgo_rocksalt-sto-3g.perf— per-phase timing breakdown.
Reference: live planetx round-trip on 2026-05-09 produced E = −950.4204308512 Ha (13 SCF iters, ~2h 20m on 16 cores).
Five canonical structures (round-trip-verified)¶
The canonical subcommand walks a hand-curated five-structure
table that the v1 acceptance harness round-trips end-to-end on
every commit:
vqfetch canonical mgo_rocksalt # MgO via Materials Project mp-1265
vqfetch canonical nacl_rocksalt # NaCl via Materials Project mp-22862
vqfetch canonical lih_rocksalt # LiH via Materials Project mp-23703
vqfetch canonical si_diamond # Si via Materials Project mp-149
vqfetch canonical c_diamond # C via Materials Project mp-66
Use --quick to drop the recommended basis to sto-3g for a
fast smoke test (default behaviour is the heuristic-recommended
basis per § Recommended basis below).
The table itself lives at
python/vibeqc/fetch/canonical_set.py
with per-entry expected space group + conventional-cell atom
count + lattice constant for sanity-check assertions in the
smoke harness.
Per-record provenance¶
Every fetched record carries the following fields (visible in
the emitted PeriodicSpec module’s Provenance dataclass and
echoed into the SCF run header):
Provenance(
source_db="OPTIMADE/mp", # "OPTIMADE/<provider>" | "COD" | "CCCBDB" | "manual"
source_id="mp-1265", # provider-specific id
source_url="https://optimade.materialsproject.org/structures/mp-1265",
original_reference="10.17188/1199994", # DOI when known, "" otherwise
license="CC-BY-4.0",
fetched_at="2026-05-09T21:30:00Z",
fetcher_version="0.1.0", # vibeqc.fetch.__version__
notes="", # free-form; aliases when dedup runs
)
When you re-run a calculation later, the provenance bundle travels with the SPEC. For published work, cite the source DB per its terms of use (Materials Project: cite per their terms; COD: CC0 — citation appreciated, not legally required; NOMAD: cite the contributing author + NOMAD).
Cache + offline mode¶
vqfetch caches every successful fetch on disk per XDG (default
root: $XDG_CACHE_HOME/vibeqc/fetch/, falling back to
~/.cache/vibeqc/fetch/). Subdirectories per source DB
(OPTIMADE_mp/, COD/, CCCBDB/, …), one JSON file per
(source_db, source_id). Repeated calls do not re-hit the
API:
30 days TTL for OPTIMADE / MP / NOMAD / CCCBDB.
Infinite TTL for COD (CIFs are immutable post-publication).
Two relevant flags:
--no-cache— bypass cache reads but still write through after a live fetch. Useful when you suspect the upstream record has been updated.--cache-only— refuse live HTTP entirely; fail fast if the record isn’t in the cache. Useful for offline / reproducible runs (e.g. on cluster compute nodes without network). Equivalent env var:VIBEQC_FETCH_CACHE_ONLY=1.
Override the cache root with $VIBEQC_FETCH_CACHE_ROOT for
tests or per-project caches.
Common flags¶
All structure subcommands accept the same emission flags:
Flag |
Default |
Purpose |
|---|---|---|
|
(heuristic; see below) |
Override the recommended basis (e.g. |
|
|
SCF method baked into the emitted input script. Choices: |
|
off |
Force the recommended basis to |
|
|
Output directory for the emitted SPEC module. |
|
|
Output directory for the emitted input script. |
|
off |
Bypass cache reads (still writes through). |
|
off |
Refuse live HTTP. |
|
(auto-generated or canonical-set slug) |
Override the emitted SPEC’s |
Recommended basis (heuristic)¶
When --basis is not specified, vqfetch picks per heuristic
(python/vibeqc/fetch/heuristics.py):
System type |
Default basis |
Reason |
|---|---|---|
Periodic (any composition) |
|
Bredow / Peintinger-Vilela-Oliveira-Bredow periodic-tuned TZVP; the standard solid-state reference. For transition-metal systems: verify ECP coverage in your build. |
Molecule |
|
Standard molecular split-valence; bump to |
|
|
Smoke-test minimum. |
Override at any time via --basis <name>. The bundled basis
inventory + per-family citations are at
docs/license.md.
SCF defaults baked into the emitted script¶
The fetcher picks SCF stability knobs per the regression-suite patterns:
Trigger |
|
|
|---|---|---|
Cell contains any of Li / Na / K / Mg / Ca / Al / Cl / transition metal |
|
|
Otherwise (rare-gas, light-covalent: H / B / C / N / O / Si / F / …) |
|
|
k-mesh seed by cell size:
atoms in cell |
|
|---|---|
≤ 4 |
(4, 4, 4) |
5–20 |
(2, 2, 2) |
> 20 |
(1, 1, 1) |
These are seeds for a convergence study, not converged
values. The emitted script includes a # TODO: k-mesh convergence comment referencing
examples/input-k-mesh-convergence.py.
What the emitted files look like¶
After vqfetch canonical mgo_rocksalt --quick:
examples/regression/systems/periodic/mgo_rocksalt.py — a
drop-in PeriodicSpec the regression suite picks up
automatically:
"""mgo_rocksalt — fetched rocksalt (OPTIMADE/mp/mp-1265).
Auto-generated by vibeqc.fetch (fetcher version 0.1.0).
Do not edit by hand — re-run the fetcher to regenerate.
"""
from __future__ import annotations
from examples.regression.core.spec import AtomFrac, PeriodicSpec, Provenance
SPEC = PeriodicSpec(
id="mgo_rocksalt",
family="rocksalt",
lattice_ang=(
(4.19400279, 0.0, 0.0),
(0.0, 4.19400279, 0.0),
(0.0, 0.0, 4.19400279),
),
space_group="Fm-3m",
atoms=(
AtomFrac(symbol="Mg", z=12, frac=(0.0, 0.0, 0.0)),
AtomFrac(symbol="Mg", z=12, frac=(0.0, 0.5, 0.5)),
AtomFrac(symbol="Mg", z=12, frac=(0.5, 0.0, 0.5)),
AtomFrac(symbol="Mg", z=12, frac=(0.5, 0.5, 0.0)),
AtomFrac(symbol="O", z=8, frac=(0.5, 0.5, 0.5)),
AtomFrac(symbol="O", z=8, frac=(0.5, 0.0, 0.0)),
AtomFrac(symbol="O", z=8, frac=(0.0, 0.5, 0.0)),
AtomFrac(symbol="O", z=8, frac=(0.0, 0.0, 0.5)),
),
default_kmesh=(2, 2, 2),
default_initial_guess="SAD",
default_damping=0.85,
recommended_basis="sto-3g", # --quick was set; default is pob-tzvp
is_open_shell=False,
provenance=Provenance(
source_db="OPTIMADE/mp",
source_id="mp-1265",
source_url="https://optimade.materialsproject.org/structures/mp-1265",
original_reference="",
license="CC-BY-4.0",
fetched_at="2026-05-09T21:30:00Z",
fetcher_version="0.1.0",
notes="",
),
)
examples/input-mgo_rocksalt-sto-3g.py — runnable SCF script
mirroring the style of examples/input-mgo-pob-tzvp.py:
"""mgo_rocksalt (rocksalt) — auto-generated by vibeqc.fetch."""
import os, time
from pathlib import Path
import numpy as np
import vibeqc as vq
from vibeqc.progress import ProgressLogger
ANGSTROM_TO_BOHR = 1.0 / 0.529177210903
LATTICE_ANG = np.array([[4.194, 0, 0], [0, 4.194, 0], [0, 0, 4.194]])
MG_FRAC = [(0,0,0), (0,0.5,0.5), (0.5,0,0.5), (0.5,0.5,0)]
O_FRAC = [(0.5,0.5,0.5), (0.5,0,0), (0,0.5,0), (0,0,0.5)]
plog = ProgressLogger(log_path="output-mgo_rocksalt-sto-3g.out", verbose=True)
with vq.perf_log("output-mgo_rocksalt-sto-3g.perf"):
lat_bohr = LATTICE_ANG * ANGSTROM_TO_BOHR
unit_cell = [vq.Atom(12, [fx*lat_bohr[0,0], fy*lat_bohr[1,1], fz*lat_bohr[2,2]])
for fx, fy, fz in MG_FRAC]
unit_cell += [vq.Atom(8, [fx*lat_bohr[0,0], fy*lat_bohr[1,1], fz*lat_bohr[2,2]])
for fx, fy, fz in O_FRAC]
system = vq.PeriodicSystem(dim=3, lattice=lat_bohr, unit_cell=unit_cell)
basis = vq.BasisSet(system.unit_cell_molecule(), "sto-3g")
vq.attach_symmetry(system, symprec=1e-4)
opts = vq.PeriodicKSOptions()
opts.functional = "LDA"
opts.lattice_opts.coulomb_method = vq.CoulombMethod.EWALD_3D
opts.damping = 0.85
opts.initial_guess = vq.InitialGuess.SAD
kpts = vq.KPoints.monkhorst_pack(system, [2, 2, 2], symmetry=True)
result = vq.run_rks_periodic_scf(system, basis, kpts, opts, progress=plog)
The emitted script uses vibe-qc’s native periodic-SCF API
(vq.PeriodicSystem, vq.PeriodicKSOptions,
vq.run_rks_periodic_scf, …) — not a higher-level wrapper.
That keeps every SCF knob visible at the call site, which is
the right tradeoff for a benchmark workload that the user will
inevitably want to tweak.
Combining with the regression suite¶
Once a SPEC lands in examples/regression/systems/periodic/,
the regression suite picks it up automatically. Run a focused
matrix entry:
python -m examples.regression.run_suite \
--systems mgo_rocksalt --bases sto-3g --methods rks-lda
…and the new system shows up alongside the hand-curated test
set, with the per-source provenance footer included in the
generated summary.md.
Multi-candidate Python API¶
For interactive workflows (notebooks / scripts) where you want to see ALL polymorphs a formula returns and pick by hand:
from vibeqc.fetch import fetch_optimade
candidates = fetch_optimade(
formula="MgO",
max_results=10, # walk all hits, not just the first
dedup=True, # structurally-identical entries merged
) # → list[PeriodicSpec] sorted by space-group
# plurality + larger-cell tiebreak
for spec in candidates:
p = spec.provenance
print(f"{p.source_db}/{p.source_id}: sg={spec.space_group} "
f"a={spec.lattice_ang[0][0]:.3f} Å natoms={len(spec.atoms)}")
Structurally identical entries across providers get merged
into one survivor — Provenance.notes carries the alias list
("also at: OPTIMADE/oqmd/oqmd-1234, OPTIMADE/aflow/aflow:abc").
Dedup hashes the formula + cell + sorted-rounded fractional
positions to 1e-4 Å resolution. Non-orthorhombic entries are
silently dropped in the multi-candidate path (the single-pick
path raises with the full rejection list).
A CLI surface for this (vqfetch list-candidates) is in the
v0.8.x maintenance line.
When NOT to use vqfetch¶
You already have a CIF on disk. Use ASE’s CIF reader directly:
ase.io.read("my.cif")then build aPeriodicSystemfrom the result. vqfetch’s value is the fetch + provenance + emission round-trip, not the CIF parsing itself.You want a structure NOT in any open database. vqfetch’s sources cover ~99% of well-studied solids; for exotic / proprietary structures, hand-build the
PeriodicSystemand document provenance manually.You need a non-orthorhombic cell (monoclinic / triclinic). vqfetch standardises through
spglib.standardize_cell(..., to_primitive=False); if the conventional setting is also non-orthorhombic, it refuses with a clear message rather than silently misconfigure the Ewald lattice sum. Triclinic-Ewald support is on the roadmap.You need cell relaxation. vqfetch emits a static-SCF script; geometry optimisation is a follow-on step the user drives.
Deferred to the v0.8.x maintenance line¶
These items are designed but not shipped in v0.8.0:
vqfetch list-candidatesCLI subcommand (Python multi-candidate API shipped in v0.8.0; the CLI surface follows).WebBook fallback for structures missing from MP / COD.
Materials Project property fields beyond what OPTIMADE surfaces (
client_mp.pyis currently a thin OPTIMADE pass- through; band gap / formation energy / magnetisation pull follows when the v0.8.x line lands).NOMAD raw-archive hook (
client_nomad.pyis a stub today; pulls the input-archive URL intoProvenance.noteswhen it lands).Cross-provider consistency check (alert when MP / COD / NOMAD disagree on the same canonical structure).
Bulk-sweep tooling (
vqfetch --bulk <list-file>) for populating the cache with a curated reference set.runner_crystal.pyfor the regression suite — runs CRYSTAL14 via vq subprocess (textbook external-codes pattern); unlocks three-way periodic parity (vibe-qc / PySCF.pbc / CRYSTAL) for fetched SPECs.
These are tracked in docs/roadmap.md under the v0.8.x
maintenance window — none blocking the v0.8.0 tag.
See also¶
reference_data.md— vqfetch’s CCCBDB integration for experimental reference data (atomization energies, ΔHf°, vibrational frequencies, IE).external_codes.md— vibe-qc’s policy on external programs vs vendored libraries; how parity runs against PySCF / ORCA / CRYSTAL are wired.docs/license.md— full per-source licensing + bundled-data inventory.