External structures (vqfetch)¶
vqfetch is the v0.8.0 console-script that pulls crystal
structures from open databases and emits two artefacts:
A regression
PeriodicSpecmodule (underexamples/regression/systems/periodic/) so the structure becomes part of the regression matrix.An executable vibe-qc input script (under
examples/) so you can run an SCF on the fetched cell with one command.
The pull preserves full per-record provenance: source DB,
ID, permalink URL, original DOI (where available), license
string, and fetched-at timestamp. This means a vqfetch-pulled
structure is publication-ready out of the box — no “where did
this come from” auditability gap.
Sources¶
Source |
Default per-record license |
Use it for |
CLI subcommand |
|---|---|---|---|
OPTIMADE federation |
per-provider (varies) |
formula-based federated query across providers |
|
Materials Project |
CC-BY 4.0 |
computed structures + properties; primary for many slugs |
|
COD (Crystallography Open Database) |
CC0 / public domain |
experimentally determined CIFs |
|
NOMAD |
CC-BY 4.0 (data); CC0 (metadata) |
computed materials data |
(use |
Canonical set |
(whichever the slug’s primary provider applies) |
the five round-trip-verified structures used as smoke tests |
|
The full license inventory is in
docs/license.md.
Install¶
vqfetch is part of the optional [fetch] extra:
pip install -e '.[fetch]' # development install
# OR
pip install 'vibe-qc[fetch]' # once published
This pulls in optimade>=1.0,<2, ase>=3.22,
beautifulsoup4>=4.12,<5, and lxml>=4.9. Without the extra,
vqfetch will not be on $PATH.
Quick start: round-trip MgO from Materials Project¶
# 1. Fetch the structure → emit SPEC + input script.
vqfetch mp --id mp-1265 --basis sto-3g --method rks-lda
# Output (one path per line; both are written to disk):
# examples/regression/systems/periodic/mp_mp-1265.py
# examples/scf-mp_mp-1265.py
# 2. Run the SCF (uses the venv's Python).
.venv/bin/python examples/scf-mp_mp-1265.py
The SCF run produces:
mp_mp-1265.out— banner, SCF trace, energy breakdown, orbital table, the source DB / ID / DOI / license recorded in the run header, plus wall-clock timings.mp_mp-1265.molden— molecular orbitals.mp_mp-1265.traj— ASE trajectory (single frame for static SCF; multi-frame foroptimize=True).
Reference: live planetx round-trip on 2026-05-09 produced E = −950.4204308512 Ha (13 SCF iters, ~2h 20m on 16 cores).
Five canonical structures (round-trip-verified)¶
The canonical subcommand walks a hand-curated five-structure
table that the v1 acceptance harness round-trips end-to-end on
every commit:
vqfetch canonical mgo_rocksalt # MgO via Materials Project
vqfetch canonical nacl_rocksalt # NaCl via Materials Project
vqfetch canonical lih_rocksalt # LiH via Materials Project
vqfetch canonical si_diamond # Si via Materials Project
vqfetch canonical c_diamond # C via Materials Project
Use --quick to drop the recommended basis to sto-3g for a
fast smoke test (default behaviour is the heuristic-recommended
basis per § Recommended basis below).
Per-record provenance¶
Every fetched record carries the following fields (visible in
the SPEC module’s Provenance dataclass and surfaced in the
SCF log + per-run .system manifest):
Provenance(
source_db="materials_project",
source_id="mp-1265",
source_url="https://next-gen.materialsproject.org/materials/mp-1265",
original_doi="10.17188/1199994", # if known
license="CC-BY 4.0",
fetched_at="2026-05-09T21:30:00+00:00",
provider="mp", # OPTIMADE provider key
notes="...", # free-form, e.g. "via canonical_set"
)
When you re-run a calculation later, the provenance bundle travels with the SPEC. For published work, cite the source DB per its terms of use (Materials Project: cite per their terms; COD: CC0 — citation appreciated, not legally required; NOMAD: cite the contributing author + NOMAD).
Cache + offline mode¶
vqfetch caches every successful fetch on disk per XDG (default:
~/.cache/vqfetch/). Repeated vqfetch mp --id mp-1265 does
not re-hit the API. TTL defaults:
30 days for OPTIMADE / MP / NOMAD.
Infinite for COD (CIFs are immutable post-publication).
Two relevant flags:
--no-cache— bypass cache reads but still write through after a live fetch. Useful when you suspect the upstream record has been updated.--cache-only— refuse live HTTP entirely; fail fast if the record isn’t in the cache. Useful for offline / reproducible runs (e.g. on cluster compute nodes without network).
Common flags¶
All structure subcommands accept the same emission flags:
Flag |
Default |
Purpose |
|---|---|---|
|
(heuristic; see below) |
Override the recommended basis (e.g. |
|
|
SCF method baked into the emitted input script. Choices: |
|
off |
Force |
|
|
Output directory for the emitted SPEC module. |
|
|
Output directory for the emitted input script. |
|
off |
Bypass cache reads (still writes through). |
|
off |
Refuse live HTTP. |
|
(auto-generated) |
Override the SPEC id slug. |
Recommended basis (heuristic)¶
When --basis is not specified, vqfetch picks per heuristic
(fetch/heuristics.py):
Structure family |
Default basis |
Reason |
|---|---|---|
Light-element periodic (Z ≤ 18, no transition metals) |
|
Designed for periodic; standard reference |
Includes transition metals or 4th+ period |
|
Extended element coverage; periodic-tuned |
Molecule |
|
Standard molecular triple-ζ |
Anything with |
|
Smoke-test minimum |
Override at any time via --basis <name>. The bundled basis
inventory + per-family citations are at
docs/license.md.
What the emitted files look like¶
After vqfetch mp --id mp-1265:
examples/regression/systems/periodic/mp_mp-1265.py —
import-as-module SPEC for the regression suite:
"""MgO rocksalt — fetched from Materials Project mp-1265."""
from examples.regression.core.spec import (
PeriodicSpec, Provenance, ReferenceKind,
)
mp_mp_1265 = PeriodicSpec(
id="mp_mp-1265",
formula="MgO",
lattice_vectors=[[0.0, 2.106, 2.106],
[2.106, 0.0, 2.106],
[2.106, 2.106, 0.0]],
atoms=[("Mg", [0.0, 0.0, 0.0]),
("O", [2.106, 2.106, 2.106])],
recommended_basis="pob-tzvp",
provenance=Provenance(
source_db="materials_project",
source_id="mp-1265",
source_url="https://next-gen.materialsproject.org/materials/mp-1265",
license="CC-BY 4.0",
fetched_at="2026-05-09T21:30:00+00:00",
),
)
examples/scf-mp_mp-1265.py — runnable SCF script:
"""SCF on mp_mp-1265 — generated by vqfetch on 2026-05-09."""
from vibeqc import Atom, PeriodicSystem, run_periodic_job
cell = PeriodicSystem(
lattice_vectors=[[0.0, 2.106, 2.106],
[2.106, 0.0, 2.106],
[2.106, 2.106, 0.0]],
atoms=[Atom(12, [0.0, 0.0, 0.0]),
Atom(8, [2.106, 2.106, 2.106])],
)
run_periodic_job(
cell,
basis="pob-tzvp",
method="RKS",
functional="LDA",
output="mp_mp-1265",
)
# Provenance bundle (preserved in the SCF log header):
# Source: materials_project mp-1265
# URL: https://next-gen.materialsproject.org/materials/mp-1265
# License: CC-BY 4.0
# Fetched: 2026-05-09T21:30:00+00:00
Combining with the regression suite¶
Once a SPEC lands in examples/regression/systems/periodic/,
the regression suite picks it up automatically. Run the full
matrix:
python -m examples.regression.run_suite \
--include examples/regression/systems/periodic/mp_mp-1265.py \
--output-md
…and the new system shows up alongside the hand-curated test
set, with the per-source provenance footer included in the
generated summary.md.
When NOT to use vqfetch¶
You already have a CIF on disk. Use ASE’s CIF reader directly:
ase.io.read("my.cif")then build aPeriodicSystemfrom the result. vqfetch’s value is the fetch + provenance + emission round-trip, not the CIF parsing itself.You want a structure NOT in any open database. vqfetch’s sources cover ~99% of well-studied solids; for exotic / proprietary structures, hand-build the
PeriodicSystemand document provenance manually.You need cell relaxation. vqfetch emits a static-SCF script; add
optimize=Trueto the emitted call yourself if you want geometry optimisation. The fetched cell is the input geometry; the optimised cell is yours to record.
Phase 3 (deferred to v0.9.0+)¶
These items are designed but not shipped in v0.8.0:
WebBook fallback for structures missing from MP / COD.
Bulk-sweep tooling (
vqfetch list-candidates) for multi-candidate workflows.OQMD as a primary provider (currently reachable via OPTIMADE federation, not as a top-level subcommand).
Cross-provider consistency check (alert when MP / COD / NOMAD disagree on the same canonical structure).
The vqfetch chat owns these; tracked in
docs/roadmap.md § vqfetch Phase 3.
See also¶
reference_data.md— vqfetch’s CCCBDB integration for experimental reference data (atomization energies, ΔHf°, vibrational frequencies, IE).docs/license.md— full per-source licensing + bundled-data inventory.