MgO from the Materials Project: vqfetch end-to-end

This tutorial exercises the v0.8.0 vqfetch external-data integration: pull an MgO crystal structure from Materials Project, emit a regression PeriodicSpec + an executable input script, and run a periodic SCF with full provenance preserved.

You will:

  • Install the [fetch] optional extra.

  • Use vqfetch mp --id mp-1265 to pull the MgO rocksalt structure.

  • Inspect the emitted SPEC + input-script files.

  • Run the SCF and verify the per-record provenance bundle in the SCF log header.

Background: see user_guide/external_structures.md for the full vqfetch CLI + provenance contract + docs/license.md for the per-source licensing inventory.

Setup

vqfetch is part of the optional [fetch] extra:

pip install -e '.[fetch]'

This pulls in optimade, ase, beautifulsoup4, lxml. After install, the vqfetch console script is on $PATH.

Fetch + emit

Materials Project ID mp-1265 is the rocksalt MgO entry:

vqfetch mp --id mp-1265 --basis sto-3g --method rks-lda
# Output (one path per line; both written to disk):
# examples/regression/systems/periodic/mp_mp-1265.py
# examples/scf-mp_mp-1265.py

For a smoke test (recommended on first run), append --quick to force sto-3g:

vqfetch mp --id mp-1265 --quick

What gets written

The emitted SPEC module is import-as-Python-data, ready for the regression suite:

# examples/regression/systems/periodic/mp_mp-1265.py
from examples.regression.core.spec import (
    PeriodicSpec, Provenance,
)

mp_mp_1265 = PeriodicSpec(
    id="mp_mp-1265",
    formula="MgO",
    lattice_vectors=[[0.0, 2.106, 2.106],
                     [2.106, 0.0, 2.106],
                     [2.106, 2.106, 0.0]],
    atoms=[("Mg", [0.0, 0.0, 0.0]),
           ("O",  [2.106, 2.106, 2.106])],
    recommended_basis="sto-3g",
    provenance=Provenance(
        source_db="materials_project",
        source_id="mp-1265",
        source_url="https://next-gen.materialsproject.org/materials/mp-1265",
        license="CC-BY 4.0",
        fetched_at="2026-05-09T21:30:00+00:00",
    ),
)

The runnable script:

# examples/scf-mp_mp-1265.py
"""SCF on mp_mp-1265 — generated by vqfetch on 2026-05-09."""
from vibeqc import Atom, PeriodicSystem, run_periodic_job

cell = PeriodicSystem(
    3,
    [[0.0, 2.106, 2.106],
     [2.106, 0.0, 2.106],
     [2.106, 2.106, 0.0]],
    [Atom(12, [0.0, 0.0, 0.0]),
     Atom(8,  [2.106, 2.106, 2.106])],
)

run_periodic_job(
    cell,
    basis="sto-3g",
    method="RKS",
    functional="LDA",
    output="mp_mp-1265",
)

# Provenance bundle (preserved in the SCF log header):
#   Source: materials_project mp-1265
#   URL:    https://next-gen.materialsproject.org/materials/mp-1265
#   License: CC-BY 4.0
#   Fetched: 2026-05-09T21:30:00+00:00

Run the SCF

Execute the emitted input script to run the periodic LDA SCF on the fetched MgO cell:

.venv/bin/python examples/scf-mp_mp-1265.py

Three output files in the working directory:

  • mp_mp-1265.out, banner, SCF trace, energy breakdown, orbital table, the source DB / ID / DOI / license recorded in the run header (search for “Provenance:” in the file).

  • mp_mp-1265.molden, molecular orbitals (open with Avogadro or Jmol).

  • mp_mp-1265.traj, ASE trajectory (single frame for static SCF; multi-frame if optimize=True).

Reference: live planetx round-trip on 2026-05-09 produced E = −950.4204308512 Ha in 13 SCF iters (~2h 20m on 16 cores at sto-3g; sub-minute on a laptop with OMP_NUM_THREADS=4 and the same basis).

Step up to a real basis + multi-k

The fetched cell is just data; you can re-run it at any basis or k-mesh by editing the input script:

run_periodic_job(
    cell,
    basis="pob-tzvp",                  # solid-state-tuned triple-zeta
    method="RKS",
    functional="pbe",                  # GGA workhorse
    kmesh=(4, 4, 4),                   # multi-k via KRKS
    output="mp_mp-1265-pbe-tzvp-444",
)

This drives the v0.8.0 run_krks_periodic_gdf driver, see LiH at multiple k-points: KRHF vs Peintinger 2013 for a multi-k worked example that targets a published reference, and user_guide/multi_k_scf.md for the algorithm + scope caveats.

Try other databases + structures

The same fetcher reaches other providers and canonical fixtures: COD for experimental CIFs, federated OPTIMADE search by formula, and the round-trip-verified canonical slugs:

# COD CIF (experimental geometry):
vqfetch cod --id 1011027 --basis pob-tzvp

# Federated OPTIMADE search by formula (default provider: MP):
vqfetch optimade --formula NaCl

# OPTIMADE search restricted to NOMAD:
vqfetch optimade --formula CaCO3 --provider nomad

# Canonical-set slug (round-trip-verified):
vqfetch canonical mgo_rocksalt
vqfetch canonical lih_rocksalt
vqfetch canonical si_diamond

Each subcommand caches the result on disk per XDG (~/.cache/vqfetch/), so re-running the same query is a no-op cache hit, useful for offline / reproducible runs.

What this tells you

  • Provenance is non-negotiable. Every fetched record carries the source DB, ID, URL, original DOI (where available), license, and fetched timestamp, and they travel with the SCF result through to the .out file. No more “where did this geometry come from?” auditability gap.

  • Open databases are first-class. vibe-qc treats COD / MP / NOMAD / OPTIMADE as input fixtures; you don’t need to wrangle CIFs by hand.

  • The license matters. vqfetch surfaces the per-record license string so when you cite the result, you cite the source per its terms (Materials Project: cite per their terms; COD: CC0, citation appreciated, not legally required).

See also