33. MgO from Materials Project — vqfetch end-to-end

This tutorial exercises the v0.8.0 vqfetch external-data integration: pull an MgO crystal structure from Materials Project, emit a regression PeriodicSpec + an executable input script, and run a periodic SCF with full provenance preserved.

You will:

  • Install the [fetch] optional extra.

  • Use vqfetch mp --id mp-1265 to pull the MgO rocksalt structure.

  • Inspect the emitted SPEC + input-script files.

  • Run the SCF and verify the per-record provenance bundle in the SCF log header.

Background: see user_guide/external_structures.md for the full vqfetch CLI + provenance contract + docs/license.md for the per-source licensing inventory.

Setup

vqfetch is part of the optional [fetch] extra:

pip install -e '.[fetch]'

This pulls in optimade, ase, beautifulsoup4, lxml. After install, the vqfetch console script is on $PATH.

Fetch + emit

Materials Project ID mp-1265 is the rocksalt MgO entry:

vqfetch mp --id mp-1265 --basis sto-3g --method rks-lda
# Output (one path per line; both written to disk):
# examples/regression/systems/periodic/mp_mp-1265.py
# examples/scf-mp_mp-1265.py

For a smoke test (recommended on first run), append --quick to force sto-3g:

vqfetch mp --id mp-1265 --quick

What gets written

The emitted SPEC module is import-as-Python-data, ready for the regression suite:

# examples/regression/systems/periodic/mp_mp-1265.py
from examples.regression.core.spec import (
    PeriodicSpec, Provenance,
)

mp_mp_1265 = PeriodicSpec(
    id="mp_mp-1265",
    formula="MgO",
    lattice_vectors=[[0.0, 2.106, 2.106],
                     [2.106, 0.0, 2.106],
                     [2.106, 2.106, 0.0]],
    atoms=[("Mg", [0.0, 0.0, 0.0]),
           ("O",  [2.106, 2.106, 2.106])],
    recommended_basis="sto-3g",
    provenance=Provenance(
        source_db="materials_project",
        source_id="mp-1265",
        source_url="https://next-gen.materialsproject.org/materials/mp-1265",
        license="CC-BY 4.0",
        fetched_at="2026-05-09T21:30:00+00:00",
    ),
)

The runnable script:

# examples/scf-mp_mp-1265.py
"""SCF on mp_mp-1265 — generated by vqfetch on 2026-05-09."""
from vibeqc import Atom, PeriodicSystem, run_periodic_job

cell = PeriodicSystem(
    lattice_vectors=[[0.0, 2.106, 2.106],
                     [2.106, 0.0, 2.106],
                     [2.106, 2.106, 0.0]],
    atoms=[Atom(12, [0.0, 0.0, 0.0]),
           Atom(8,  [2.106, 2.106, 2.106])],
)

run_periodic_job(
    cell,
    basis="sto-3g",
    method="RKS",
    functional="LDA",
    output="mp_mp-1265",
)

# Provenance bundle (preserved in the SCF log header):
#   Source: materials_project mp-1265
#   URL:    https://next-gen.materialsproject.org/materials/mp-1265
#   License: CC-BY 4.0
#   Fetched: 2026-05-09T21:30:00+00:00

Run the SCF

.venv/bin/python examples/scf-mp_mp-1265.py

Three output files in the working directory:

  • mp_mp-1265.out — banner, SCF trace, energy breakdown, orbital table, the source DB / ID / DOI / license recorded in the run header (search for “Provenance:” in the file).

  • mp_mp-1265.molden — molecular orbitals (open with Avogadro or Jmol).

  • mp_mp-1265.traj — ASE trajectory (single frame for static SCF; multi-frame if optimize=True).

Reference: live planetx round-trip on 2026-05-09 produced E = −950.4204308512 Ha in 13 SCF iters (~2h 20m on 16 cores at sto-3g; sub-minute on a laptop with OMP_NUM_THREADS=4 and the same basis).

Step up to a real basis + multi-k

The fetched cell is just data; you can re-run it at any basis or k-mesh by editing the input script:

run_periodic_job(
    cell,
    basis="pob-tzvp",                  # solid-state-tuned triple-zeta
    method="RKS",
    functional="pbe",                  # GGA workhorse
    kmesh=(4, 4, 4),                   # multi-k via KRKS
    output="mp_mp-1265-pbe-tzvp-444",
)

This drives the v0.8.0 run_krks_periodic_gdf driver — see Tutorial 34 for a multi-k worked example that targets a published reference, and user_guide/multi_k_scf.md for the algorithm + scope caveats.

Try other databases + structures

# COD CIF (experimental geometry):
vqfetch cod --id 1011027 --basis pob-tzvp

# Federated OPTIMADE search by formula (default provider: MP):
vqfetch optimade --formula NaCl

# OPTIMADE search restricted to NOMAD:
vqfetch optimade --formula CaCO3 --provider nomad

# Canonical-set slug (round-trip-verified):
vqfetch canonical mgo_rocksalt
vqfetch canonical lih_rocksalt
vqfetch canonical si_diamond

Each subcommand caches the result on disk per XDG (~/.cache/vqfetch/), so re-running the same query is a no-op cache hit — useful for offline / reproducible runs.

What this tells you

  • Provenance is non-negotiable. Every fetched record carries the source DB, ID, URL, original DOI (where available), license, and fetched timestamp — and they travel with the SCF result through to the .out file. No more “where did this geometry come from?” auditability gap.

  • Open databases are first-class. vibe-qc treats COD / MP / NOMAD / OPTIMADE as input fixtures; you don’t need to wrangle CIFs by hand.

  • The license matters. vqfetch surfaces the per-record license string so when you cite the result, you cite the source per its terms (Materials Project: cite per their terms; COD: CC0 — citation appreciated, not legally required).

See also