Handover — F1 RSH K-build wiring (post-v0.9.0)

The v0.9.0 release lands the F1a scaffolding for range-separated hybrid (RSH) functionals:

  • Functional::cam_omega() / cam_alpha() / cam_beta() / is_range_separated() accessors (cpp/include/vibeqc/xc.hpp, cpp/src/xc.cpp).

  • xc_hyb_cam_coef extraction at Functional ctor time, validated against four RSH aliases (HSE06, ωB97X, ωB97X-V, CAM-B3LYP).

  • JKBuilder::build_K_lr(D, omega) abstract-base default that throws RuntimeError with an actionable “not yet wired” message (cpp/include/vibeqc/jk_builder.hpp).

  • tests/test_xc_rsh.py pinning the F1a contract.

The F1b (per-builder erfc-K implementation) and F1c (composite recipe flipping + literature validation) pieces are the v0.9.x follow-ups. This doc is the handover.

What needs to land for ωB97X-3c and HSE-3c to flip to RUNNABLE

F1b — long-range K-build in each JKBuilder concrete

The Fock build for an RSH functional needs two K matrices:

K_full(D)        — built with the standard Coulomb kernel 1/r₁₂
                    (the existing JKBuilder::build_K)
K_LR(D, ω)       — built with the erf-attenuated kernel
                    erf(ω · r₁₂) / r₁₂
                    (the NEW JKBuilder::build_K_lr)

The RSH Fock contribution is then

F_K = α · K_full(D)  +  β · (K_full(D) − K_LR(D, ω))
    = (α + β) · K_full(D)  −  β · K_LR(D, ω)

per the libxc CAM convention where:

  • α = SR + LR HF fraction (= total HF at small r)

  • β = LR-only HF fraction (extra HF activated at large r)

  • ω = range-separation parameter (inverse bohr)

For pure-RSH like ωB97X, α = 1, β = -0.842 → 100% LR HF + ~16% SR HF. For HSE06, α = 0, β = 0.25 → 0% LR HF + 25% SR HF (screened hybrid).

The four JKBuilder concretes need build_K_lr overrides:

FourIndexJKBuilder (cpp/src/jk_builder.cpp:20-44)

Smallest lift. The existing compute_eri(basis) materialises the full (μν|λρ) 4-index tensor using libint’s Operator::coulomb. Add a parallel compute_eri_lr(basis, omega) that uses Operator::erf_coulomb with the same machinery:

// cpp/src/integrals.cpp — new function
Eri4D compute_eri_lr(const BasisSet& basis, double omega) {
    libint2::Engine engine(libint2::Operator::erf_coulomb,
                           basis.max_nprim(),
                           basis.max_l());
    engine.set_params(omega);
    // … same shell-quartet loop as compute_eri …
}

Then in FourIndexJKBuilder cache the LR tensor lazily and dispatch:

class FourIndexJKBuilder final : public JKBuilder {
public:
    FourIndexJKBuilder(const BasisSet& basis)
        : basis_(&basis), eri_(compute_eri(basis)) {}

    Eigen::MatrixXd build_K_lr(const Eigen::MatrixXd& D,
                               double omega) const override {
        if (!eri_lr_ || omega != cached_omega_) {
            eri_lr_ = std::make_unique<Eri4D>(
                compute_eri_lr(*basis_, omega));
            cached_omega_ = omega;
        }
        return build_exchange(*eri_lr_, D);
    }

private:
    const BasisSet* basis_;
    Eri4D eri_;
    mutable std::unique_ptr<Eri4D> eri_lr_;
    mutable double cached_omega_ = 0.0;
};

Memory: ~doubles for RSH (caches two (n_bf)⁴ tensors). Same n_bf ≤ 250 ceiling as the standard FourIndex path.

DirectJKBuilder (cpp/src/jk_builder.cpp:59-...)

Medium lift. The existing direct_compute_k walks the 8-fold shell-quartet loop with on-the-fly libint evaluation. Mirror it with an Operator::erf_coulomb engine:

// cpp/src/jk_direct.cpp — new function
Eigen::MatrixXd direct_compute_k_lr(const BasisSet& basis,
                                    const Eigen::MatrixXd& D,
                                    const Eigen::MatrixXd& Q_lr,
                                    double schwarz_threshold,
                                    double omega);

Where Q_lr is the Schwarz-bound matrix built with the erf-Coulomb kernel (the standard Q is built with the unscreened kernel). The existing compute_schwarz_factors function (cpp/src/schwarz.cpp) takes a libint Engine prototype; pass an Operator::erf_coulomb engine to get the erf-Coulomb Q.

Caveat: the Almlöf incremental-ΔP cache (G_prev_) needs a parallel G_prev_lr_ cache, doubling the state. Manageable but requires careful invalidation when ω changes between SCF runs.

DFJKBuilder (cpp/src/jk_builder.cppmake_df_jk_builder)

Largest lift. The DF path uses the RI auxiliary basis:

B[A, μν] = Σ_B V[A, B]^(-1/2) · (B | μν)
J(D) = Σ_A B[A] · (Σ_μν D_{μν} B[A, μν])
K(D) = Σ_A B^T[A] · D · B[A]            (per-aux dot products)

For the LR variant, both the metric V and the (Aux | OrbOrb) integrals are built with the erf-attenuated Coulomb kernel. libint’s Operator::erf_coulomb works for these too, but the B-tensor caching strategy in DensityFitting needs a parallel “LR B-tensor” cache.

The existing vibeqc::DensityFitting class (cpp/include/vibeqc/density_fitting.hpp) likely needs:

class DensityFitting {
public:
    // existing — uses Coulomb metric
    const Eigen::Tensor<double, 3>& B_tensor() const;
    // NEW — uses erf-Coulomb metric with the given ω
    const Eigen::Tensor<double, 3>& B_tensor_lr(double omega) const;
};

Memory: ~doubles. The LR B-tensor is the same shape as the Coulomb B-tensor.

COSXJKBuilder (cpp/src/cosx.cpp)

Smallest performance footprint, biggest analytic complexity. COSX splits K into:

K_{μν}(D) = Σ_g w_g · χ_μ(g) · A_g[D]
A_g[D]    = Σ_λρ D_{λρ} χ_λ(g) · (V_{ρ}(g))
V_{ρ}(g)  = Σ_σ ∫ χ_ρ(r) χ_σ(r) / |r - r_g| dr

The numerator V_ρ(g) uses the analytic Coulomb integral; for the LR version it uses the erf-Coulomb integral. libint’s erf_coulomb operator supports the compute_nuclear_attraction-style call signature that produces V_ρ(g); the COSX kernel just needs to swap the operator when building the per-grid-point V.

Mardirossian-Head-Gordon 2014 ωB97X-V is typically run with COSX in production (Q-Chem); validating against their reference energies requires the COSX-LR path.

F1c — SCF Fock-build dispatcher + composite flips

Once build_K_lr is wired in the four concretes, the RKS/UKS Fock build needs a thin dispatch layer:

// cpp/src/rks.cpp — inside the SCF inner loop
Eigen::MatrixXd K = jk->build_K(D);
double alpha_K = functional.hf_exchange_fraction();
if (functional.is_range_separated()) {
    const double omega = functional.cam_omega();
    const double alpha_cam = functional.cam_alpha();
    const double beta_cam  = functional.cam_beta();
    Eigen::MatrixXd K_lr = jk->build_K_lr(D, omega);
    // F_K = (α + β) · K_full − β · K_LR
    F.noalias() -= 0.5 * (alpha_cam + beta_cam) * K;
    F.noalias() += 0.5 * beta_cam * K_lr;
} else {
    // Standard hybrid path (unchanged from v0.9.0).
    if (alpha_K != 0.0) F.noalias() -= 0.5 * alpha_K * K;
}

UKS is the same dispatcher with separate K_α, K_β builds for each spin density.

Then in python/vibeqc/composites.py, flip the recipe availabilities:

# ωB97X-3c: Availability.PENDING_F1 → Availability.RUNNABLE
# HSE-3c:   Availability.PENDING_F1 → Availability.RUNNABLE

F1c validation

Literature reference for the final correctness check:

  • ωB97X-V / def2-TZVPP / H₂O ≈ -76.467 Ha (Mardirossian-Head-Gordon 2014 supporting info).

  • HSE06 / def2-TZVPP / H₂O ≈ -76.435 Ha (Heyd-Scuseria-Ernzerhof 2003 paper or NIST reference data).

  • ωB97X-3c (with vDZP basis, gCP, D4) — Müller-Hansen-Grimme 2023 supporting info has S22 / TM26 benchmark numbers.

Tolerance: sub-mHa at converged SCF + tight grid.

Scope estimate per F1b sub-piece

Piece

LOC est.

Days

Risk

compute_eri_lr

~80

0.5

low — pattern match

FourIndex K_lr

~30

0.5

low

Direct K_lr

~120

1.0

medium — incremental cache state

DF K_lr

~150

1.5

high — B-tensor caching surface

COSX K_lr

~100

1.0

medium — operator swap in kernel

RKS/UKS dispatch

~40

0.5

low

Tests + validate

~100

1.0

low if reference numbers in hand

Total: 3-5 days of focused work for full F1 closure across all four JKBuilder concretes. A minimal proof-of-concept on FourIndex only is ~1 day and would land ωB97X-V SCF (with the existing in-core path; ≤250 BF), validating the F1 architecture end-to-end. The Direct/DF/COSX additions could then come as v0.9.x patches one at a time.