BLAS + LAPACK backend

vibe-qc’s C++ core uses Eigen for dense linear algebra. When a BLAS+LAPACK library is linked, Eigen delegates its matrix products, eigendecompositions, Cholesky factorisations, and dense LU / QR / SVD to that library (EIGEN_USE_BLAS catches dense *, .noalias() products and triangular solves; EIGEN_USE_LAPACKE catches Eigen::LLT, Eigen::LDLT, Eigen::SelfAdjointEigenSolver, Eigen::FullPivLU, etc.). The delegation is set at compile time; no source-code change is required to take advantage of it.

This page describes:

  • which BLAS gets linked on which platform, and how to check;

  • how vibe-qc’s OpenMP layer interacts with BLAS-internal threading;

  • when the vendored OpenBLAS path is worth it (and when the system install is enough);

  • a candid note on what the linkage does not fix — most notably the parent perf-optimisation lever for RIJCOSX.

What’s linked: read it off the banner

vibeqc.print_banner() prints a linked: line that names the BLAS backend in plain English:

╔══════════════════════════════════════════════════════════════════════════════════════════╗
║ dev 0.7.5 "Löwdin's Compass" (main @ 713cae9)  —  Quantum chemistry for molecules and …  ║
║ © Michael F. Peintinger · MPL 2.0  ·  https://vibe-qc.com                                ║
║ linked: libint 2.13.1 · libxc 7.0.0 · spglib 2.7.0 · blas Accelerate                     ║
╚══════════════════════════════════════════════════════════════════════════════════════════╝

The blas field is one of:

Banner label

What it means

blas Accelerate

Apple Accelerate framework (macOS default)

blas OpenBLAS

System or vendored OpenBLAS, no LAPACKE

blas OpenBLAS +LAPACKE

OpenBLAS plus the LAPACK C interface (Eigen’s dense solvers also delegate)

blas MKL

Intel MKL

blas netlib BLAS

Reference netlib BLAS (functional but slow — see below)

blas none

VIBEQC_USE_BLAS=OFF at build, Eigen-generic kernels

blas unknown

C extension didn’t load or blas_info() not in this build

Programmatically:

>>> import vibeqc
>>> from vibeqc.banner import library_versions
>>> library_versions()["blas"]
'OpenBLAS +LAPACKE'

>>> from vibeqc._vibeqc_core import blas_info
>>> blas_info()
{'libraries': '/usr/lib/libopenblas.so',
 'blas_enabled': True,
 'lapacke_enabled': True}

The raw libraries string is the literal BLAS_LIBRARIES value CMake’s FindBLAS returned at configure time — useful when debugging which .so actually got resolved.

Backend selection per platform

macOS — Apple Accelerate (default)

Accelerate ships with the OS and is highly tuned for Apple Silicon and recent Intel Macs. CMake picks it via find_package(BLAS BLA_VENDOR=Apple). Nothing to install, nothing to configure.

otool -L on the compiled _vibeqc_core*.so confirms:

$ otool -L .../vibeqc/_vibeqc_core.cpython-*-darwin.so | grep Accelerate
  /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate

EIGEN_USE_LAPACKE is not set on macOS — Accelerate’s LAPACKE bridge has version-specific quirks vs Eigen’s expectations, and Accelerate’s BLAS catches the SCF matrix products that matter. To force OpenBLAS on macOS anyway (mostly for reproducible cross-platform builds), build the vendored OpenBLAS (see below) and pass -DVIBEQC_BLAS_VENDOR=OpenBLAS to CMake.

Linux — system OpenBLAS / MKL / netlib auto-detect

CMake’s FindBLAS scans well-known vendors in order. With no override, the first one found wins. If you have admin access, the cleanest path is to install OpenBLAS via the package manager:

# Arch / Manjaro:
sudo pacman -S blas-openblas      # replaces reference BLAS system-wide

# Debian / Ubuntu:
sudo apt install libopenblas-dev liblapacke-dev

# Fedora / RHEL:
sudo dnf install openblas-devel lapack-devel

Then pip install -e . will pick up the system OpenBLAS automatically — no rebuild flag needed.

If only liblapack / libblas (netlib reference) is installed, CMake will still link, but the banner reads blas netlib BLAS — a perf trap (see “When BLAS linkage does and does not matter” below). scripts/setup_native_deps.sh probes for this case and prints a setup-time recommendation.

Linux — vendored OpenBLAS (no-sudo / HPC / CI)

For HPC users without root, locked-down workstations, or CI boxes that want a byte-identical BLAS across builds, an opt-in vendored OpenBLAS lives at third_party/openblas/install/:

WITH_OPENBLAS=1 ./scripts/setup_native_deps.sh
# ...or directly:
./scripts/build_openblas.sh

Requirements:

  • A Fortran compiler (gfortran) — netlib LAPACK’s Fortran sources are bundled into the resulting libopenblas.so to give EIGEN_USE_LAPACKE-grade dense solvers. Install hints per-distro are printed by build_openblas.sh if it can’t find one.

  • About 5–10 minutes of build time on first run.

Once third_party/openblas/install/ exists, CMake auto-prepends it to CMAKE_PREFIX_PATH and pins BLA_VENDOR=OpenBLAS, so the vendored library wins deterministically over any system BLAS. The next pip install -e . rebuilds the C extension against libopenblas.so (with RPATH baked, so import vibeqc finds it without LD_LIBRARY_PATH fiddling) and the banner switches to blas OpenBLAS +LAPACKE.

Build flags used (see scripts/build_openblas.sh for the full list and rationale):

  • DYNAMIC_ARCH=1 — runtime CPU dispatch. The vendored binary picks the right kernel on Haswell / Skylake / Zen / Apple Silicon / … at load time, so the build host’s SIMD support doesn’t constrain where the binary can run.

  • USE_LAPACK=1 USE_LAPACKE=1 — single libopenblas.so carries BLAS, LAPACK, and the LAPACKE C interface (so both Eigen delegation macros activate).

  • USE_THREAD=1 USE_OPENMP=0 NUM_THREADS=128 NO_AFFINITY=1 — pthreads-internal threading, capped, no CPU pinning. Combined with OPENBLAS_NUM_THREADS=1 (set by default in python/vibeqc/__init__.py) this gives a sane BLAS-serial, OpenMP-parallel posture — see the next section.

Threading: BLAS-serial, app-parallel

vibe-qc parallelises its own inner loops with OpenMP, controlled by OMP_NUM_THREADS. If the linked BLAS also threads internally — OpenBLAS-pthreads, MKL — you get N×K threads contending for cores: measurably worse than either layer alone.

python/vibeqc/__init__.py pins BLAS-internal threading to 1 by default before the C extension loads:

def _pin_blas_threads() -> None:
    for key in (
        "OPENBLAS_NUM_THREADS",
        "MKL_NUM_THREADS",
        "VECLIB_MAXIMUM_THREADS",
        "BLIS_NUM_THREADS",
    ):
        os.environ.setdefault(key, "1")

os.environ.setdefault only fills a key that isn’t already set, so an explicit OPENBLAS_NUM_THREADS=4 in your shell wins. If you specifically want BLAS-internal threading (e.g., for a single-threaded outer loop where BLAS parallelism is the only parallelism available), export it before import vibeqc.

The default — BLAS serial, OpenMP parallel via OMP_NUM_THREADS — is the standard posture for mixed OpenMP-and-BLAS codes (ORCA, NWChem, PySCF, Psi4 all do the same).

When BLAS linkage does — and does not — matter

vibe-qc inherits its perf model from Eigen + the vendored numerical libraries (libint, libxc, libecpint) and its own hand-rolled kernels. EIGEN_USE_BLAS+LAPACKE matters where Eigen ops are the bottleneck — and not where they aren’t.

Where it helps: density-fitting SCF (density_fit=True) at larger systems, periodic-SCF Cholesky factorisations, MP2 amplitude updates, hessian assembly. Anything dominated by dense matrix products on Fock-sized matrices (≳ 500 BFs) benefits.

Where it’s near-flat: RIJCOSX-heavy HF / hybrid DFT at small systems. The cost there is in the COSX grid loop, AO integral assembly, and DF 3-center kernel evaluation — paths that don’t go through Eigen’s dense kernels, so EIGEN_USE_BLAS doesn’t touch them. Measured on butanethiol (C₄H₁₀S) HF / def2-TZVP / RIJCOSX, the wall time difference between no-BLAS, system netlib BLAS, and vendored OpenBLAS+LAPACKE is within 7% (~298 s across all three).

If your run is RIJCOSX-bound and you need it faster, the inner-loop work is in cpp/src/cosx.cpp (Q-junction caching, shell-pair Schwarz screening on the K build, A_g block sparsity) rather than the BLAS backend. Track that work in the perf chat on claude/perf-vs-orca.

Disabling BLAS (debugging)

For numerical regressions where you suspect Eigen / BLAS roundoff differences (rare — should produce ulp-level discrepancies at most, easily within SCF tolerances), pass -DVIBEQC_USE_BLAS=OFF to the CMake configure to force Eigen’s generic kernels:

pip install --no-build-isolation -e . \
    -Ccmake.define.VIBEQC_USE_BLAS=OFF

The banner will then read blas none. Run side-by-side against a default build to isolate which kernel the discrepancy lives in.

Citation